HySDS UI has a minimal dashboard in Kibana to display current status for its services (supervisor
/ systemd
)
Kibana is running on https://<MOZART_IP_ADDRESS>/metrics/app/home
Accessing the HySDS Cluster Health page
HySDS Cluster health dashboard
The table above [Fig 4] shows the status of systemd
services (Elasticsearch, redis, rabbitmq, etc.), check the systemd.SubState
& systemd.ActiveStateTimestamp
columns for the current status and last updated.
The table above [Fig 5] shows the status of supervisord
services (celery workers, rest APIs, logstash, etc.), check the supervisord.status
& supervisord.uptime
columns for the current status and uptime.
Cluster Health backend
There are 2 supervisord
processes running; they check for:
supervisord
servicescelery
workers:job workers (factotum)
user rules
orchestrator
Rest APIs (grq2, mozart, pele, etc.)
Logstash
docker registry
sdswatch
Kibana
Filebeats
worker timeouts
systemd
servicesElasticsearch (grq, mozart & metrics)
Redis (mozart & metrics)
Rabbitmq
httpd (proxy)
The script(s) will periodically check for service statuses every minute
[program:watch_supervisord_services] directory={{ OPS_HOME }} command={{ OPS_HOME }}/mozart/bin/watch_supervisord_services.py --host mozart process_name=%(program_name)s priority=999 numprocs=1 numprocs_start=0 redirect_stderr=true startretries=0 stdout_logfile=%(here)s/../log/%(program_name)s.fulldict.sdswatch.log stdout_logfile_maxbytes=100MB stdout_logfile_backups=10 startsecs=10 [program:watch_systemd_services] directory={{ OPS_HOME }} command={{ OPS_HOME }}/mozart/bin/watch_systemd_services.py --host mozart -s elasticsearch redis rabbitmq-server httpd process_name=%(program_name)s priority=999 numprocs=1 numprocs_start=0 redirect_stderr=true startretries=0 stdout_logfile=%(here)s/../log/%(program_name)s.fulldict.sdswatch.log stdout_logfile_maxbytes=100MB stdout_logfile_backups=10 startsecs=10
Future plans for the Cluster Health page
The current page is very bare boned, but some ideas for the future are:
Color-coating the status column in the table with
RED: service is down
GREEN: service is running
YELLOW: Service is starting
Integration with Cloudwatch logs (if possible) or just a simple link
Graphs & visualization
Potentially moving it to a proper frontend/React application with better UI/UX
won't be limited by Kibana