HySDS UI has a minimal dashboard in Kibana to display current status for its services (supervisor / systemd)

Kibana is running on https://<MOZART_IP_ADDRESS>/metrics/app/home

Accessing the HySDS Cluster Health page

Fig 1. Link to Kibana in HySDS UI

Fig 2. Sidebar display button and link to the Kibana dashboards

Fig 3. Link to the HySDS cluster health page

HySDS Cluster health dashboard

Fig 4. systemd service(s) status

The table above [Fig 4] shows the status of systemd services (Elasticsearch, redis, rabbitmq, etc.), check the systemd.SubState & systemd.ActiveStateTimestamp columns for the current status and last updated.

Fig 5. supervisord service(s) status

The table above [Fig 5] shows the status of supervisord services (celery workers, rest APIs, logstash, etc.), check the supervisord.status & supervisord.uptime columns for the current status and uptime.

Cluster Health backend

There are 2 supervisord processes running; they check for:

supervisord services
- celery workers:
  - job workers (factotum)
  - user rules
  - orchestrator
- Rest APIs (grq2, mozart, pele, etc.)
- Logstash
- docker registry
- sdswatch
- Kibana
- Filebeats
- worker timeouts
systemd services
- Elasticsearch (grq, mozart & metrics)
- Redis (mozart & metrics)
- Rabbitmq
- httpd (proxy)

The script(s) will periodically check for service statuses every minute

[program:watch_supervisord_services]
directory={{ OPS_HOME }}
command={{ OPS_HOME }}/mozart/bin/watch_supervisord_services.py --host mozart
process_name=%(program_name)s
priority=999
numprocs=1
numprocs_start=0
redirect_stderr=true
startretries=0
stdout_logfile=%(here)s/../log/%(program_name)s.fulldict.sdswatch.log
stdout_logfile_maxbytes=100MB
stdout_logfile_backups=10
startsecs=10

[program:watch_systemd_services]
directory={{ OPS_HOME }}
command={{ OPS_HOME }}/mozart/bin/watch_systemd_services.py --host mozart -s elasticsearch redis rabbitmq-server httpd
process_name=%(program_name)s
priority=999
numprocs=1
numprocs_start=0
redirect_stderr=true
startretries=0
stdout_logfile=%(here)s/../log/%(program_name)s.fulldict.sdswatch.log
stdout_logfile_maxbytes=100MB
stdout_logfile_backups=10
startsecs=10

Future plans for the Cluster Health page

The current page is very bare boned, but some ideas for the future are:

Color-coating the status column in the table with
- RED: service is down
- GREEN: service is running
- YELLOW: Service is starting
Integration with Cloudwatch logs (if possible) or just a simple link
Graphs & visualization (if possible)
Potentially moving it to a proper frontend/React application with better UI/UX
- won't be limited by Kibana

HySDS-Core

HySDS Cluster Health Page

Accessing the HySDS Cluster Health page

HySDS Cluster health dashboard

Cluster Health backend

Future plans for the Cluster Health page