...
HySDS UI has a minimal dashboard in Kibana to display current status for its services (supervisor
/ systemd
)
Kibana is running on https://<MOZART_IP_ADDRESS>/metrics/app/home
Accessing the HySDS Cluster Health page
...
HySDS Cluster health dashboard
...
The table above [Fig 4] shows the status of systemd
services (Elasticsearch, redis, rabbitmq, etc.), check the systemd.SubState
& systemd.ActiveStateTimestamp
columns for the current status and last updated.
The table above [Fig 5] shows the status of supervisord
services (celery workers, rest APIs, logstash, etc.), check the supervisord.status
& supervisord.uptime
columns for the current status and uptime.
Cluster Health backend
There are 2 supervisord
processes running; they check for:
supervisord
servicescelery
workers:job workers (factotum)
user rules
orchestrator
Rest APIs (grq2, mozart, pele, etc.)
Logstash
docker registry
sdswatch
Kibana
Filebeats
worker timeouts
systemd
servicesElasticsearch (grq, mozart & metrics)
Redis (mozart & metrics)
Rabbitmq
httpd (proxy)
The script(s) will periodically check for service statuses every minute
Code Block |
---|
[program:watch_supervisord_services]
directory={{ OPS_HOME }}
command={{ OPS_HOME }}/mozart/bin/watch_supervisord_services.py --host mozart
process_name=%(program_name)s
priority=999
numprocs=1
numprocs_start=0
redirect_stderr=true
startretries=0
stdout_logfile=%(here)s/../log/%(program_name)s.fulldict.sdswatch.log
stdout_logfile_maxbytes=100MB
stdout_logfile_backups=10
startsecs=10
[program:watch_systemd_services]
directory={{ OPS_HOME }}
command={{ OPS_HOME }}/mozart/bin/watch_systemd_services.py --host mozart -s elasticsearch redis rabbitmq-server httpd
process_name=%(program_name)s
priority=999
numprocs=1
numprocs_start=0
redirect_stderr=true
startretries=0
stdout_logfile=%(here)s/../log/%(program_name)s.fulldict.sdswatch.log
stdout_logfile_maxbytes=100MB
stdout_logfile_backups=10
startsecs=10 |
Future plans for the Cluster Health page
The current page is very bare boned, but some ideas for the future are:
Color-coating the status column in the table with
RED: service is down
GREEN: service is running
YELLOW: Service is starting
Integration with Cloudwatch logs (if possible) or just a simple link
Graphs & visualization