Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

HySDS UI has a minimal dashboard in Kibana to display current status for its services (supervisor / systemd)

Kibana is running on https://<MOZART_IP_ADDRESS>/metrics/app/home

Accessing the HySDS Cluster Health page

HySDS Cluster health dashboard

The table above [Fig 4] shows the status of systemd services (Elasticsearch, redis, rabbitmq, etc.), check the systemd.SubState & systemd.ActiveStateTimestamp columns for the current status and last updated.

The table above [Fig 5] shows the status of supervisord services (celery workers, rest APIs, logstash, etc.), check the supervisord.status & supervisord.uptime columns for the current status and uptime.

Cluster Health backend

There are 2 supervisord processes running; they check for:

  • supervisord services

    • celery workers:

      • job workers (factotum)

      • user rules

      • orchestrator

    • Rest APIs (grq2, mozart, pele, etc.)

    • Logstash

    • docker registry

    • sdswatch

    • Kibana

    • Filebeats

    • worker timeouts

  • systemd services

    • Elasticsearch (grq, mozart & metrics)

    • Redis (mozart & metrics)

    • Rabbitmq

    • httpd (proxy)

The script(s) will periodically check for service statuses every minute

[program:watch_supervisord_services]
directory={{ OPS_HOME }}
command={{ OPS_HOME }}/mozart/bin/watch_supervisord_services.py --host mozart
process_name=%(program_name)s
priority=999
numprocs=1
numprocs_start=0
redirect_stderr=true
startretries=0
stdout_logfile=%(here)s/../log/%(program_name)s.fulldict.sdswatch.log
stdout_logfile_maxbytes=100MB
stdout_logfile_backups=10
startsecs=10

[program:watch_systemd_services]
directory={{ OPS_HOME }}
command={{ OPS_HOME }}/mozart/bin/watch_systemd_services.py --host mozart -s elasticsearch redis rabbitmq-server httpd
process_name=%(program_name)s
priority=999
numprocs=1
numprocs_start=0
redirect_stderr=true
startretries=0
stdout_logfile=%(here)s/../log/%(program_name)s.fulldict.sdswatch.log
stdout_logfile_maxbytes=100MB
stdout_logfile_backups=10
startsecs=10

Future plans for the Cluster Health page

The current page is very bare boned, but some ideas for the future are:

  • Color-coating the status column in the table with

    • RED: service is down

    • GREEN: service is running

    • YELLOW: Service is starting

  • Integration with Cloudwatch logs (if possible) or just a simple link

  • Graphs & visualization

  • No labels