When we use Amazon Web Services autoscaling, we need to configure an alarm that is triggered when a certain condition is met. Usually, this involves checking whether a CloudWatch metric falls within or outside a certain range.
Since we want to trigger autoscaling when we have a large number of jobs in the system, Mozart needs to communicate this metric to Amazon, so it can be used in CloudWatch. We can define custom metrics. This is done via https://github.jpl.nasa.gov/hysds-org/hysds/blob/master/scripts/sync_ec2_job_metric.py
The program runs in supervisord on Mozart. This is not enabled by default, because it is customized based on which queues are being monitored. You need to add a new program block for each queue you are monitoring.
This is the templated version kept in GitHub for ARIA Ops:
hysds_cluster_fab/files/supervisord.conf.mozart
[program:sync_ec2_job_metric-{{ MONITORED_QUEUE_5 }}] directory=/home/ops/mozart/ops/hysds/scripts command=/home/ops/mozart/ops/hysds/scripts/sync_ec2_job_metric.py --interval 60 {{ MONITORED_QUEUE_5 }} process_name=%(program_name)s priority= 1 numprocs= 1 numprocs_start= 0 redirect_stderr= true stdout_logfile=%(here)s/../log/%(program_name)s.log startsecs= 10 |
MONITORED_QUEUE_5 is a template variable defined in hysds_cluster_fab/context.sh
export MONITORED_QUEUE=spyddder-extract |
Once Fabric pushes the configuration to Mozart, it can be found on Mozart at: /home/ops/mozart/etc/supervisord.conf
[program:sync_ec2_job_metric-spyddder-extract] directory=/home/ops/mozart/ops/hysds/scripts command=/home/ops/mozart/ops/hysds/scripts/sync_ec2_job_metric.py --interval 60 spyddder-extract process_name=%(program_name)s priority= 1 numprocs= 1 numprocs_start= 0 redirect_stderr= true stdout_logfile=%(here)s/../log/%(program_name)s.log startsecs= 10 |