Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Current »

Page Navigation:


Highlights

  • Scale-up is done by an auto-scaling up metric alarm that checks the queue size.

  • Desirement to reduce auto-scaling AZ re-balancing which results in terminations. Therefore better to keep fleet balanced evenly across auto-scaling zones (AZ).

  • When using spot fleet, can ensure scale out in multiples of number of AZs.

  • Set scaling policy per ASG

    • example: If alarm threshold is greater than 1 for greater than 60 seconds

      • Add 1 instance when JobsWaiting-grfn-job_worker-large is [1,10)

      • Add 10 instances when JobsWaiting-grfn-job_worker-large is [10,∞)

Optimization

  • Auto-scaling optimizations

    • Currently public batch size 20-instance per 5-minute cool-down

    • Cool-down default 300-seconds

  • Default internal batch rate of 10-instances per 30-seconds

  • AWS ASG will increase our max batch rate to 100-instances per 30-seconds

  • Could manually set desired group size to 100

  • Logs indicate our queue_size metric alarm only firing every 10-minutes.

  • Recommendations

    • change cool-down to 1-minute

    • try batch size of 100 instances

    • set custom queue_size metric to check every 1-minute CloudWatch

  • if make these recommended changes, then estimate 1000 instances will take 55-minutes to ramp up

Self-Termination

References

AWS Autoscaling


AWS CloudWatch


boto: Python interface to Amazon Web Services



>>> import boto.ec2.cloudwatch

>>> c = boto.ec2.cloudwatch.connect_to_region('us-west-2')

>>> metrics = c.list_metrics()

>>> metrics

[Metric:DiskReadBytes,

 Metric:CPUUtilization,

 Metric:DiskWriteOps,

 Metric:DiskWriteOps,

 Metric:DiskReadOps,

 Metric:DiskReadBytes,

 Metric:DiskReadOps,

 Metric:CPUUtilization,

 Metric:DiskWriteOps,

 Metric:NetworkIn,

 Metric:NetworkOut,

 Metric:NetworkIn,

 Metric:DiskReadBytes,

 Metric:DiskWriteBytes,

 Metric:DiskWriteBytes,

 Metric:NetworkIn,

 Metric:NetworkIn,

 Metric:NetworkOut,

 Metric:NetworkOut,

 Metric:DiskReadOps,

 Metric:CPUUtilization,

 Metric:DiskReadOps,

 Metric:CPUUtilization,

 Metric:DiskWriteBytes,

 Metric:DiskWriteBytes,

 Metric:DiskReadBytes,

 Metric:NetworkOut,

 Metric:DiskWriteOps]

Code


(lightbulb) Have Questions? Ask a HySDS Developer:

Anyone can join our public Slack channel to learn more about HySDS. JPL employees can join #HySDS-Community

(blue star)

JPLers can also ask HySDS questions at Stack Overflow Enterprise

(blue star)

  • No labels