HySDS AWS Autoscaling


Confidence Level TBD  This article has not been reviewed for accuracy, timeliness, or completeness. Check that this information is valid before acting on it.

Confidence Level TBD  This article has not been reviewed for accuracy, timeliness, or completeness. Check that this information is valid before acting on it.

Highlights

  • Scale-up is done by an auto-scaling up metric alarm that checks the queue size.

  • Desirement to reduce auto-scaling AZ re-balancing which results in terminations. Therefore better to keep fleet balanced evenly across auto-scaling zones (AZ).

  • When using spot fleet, can ensure scale out in multiples of number of AZs.

  • Set scaling policy per ASG

    • example: If alarm threshold is greater than 1 for greater than 60 seconds

      • Add 1 instance when JobsWaiting-grfn-job_worker-large is [1,10)

      • Add 10 instances when JobsWaiting-grfn-job_worker-large is [10,∞)

Optimization

  • Auto-scaling optimizations

    • Currently public batch size 20-instance per 5-minute cool-down

    • Cool-down default 300-seconds

  • Default internal batch rate of 10-instances per 30-seconds

  • AWS ASG will increase our max batch rate to 100-instances per 30-seconds

  • Could manually set desired group size to 100

  • Logs indicate our queue_size metric alarm only firing every 10-minutes.

  • Recommendations

    • change cool-down to 1-minute

    • try batch size of 100 instances

    • set custom queue_size metric to check every 1-minute CloudWatch

  • if make these recommended changes, then estimate 1000 instances will take 55-minutes to ramp up

Self-Termination

 

References

AWS Autoscaling


AWS CloudWatch


boto: Python interface to Amazon Web Services




>>> import boto.ec2.cloudwatch

>>> c = boto.ec2.cloudwatch.connect_to_region('us-west-2')

>>> metrics = c.list_metrics()

>>> metrics

[Metric:DiskReadBytes,

 Metric:CPUUtilization,

 Metric:DiskWriteOps,

 Metric:DiskWriteOps,

 Metric:DiskReadOps,

 Metric:DiskReadBytes,

 Metric:DiskReadOps,

 Metric:CPUUtilization,

 Metric:DiskWriteOps,

 Metric:NetworkIn,

 Metric:NetworkOut,

 Metric:NetworkIn,

 Metric:DiskReadBytes,

 Metric:DiskWriteBytes,

 Metric:DiskWriteBytes,

 Metric:NetworkIn,

 Metric:NetworkIn,

 Metric:NetworkOut,

 Metric:NetworkOut,

 Metric:DiskReadOps,

 Metric:CPUUtilization,

 Metric:DiskReadOps,

 Metric:CPUUtilization,

 Metric:DiskWriteBytes,

 Metric:DiskWriteBytes,

 Metric:DiskReadBytes,

 Metric:NetworkOut,

 Metric:DiskWriteOps]

Code

 


 

Related Articles:

Have Questions? Ask a HySDS Developer:

Anyone can join our public Slack channel to learn more about HySDS. JPL employees can join #HySDS-Community

JPLers can also ask HySDS questions at Stack Overflow Enterprise

Search HySDS Wiki

Page Information:

Was this page useful?

Yes No

Contribution History:

Subject Matter Expert:

@Gerald Manipon

@Hook Hua

Find an Error?

Is this document outdated or inaccurate? Please contact the assigned Page Maintainer:

@Gerald Manipon

Note: JPL employees can also get answers to HySDS questions at Stack Overflow Enterprise: