HySDS AWS Autoscaling
Page Navigation: |
---|
Confidence Level TBD This article has not been reviewed for accuracy, timeliness, or completeness. Check that this information is valid before acting on it. |
---|
Highlights
Scale-up is done by an auto-scaling up metric alarm that checks the queue size.
Desirement to reduce auto-scaling AZ re-balancing which results in terminations. Therefore better to keep fleet balanced evenly across auto-scaling zones (AZ).
When using spot fleet, can ensure scale out in multiples of number of AZs.
Set scaling policy per ASG
example: If alarm threshold is greater than 1 for greater than 60 seconds
Add 1 instance when JobsWaiting-grfn-job_worker-large is [1,10)
Add 10 instances when JobsWaiting-grfn-job_worker-large is [10,∞)
Optimization
Auto-scaling optimizations
Currently public batch size 20-instance per 5-minute cool-down
Cool-down default 300-seconds
Default internal batch rate of 10-instances per 30-seconds
AWS ASG will increase our max batch rate to 100-instances per 30-seconds
Could manually set desired group size to 100
Logs indicate our queue_size metric alarm only firing every 10-minutes.
Recommendations
change cool-down to 1-minute
try batch size of 100 instances
set custom queue_size metric to check every 1-minute CloudWatch
if make these recommended changes, then estimate 1000 instances will take 55-minutes to ramp up
Self-Termination
Need to suspend auto-scaling group AZ for load balancing scale down
http://docs.aws.amazon.com/AutoScaling/latest/DeveloperGuide/US_SuspendResume.html
?Command for ASG to turn off AZ rebalancing
This only needs to be ran once per ASG and will show up in the details tab of the ASG
References
AWS Autoscaling
AWS CloudWatch
boto: Python interface to Amazon Web Services
|
Code
Related Articles: |
---|
Have Questions? Ask a HySDS Developer: |
Anyone can join our public Slack channel to learn more about HySDS. JPL employees can join #HySDS-Community
|
JPLers can also ask HySDS questions at Stack Overflow Enterprise
|
Page Information: |
---|
Was this page useful? |
Contribution History:
|
Subject Matter Expert: @Gerald Manipon @Hook Hua |
Find an Error? Is this document outdated or inaccurate? Please contact the assigned Page Maintainer: @Gerald Manipon |