...
Page Navigation: | ||||
---|---|---|---|---|
|
Confidence Level TBD This article has not been reviewed for accuracy, timeliness, or completeness. Check that this information is valid before acting on it. |
---|
Highlights
Scale-up is done by an auto-scaling up metric alarm that checks the queue size.
Desirement to reduce auto-scaling AZ re-balancing which results in terminations. Therefore better to keep fleet balanced evenly across auto-scaling zones (AZ).
...
When using spot fleet, can ensure scale out in multiples of number of AZs.
Set scaling policy per ASG
example: If alarm threshold is greater than 1 for greater than 60 seconds
Add 1 instance when JobsWaiting-grfn-job_worker-large is [1,10)
Add 10 instances when JobsWaiting-grfn-job_worker-large is [10,∞)
Optimization
...
Auto-scaling optimizations
...
Currently public batch size 20-instance per 5-minute cool-down
...
Cool-down default 300-seconds
...
Default internal batch rate of 10-instances per 30-seconds
AWS ASG will increase our max batch rate to 100-instances per 30-seconds
...
Could manually set desired group size to 100
...
Logs indicate our queue_size metric alarm only firing every 10-minutes.
...
Recommendations
change cool-down to 1-minute
try batch size of 100 instances
set custom queue_size metric to check every 1-minute CloudWatch
if make these recommended changes, then estimate 1000 instances will take 55-minutes to ramp up
...
Self-
...
Termination
...
Need to suspend auto-scaling group AZ for load balancing scale down
http://docs.aws.amazon.com/AutoScaling/latest/DeveloperGuide/US_SuspendResume.html
...
?Command for ASG to turn off AZ rebalancing
...
This only needs to be ran once per ASG and will show up in the details tab of the ASG
References
AWS Autoscaling
AWS CloudWatch
boto: Python interface to Amazon Web Services
...
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|