Skip to end of metadata
Go to start of metadata
You are viewing an old version of this page. View the current version.
Compare with Current
View Page History
« Previous
Version 4
Next »
- scale-up is done by an auto-scaling up metric alarm that checks the queue size.
- Desirement to reduce auto-scaling AZ re-balancing which results in terminations. Therefore better to keep fleet balanced evenly across auto-scaling zones (AZ).
- when using spot fleet, can ensure scale out in multiples of number of AZs.
- Set scaling policy per ASG
- example: If alarm threshold is greater than 1 for greater than 60 seconds
- Add 1 instance when JobsWaiting-grfn-job_worker-large is [1,10)
- Add 10 instances when JobsWaiting-grfn-job_worker-large is [10,∞)
Optimization
- auto-scaling optimizations
- currently public batch size 20-instance per 5-minute cool-down
- cool-down default 300-seconds
- default internal batch rate of 10-instances per 30-seconds
- AWS ASG will increase our max batch rate to 100-instances per 30-seconds
- could manually set desired group size to 100
- logs indicate our queue_size metric alarm only firing every 10-minutes.
- recommendations
- change cool-down to 1-minute
- try batch size of 100 instances
- set custom queue_size metric to check every 1-minute CloudWatch
- if make these recommended changes, then estimate 1000 instances will take 55-minutes to ramp up
self-termination
References
AWS Autoscaling
AWS CloudWatch
boto: Python interface to Amazon Web Services
>>> import boto.ec2.cloudwatch
>>> c = boto.ec2.cloudwatch.connect_to_region( 'us-west-2' )
>>> metrics = c.list_metrics()
>>> metrics
[Metric:DiskReadBytes,
Metric:CPUUtilization,
Metric:DiskWriteOps,
Metric:DiskWriteOps,
Metric:DiskReadOps,
Metric:DiskReadBytes,
Metric:DiskReadOps,
Metric:CPUUtilization,
Metric:DiskWriteOps,
Metric:NetworkIn,
Metric:NetworkOut,
Metric:NetworkIn,
Metric:DiskReadBytes,
Metric:DiskWriteBytes,
Metric:DiskWriteBytes,
Metric:NetworkIn,
Metric:NetworkIn,
Metric:NetworkOut,
Metric:NetworkOut,
Metric:DiskReadOps,
Metric:CPUUtilization,
Metric:DiskReadOps,
Metric:CPUUtilization,
Metric:DiskWriteBytes,
Metric:DiskWriteBytes,
Metric:DiskReadBytes,
Metric:NetworkOut,
Metric:DiskWriteOps]
|
Code