Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Skip to end of metadata

...



  • scale-up is done by an auto-scaling up metric alarm that checks the queue size.
  • Desirement to reduce auto-scaling AZ re-balancing which results in terminations. Therefore better to keep fleet balanced evenly across auto-scaling zones (AZ).
    • when using spot fleet, can ensure scale out in multiples of number of AZs.
  • Set scaling policy per ASG
    • example: If alarm threshold is greater than 1 for greater than 60 seconds
      • Add 1 instance when JobsWaiting-grfn-job_worker-large is [1,10)
      • Add 10 instances when JobsWaiting-grfn-job_worker-large is [10,∞)

Optimization

  • auto-scaling optimizations
    • currently public batch size 20-instance per 5-minute cool-down
    • cool-down default 300-seconds
  • default internal batch rate of 10-instances per 30-seconds
  • AWS ASG will increase our max batch rate to 100-instances per 30-seconds
  • could manually set desired group size to 100
  • logs indicate our queue_size metric alarm only firing every 10-minutes.
  • recommendations
    • change cool-down to 1-minute
    • try batch size of 100 instances
    • set custom queue_size metric to check every 1-minute CloudWatch
  • if make these recommended changes, then estimate 1000 instances will take 55-minutes to ramp up

self-termination


References

AWS Autoscaling


AWS CloudWatch


boto: Python interface to Amazon Web Services



>>> import boto.ec2.cloudwatch
>>> c = boto.ec2.cloudwatch.connect_to_region('us-west-2')
>>> metrics = c.list_metrics()
>>> metrics
[Metric:DiskReadBytes,
 Metric:CPUUtilization,
 Metric:DiskWriteOps,
 Metric:DiskWriteOps,
 Metric:DiskReadOps,
 Metric:DiskReadBytes,
 Metric:DiskReadOps,
 Metric:CPUUtilization,
 Metric:DiskWriteOps,
 Metric:NetworkIn,
 Metric:NetworkOut,
 Metric:NetworkIn,
 Metric:DiskReadBytes,
 Metric:DiskWriteBytes,
 Metric:DiskWriteBytes,
 Metric:NetworkIn,
 Metric:NetworkIn,
 Metric:NetworkOut,
 Metric:NetworkOut,
 Metric:DiskReadOps,
 Metric:CPUUtilization,
 Metric:DiskReadOps,
 Metric:CPUUtilization,
 Metric:DiskWriteBytes,
 Metric:DiskWriteBytes,
 Metric:DiskReadBytes,
 Metric:NetworkOut,
 Metric:DiskWriteOps]


Code