Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Page Navigation:

Table of Contents
maxLevel2


Highlights

  • Scale-up is done by an auto-scaling up metric alarm that checks the queue size.

  • Desirement to reduce auto-scaling AZ re-balancing which results in terminations. Therefore better to keep fleet balanced evenly across auto-scaling zones (AZ).

...

  • When using spot fleet, can ensure scale out in multiples of number of AZs.

  • Set scaling policy per ASG

    • example: If alarm threshold is greater than 1 for greater than 60 seconds

      • Add 1 instance when JobsWaiting-grfn-job_worker-large is [1,10)

      • Add 10 instances when JobsWaiting-grfn-job_worker-large is [10,∞)

Optimization

...

  • Auto-scaling optimizations

...

    • Currently public batch size 20-instance per 5-minute cool-down

...

    • Cool-down default 300-seconds

...

  • Default internal batch rate of 10-instances per 30-seconds

  • AWS ASG will increase our max batch rate to 100-instances per 30-seconds

...

  • Could manually set desired group size to 100

...

  • Logs indicate our queue_size metric alarm only firing every 10-minutes.

...

  • Recommendations

    • change cool-down to 1-minute

    • try batch size of 100 instances

    • set custom queue_size metric to check every 1-minute CloudWatch

  • if make these recommended changes, then estimate 1000 instances will take 55-minutes to ramp up

...

Self-

...

Termination

...

...

  • ?Command for ASG to turn off AZ rebalancing

aws autoscaling suspend-processes  --auto-scaling-group-name ${yourASGname} --scaling-processes AZRebalance

...


>>> import 

...

boto.ec2.cloudwatch

>>> c = boto.ec2.cloudwatch.connect_to_region('us-west-2')

>>> metrics = c.list_metrics()

>>> metrics

[Metric:DiskReadBytes,

...

 Metric:CPUUtilization,

...

 Metric:DiskWriteOps,

...

 Metric:DiskWriteOps,

...

 Metric:DiskReadOps,

...

 Metric:DiskReadBytes,

...

 Metric:DiskReadOps,

...

 Metric:CPUUtilization,

...

 Metric:DiskWriteOps,

...

 Metric:NetworkIn,

...

 Metric:NetworkOut,

...

 Metric:NetworkIn,

...

 Metric:DiskReadBytes,

...

 Metric:DiskWriteBytes,

...

 Metric:DiskWriteBytes,

...

 Metric:NetworkIn,

...

 Metric:NetworkIn,

...

 Metric:NetworkOut,

...

 Metric:NetworkOut,

...

 Metric:DiskReadOps,

...

 Metric:CPUUtilization,

...

 Metric:DiskReadOps,

...

 Metric:CPUUtilization,

...

 Metric:DiskWriteBytes,

...

 Metric:DiskWriteBytes,

...

 Metric:DiskReadBytes,

...

 Metric:NetworkOut,

...

(lightbulb) Have Questions? Ask a HySDS Developer:

Anyone can join our public Slack channelto learn more about HySDS. JPL employees can join #HySDS-Community

(blue star)

JPLers can also ask HySDS questions atStack Overflow Enterprise

(blue star)

Live Search
placeholderSearch HySDS Wiki