Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Skip to end of metadata

Go to start of metadata

...

Page Navigation:

Table of Contents
maxLevel2


(blue star) Confidence Level TBD  This article has not been reviewed for accuracy, timeliness, or completeness. Check that this information is valid before acting on it.

Highlights

  • Scale-up is done by an auto-scaling up metric alarm that checks the queue size.

  • Desirement to reduce auto-scaling AZ re-balancing which results in terminations. Therefore better to keep fleet balanced evenly across auto-scaling zones (AZ).

...

  • When using spot fleet, can ensure scale out in multiples of number of AZs.

  • Set scaling policy per ASG

    • example: If alarm threshold is greater than 1 for greater than 60 seconds

      • Add 1 instance when JobsWaiting-grfn-job_worker-large is [1,10)

      • Add 10 instances when JobsWaiting-grfn-job_worker-large is [10,∞)

Optimization

...

  • Auto-scaling optimizations

...

    • Currently public batch size 20-instance per 5-minute cool-down

...

    • Cool-down default 300-seconds

...

  • Default internal batch rate of 10-instances per 30-seconds

  • AWS ASG will increase our max batch rate to 100-instances per 30-seconds

...

  • Could manually set desired group size to 100

...

  • Logs indicate our queue_size metric alarm only firing every 10-minutes.

...

  • Recommendations

    • change cool-down to 1-minute

    • try batch size of 100 instances

    • set custom queue_size metric to check every 1-minute CloudWatch

  • if make these recommended changes, then estimate 1000 instances will take 55-minutes to ramp up

...

Self-

...

Termination

...

...

  • ?Command for ASG to turn off AZ rebalancing

...

...


>>> import boto.ec2.cloudwatch

>>> c = boto.ec2.cloudwatch.connect_to_region('us-west-2')

>>> metrics = c.list_metrics()

>>> metrics

[Metric:DiskReadBytes,

...

 Metric:CPUUtilization,

...

 Metric:DiskWriteOps,

...

 Metric:DiskWriteOps,

...

 Metric:DiskReadOps,

...

 Metric:DiskReadBytes,

...

 Metric:DiskReadOps,

...

 Metric:CPUUtilization,

...

 Metric:DiskWriteOps,

...

 Metric:NetworkIn,

...

 Metric:NetworkOut,

...

 Metric:NetworkIn,

...

 Metric:DiskReadBytes,

...

 Metric:DiskWriteBytes,

...

 Metric:DiskWriteBytes,

...

 Metric:NetworkIn,

...

 Metric:NetworkIn,

...

 Metric:NetworkOut,

...

 Metric:NetworkOut,

...

 Metric:DiskReadOps,

...

 Metric:CPUUtilization,

...

 Metric:DiskReadOps,

...

 Metric:CPUUtilization,

...

 Metric:DiskWriteBytes,

...

 Metric:DiskWriteBytes,

...

 Metric:DiskReadBytes,

...

 Metric:NetworkOut,

...

(lightbulb) Have Questions? Ask a HySDS Developer:

Anyone can join our public Slack channelto learn more about HySDS. JPL employees can join #HySDS-Community

(blue star)

JPLers can also ask HySDS questions atStack Overflow Enterprise

(blue star)

Live Search
placeholderSearch HySDS Wiki