Page Navigation:

Table of Contents

maxLevel	2

Confidence Level TBD This article has not been reviewed for accuracy, timeliness, or completeness. Check that this information is valid before acting on it.

Highlights

Scale-up is done by an auto-scaling up metric alarm that checks the queue size.
Desirement to reduce auto-scaling AZ re-balancing which results in terminations. Therefore better to keep fleet balanced evenly across auto-scaling zones (AZ).
When using spot fleet, can ensure scale out in multiples of number of AZs.
Set scaling policy per ASG
- example: If alarm threshold is greater than 1 for greater than 60 seconds
  - Add 1 instance when JobsWaiting-grfn-job_worker-large is [1,10)
  - Add 10 instances when JobsWaiting-grfn-job_worker-large is [10,∞)

Optimization

Auto-scaling optimizations
- Currently public batch size 20-instance per 5-minute cool-down
- Cool-down default 300-seconds
Default internal batch rate of 10-instances per 30-seconds
AWS ASG will increase our max batch rate to 100-instances per 30-seconds
Could manually set desired group size to 100
Logs indicate our queue_size metric alarm only firing every 10-minutes.
Recommendations
- change cool-down to 1-minute
- try batch size of 100 instances
- set custom queue_size metric to check every 1-minute CloudWatch
if make these recommended changes, then estimate 1000 instances will take 55-minutes to ramp up

Self-Termination

Need to suspend auto-scaling group AZ for load balancing scale down
http://docs.aws.amazon.com/AutoScaling/latest/DeveloperGuide/US_SuspendResume.html
?Command for ASG to turn off AZ rebalancing
This only needs to be ran once per ASG and will show up in the details tab of the ASG

References

AWS Autoscaling

http://boto.readthedocs.org/en/latest/autoscale_tut.html

AWS CloudWatch

boto: Python interface to Amazon Web Services

>>> import boto.ec2.cloudwatch

>>> c = boto.ec2.cloudwatch.connect_to_region('us-west-2')

>>> metrics = c.list_metrics()

>>> metrics

[Metric:DiskReadBytes,

Metric:CPUUtilization,

Metric:DiskWriteOps,

Metric:DiskReadOps,

Metric:DiskReadBytes,

Metric:DiskReadOps,

Metric:CPUUtilization,

Metric:DiskWriteOps,

Metric:NetworkIn,

Metric:NetworkOut,

Metric:NetworkIn,

Metric:DiskReadBytes,

Metric:DiskWriteBytes,

Metric:NetworkIn,

Metric:NetworkOut,

Metric:DiskReadOps,

Metric:CPUUtilization,

Metric:DiskReadOps,

Metric:CPUUtilization,

Metric:DiskWriteBytes,

Metric:DiskReadBytes,

Metric:NetworkOut,

Metric:DiskWriteOps]

Code

http://gitlab:8000/browser/trunk/HySDS/cluster_fab/aria-jobs-dev/test_autoscale.py

📖 Related Articles:

Filter by label (Content by label)

showLabels	false
max	12
sort	title
showSpace	false
cql	label = "aws"

Have Questions? Ask a HySDS Developer:

Anyone can join our public Slack channelto learn more about HySDS. JPL employees can join #HySDS-Community

JPLers can also ask HySDS questions atStack Overflow Enterprise

Live Search

placeholder	Search HySDS Wiki

🚀 Page Information:

Was this page useful?

Yes No

Contribution History:

Contributors

mode	list
showLastTime	true
order	update

Subject Matter Expert:

Gerald Manipon

Hook Hua

Find an Error?

Is this document outdated or inaccurate? Please contact the assigned Subject Matter ExpertPage Maintainer:

Hook Hua Gerald Manipon

Versions Compared

Old Version 5

New Version Current

Key

Highlights

Optimization

Self-Termination

References

AWS Autoscaling

AWS CloudWatch

boto: Python interface to Amazon Web Services

Code

Page Comparison

Versions Compared

Old Version 5

New Version Current

Key

Highlights

Optimization

Self-Termination

References

AWS Autoscaling

AWS CloudWatch

boto: Python interface to Amazon Web Services

Code