2017-08-31 Death Valley for HySDS v2

(The following was documented on https://github.jpl.nasa.gov/hysds-org/general/issues/456)

Use spot fleet instead of Auto Scaling to also test harikiri at scale.

Depends on “verdi event stream on figaro” https://github.jpl.nasa.gov/hysds-org/general/milestone/55

New features tested:

  • job drain
  • no-clobber dataset publishing from verdi
  • stability checks on compute instances
  • docker daemon
  • event stream back to mozart
  • spot fleet


at around 2000 active worker nodes

  • mozart on r4.8xlarge
  • cpu around 20%
  • network in: 40MB/s
  • network out: 20MB/s

screen shot 2017-08-31 at 2 53 21 pm

  • metrics on r4.4xlarge
  • cpu around 2%
  • network in: 4MB/s
  • network out: 1MB/s

metrics 2017-08-31 at 2 59 35 pm

  • grq on r4.4xlarge
  • cpu around 70%
  • network in: 4MB/s
  • network out: 2MB/s

grq 2017-08-31 at 3 03 00 pm

  • factotum on r4.4xlarge
  • cpu around 1%
  • network in: 1MB/s
  • network out: 2MB/s

factotum 2017-08-31 at 3 01 24 pm


Successfully tested using the following trinity mode configuration:

  • mozart (rabbitmq node) => r4.8xlarge
  • mozart (ES node) => r4.8xlarge
  • mozart (redis node) => r4.8xlarge
  • grq => r4.8xlarge
  • factoturm => r4.4xlarge
  • ci => r4.xlarge

dv-mozart-trinity-network_out-blue_rabbitmq-orange_es-green_redis


dv-mozart-trinity-network_in-blue_rabbitmq-orange_es-green_redis

dv-mozart-trinity-cpu_utilization-blue_rabbitmq-orange_es-green_redis

metrics node:


dv-metrics-cpu_utilization

dv-metrics-network_out

dv-metrics-network_in


grq node:


dv-grq-cpu_utilization

dv-grq-network_out

dv-grq-network_in


Note: JPL employees can also get answers to HySDS questions at Stack Overflow Enterprise: