/
2018-01-31 bulk reprocessing 50Gbps in trinity mode

2018-01-31 bulk reprocessing 50Gbps in trinity mode

bulk reprocessing prep for KACST (and GRFN 50Gbps metrics): take-2 in trinity mode

Goal is to load test at NISAR-scales

  • Data Rates and Volumes
    • Forward “keep-up” processing: 1X at 86TB per day
    • Bulk processing: 4X at 344TB per day
    • Concurrent load: 5X at 430TB per day
  • Processed rate
    • Data delivery rate to DAAC
  • SDS-DAAC Network
    • SDS baseline currently in AWS Oregon region
    • NGAP currently in AWS Virginia region

Cloud-scale Cost estimate

  • Previously spiked at 430TB per day at 8,200 concurrent compute instances
    • With current c5.9xlarge spot pricing around $0.55/hr, need about $4600 per hour, $110K per day in spot
  • But note NISAR baseline is GPU-enabled

Bulk Processing Test Sample

  • 9-hour production test
  • 31K jobs
  • 27K x L2 data products generated (~48.4TB)
  • Spiked at NISAR 5X data production rate 40Gbps
    • About ~170 x S1-IFG L2 data products processed per minute
  • Sustained at NISAR 2.5X


Peaked at ~8200 concurrent compute instances

The following shows concurrent number of compute nodes over time. It peaked at NISAR 5X rate at 8200+ concurrent compute nodes in fleet.

In the middle, the system nominally auto-scaled to NSAR 2.5X rate.

 

Compute diversification

To achieve this total number of compute instances, a diversification of compute instances had to be used. Both Auto Scaling + Spot Fleet were used to spread compute across a diversity of these ec2 instance types:

Name

API Name

Memory

vCPUs

Instance Storage

Network

On-demand

Spot (us-west-2)

C5 High-CPU 9xlarge

c5.9xlarge

72.0 GiB

36 vCPUs

EBS only

10 Gigabit

$1.530 hourly

$0.55 hourly

C5 High-CPU Quadruple Extra Large

c5.4xlarge

32.0 GiB

16 vCPUs

EBS only

Up to 10 Gigabit

$0.68 hourly

$0.33 hourly

C3 High-CPU Quadruple Extra Large

c3.4xlarge

30.0 GiB

16 vCPUs

320 GiB (2 * 160 GiB SSD)  

High

$0.840 hourly

$0.25 hourly

C3 High-CPU Eight Extra Large

c3.8xlarge

60.0 GiB

32 vCPUs

640 GiB (2 * 320 GiB SSD)

10 Gigabit

$1.680 hourly

$0.49 hourly

I3 High I/O Quadruple Extra Large

i3.4xlarge

122.0 GiB

16 vCPUs

3800 GiB (2 * 1900 GiB NVMe SSD)

Up to 10 Gigabit

$1.248 hourly

$0.50 hourly

I3 High I/O Extra Large

i3.xlarge

30.5 GiB

4 vCPUs

950 GiB NVMe SSD

Up to 10 Gigabit

$0.312 hourly

$0.10 hourly

The AWS console during peak and showing diversity of fleet:

EC2-S3 performance

S3 to c5.9xlarge instance performance showed 10Gbps network to S3. This is increased in performance trends between *.8x+ to S3 from prior years.

The following shows plot of download rates from S3 onto c5.9xlarge when staging input data for each compute job.


Related content

Trinity Mode for Larger Scales
Trinity Mode for Larger Scales
More like this
2017-08-31 Death Valley for HySDS v2
2017-08-31 Death Valley for HySDS v2
More like this
HySDS AWS Autoscaling
HySDS AWS Autoscaling
More like this
2016-10-28 HySDS v2 large scale 1M dumby-landsat test run
2016-10-28 HySDS v2 large scale 1M dumby-landsat test run
More like this
SDSWatch Metrics
SDSWatch Metrics
More like this
Welcome to the HySDS Wiki
Welcome to the HySDS Wiki
More like this
Note: JPL employees can also get answers to HySDS questions at Stack Overflow Enterprise: