2018-01-31 bulk reprocessing 50Gbps in trinity mode

bulk reprocessing prep for KACST (and GRFN 50Gbps metrics): take-2 in trinity mode

Goal is to load test at NISAR-scales

  • Data Rates and Volumes
    • Forward “keep-up” processing: 1X at 86TB per day
    • Bulk processing: 4X at 344TB per day
    • Concurrent load: 5X at 430TB per day
  • Processed rate
    • Data delivery rate to DAAC
  • SDS-DAAC Network
    • SDS baseline currently in AWS Oregon region
    • NGAP currently in AWS Virginia region

Cloud-scale Cost estimate

  • Previously spiked at 430TB per day at 8,200 concurrent compute instances
    • With current c5.9xlarge spot pricing around $0.55/hr, need about $4600 per hour, $110K per day in spot
  • But note NISAR baseline is GPU-enabled

Bulk Processing Test Sample

  • 9-hour production test
  • 31K jobs
  • 27K x L2 data products generated (~48.4TB)
  • Spiked at NISAR 5X data production rate 40Gbps
    • About ~170 x S1-IFG L2 data products processed per minute
  • Sustained at NISAR 2.5X


Peaked at ~8200 concurrent compute instances

The following shows concurrent number of compute nodes over time. It peaked at NISAR 5X rate at 8200+ concurrent compute nodes in fleet.

In the middle, the system nominally auto-scaled to NSAR 2.5X rate.

 

Compute diversification

To achieve this total number of compute instances, a diversification of compute instances had to be used. Both Auto Scaling + Spot Fleet were used to spread compute across a diversity of these ec2 instance types:

Name

API Name

Memory

vCPUs

Instance Storage

Network

On-demand

Spot (us-west-2)

C5 High-CPU 9xlarge

c5.9xlarge

72.0 GiB

36 vCPUs

EBS only

10 Gigabit

$1.530 hourly

$0.55 hourly

C5 High-CPU Quadruple Extra Large

c5.4xlarge

32.0 GiB

16 vCPUs

EBS only

Up to 10 Gigabit

$0.68 hourly

$0.33 hourly

C3 High-CPU Quadruple Extra Large

c3.4xlarge

30.0 GiB

16 vCPUs

320 GiB (2 * 160 GiB SSD)  

High

$0.840 hourly

$0.25 hourly

C3 High-CPU Eight Extra Large

c3.8xlarge

60.0 GiB

32 vCPUs

640 GiB (2 * 320 GiB SSD)

10 Gigabit

$1.680 hourly

$0.49 hourly

I3 High I/O Quadruple Extra Large

i3.4xlarge

122.0 GiB

16 vCPUs

3800 GiB (2 * 1900 GiB NVMe SSD)

Up to 10 Gigabit

$1.248 hourly

$0.50 hourly

I3 High I/O Extra Large

i3.xlarge

30.5 GiB

4 vCPUs

950 GiB NVMe SSD

Up to 10 Gigabit

$0.312 hourly

$0.10 hourly

The AWS console during peak and showing diversity of fleet:

EC2-S3 performance

S3 to c5.9xlarge instance performance showed 10Gbps network to S3. This is increased in performance trends between *.8x+ to S3 from prior years.

The following shows plot of download rates from S3 onto c5.9xlarge when staging input data for each compute job.


Note: JPL employees can also get answers to HySDS questions at Stack Overflow Enterprise: