bulk reprocessing prep for KACST (and GRFN 50Gbps metrics): take-2 in trinity mode

Goal is to load test at NISAR-scales

Data Rates and Volumes
- Forward “keep-up” processing: 1X at 86TB per day
- Bulk processing: 4X at 344TB per day
- Concurrent load: 5X at 430TB per day
Processed rate
- Data delivery rate to DAAC
SDS-DAAC Network
- SDS baseline currently in AWS Oregon region
- NGAP currently in AWS Virginia region

Cloud-scale Cost estimate

Previously spiked at 430TB per day at 8,200 concurrent compute instances
- With current c5.9xlarge spot pricing around $0.55/hr, need about $4600 per hour, $110K per day in spot
But note NISAR baseline is GPU-enabled

Bulk Processing Test Sample

9-hour production test
31K jobs
27K x L2 data products generated (~48.4TB)
Spiked at NISAR 5X data production rate 40Gbps
- About ~170 x S1-IFG L2 data products processed per minute
Sustained at NISAR 2.5X

Peaked at ~8200 concurrent compute instances

The following shows concurrent number of compute nodes over time. It peaked at NISAR 5X rate at 8200+ concurrent compute nodes in fleet.

In the middle, the system nominally auto-scaled to NSAR 2.5X rate.

Compute diversification

To achieve this total number of compute instances, a diversification of compute instances had to be used. Both Auto Scaling + Spot Fleet were used to spread compute across a diversity of these ec2 instance types:

Name	API Name	Memory	vCPUs	Instance Storage	Network	On-demand	Spot (us-west-2)
C5 High-CPU 9xlarge	c5.9xlarge	72.0 GiB	36 vCPUs	EBS only	10 Gigabit	$1.530 hourly	$0.55 hourly
C5 High-CPU Quadruple Extra Large	c5.4xlarge	32.0 GiB	16 vCPUs	EBS only	Up to 10 Gigabit	$0.68 hourly	$0.33 hourly
C3 High-CPU Quadruple Extra Large	c3.4xlarge	30.0 GiB	16 vCPUs	320 GiB (2 * 160 GiB SSD)	High	$0.840 hourly	$0.25 hourly
C3 High-CPU Eight Extra Large	c3.8xlarge	60.0 GiB	32 vCPUs	640 GiB (2 * 320 GiB SSD)	10 Gigabit	$1.680 hourly	$0.49 hourly
I3 High I/O Quadruple Extra Large	i3.4xlarge	122.0 GiB	16 vCPUs	3800 GiB (2 * 1900 GiB NVMe SSD)	Up to 10 Gigabit	$1.248 hourly	$0.50 hourly
I3 High I/O Extra Large	i3.xlarge	30.5 GiB	4 vCPUs	950 GiB NVMe SSD	Up to 10 Gigabit	$0.312 hourly	$0.10 hourly

The AWS console during peak and showing diversity of fleet:

EC2-S3 performance

S3 to c5.9xlarge instance performance showed 10Gbps network to S3. This is increased in performance trends between *.8x+ to S3 from prior years.

The following shows plot of download rates from S3 onto c5.9xlarge when staging input data for each compute job.

Browser not supported