2018-01-31 bulk reprocessing 50Gbps in trinity mode
bulk reprocessing prep for KACST (and GRFN 50Gbps metrics): take-2 in trinity mode
Goal is to load test at NISAR-scales
- Data Rates and Volumes
- Forward “keep-up” processing: 1X at 86TB per day
- Bulk processing: 4X at 344TB per day
- Concurrent load: 5X at 430TB per day
- Processed rate
- Data delivery rate to DAAC
- SDS-DAAC Network
- SDS baseline currently in AWS Oregon region
- NGAP currently in AWS Virginia region
Cloud-scale Cost estimate
- Previously spiked at 430TB per day at 8,200 concurrent compute instances
- With current c5.9xlarge spot pricing around $0.55/hr, need about $4600 per hour, $110K per day in spot
- But note NISAR baseline is GPU-enabled
Bulk Processing Test Sample
- 9-hour production test
- 31K jobs
- 27K x L2 data products generated (~48.4TB)
- Spiked at NISAR 5X data production rate 40Gbps
- About ~170 x S1-IFG L2 data products processed per minute
- Sustained at NISAR 2.5X
Peaked at ~8200 concurrent compute instances
The following shows concurrent number of compute nodes over time. It peaked at NISAR 5X rate at 8200+ concurrent compute nodes in fleet.
In the middle, the system nominally auto-scaled to NSAR 2.5X rate.
Compute diversification
To achieve this total number of compute instances, a diversification of compute instances had to be used. Both Auto Scaling + Spot Fleet were used to spread compute across a diversity of these ec2 instance types:
Name | API Name | Memory | vCPUs | Instance Storage | Network | On-demand | Spot (us-west-2) |
C5 High-CPU 9xlarge | c5.9xlarge | 72.0 GiB | 36 vCPUs | EBS only | 10 Gigabit | $1.530 hourly | $0.55 hourly |
C5 High-CPU Quadruple Extra Large | c5.4xlarge | 32.0 GiB | 16 vCPUs | EBS only | Up to 10 Gigabit | $0.68 hourly | $0.33 hourly |
C3 High-CPU Quadruple Extra Large | c3.4xlarge | 30.0 GiB | 16 vCPUs | 320 GiB (2 * 160 GiB SSD) | High | $0.840 hourly | $0.25 hourly |
C3 High-CPU Eight Extra Large | c3.8xlarge | 60.0 GiB | 32 vCPUs | 640 GiB (2 * 320 GiB SSD) | 10 Gigabit | $1.680 hourly | $0.49 hourly |
I3 High I/O Quadruple Extra Large | i3.4xlarge | 122.0 GiB | 16 vCPUs | 3800 GiB (2 * 1900 GiB NVMe SSD) | Up to 10 Gigabit | $1.248 hourly | $0.50 hourly |
I3 High I/O Extra Large | i3.xlarge | 30.5 GiB | 4 vCPUs | 950 GiB NVMe SSD | Up to 10 Gigabit | $0.312 hourly | $0.10 hourly |
The AWS console during peak and showing diversity of fleet:
EC2-S3 performance
S3 to c5.9xlarge instance performance showed 10Gbps network to S3. This is increased in performance trends between *.8x+ to S3 from prior years.
The following shows plot of download rates from S3 onto c5.9xlarge when staging input data for each compute job.