Page Navigation: |
---|
Confidence Level Moderate This article includes input from several JPLers. Multiple subject matter experts can indicate that a page is more frequently reviewed and updated. |
---|
This page outlines some of the key concepts of HySDS, and links to related resources to use while onboarding with HySDS.
see also: HySDS Intro: Reference Materials and Getting Started for related pages (review for consolidation)
see also: HySDS Intro: Reference Materials and Getting Started for related pages (review for consolidation)
HySDS is a custom hybrid compute system. It is a science data system designed to automate bulk processing of large volumes of data.
HySDS supports various NASA projects, including: Advanced Rapid Imaging and Analysis (ARIA), Surface Water and Ocean Topography (SWOT), Soil Moisture Active Passive (SMAP), and NASA-ISRO SAR Mission (NISAR).
HySDS can run on a variety of compute resources, including Amazon AWS, Microsoft Azure, Google Cloud, and NASA’s Pleiades supercomputer.
Operators mainly use Mozart (job management), GRQ (job data), Tosca, and Figaro. |
Glossary of Common HySDS Terms
HySDS Level 1 Component Overview
HySDS Level 2 Component Overview
Glossary of HySDS specific & Third Party Components
The HySDS GUI for submitting jobs for data processing
The HySDS GUI for tracking and managing submitted jobs
Functionality enabling targeted searches based on job parameters
One-time job processing options for bulk data processing
Automated job processing based on custom pre-configured conditions
Lightweight jobs are common across all HySDS adaptations. They allow easy bulk processing of data via frequently used job tasks.
Third-party software managing HySDS job queues. Operators utilize RabbitMQ for troubleshooting; it serves as an ultimate source of truth for job status.
Repository for much of the HySDS state information
https://wiki.jpl.nasa.gov/display/S6MS/Loading+Data+Into+ElasticSearch
https://wiki.jpl.nasa.gov/display/NISARSDS/Elasticsearch+Service
Regions are geographically distinct groupings of AWS resources. Most AWS resources are region-specific, including S3, EC2, and ASG. Availability zones consist of a physical data center; each region has at least 3 availability zones.
HySDS uses AWS S3 object storage for its data storage requirements.
Verdi and Factotum worker nodes use EC2 instances to provide scalable computing resources. Amazon offers on-demand, spot, and reserved pricing options.
HySDS uses both on-demand and spot instances.
Auto Scaling Groups are resizable clusters of EC2 instances used for large-scale data processing.
HySDS uses customized AMIs that are shared across JPL missions. Experienced HySDS users should confirm AMI compatibility with the configuration of HySDS their using.
HySDS monitors Mozart metrics via CloudWatch
Controls scaling behavior of the HySDS ASGs
Python library that allows you to programmatically communicate with AWS. HySDS devs use this
Allows users to interact with Amazon via the command line.
The main components: GRQ, Factotum, Docker
HySDS uses Docker containers to run Product Generating Executor’s (PGE’s). An introduction to Docker.
Depends on the actual role, for example:
Software dev → Docker/PGE
Ops → How to SSH into the machines