HySDS APIs
Page Navigation: |
---|
Confidence Level TBD This article has not been reviewed for accuracy, timeliness, or completeness. Check that this information is valid before acting on it. |
---|
Introduction
HySDS has a number of APIs that are designed to provide a way for external users to programmatically access HySDS data and job information. The APIs are also meant to throttle or jitter requests to avoid hammering the backend servers and datastores.
As we evolve HySDS, the most important role that APIs may play is to throttle or jitter the requests from HySDS itself, which may hammer our backend services.
This is an initial attempt to compile information regarding the existing APIs and come upon best practices and design for new APIs or evolution of existing APIs.
We tried to build on top of Swagger UI
API | Purpose | version | Cognizant developer/expert user | Documentation | Application scenarios |
---|---|---|---|---|---|
Pele | External users access to datasets | Gerald Manipon, Namrata Malarout | Osiris (Urgent Response UI). Future: science users, ops | ||
Mozart | External users ability to query Mozart for job info or to submit jobs to Mozart over REST Enable CI to register jobs and manage containers | Gerald Manipon, Mohammed Karim, Justin Linick | ASF uses it to submit on-demand jobs It is a way to submit jobs when running HySDS standalone scripts on-premise. CI machine uses it to register Jobs, Mozart actions, and manage containers in cluster AWS lambdas to trigger job submission – currently used in GRFN for ASF delivery. Could be used for scheduled lambdas instead of cronjobs on factotum | ||
GRQ | Provide ability to manage hysds-io on GRQ or add a new dataset index to GRQ | Gerald Manipon, Mohammed Karim | This API was meant to be the GRQ equivalent of the Mozart API CI machine uses it to manage GRQ ACTIONS | ||
Picasso | Power's SMAP's UI | Sujen Shah | SMAP User Interface (currently mission-specific). Could this be refactored to use HySDS core APIs? It currently talks to both Mozart and GRQ and mission-specific database | ||
Any others? | Manage/talk/get status of cloud resources? | ||||
OGC WPS | OGC WPS (xml serialization) front-end to Mozart | Namrata Malarout | Joint ESA-NASA MAAP | ||
Going forward
Even within the SDS, we find a need to have an API or middleware layer to communicate between components, to avoid having every developer come up with their own way to talk to components or the underlying services.
Longterm reward to investing in API now
extracting best practices from existing code that can be reused in a generic way
Then refactoring existing code to call API instead of using custom code
Questions
how do you separate traffic from external users from internal SDS machinery?
All internal SDS machinery calls would be done as the special "ops" user.
In HySDS core -- there is the "ops" super-user. Everyone else has limited access, based off explicitly granting them access.
how do we get requirements from Ops and science users.
Should everything be a REST API?
We use boto python library to talk to Amazon
Where do we go with SDS Watch
* should be a product. Should interface via API. May need to interact with Amazon.
Suggestion:
abstract the backend --
Uniform way of dealing with common issues and scenarios
jitter and rate limits to avoid hammering backend service
common place to deal with anomalous service response in consistent manne
distinguish from "no results returned" and "service unavailable"
automatic retry with exponential backoff
we can write out JSON query...but parsing ElasticSearch results should be abstracted and handled by API.
We can catch bad ES responses via API
We could sanitize JSON query against bad requests, but otherwise let the query through.
will make it easier for upgrading or swapping backend services.
encapsulate best practices in a common place
No elasticsearch code entangled with HySDS and PGE codebase
Makes it easier for end-users to develop tools and for development of PGEs.
ES abstraction layer should have the following characteristics (build upon query-util?)
(1) jittering, with automated & finite exponential backoff to backend (ES)
(2) rate limits on API for throttling to backend (ES)
(3) reuse connections to backend (ES) to improve performance
for public APIs
register with email and password
help identify our users for announcements, or directed emails.
results depend on user access control, i.e. what should be visible to them.
programmatic way to refresh token
require authentication/token
require time limits on tokens.
assign limits per user
ability to throttle or limit bad end-users, so they don't cause denial of service to others.
versioning of APIs to have path to updated APIs, deprecating old verisons
collect metrics regarding API use
versioning
which actions are called the most
performance analysis
helpful for developers to debug.
Also helpful to provide documentation to end user.
additional requirements
swagger-ui
Pele Links and Examples
Code can be found in hysds/pele
Pele - enables us to query datasets -- clean up the terminology. -- Gerald may be documenting this in SDD
Was developed mainly for Osiris (urgent response UI), based on what Namrata needed during development.
Currently, main user is Osiris (urgent response UI)
Future users -- science users who want to pull data programmatically, ops reporting
Functionality
dataset types
datasets
list datasets of a certain type. (are results paginated?)
get ids by dataset (so we can pull out more specific info later)
get ids by type
get dataset by id
query for certain fields for specified type and dataset.
list datasets that overlap temporally (by ID) or spatially
Used to grab all acquisitions or product types that match AOI spatially and temporal extent
list overlapping datasets of particular types
check for each type whether products exist or not
for example, only care about COD or LAR or SLCP
swagger api requires authentication
need to authenticate for post, but not gets (confirm with Gerald)
dataset and dataset type.
get datasettype based on dataset
Mozart Links and Examples
in lambdas on AWS that trigger job submissions -- mostly for ASF delivery. Will be using it for scheduled lambdas. (Instead of cronjob)
ASF has put jobs into our system
FEMA could've put processing jobs in our system
CI to register jobs (including job spec) into Mozart + GRQ
MyJobs????
Functionality
list queues
manage job specs -- add, list, remove, get job type ID. (given job type, returns the actual job spec as JSON)
manage jobs --- get info by ID, list submitted jobs, get status based on job id, submit a job
Namrata could name the job, but HySDS will append a timestamp to that name. (This is more common inside HySDS machinery when using hysds utils)
Wish for enhancement to allow job naming when submitting through REST API
Wish to query jobs by user
Wish to list all known users
manage containers -- used mainly by CI
hysds_io - manage only Mozart actions
event - gives system ability to publish events to log anomalies like spot termination, etc.
GRQ Links and Examples
used by CI
Functionality
register jobs and actions to GRQ
SMAP-API Links and Examples
SMAP-API
GitHub-FN
Functionality
get product by id
get job by id
get docs in an index
list all half orbit statuses using specific metadata
list all half orbits
Related Articles: |
---|
Have Questions? Ask a HySDS Developer: |
Anyone can join our public Slack channel to learn more about HySDS. JPL employees can join #HySDS-Community
|
JPLers can also ask HySDS questions at Stack Overflow Enterprise
|
Page Information: |
---|
Was this page useful? |
Contribution History:
|
Subject Matter Expert: @Dustin Lo @Hook Hua @Gerald Manipon |
Find an Error? Is this document outdated or inaccurate? Please contact the assigned Page Maintainer: @Hook Hua |