Step 8: (AWS) Create Resources for Data Staging Area
Confidence Level TBD This article has not been reviewed for accuracy, timeliness, or completeness. Check that this information is valid before acting on it. |
---|
Setting up an AWS Resources for Staging Area
For definitions of terminology used, please refer to our terminology reference.
In this tutorial, we use the sdscli
to create a staging area in your dataset bucket. The staging area will be a prefix ("directory") at the top level of your dataset bucket where files/directories can be deposited to trigger arbitrary HySDS jobs. As an example, assume in our .sds/config
we've set DATASET_BUCKET
to gman-swot-dataset-bucket
. By default, the sdscli cloud storage create_staging_area
command uses staging_area
as the S3 key prefix to use but that can be specified using the --prefix
option. A file can be deposited to s3://gman-swot-dataset-bucket/staging_area/
, for example:
aws s3 cp LC08_L1TP_149039_20170411_20170415_01_T1_B2.TIF s3://gman-swot-dataset-bucket/staging_area/
Other files or directories can be staged but nothing is triggered until the signal file is deposited. By default the sdscli cloud storage create_staging_area
command uses .met.json
as the S3 key suffix to use but that can be specified using the --suffix
option. The contents of the signal file should ccontain information that will feed into the arbitrary HySDS job to be triggered and the filename format of the signal should be the file or directory that was staged with the suffix at the end. Continuing our example, we deposit the signal file:
echo '{ "id": "LC08_L1TP_149039_20170411_20170415_01_T1_B2", "geolocation": [ 1, 2, 3 ,4 ], "whatever": "okay" }' > LC08_L1TP_149039_20170411_20170415_01_T1_B2.TIF.met.json
aws s3 cp LC08_L1TP_149039_20170411_20170415_01_T1_B2.TIF.met.json s3://gman-swot-dataset-bucket/staging_area/
Upon the s3:ObjectCreated:*
event of the s3://gman-swot-dataset-bucket/staging_area/LC08_L1TP_149039_20170411_20170415_01_T1_B2.TIF.met.json
, an S3 event notification is sent to a pre-configured SNS topic. The subscriber of the SNS topic is a pre-configured Lambda function that will read in the contents of the S3 event payload (which includes the S3 path of the signal file that was deposited) and the contents of the signal file and submit them to an arbitrary HySDS job. This job is arbirtrary but usually in an SDS the job should extract metadata from the staged data, create a formal HySDS dataset, ingest it and finally clean it out from the staging area.
We use the sdscli
to create all the AWS resources needed to setup our staging area. You'll need the following information beforehand to facilitate provisioning:
AWS security groups to use for lambda execution
AWS role to use for lambda execution
HySDS job type to submit data staged event to, e.g. INGEST_L0A_LR_RAW
release version of the HySDS job type, e.g. release-20180327
HySDS queue name to submit the job to, e.g. factotum-job_worker-small
We assume that you have an AWS account, have configured your HySDS cluster in EC2 and have completed cluster setup per Cluster Setup.
Log into
mozart
instance and source the mozart virtual environment:source ~/mozart/bin/activate
Run
sdscli
command to create staging area:The command will prompt you for information needed to create all of the resources:
Navigate to your AWS dashboard and check that the S3 event on your dataset bucket was created:
Check that the SNS topic was created:
Finally check that the lambda function was created:
That's it!
Congratulations, you've created a staging area for your HySDS cluster.
Related Articles: |
---|
Have Questions? Ask a HySDS Developer: |
Anyone can join our public Slack channel to learn more about HySDS. JPL employees can join #HySDS-Community
|
JPLers can also ask HySDS questions at Stack Overflow Enterprise
|
Page Information: |
---|
Was this page useful? |
Contribution History:
|
Subject Matter Expert: @Gerald Manipon @Hook Hua |
Find an Error? Is this document outdated or inaccurate? Please contact the assigned Page Maintainer: @Lan Dang |