Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Dataset Specification:

This page details how to author a new dataset type. In order to for HySDS to recognize a dataset, the dataset must follow certain conventions. These conventions documented on this page must be implemented by the PGE or the PGE wrapper. The dataset conventions include:

Dataset ID:

Each product should have a dataset ID. This name is used to determine the type of the dataset and name all the important files for the dataset. A dataset ID is matched against entries found in the <code>datasets.json</code> file to determine its type.<br />

In this example, we shall use the dataset ID <code>dumby-product-20170101T000000Z-3lx0a</code>.




Markdown
highlightfalse
highlightStylegithub
=
Dataset Specification (how to author a new dataset type) =:

In order to for HySDS to recognize a dataset, the dataset must follow certain conventions. These conventions are documented on this page and must be implemented by the PGE or the PGE wrapper.

== Dataset ID ==

Each product should have a dataset ID. This name is used to determine the type of the dataset and name all the important files for the dataset. A dataset ID is matched against entries found in the <code>datasets.json</code> file to determine its type.<br />
In this example, we shall use the dataset ID <code>dumby-product-20170101T000000Z-3lx0a</code>.

== Directory ==

Any directory containing the below JSON files and found within the working directory supplied to the PGE is considered a dataset. Thus this directory must be named with the dataset's ID (see above):

<pre>$ pwd
/data/work/example_work_dir/dumby-product-20170101T000000Z-3lx0a
$ ls
dumby-product-20170101T000000Z-3lx0a.dataset.json
dumby-product-20170101T000000Z-3lx0a.met.json
dumby-product-20170101T000000Z-3lx0a.prov_es.json
dumby-product-20170101T000000Z-3lx0a.h5
pge_output_2.h5
errors.txt
other_metadata.xml
</pre>
''Note that any other PGE data files should be placed in the <Dataset ID> directory, as the whole directory is the dataset.''

== HySDS dataset and metadata JSON files ==

=== dataset JSON file ===

A product must produce a <Dataset ID>.dataset.json in the <Dataset ID> directory. This file contains JSON formatted metadata representing the cataloged dataset metadata:

<pre>$ cat dumby-product-20170101T000000Z-3lx0a.dataset.json
 {
  "version": "v1.0",
  "label": "dumby product for 2017-01-01T00:00:00Z",
  "location": {
    "type": "polygon",
    "coordinates": [
      [
        [-122.9059682940358,40.47090915967475],
        [-121.6679748715316,37.84406528996276],
        [-120.7310161872557,38.28728069813177],
        [-121.7043611684245,39.94137004454238],
        [-121.9536916840953,40.67097860759095],
        [-122.3100379696548,40.7267890636145],
        [-122.7640648263371,40.5457010812299],
        [-122.9059682940358,40.47090915967475]
      ]
    ]
  },
  "starttime": "2017-01-01T00:00:00",
  "endtime": "2017-01-01T00:05:00"
}
</pre>
The required fields are:

* <code>version</code>

The optional fields are:

* <code>label</code>
* <code>location</code> (in GeoJSON format)
* <code>starttime</code>
* <code>endtime</code>

=== metadata JSON file ===

In addition, other metadata data can be added to a <Dataset ID>.met.json in the <Dataset ID> directory. As long as the file conforms to the JSON format, the dataset developer has free reign on what goes into this file:

<pre>$ cat dumby-product-20170101T000000Z-3lx0a.met.json
{
  "startingRange": 800026.4431219272,
  "sensor": "SAR-C Sentinel1",
  "esd_threshold": 0.85,
  "tiles": true,
  "reference": true,
  "trackNumber": 144,
  "lookDirection": "right",
  "beamMode": "IW",
  "direction": "descending",
  "inputFile": "sentinel.ini",
  "polarization": "VV",
  "imageCorners": {
    "maxLon": -117.56055555555555,
    "minLon": -119.06166666666667,
    "minLat"
</pre>

...