Stack Overflow Q&A

How do you backup your Elasticsearch for your HySDS system?

Context:

I know that HySDS persists most of its state in Elasticsearch. How can I regularly backup my Elasticsearch indices?

Answer:

In case of catastrophic failure, you would be more likely to backup GRQ and Metrics Elasticsearch indices as your products and Metrics are long-term data stores. It can be problematic to backup Mozart ES because Mozart is supposed to be a snapshot of the processing state of the system. In a catastrophic failure, the snapshot that Mozart provides will be completely out-of-sync with reality and restoring it could cause redundant processing and incomplete knowledge of system state.

This answer is to backup GRQ on a HySDS-hosted Elasticsearch service with an AWS backend.

The answer is via Sujen Shah on Slack:

For snapshot, you need two things, which you can initiate on the GRQ server on your HySDS cluster.

1 - Snapshot repository To register repository

curl --location --request PUT 'http://localhost:9200/_snapshot/s3_backup' \
--header 'Content-Type: application/json' \
--header 'Content-Type: text/plain' \
--data '{
  "type": "s3",
  "settings": {
    "bucket": "<YOUR_DATASET_BUCKET>",
    "region": "us-west-2",
    "base_path": "es_snapshot"
  }
}'

2 - A Snapshot

$ curl --location --request PUT 'http://localhost:9200/_snapshot/s3_backup/%3Csnapshot-%7Bnow%2Fd%7D%3E'

If you are using a tutorial and other options, you should know:

General recommendation, do not use wait_for_completion=true as that could timeout your client when the indices grow large.
Also better to use timestamps in the name of the snapshot

Note: You may want to be selective about what indices you are restoring. For example, if you wanted to omit the following (some of which are adaptation-specific):

user_rules
user_runs_history
versions
product_counter
product_accountability
hysds_ios
orbits
orbits_status
*_triaged_job
grq_q16511*

You would do this:

curl -X POST "localhost:9200/_snapshot/my_backup/snapshot_1/_restore?pretty" -H 'Content-Type: application/json' -d'
{
  "indices": "-user_rules,-user_runs_history,-versions,-product_counter,-product_accountability,-hysds_ios,-orbits,-orbits_status,-*_triaged_job,-grq_q16511*"
}
'

Also, before you restore and ES, you need to close all indices. https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-close.html

POST /*/_close

How do you deal with offline jobs in HySDS?

Answer:

In Figaro, there is a category of job state called "Offline." What is the operator supposed to do with these offlien jobs?perator supposed to do with Offline jobs?

How do I update datasets.json on factotum?

Context:

As a developer, I need to update datasets.json on factotum, so it knows about my new product type. But my changes keep getting overwritten?

How do I properly update datasets.json?

Answer:

It's my understanding that you just add your entry to ~/.sds/files, and then send the update out. If it only applies to factotum, then you only really need to update factotum

10:53 AM if your datasets.json impact the verdi workers in autoscaling groups, then you will need to run the sds ship, so that it remakes all the code bundles

Update ~/.sds/files/datsets.json Run sds update factotum Login to factotum and make sure ~/verdi/etc/datasets.json was updated the way you thought it should be.

Whether you stop or start factotum again depends on if you need to restart processes to reread the new datasets.json

Summary of commands:

sds stop mozart -f
sds status mozart
sds update mozart -f
sds ship # to workers
sds start mozart

What is an valid datasets.json definition in HySDS which does not publish anything to S3?

Context:

To avoid no-clobber job errors when ingesting overlapping datasets, it has been suggested that json-only datasets can be set to avoid publishing anything to S3 and instead rely on (overwritable) ElasticSearch for storage of the json data product.

The documentation does not make it clear which values (under the "publish" and "browse" attributes of the dataset definition) are optional and may be omitted, what valid null values for these attributes are, or whether the "publish" and "browse" attributes may be omitted entirely.

Answer:

Omission of the "publish" and "browse" attributes is valid, and the suggested way of accomplishing this.

Is there a way to enable clobbering for a particular job type in HySDS?

Context:

Via Alex Dunn regarding a HySDS pge he maintains:

Due to how the PGE runs, there will be an overlap in the products it creates with products that are already in the system. In this case, the PGE will error out with a no-clobber error instead of completing.

Is there a way to enable clobbering for a particular job type or what have you?

I can’t see anything in the job-spec specs.

Answer:

This is not an answer to the question, but a bit of background as to where the no-clobber error message comes from.

Osaka is the HySDS module that handles dataset localization and publishing; it abstracts the various backends that HySDS can write to like s3, azure, etc. By default, it is designed not to overwrite a dataset that already exists at the location. This behavior is controlled by the no-clobber parameter, which is default set to True.

The complication regarding the question above is that dataset publication is usually not handled directly by the PGE, but by the verdi worker as a post-processing step after the PGE executes. Is there a way to communicate to the verdi worker that for this PGE, we do want the no-clobber parameter to be set to False? Or is the only way to override this behavior to avoid using the verdi automatic publication step and have the PGE handle the dataset publication?

How do I remove a pge from the SDS?

Context:

I am working with a HySDS system. I am trying to remove a PGE that was installed on this HySDS system.

I tried using

sds ci remove https://github.com/aria-jpl/create_aoi_track

But it failed with an error involving jenkins. How do I fix it?

Answer:

Per Alex Torres:

If you want to deregister the PGE from Jenkins --> use sds ci remove_job

If you want to remove the package (tarball) from Mozart --> use sds pkg rm

you can find a list of packages using sds pkg ls

To answer the follow up question about how to re-add a package after removing it, Mohammed Karim added:

for sds pkg import to work, you need to have the tar file created by jenkins with you. It will upload it to appropriate s3 bucket. you can get that tar by running sds pkg export. We generally use these to move a container build in one cluster to another

say you have build it in 'a' cluster jenkins. after building, you can download it in 'a' cluster's mozart using 'sds pkg export'. Now you can move it to another cluster's mozart and run sds pkg import to get that container in that cluster

sds ci generally use to add/build/remove a job to jenkins.