Context:
I know that HySDS persists most of its state in Elasticsearch. How can I regularly backup my Elasticsearch indices?
Answer:
In case of catastrophic failure, you would be more likely to backup GRQ and Metrics Elasticsearch indices as your products and Metrics are long-term data stores. It can be problematic to backup Mozart ES because Mozart is supposed to be a snapshot of the processing state of the system. In a catastrophic failure, the snapshot that Mozart provides will be completely out-of-sync with reality and restoring it could cause redundant processing and incomplete knowledge of system state.
This answer is to backup GRQ on a HySDS-hosted Elasticsearch service with an AWS backend.
The answer is via Sujen Shah on Slack:
For snapshot, you need two things, which you can initiate on the GRQ server on your HySDS cluster.
1 - Snapshot repository To register repository
curl --location --request PUT 'http://localhost:9200/_snapshot/s3_backup' \ --header 'Content-Type: application/json' \ --header 'Content-Type: text/plain' \ --data '{ "type": "s3", "settings": { "bucket": "<YOUR_DATASET_BUCKET>", "region": "us-west-2", "base_path": "es_snapshot" } }'
2 - A Snapshot
$ curl --location --request PUT 'http://localhost:9200/_snapshot/s3_backup/%3Csnapshot-%7Bnow%2Fd%7D%3E'
If you are using a tutorial and other options, you should know:
General recommendation, do not use wait_for_completion=true as that could timeout your client when the indices grow large.
Also better to use timestamps in the name of the snapshot
Note: You may want to be selective about what indices you are restoring. For example, if you wanted to omit the following (some of which are adaptation-specific):
user_rules user_runs_history versions product_counter product_accountability hysds_ios orbits orbits_status *_triaged_job grq_q16511*
You would do this:
curl -X POST "localhost:9200/_snapshot/my_backup/snapshot_1/_restore?pretty" -H 'Content-Type: application/json' -d' { "indices": "-user_rules,-user_runs_history,-versions,-product_counter,-product_accountability,-hysds_ios,-orbits,-orbits_status,-*_triaged_job,-grq_q16511*" } '
Also, before you restore and ES, you need to close all indices. https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-close.html
POST /*/_close