Question:
How do you backup your Elasticsearch for your HySDS system?
Context:
I know that HySDS persists most of its state in Elasticsearch. How can I regularly backup my Elasticsearch indices?
Answer:
In case of catastrophic failure, you would be more likely to backup GRQ and Metrics Elasticsearch indices as your products and Metrics are long-term data stores. It can be problematic to backup Mozart ES because Mozart is supposed to be a snapshot of the processing state of the system. In a catastrophic failure, the snapshot that Mozart provides will be completely out-of-sync with reality and restoring it could cause redundant processing and incomplete knowledge of system state.
This answer is to backup GRQ on a HySDS-hosted Elasticsearch service with an AWS backend.
The answer is via Sujen Shah on Slack:
For snapshot, you need two things, which you can initiate on the GRQ server on your HySDS cluster.
1 - Snapshot repository To register repository
curl --location --request PUT 'http://localhost:9200/_snapshot/s3_backup' \
--header 'Content-Type: application/json' \
--header 'Content-Type: text/plain' \
--data '{
"type": "s3",
"settings": {
"bucket": "<YOUR_DATASET_BUCKET>",
"region": "us-west-2",
"base_path": "es_snapshot"
}
}'
2 - A Snapshot
$ curl --location --request PUT 'http://localhost:9200/_snapshot/s3_backup/%3Csnapshot-%7Bnow%2Fd%7D%3E'
If you are using a tutorial and other options, you should know:
General recommendation, do not use wait_for_completion=true as that could timeout your client when the indices grow large.
Also better to use timestamps in the name of the snapshot
Note: You may want to be selective about what indices you are restoring. For example, if you wanted to omit the following (some of which are adaptation-specific):
user_rules
user_runs_history
versions
product_counter
product_accountability
hysds_ios
orbits
orbits_status
*_triaged_job
grq_q16511*
You would do this:
curl -X POST "localhost:9200/_snapshot/my_backup/snapshot_1/_restore?pretty" -H 'Content-Type: application/json' -d'
{
"indices": "-user_rules,-user_runs_history,-versions,-product_counter,-product_accountability,-hysds_ios,-orbits,-orbits_status,-*_triaged_job,-grq_q16511*"
}
'
Also, before you restore and ES, you need to close all indices. https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-close.html