Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Expand
titleHow do you backup your Elasticsearch for your HySDS system?

Context:

I know that HySDS persists most of its state in Elasticsearch. How can I regularly backup my Elasticsearch indices?

Answer:

In case of catastrophic failure, you would be more likely to backup GRQ and Metrics Elasticsearch indices as your products and Metrics are long-term data stores. It can be problematic to backup Mozart ES because Mozart is supposed to be a snapshot of the processing state of the system. In a catastrophic failure, the snapshot that Mozart provides will be completely out-of-sync with reality and restoring it could cause redundant processing and incomplete knowledge of system state.

This answer is to backup GRQ on a HySDS-hosted Elasticsearch service with an AWS backend.

The answer is via Sujen Shah on Slack:

For snapshot, you need two things, which you can initiate on the GRQ server on your HySDS cluster.

1 - Snapshot repository To register repository

Code Block
curl --location --request PUT 'http://localhost:9200/_snapshot/s3_backup' \
--header 'Content-Type: application/json' \
--header 'Content-Type: text/plain' \
--data '{
  "type": "s3",
  "settings": {
    "bucket": "<YOUR_DATASET_BUCKET>",
    "region": "us-west-2",
    "base_path": "es_snapshot"
  }
}'

2 - A Snapshot

Code Block
$ curl --location --request PUT 'http://localhost:9200/_snapshot/s3_backup/%3Csnapshot-%7Bnow%2Fd%7D%3E'

If you are using a tutorial and other options, you should know:

General recommendation, do not use wait_for_completion=true as that could timeout your client when the indices grow large.

Also better to use timestamps in the name of the snapshot

Note: You may want to be selective about what indices you are restoring. For example, if you wanted to omit the following (some of which are adaptation-specific):

Code Block
user_rules
user_runs_history
versions
product_counter
product_accountability
hysds_ios
orbits
orbits_status
*_triaged_job
grq_q16511*

You would do this:

Code Block
curl -X POST "localhost:9200/_snapshot/my_backup/snapshot_1/_restore?pretty" -H 'Content-Type: application/json' -d'
{
  "indices": "-user_rules,-user_runs_history,-versions,-product_counter,-product_accountability,-hysds_ios,-orbits,-orbits_status,-*_triaged_job,-grq_q16511*"
}
'

Also, before you restore and ES, you need to close all indices. https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-close.html

Code Block
POST /*/_close

...