Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

I know that HySDS persists most of its state in Elasticsearch. How can I regularly backup my Elasticsearch indices?

Expand
titleAnswer

...

In case of catastrophic failure, you would be more likely to backup GRQ and Metrics Elasticsearch indices as your products and Metrics are long-term data stores. It can be problematic to backup Mozart ES because Mozart is supposed to be a snapshot of the processing state of the system. In a catastrophic failure, the snapshot that Mozart provides will be completely out-of-sync with reality and restoring it could cause redundant processing and incomplete knowledge of system state.

This answer is to backup GRQ on a HySDS-hosted Elasticsearch service with an AWS backend.

The answer is via Sujen Shah on Slack:

For snapshot, you need two things, which you can initiate on the GRQ server on your HySDS cluster.

1 - Snapshot repository To register repository

Code Block
curl --location --request PUT 'http://localhost:9200/_snapshot/s3_backup' \
--header 'Content-Type: application/json' \
--header 'Content-Type: text/plain' \
--data '{
  "type": "s3",
  "settings": {
    "bucket": "<YOUR_DATASET_BUCKET>",
    "region": "us-west-2",
    "base_path": "es_snapshot"
  }
}'

2 - A Snapshot

Code Block
$ curl --location --request PUT 'http://localhost:9200/_snapshot/s3_backup/%3Csnapshot-%7Bnow%2Fd%7D%3E'

If you are using a tutorial and other options, you should know:

General recommendation, do not use wait_for_completion=true as that could timeout your client when the indices grow large.

Also better to use timestamps in the name of the snapshot

Note: You may want to be selective about what indices you are restoring. For example, if you wanted to omit the following (some of which are adaptation-specific):

Code Block
user_rules
user_runs_history
versions
product_counter
product_accountability
hysds_ios
orbits
orbits_status
*_triaged_job
grq_q16511*

You would do this:

Code Block
curl -X POST "localhost:9200/_snapshot/my_backup/snapshot_1/_restore?pretty" -H 'Content-Type: application/json' -d'
{
  "indices": "-user_rules,-user_runs_history,-versions,-product_counter,-product_accountability,-hysds_ios,-orbits,-orbits_status,-*_triaged_job,-grq_q16511*"
}
'

Also, before you restore and ES, you need to close all indices. https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-close.html

Code Block
POST /*/_close

...