Customizing Rollover of old jobs in Mozart

Overview

With HySDS Core release version 5, it added the capability to rollover old jobs from Mozart’s Elasticsearch. The associated Jira ticket can be found here:

Elasticsearch’s Index Lifecycle Manager (ILM) policy is used to determine when to rollover old indices:

Previously, all jobs were getting stored under the job_status-current index. Now, with this new feature, jobs will get partitioned by date under the following index: job_status-YYYY.MM.DD.

This wiki will describe the default behavior of the system and how to customize it to adapt to project needs.

ILM (Index lifecycle policy) Policy

A default ILM policy comes with HySDS Core version 5. It is found in the sdscli repository along with dependency index templates for the jobs:

NOTE: at the time of this writing, so that we don’t break other projects using hysds_release=develop, we will merge this feature into a develop-v5 branch until we cut an official v5 of HySDS Core. At that time, we will fully merge this feature into the develop branch.

 

The default ILM policy is as follows:

{ "policy": { "phases": { "hot": { "min_age": "0ms", "actions": { "set_priority" : { "priority": 100 } } }, "warm": { "min_age": "90d", "actions": { "migrate": { "enabled": false }, "set_priority" : { "priority": 50 } } }, "cold": { "min_age": "97d", "actions": { "set_priority" : { "priority": 0 }, "migrate": { "enabled": false }, "freeze": {} } }, "delete": { "min_age": "104d", "actions": { "delete": {} } } } } }

 

At a high level, this will do the following:

  • Upon index creation, the index will go into a hot phase

  • After 90 days, it will move into a warm phase, where they can still be written to in order to update job statuses they may have crossed over from hot to warm in the middle of running.

  • 7 days from that point, they will move into a cold phase where they will be closed. Thus, they will not appear under Figaro at that point.

  • Finally, 7 days from that point, they will be deleted from ElasticSearch.

See the comments section in HC-447 as it contains details on how this was tested and what you can expect to see at the different phases when looking at HySDS UI:

 

Visualization of ILM policy

 

Override Default Behavior

In order to override the default ILM behavior, do the following:

 

Get the following files from the sdscli repo and put it into the ~/.sds/files area in your Mozart:

  • es_ilm_policy_mozart.json

  • event_status.template

  • job_status.template

  • task_status.template

  • worker_status.template

 

At this point, you can update your ILM policy to fit your project needs. The ElasticSearch documentation referenced in the Overview section has more details on how else you can customize it as you see fit.

The templates should be included in here, mainly because within each of these templates has the number of shards setting:

"settings": { "number_of_shards": 8, "index": { "refresh_interval": "5s" }

 

it is best to verify that the shards setting here is consistent with your deployment.

 

Testing

At the time of this writing, HySDS Core is in the middle of transitioning from v4 to v5. While that is occurring, the rollover feature will temporarily reside under a develop-v5 branch for the following repos that were updated to properly add this rollover:

  • hysds:

  • sdscli:

 

For NISAR, we have temporarily updated Terraform so that we can bring this feature into our development clusters for further vetting and testing. It is done like so before we call the sds update commands to push the HySDS Core code out to the cluster:

provisioner "remote-exec" { inline = [ "set -ex", "source ~/.bash_profile", # NOTE THAT THIS WILL BE REMOVED ONCE WE MERGE THE develop-v5 changes into the develop branch # get v5 develop versions of sdscli and hysds repo "if [ \"${var.hysds_release}\" = \"develop\" ]; then", " cd ~/mozart/ops/hysds", " git checkout develop-v5", " pip install -e .", " cd ~/mozart/ops/sdscli", " git checkout develop-v5", " pip install -e .", "fi", ] }

 

 

Note: JPL employees can also get answers to HySDS questions at Stack Overflow Enterprise: