Upgrade

Gerald Manipon edited this page on Aug 30, 2018 · 6 revisions

Confidence Level TBD  This article has not been reviewed for accuracy, timeliness, or completeness. Check that this information is valid before acting on it.

Confidence Level TBD  This article has not been reviewed for accuracy, timeliness, or completeness. Check that this information is valid before acting on it.


Upgrade

For definitions of terminology used, please refer to our terminology reference.

Latest Releases

The latest releases are here: https://github.com/hysds/hysds-framework/releases. Each was taken from the latest head state of all repos at time of release.

Prerequisite - Graceful Shutdown

To preserve the state of queued/running HySDS jobs mozart, the HySDS cluster should be brought down gracefully as follows:

Turn off all timer based job submission scripts (e.g. crontab on factotum)

  1. Log into into your factotum instance

    ssh -i <PEM file> ops@<factotum IP>
  2. Back up your crontab and remove it to prevent jobs from being submitted during the upgrade process

    mkdir ~/crontabs crontab -l > crontab.$(date -u -Iseconds) crontab -r

Gracefully shutdown the workers

  1. Log into your mozart instance

    ssh -i <PEM file> ops@<mozart IP>
  2. Shut down the syncer processes that updates job queue metrics (i.e. AWS CloudWatch) which triggers autoscaling of verdi workers:

  3. Cancel consumption of tasks:

  4. Log into the RabbitMQ admin interface to ensure that all queues show 0 in the Unackedcolumn of the Queues tab. To show only the job/tasks queues, enter ^(?!celery) in the filter text box and check the Regex checkbox. If there are jobs/tasks currently running, you can either wait for them to complete or kill them manually to retry them later after the upgrade. Below screenshot shows the Unacked column with all zeros signifying that there are no jobs/tasks currently running. 

  5. Log into figaro and ensure that it matches what you see in the RabbitMQ admin interface 

  6. Your cluster is now ready for the upgrade

Upgrade

Update HySDS core using hysds-framework and sdscli

  1. Log into your mozart instance

  2. Stop the cluster

  3. Backup your mozart directory

  4. If you have it, remove the old hysds-framework clone

  5. Clone the HySDS framework repository and enter it

  6. Select the HySDS framework release tag you'd like to install for mozart

  7. Install the latest HySDS release (e.g. v2.1.0-rc.3) for the mozart component

    You could also install the development version which pulls the master branch of each HySDS repo:

  8. Restore the non-core repositories from the directory backup under ~/mozart.orig/ops

  9. Update all HySDS components:

    If you receive any errors, they will need to be addressed.

  10. (Optional) Run any adaptation-specific fabric updates (e.g. update_aria_packages)

  11. Build and ship out updated code/config bundles

  12. Start up the grq component and validate that all services come up fine

  13. Start up the mozart component and validate that all services come up fine

  14. Start up the metrics component and validate that all services come up fine

  15. During installation, the latest versions of the lightweight-jobs core HySDS package and the verdi docker image was downloaded. If the version has has incremented, we import the lightweight-jobs package:

  16. Copy the verdi docker image to the code bucket (CODE_BUCKET as specified during sds configure). Ensure VERDI_PRIMER_IMAGE url is consistent:

  17. Start up the factotum component and validate that all services come up fine

  18. View status of HySDS components and services:

Restore all timer based job submission scripts (e.g. crontab on factotum)

  1. Log back into factotum and restore the crontab

Note: JPL employees can also get answers to HySDS questions at Stack Overflow Enterprise: