Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Current »

Gerald Manipon edited this page on Aug 30, 2018 · 6 revisions

Upgrade

For definitions of terminology used, please refer to our terminology reference.

Latest Releases

The latest releases are here: https://github.com/hysds/hysds-framework/releases. Each was taken from the latest head state of all repos at time of release.

Prerequisite - Graceful Shutdown

To preserve the state of queued/running HySDS jobs mozart, the HySDS cluster should be brought down gracefully as follows:

Turn off all timer based job submission scripts (e.g. crontab on factotum)

  1. Log into into your factotum instance
    ssh -i <PEM file> ops@<factotum IP>
    
  2. Back up your crontab and remove it to prevent jobs from being submitted during the upgrade process
    mkdir ~/crontabs
    crontab -l > crontab.$(date -u -Iseconds)
    crontab -r
    

Gracefully shutdown the workers

  1. Log into your mozart instance
    ssh -i <PEM file> ops@<mozart IP>
    
  2. Shut down the syncer processes that updates job queue metrics (i.e. AWS CloudWatch) which triggers autoscaling of verdi workers:
    supervisorctl status  | grep ^sync_ | awk '{print $1}' | xargs -i -t supervisorctl stop {}
    
  3. Cancel consumption of tasks:
    sudo rabbitmqctl list_queues 2>&1 | grep -v '^celery' | tail -n +2 | awk '{print $1}' | xargs -i -t celery -A hysds control cancel_consumer {}
    
  4. Log into the RabbitMQ admin interface to ensure that all queues show 0 in the Unackedcolumn of the Queues tab. To show only the job/tasks queues, enter ^(?!celery) in the filter text box and check the Regex checkbox. If there are jobs/tasks currently running, you can either wait for them to complete or kill them manually to retry them later after the upgrade. Below screenshot shows the Unacked column with all zeros signifying that there are no jobs/tasks currently running. rabbitmq
  5. Log into figaro and ensure that it matches what you see in the RabbitMQ admin interface head
  6. Your cluster is now ready for the upgrade

Upgrade

Update HySDS core using hysds-framework and sdscli

  1. Log into your mozart instance
    ssh -i <PEM file> ops@<mozart IP>
    
  2. Stop the cluster
    sds stop all -f
    
  3. Backup your mozart directory
    mv ~/mozart ~/mozart.orig
    
  4. If you have it, remove the old hysds-framework clone
    rm -rf ~/hysds-framework
    
  5. Clone the HySDS framework repository and enter it
    cd ~
    git clone https://github.com/hysds/hysds-framework.git
    cd hysds-framework
    
  6. Select the HySDS framework release tag you'd like to install for mozart
    ./install.sh mozart
    HySDS install directory set to /home/ops/mozart
    New python executable in /home/ops/mozart/bin/python
    Installing Setuptools............................................done.
    Installing Pip...................................................done.
    Created virtualenv at /home/ops/mozart.
    [2017-08-09 19:25:37,789: INFO/main] Github repo URL: https://xxxxxxxx@github.com/api/v3/repos/hysds/hysds-framework/releases
    [2017-08-09 19:25:37,798: INFO/_new_conn] Starting new HTTPS connection (1): github.com
    No release specified. Use -r RELEASE | --release=RELEASE to install a specific release. Listing available releases:
    v2.0.0-beta.3
    v2.1.0-beta.0
    v2.1.0-beta.1
    v2.1.0-beta.2
    v2.1.0-beta.3
    v2.1.0-beta.4
    v2.1.0-beta.5
    v2.1.0-beta.6
    v2.1.0-beta.7
    v2.1.0-beta.8
    v2.1.0-rc.0
    v2.1.0-rc.1
    v2.1.0-rc.2
    v2.1.0-rc.3
    
  7. Install the latest HySDS release (e.g. v2.1.0-rc.3) for the mozart component
    ./install.sh mozart -r <release>
    
    e.g.
    
    ./install.sh mozart -r v2.1.0-rc.3
    
    You could also install the development version which pulls the master branch of each HySDS repo:
    ./install.sh mozart -d
    
  8. Restore the non-core repositories from the directory backup under ~/mozart.orig/ops
    cd ~/mozart.orig/ops
    for i in *; do new=~/mozart/ops/$i; if [ ! -e "$new" ]; then cp -rp $i $new; fi done
    
  9. Update all HySDS components:
    sds update all
    
    If you receive any errors, they will need to be addressed.
  10. (Optional) Run any adaptation-specific fabric updates (e.g. update_aria_packages)
    fab -f ~/.sds/cluster.py -R factotum,verdi update_aria_packages
    
  11. Build and ship out updated code/config bundles
    sds ship
    
  12. Start up the grq component and validate that all services come up fine
    sds start grq
    sds status grq
    
  13. Start up the mozart component and validate that all services come up fine
    sds start mozart
    sds status mozart
    
  14. Start up the metrics component and validate that all services come up fine
    sds start metrics
    sds status metrics
    
  15. During installation, the latest versions of the lightweight-jobs core HySDS package and the verdi docker image was downloaded. If the version has has incremented, we import the lightweight-jobs package:
    cd ~/mozart/pkgs
    sds pkg import container-hysds_lightweight-jobs.*.sdspkg.tar
    
  16. Copy the verdi docker image to the code bucket (CODE_BUCKET as specified during sds configure). Ensure VERDI_PRIMER_IMAGE url is consistent:
    aws s3 cp hysds-verdi-latest.tar.gz s3://<CODE_BUCKET>/hysds-verdi-latest.tar.gz
    
  17. Start up the factotum component and validate that all services come up fine
    sds start factotum
    sds status factotum
    
  18. View status of HySDS components and services:
    sds status all
    

Restore all timer based job submission scripts (e.g. crontab on factotum)

  1. Log back into factotum and restore the crontab
    ssh -i <PEM file> ops@<factotum IP>
    cd ~/crontabs
    crontab crontab.<your_last_backup>
  • No labels