Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Gerald Manipon edited this

...

page on Aug 30,

...

2018 · 6 revisions

Page Navigation:

Table of Contents

(blue star) Confidence Level TBD  This article has not been reviewed for accuracy, timeliness, or completeness. Check that this information is valid before acting on it.


Upgrade

For definitions of terminology used, please refer to

...

our terminology reference.

Latest Releases

The latest releases are here: https://github.com/hysds/hysds-framework/releases. Each was taken from the latest head state of all repos at time of release.

Prerequisite - Graceful Shutdown

To preserve the state of queued/running HySDS

...

jobs mozart, the HySDS cluster should be brought down gracefully as follows:

Turn off all timer based job submission scripts (e.g. crontab

...

on factotum)

  1. Log into into

...

  1. your factotum instance

    Code Block
    ssh -i <PEM file> ops@<factotum IP>
    
  2. Back up your crontab and remove it to prevent jobs from being submitted during the upgrade process

    Code Block
    mkdir ~/crontabs
    crontab -l > crontab.$(date -u -Iseconds)
    crontab -r
    

Gracefully shutdown the workers

  1. Log into

...

  1. your mozart instance

    Code Block
    ssh -i <PEM file> ops@<mozart IP>
    
  2. Shut down the syncer processes that updates job queue metrics (i.e. AWS CloudWatch) which triggers autoscaling

...

  1. of verdi

...

  1.  workers:

    Code Block
    supervisorctl status  | grep ^sync_ | awk '{print $1}' | xargs -i -t supervisorctl stop {}
    
  2. Cancel consumption of tasks:

    Code Block
    sudo rabbitmqctl list_queues 2>&1 | grep -v '^celery' | tail -n +2 | awk '{print $1}' | xargs -i -t celery -A hysds control cancel_consumer {}
    
  3. Log into the RabbitMQ admin interface to ensure that all queues

...

  1. show 0 in the Unackedcolumn of

...

  1. the Queues

...

  1.  tab. To show only the job/tasks queues,

...

  1. enter ^(?!celery)

...

  1.  in the filter text box and check

...

  1. the Regex

...

  1.  checkbox. If there are jobs/tasks currently running, you can either wait for them to complete or kill them manually to retry them later after the upgrade. Below screenshot shows

...

  1. the Unacked

...

  1.  column with all zeros signifying that there are no jobs/tasks currently running. 

    rabbitmqImage Modified
  2. Log

...

  1. into figaro

...

  1.  and ensure that it matches what you see in the RabbitMQ admin

...

  1. interface 

    headImage Modified
  2. Your cluster is now ready for the upgrade

Upgrade

Update HySDS core using hysds-framework and sdscli

  1. Log into

...

  1. your mozart instance

    Code Block
    ssh -i <PEM file> ops@<mozart IP>
    
  2. Stop the cluster

    Code Block
    sds stop all -f
    
  3. Backup your mozart directory

    Code Block
    mv ~/mozart ~/mozart.orig
    
  4. If you have it, remove the

...

  1. old hysds-framework

...

  1.  clone

    Code Block
    rm -rf ~/hysds-framework
    
  2. Clone the HySDS framework repository and enter it

    Code Block
    cd ~
    git clone https://github.com/hysds/hysds-framework.git
    cd hysds-framework
    
  3. Select the HySDS framework release tag you'd like to install for mozart

    Code Block
    ./install.sh mozart
    HySDS install directory set to /home/ops/mozart
    New python executable in /home/ops/mozart/bin/python
    Installing Setuptools............................................done.
    Installing Pip...................................................done.
    Created virtualenv at /home/ops/mozart.
    [2017-08-09 19:25:37,789: INFO/main] Github repo URL: https://xxxxxxxx@github.com/api/v3/repos/hysds/hysds-framework/releases
    [2017-08-09 19:25:37,798: INFO/_new_conn] Starting new HTTPS connection (1): github.com
    No release specified. Use -r RELEASE | --release=RELEASE to install a specific release. Listing available releases:
    v2.0.0-beta.3
    v2.1.0-beta.0
    v2.1.0-beta.1
    v2.1.0-beta.2
    v2.1.0-beta.3
    v2.1.0-beta.4
    v2.1.0-beta.5
    v2.1.0-beta.6
    v2.1.0-beta.7
    v2.1.0-beta.8
    v2.1.0-rc.0
    v2.1.0-rc.1
    v2.1.0-rc.2
    v2.1.0-rc.3
    
  4. Install the latest HySDS release (e.g. v2.1.0-rc.3) for the mozart component

    Code Block
    ./install.sh mozart -r <release>
    
    e.g.
    
    ./install.sh mozart -r v2.1.0-rc.3
    

    You could also install the development version which pulls the master branch of each HySDS repo:

    Code Block
    ./install.sh mozart -d
    
  5. Restore the non-core repositories from the directory backup

...

  1. under ~/mozart.orig/ops

    Code Block
    cd ~/mozart.orig/ops
    for i in *; do new=~/mozart/ops/$i; if [ ! -e "$new" ]; then cp -rp $i $new; fi done
    
  2. Update all HySDS components:

    Code Block
    sds update all
    

    If you receive any errors, they will need to be addressed.

  3. (Optional) Run any adaptation-specific fabric updates (e.g. update_aria_packages)

    Code Block
    fab -f ~/.sds/cluster.py -R factotum,verdi update_aria_packages
    
  4. Build and ship out updated code/config bundles

    Code Block
    sds ship
    
  5. Start up

...

  1. the grq

...

  1.  component and validate that all services come up fine

    Code Block
    sds start grq
    sds status grq
    
  2. Start up

...

  1. the mozart

...

  1.  component and validate that all services come up fine

    Code Block
    sds start mozart
    sds status mozart
    
  2. Start up

...

  1. the metrics

...

  1.  component and validate that all services come up fine

    Code Block
    sds start metrics
    sds status metrics
    
  2. During installation, the latest versions of

...

  1. the lightweight-jobs

...

  1.  core HySDS package and

...

  1. the verdi

...

  1.  docker image was downloaded. If the version has has incremented, we import

...

  1. the lightweight-jobs

...

  1.  package:

    Code Block
    cd ~/mozart/pkgs
    sds pkg import container-hysds_lightweight-jobs.*.sdspkg.tar
    
  2. Copy

...

  1. the verdi

...

  1.  docker image to the code bucket (CODE_BUCKET

...

  1.  as specified

...

  1. during sds configure).

...

  1. Ensure VERDI_PRIMER_IMAGE

...

  1.  url is consistent:

    Code Block
    aws s3 cp hysds-verdi-latest.tar.gz s3://<CODE_BUCKET>/hysds-verdi-latest.tar.gz
    
  2. Start up

...

  1. the factotum

...

  1.  component and validate that all services come up fine

    Code Block
    sds start factotum
    sds status factotum
    
  2. View status of HySDS components and services:

    Code Block
    sds status all
    

Restore all timer based job submission scripts (e.g. crontab

...

on factotum)

  1. Log back

...

  1. into factotum

...

  1.  and restore the crontab

    Code Block
    ssh -i <PEM file> ops@<factotum IP>
    cd ~/crontabs
    crontab crontab.<your_last_backup>