Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

Gerald Manipon edited this page on Mar 21 · 36 revisions

Initial Cluster Setup

For definitions of terminology used, please refer to ourterminology reference.

When installation of the HySDS framework is complete on yourmozartinstance (seeInstallation), we must configure the rest of the cluster instances so that they can talk to each other. We do this using thesdsclicommand on themozartinstance. The idea is that all code and configuration is centralized on themozartinstance and when ready to deploy updates during the development cycle or when upgrading operations, we can push them easily from a single location.

  1. Configure your cluster parameters usingsdscli: Thesdsclirepo was installed on yourmozartinstance duringInstallation. Configure your cluster by running:
    cd ~
    source ~/mozart/bin/activate
    sds configure
    
  2. Thesds configurecommand will prompt you for your cluster parameters. A description of the parameters with examples is provided below

*** LOOK OVER THE FIELDS AND HAVE THE VALUES READY BEFORE HAND ***

fielddescriptionexample
MOZART_PVT_IPprivate IP address ofmozartinstance100.64.134.201
MOZART_PUB_IPpublicly accessible IP address ofmozartinstance, e.g. Elastic IP; can be the same as MOZART_PVT_IP64.34.21.123
MOZART_FQDNpublicly resolvable FQDN ofmozartinstance; can be the same as MOZART_PVT_IPgman-jobs.hysds.net
MOZART_RABBIT_PVT_IPprivate IP address ofmozartrabbitMQ instance (if running rabbitMQ on a different instance); otherwise replicate value from MOZART_PVT_IP100.64.134.201
MOZART_RABBIT_PUB_IPpublicly accessible IP address ofmozartrabbitMQ instance (if running rabbitMQ on a different instance); otherwise replicate value from MOZART_PUB_IP64.34.21.123
MOZART_RABBIT_FQDNpublicly resolvable FQDN ofmozartrabbitMQ instance (if running rabbitMQ on a different instance); otherwise replicate value from MOZART_FQDNgman-jobs.hysds.net
MOZART_RABBIT_USERrabbitMQ user account; default installed by rabbitMQ isguestguest
MOZART_RABBIT_PASSWORDrabbitMQ user password; default installed by rabbitMQ isguestguest
MOZART_REDIS_PVT_IPprivate IP address ofmozartredis instance (if running redis on a different instance); otherwise replicate value from MOZART_PVT_IP100.64.134.201
MOZART_REDIS_PUB_IPpublicly accessible IP address ofmozartredis instance (if running redis on a different instance); otherwise replicate value from MOZART_PUB_IP64.34.21.123
MOZART_REDIS_FQDNpublicly resolvable FQDN ofmozartredis instance (if running redis on a different instance); otherwise replicate value from MOZART_FQDNgman-jobs.hysds.net
MOZART_REDIS_PASSWORDredis password (if AUTH is configured)empty string or <redis password>
MOZART_ES_PVT_IPprivate IP address ofmozartelasticsearch instance (if running elasticsearch on a different instance); otherwise replicate value from MOZART_PVT_IP100.64.134.201
MOZART_ES_PUB_IPpublicly accessible IP address ofmozartelasticsearch instance (if running elasticsearch on a different instance); otherwise replicate value from MOZART_PUB_IP64.34.21.123
MOZART_ES_FQDNpublicly resolvable FQDN ofmozartelasticsearch instance (if running elasticsearch on a different instance); otherwise replicate value from MOZART_FQDNgman-jobs.hysds.net
OPS_USERops account on HySDS cluster instancesops or hysdsops or swotops
OPS_HOMEops account home directory on HySDS cluster instances/home/ops or /data/home/hysdsops
OPS_PASSWORD_HASHsha224sum password hash for ops user account login to HySDS web interfacesoutput ofecho -n mypassword | sha224sum
LDAP_GROUPScomma-separated list of LDAP groups to use for user authentication into HySDS web interfaceshysds-v2,aria.users,ariamh
KEY_FILENAMEprivate ssh key to use for logging into other cluster instances; used for deployment via fabric/home/ops/.ssh/my_cloud_keypair.pem
JENKINS_USERaccount onciinstance that owns and runs the Jenkins CI serverjenkins
JENKINS_DIRlocation of the Jenkins HOME directory (where jobs/ directory is located)/var/lib/jenkins
METRICS_PVT_IPprivate IP address ofmetricsinstance100.64.134.153
METRICS_PUB_IPpublicly accessible IP address ofmetricsinstance, e.g. Elastic IP; can be the same as METRICS_PVT_IP64.34.21.124
METRICS_FQDNpublicly resolvable FQDN ofmetricsinstance; can be the same as METRICS_PVT_IPgman-metrics.hysds.net
METRICS_REDIS_PVT_IPprivate IP address ofmetricsredis instance (if running redis on a different instance); otherwise replicate value from METRICS_PVT_IP100.64.134.153
METRICS_REDIS_PUB_IPpublicly accessible IP address ofmetricsredis instance (if running redis on a different instance); otherwise replicate value from METRICS_PUB_IP64.34.21.123
METRICS_REDIS_FQDNpublicly resolvable FQDN ofmetricsredis instance (if running redis on a different instance); otherwise replicate value from METRICS_FQDNgman-metrics.hysds.net
METRICS_REDIS_PASSWORDredis password (if AUTH is configured)empty string or <redis password>
METRICS_ES_PVT_IPprivate IP address ofmetricselasticsearch instance (if running elasticsearch on a different instance); otherwise replicate value from METRICS_PVT_IP100.64.134.153
METRICS_ES_PUB_IPpublicly accessible IP address ofmetricselasticsearch instance (if running elasticsearch on a different instance); otherwise replicate value from METRICS_PUB_IP64.34.21.124
METRICS_ES_FQDNpublicly resolvable FQDN ofmetricselasticsearch instance (if running elasticsearch on a different instance); otherwise replicate value from METRICS_FQDNgman-metrics.hysds.net
GRQ_PVT_IPprivate IP address ofgrqinstance100.64.134.71
GRQ_PUB_IPpublicly accessible IP address ofgrqinstance, e.g. Elastic IP; can be the same as GRQ_PVT_IP64.34.21.125
GRQ_FQDNpublicly resolvable FQDN ofgrqinstance; can be the same as GRQ_PVT_IPgman-grq.hysds.net
GRQ_PORTport to use for the grq2 REST API8878
GRQ_ES_PVT_IPprivate IP address ofgrqelasticsearch instance (if running elasticsearch on a different instance); otherwise replicate value from GRQ_PVT_IP100.64.134.71
GRQ_ES_PUB_IPpublicly accessible IP address ofgrqelasticsearch instance (if running elasticsearch on a different instance); otherwise replicate value from GRQ_PUB_IP64.34.21.125
GRQ_ES_FQDNpublicly resolvable FQDN ofgrqelasticsearch instance (if running elasticsearch on a different instance); otherwise replicate value from GRQ_FQDNgman-grq.hysds.net
FACTOTUM_PVT_IPprivate IP address offactotuminstance100.64.134.184
FACTOTUM_PUB_IPpublicly accessible IP address offactotuminstance, e.g. Elastic IP; can be the same as FACTOTUM_PVT_IP64.34.21.126
FACTOTUM_FQDNpublicly resolvable FQDN offactotuminstance; can be the same as FACTOTUM_PVT_IPgman-factotum.hysds.net
CI_PVT_IPprivate IP address ofciinstance100.64.134.179
CI_PUB_IPpublicly accessible IP address ofciinstance, e.g. Elastic IP; can be the same as CI_PVT_IP64.34.21.127
CI_FQDNpublicly resolvable FQDN ofciinstance; can be the same as CI_PVT_IPgman-ci.hysds.net
VERDI_PVT_IPprivate IP address ofverdiinstance; if noverdiinstance, useciinstance value for CI_PVT_IP100.64.134.179
VERDI_PUB_IPpublicly accessible IP address ofverdiinstance, e.g. Elastic IP; if noverdiinstance, useciinstance value for CI_PUB_IP64.34.21.127
VERDI_FQDNpublicly resolvable FQDN ofverdiinstance; if noverdiinstance, useciinstance value for CI_FQDNgman-ci.hysds.net
JENKINS_API_USERJenkins user account to use for access to Jenkins APIgmanipon
JENKINS_API_KEYJenkins user API key to use for access to Jenkins API. Go to an already set up Jenkins web page and click on “People”, your username, then “Configure”. Click on “Show API Token”. Use that token and you username for API_USER.<api key>
DAV_SERVERWebDAV server for dataset publication (optional); leave blank if using S3aria-dav.jpl.nasa.gov
DAV_USERWebDAV server account with R/W accessops
DAV_PASSWORDDAV_USER account password<password>
DATASET_AWS_ACCESS_KEYAWS access key for account or role with R/W access to S3 bucket for dataset repository<access key>
DATASET_AWS_SECRET_KEYAWS secret key for DATASET_AWS_ACCESS_KEY<secret key>
DATASET_AWS_REGIONAWS region for S3 bucket for dataset repositoryus-west-2
DATASET_S3_ENDPOINTS3 endpoint for the DATASET_AWS_REGIONs3-us-west-2.amazonaws.com
DATASET_S3_WEBSITE_ENDPOINTS3 website endpoint for the DATASET_AWS_REGIONs3-website-us-west-2.amazonaws.com
DATASET_BUCKETbucket name for dataset repositoryops-product-bucket
AWS_ACCESS_KEYAWS access key for account or role with R/W access to S3 bucket for code/config bundle and docker image repository; can be the same as DATASET_AWS_ACCESS_KEY<access key>
AWS_SECRET_KEYAWS secret key for AWS_ACCESS_KEY; can be the same as DATASET_AWS_SECRET_KEY<secret key>
AWS_REGIONAWS region for S3 bucket for code/config bundle and docker image repositoryus-west-2
S3_ENDPOINTS3 endpoint for the AWS_REGIONs3-us-west-2.amazonaws.com
CODE_BUCKETbucket name for code/config bundle and docker image repositoryops-code-bucket
VERDI_PRIMER_IMAGES3 url toverdidocker image in CODE_BUCKETs3://ops-code-bucket/hysds-verdi-latest.tar.gz
VERDI_TAGdocker tag forverdidocker imagelatest
VERDI_UIDUID of ops user onciinstance; used to sync UID upon docker image creation1001
VERDI_GIDGID of ops user onciinstance; used to sync GID upon docker image creation1001
QUEUESspace-delimited list of queues to create autoscaling code/config bundles for"dumby-job_worker-small dumby-job_worker-large"
INSTANCE_TYPESspace-delimited list of instance types to use for the corresponding queue as defined inQUEUES"t2.micro t2.micro"
VENUEunique tag name to differentiate this HySDS cluster from otherse.g. ops or dev or oasis or test
PROVES_URLurl to PROV-ES server (optional)https://prov-es.jpl.nasa.gov/beta
PROVES_IMPORT_URLPROV-ES API url for import of PROV-ES documents (optional)https://prov-es.jpl.nasa.gov/beta/api/v0.1/prov_es/import/json
DATASETS_CFGlocation ofdatasets configurationon workers/home/ops/verdi/etc/datasets.json
SYSTEM_JOBS_QUEUEname of queue to use for system jobssystem-jobs-queue
GIT_OAUTH_TOKENoptional Github OAuth token to use onciinstance when checking out code for continuous integration (optional)<token>
  1. Make sure elasticsearch is up on the mozart and grq instances. You can run the following command to check:

    curl 'http://<mozart/grq ip>:9200/?pretty'
    

    you should get answer back from ES, something like this:

    {
     "status" : 200,
     "name" : "Dweller-in-Darkness",
     "cluster_name" : "resource_cluster",
     "version" : {
       "number" : "1.7.3",
       "build_hash" : "05d4530971ef0ea46d0f4fa6ee64dbc8df659682",
       "build_timestamp" : "2015-10-15T09:14:17Z",
       "build_snapshot" : false,
       "lucene_version" : "4.10.4"
     },
     "tagline" : "You Know, for Search"
    }
    
    

    If you can not connect to elastic search, you need to start ElasticSearch in mozart and grq instances:

    sudo systemctl start elasticsearch
    
  2. Ensuremozartcomponent can connect to other components over ssh using the configuredKEY_FILENAME. If correctly configured, thesds status allcommand should show that it was able to ssh into each component to check that thesupervisorddaemon was not running like below:

    sds status
    ########################################
    grq
    ########################################
    ----------------------------------------
    third-party services
    ----------------------------------------
    [100.64.134.135] Executing task 'systemctl'
    elasticsearch: ACTIVE
    ----------------------------------------
    supervised services
    ----------------------------------------
    [100.64.134.135] Executing task 'status'
    Supervisord is not running on grq.
    ########################################
    mozart
    ########################################
    ----------------------------------------
    third-party services
    ----------------------------------------
    [100.64.134.172] Executing task 'systemctl'
    rabbitmq-server: ACTIVE
    [100.64.134.172] Executing task 'systemctl'
    redis: ACTIVE
    [100.64.134.172] Executing task 'systemctl'
    elasticsearch: ACTIVE
    ----------------------------------------
    supervised services
    ----------------------------------------
    [100.64.134.172] Executing task 'status'
    Supervisord is not running on mozart.
    ########################################
    metrics
    ########################################
    ----------------------------------------
    third-party services
    ----------------------------------------
    [100.64.134.51] Executing task 'systemctl'
    redis: ACTIVE
    [100.64.134.51] Executing task 'systemctl'
    elasticsearch: ACTIVE
    ----------------------------------------
    supervised services
    ----------------------------------------
    [100.64.134.51] Executing task 'status'
    Supervisord is not running on metrics.
    ########################################
    factotum
    ########################################
    ----------------------------------------
    supervised services
    ----------------------------------------
    [100.64.134.157] Executing task 'status'
    Supervisord is not running on factotum.
    ########################################
    ci
    ########################################
    ----------------------------------------
    third-party services
    ----------------------------------------
    [100.64.134.108] Executing task 'systemctl'
    jenkins: ACTIVE
    ----------------------------------------
    supervised services
    ----------------------------------------
    [100.64.134.108] Executing task 'status'
    Supervisord is not running on ci.
    ########################################
    verdi
    ########################################
    ----------------------------------------
    supervised services
    ----------------------------------------
    [100.64.134.108] Executing task 'status'
    Supervisord is not running on verdi.
    

    Otherwise if any of the components show the following error, for example for the grq component:

    ########################################
    grq
    ########################################
    [100.64.106.214] Executing task 'status'
    
    Fatal error: Needed to prompt for a connection or sudo password (host: 100.64.106.214), but abort-on-prompts was set to True
    
    Aborting.
    Needed to prompt for a connection or sudo password (host: 100.64.106.214), but abort-on-prompts was set to True
    
    Fatal error: One or more hosts failed while executing task 'status'
    
    Aborting.
    One or more hosts failed while executing task 'status'
    

    then there is an issue with the configuredKEY_FILENAMEonmozartor theauthorized_keysfile under the component's~/.sshdirectory for userOPS_USER. Resolve this issue before continuing on.

  3. Update all HySDS components:

    sds update all
    

    If you receive any errors, they will need to be addressed.

  4. Start up all HySDS components:

    sds start all
    
  5. View status of HySDS components and services:

    sds status all
    
  6. During installation, the latest versions of thelightweight-jobscore HySDS package and theverdidocker image was downloaded. Next we import thelightweight-jobspackage:

    cd ~/mozart/pkgs
    sds pkg import container-hysds_lightweight-jobs.*.sdspkg.tar
    
  7. Finally we copy theverdidocker image to the code bucket (CODE_BUCKETas specified duringsds configure). EnsureVERDI_PRIMER_IMAGEurl is consistent:

    aws s3 cp hysds-verdi-latest.tar.gz s3://<CODE_BUCKET>/hysds-verdi-latest.tar.gz
    

Next Steps

Now that you have your HySDS cluster configured, continue on toHello World.

To configure Autoscaling groups for your HySDS cluster, continue on toCreate-AWS-Autoscaling-Group-for-Verdi.

To configure a staging area for your HySDS cluster, continue on toCreate-AWS-Resources-for-Staging-Area.


  • No labels