Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 11 Next »

When installation of the HySDS framework is complete on your mozart instance (see Installation), we must configure the rest of the cluster instances so that they can talk to each other. We do this using the sdscli command on the mozart instance. The idea is that all code and configuration is centralized on the mozart instance and when ready to deploy updates during the development cycle or when upgrading operations, we can push them easily from a single location.

  1. Configure your cluster parameters using sdscli: The sdscli repo was installed on your mozart instance during Installation. Configure your cluster by running:

    cd ~
    source ~/mozart/bin/activate
    sds configure
  2. The sds configure command will prompt you for your cluster parameters. A description of the parameters with examples is provided below

*** LOOK OVER THE FIELDS AND HAVE THE VALUES READY BEFORE HAND ***

fielddescriptionexample
MOZART_PVT_IPprivate IP address of mozartinstance100.64.134.201
MOZART_PUB_IPpublicly accessible IP address of mozart instance, e.g. Elastic IP; can be the same as MOZART_PVT_IP64.34.21.123
MOZART_FQDNpublicly resolvable FQDN of mozart instance; can be the same as MOZART_PVT_IPgman-jobs.hysds.net
MOZART_RABBIT_PVT_IPprivate IP address of mozartrabbitMQ instance (if running rabbitMQ on a different instance); otherwise replicate value from MOZART_PVT_IP100.64.134.201
MOZART_RABBIT_PUB_IPpublicly accessible IP address of mozartrabbitMQ instance (if running rabbitMQ on a different instance); otherwise replicate value from MOZART_PUB_IP64.34.21.123
MOZART_RABBIT_FQDNpublicly resolvable FQDN of mozart rabbitMQ instance (if running rabbitMQ on a different instance); otherwise replicate value from MOZART_FQDNgman-jobs.hysds.net
MOZART_RABBIT_USERrabbitMQ user account; default installed by rabbitMQ is guestguest
MOZART_RABBIT_PASSWORDrabbitMQ user password; default installed by rabbitMQ is guestguest
MOZART_REDIS_PVT_IPprivate IP address of mozartredis instance (if running redis on a different instance); otherwise replicate value from MOZART_PVT_IP100.64.134.201
MOZART_REDIS_PUB_IPpublicly accessible IP address of mozart redis instance (if running redis on a different instance); otherwise replicate value from MOZART_PUB_IP64.34.21.123
MOZART_REDIS_FQDNpublicly resolvable FQDN of mozart redis instance (if running redis on a different instance); otherwise replicate value from MOZART_FQDNgman-jobs.hysds.net
MOZART_REDIS_PASSWORDredis password (if AUTH is configured)empty string or <redis password>
MOZART_ES_PVT_IPprivate IP address of mozartelasticsearch instance (if running elasticsearch on a different instance); otherwise replicate value from MOZART_PVT_IP100.64.134.201
MOZART_ES_PUB_IPpublicly accessible IP address of mozartelasticsearch instance (if running elasticsearch on a different instance); otherwise replicate value from MOZART_PUB_IP64.34.21.123
MOZART_ES_FQDNpublicly resolvable FQDN of mozart elasticsearch instance (if running elasticsearch on a different instance); otherwise replicate value from MOZART_FQDNgman-jobs.hysds.net
OPS_USERops account on HySDS cluster instancesops or hysdsops or swotops
OPS_HOMEops account home directory on HySDS cluster instances/home/ops or /data/home/hysdsops
OPS_PASSWORD_HASHsha224sum password hash for ops user account login to HySDS web interfacesoutput of echo -n mypassword | sha224sum
LDAP_GROUPScomma-separated list of LDAP groups to use for user authentication into HySDS web interfaceshysds-v2,aria.users,ariamh
KEY_FILENAMEprivate ssh key to use for logging into other cluster instances; used for deployment via fabric/home/ops/.ssh/my_cloud_keypair.pem
JENKINS_USERaccount on ci instance that owns and runs the Jenkins CI serverjenkins
JENKINS_DIRlocation of the Jenkins HOME directory (where jobs/ directory is located)/var/lib/jenkins
METRICS_PVT_IPprivate IP address of metrics instance100.64.134.153
METRICS_PUB_IPpublicly accessible IP address of metricsinstance, e.g. Elastic IP; can be the same as METRICS_PVT_IP64.34.21.124
METRICS_FQDNpublicly resolvable FQDN of metrics instance; can be the same as METRICS_PVT_IPgman-metrics.hysds.net
METRICS_REDIS_PVT_IPprivate IP address of metrics redis instance (if running redis on a different instance); otherwise replicate value from METRICS_PVT_IP100.64.134.153
METRICS_REDIS_PUB_IPpublicly accessible IP address of metrics redis instance (if running redis on a different instance); otherwise replicate value from METRICS_PUB_IP64.34.21.123
METRICS_REDIS_FQDNpublicly resolvable FQDN of metrics redis instance (if running redis on a different instance); otherwise replicate value from METRICS_FQDNgman-metrics.hysds.net
METRICS_REDIS_PASSWORDredis password (if AUTH is configured)empty string or <redis password>
METRICS_ES_PVT_IPprivate IP address of metrics elasticsearch instance (if running elasticsearch on a different instance); otherwise replicate value from METRICS_PVT_IP100.64.134.153
METRICS_ES_PUB_IPpublicly accessible IP address of metricselasticsearch instance (if running elasticsearch on a different instance); otherwise replicate value from METRICS_PUB_IP64.34.21.124
METRICS_ES_FQDNpublicly resolvable FQDN of metrics elasticsearch instance (if running elasticsearch on a different instance); otherwise replicate value from METRICS_FQDNgman-metrics.hysds.net
GRQ_PVT_IPprivate IP address of grqinstance100.64.134.71
GRQ_PUB_IPpublicly accessible IP address of grq instance, e.g. Elastic IP; can be the same as GRQ_PVT_IP64.34.21.125
GRQ_FQDNpublicly resolvable FQDN of grq instance; can be the same as GRQ_PVT_IPgman-grq.hysds.net
GRQ_PORTport to use for the grq2 REST API8878
GRQ_ES_PVT_IPprivate IP address of grqelasticsearch instance (if running elasticsearch on a different instance); otherwise replicate value from GRQ_PVT_IP100.64.134.71
GRQ_ES_PUB_IPpublicly accessible IP address of grqelasticsearch instance (if running elasticsearch on a different instance); otherwise replicate value from GRQ_PUB_IP64.34.21.125
GRQ_ES_FQDNpublicly resolvable FQDN of grq elasticsearch instance (if running elasticsearch on a different instance); otherwise replicate value from GRQ_FQDNgman-grq.hysds.net
FACTOTUM_PVT_IPprivate IP address of factotum instance100.64.134.184
FACTOTUM_PUB_IPpublicly accessible IP address of factotuminstance, e.g. Elastic IP; can be the same as FACTOTUM_PVT_IP64.34.21.126
FACTOTUM_FQDNpublicly resolvable FQDN of factotum instance; can be the same as FACTOTUM_PVT_IPgman-factotum.hysds.net
CI_PVT_IPprivate IP address of ciinstance100.64.134.179
CI_PUB_IPpublicly accessible IP address of ci instance, e.g. Elastic IP; can be the same as CI_PVT_IP64.34.21.127
CI_FQDNpublicly resolvable FQDN of ci instance; can be the same as CI_PVT_IPgman-ci.hysds.net
VERDI_PVT_IPprivate IP address of verdiinstance; if no verdiinstance, use ci instance value for CI_PVT_IP100.64.134.179
VERDI_PUB_IPpublicly accessible IP address of verdi instance, e.g. Elastic IP; if no verdiinstance, use ci instance value for CI_PUB_IP64.34.21.127
VERDI_FQDNpublicly resolvable FQDN of verdi instance; if no verdi instance, use ciinstance value for CI_FQDNgman-ci.hysds.net
JENKINS_API_USERJenkins user account to use for access to Jenkins APIgmanipon
JENKINS_API_KEYJenkins user API key to use for access to Jenkins API. Go to an already set up Jenkins web page and click on “People”, your username, then “Configure”. Click on “Show API Token”. Use that token and you username for API_USER.<api key>
DAV_SERVERWebDAV server for dataset publication (optional); leave blank if using S3aria-dav.jpl.nasa.gov
DAV_USERWebDAV server account with R/W accessops
DAV_PASSWORDDAV_USER account password<password>
DATASET_AWS_ACCESS_KEYAWS access key for account or role with R/W access to S3 bucket for dataset repository<access key>
DATASET_AWS_SECRET_KEYAWS secret key for DATASET_AWS_ACCESS_KEY<secret key>
DATASET_AWS_REGIONAWS region for S3 bucket for dataset repositoryus-west-2
DATASET_S3_ENDPOINTS3 endpoint for the DATASET_AWS_REGIONs3-us-west-2.amazonaws.com
DATASET_S3_WEBSITE_ENDPOINTS3 website endpoint for the DATASET_AWS_REGIONs3-website-us-west-2.amazonaws.com
DATASET_BUCKETbucket name for dataset repositoryops-product-bucket
AWS_ACCESS_KEYAWS access key for account or role with R/W access to S3 bucket for code/config bundle and docker image repository; can be the same as DATASET_AWS_ACCESS_KEY<access key>
AWS_SECRET_KEYAWS secret key for AWS_ACCESS_KEY; can be the same as DATASET_AWS_SECRET_KEY<secret key>
AWS_REGIONAWS region for S3 bucket for code/config bundle and docker image repositoryus-west-2
S3_ENDPOINTS3 endpoint for the AWS_REGIONs3-us-west-2.amazonaws.com
CODE_BUCKETbucket name for code/config bundle and docker image repositoryops-code-bucket
VERDI_PRIMER_IMAGES3 url to verdi docker image in CODE_BUCKETs3://ops-code-bucket/hysds-verdi-latest.tar.gz
VERDI_TAGdocker tag for verdi docker imagelatest
VERDI_UIDUID of ops user on ciinstance; used to sync UID upon docker image creation1001
VERDI_GIDGID of ops user on ciinstance; used to sync GID upon docker image creation1001
QUEUES (v2.* and earlier)space-delimited list of queues to create autoscaling code/config bundles for"dumby-job_worker-small dumby-job_worker-large"
INSTANCE_TYPES (v2.* and earlier)space-delimited list of instance types to use for the corresponding queue as defined in QUEUES"t2.micro t2.micro"
QUEUES (v3.* and later)

list of queue configurations specifying the queue name and the list of instance types to configure for the autoscaling fleet of workers that will pull from the queue; for each queue configuration, a code/config bundle will be generated

QUEUES:
  - QUEUE_NAME: dumby-job_worker-small
    INSTANCE_TYPES:
      - t2.medium
      - t3a.medium
      - t3.medium
  - QUEUE_NAME: dumby-job_worker-large
    INSTANCE_TYPES:
      - t2.medium
      - t3a.medium
      - t3.medium
VENUEunique tag name to differentiate this HySDS cluster from otherse.g. ops or dev or oasis or test
PROVES_URLurl to PROV-ES server (optional)https://prov-es.jpl.nasa.gov/beta
PROVES_IMPORT_URLPROV-ES API url for import of PROV-ES documents (optional)https://prov-es.jpl.nasa.gov/beta/api/v0.1/prov_es/import/json
DATASETS_CFGlocation of datasets configuration on workers/home/ops/verdi/etc/datasets.json
SYSTEM_JOBS_QUEUEname of queue to use for system jobssystem-jobs-queue
GIT_OAUTH_TOKENoptional Github OAuth token to use on ci instance when checking out code for continuous integration (optional)<token>
CONTAINER_REGISTRYif using the container registry feature, the URL location of the container registry (optional)localhost:5050
CONTAINER_REGISTRY_BUCKETif using the container registry feature, the bucket that will be used for the docker registry's storage backendops-code-bucket
  1. Make sure elasticsearch is up on the mozart and grq instances. You can run the following command to check:

    curl 'http://<mozart/grq ip>:9200/?pretty'


    you should get answer back from ES, something like this:


    {
     "status" : 200,
     "name" : "Dweller-in-Darkness",
     "cluster_name" : "resource_cluster",
     "version" : {
       "number" : "1.7.3",
       "build_hash" : "05d4530971ef0ea46d0f4fa6ee64dbc8df659682",
       "build_timestamp" : "2015-10-15T09:14:17Z",
       "build_snapshot" : false,
       "lucene_version" : "4.10.4"
     },
     "tagline" : "You Know, for Search"
    }
    
    
    
    

    If you can not connect to elastic search, you need to start ElasticSearch in mozart and grq instances:


    sudo systemctl start elasticsearch
    
    
  2. Ensuremozartcomponent can connect to other components over ssh using the configuredKEY_FILENAME. If correctly configured, thesds status allcommand should show that it was able to ssh into each component to check that thesupervisorddaemon was not running like below:

    sds status all
    ########################################
    grq
    ########################################
    [100.64.106.214] Executing task 'status'
    Supervisord is not running on grq.
    ########################################
    mozart
    ########################################
    [100.64.106.38] Executing task 'status'
    Supervisord is not running on mozart.
    ########################################
    metrics
    ########################################
    [100.64.106.140] Executing task 'status'
    Supervisord is not running on metrics.
    ########################################
    factotum
    ########################################
    [100.64.106.64] Executing task 'status'
    Supervisord is not running on factotum.
    ########################################
    ci
    ########################################
    [100.64.106.220] Executing task 'status'
    Supervisord is not running on ci.
    ########################################
    verdi
    ########################################
    [100.64.106.220] Executing task 'status'
    Supervisord is not running on verdi.

    Otherwise if any of the components show the following error, for example for the grq component:

    ########################################
    grq
    ########################################
    [100.64.106.214] Executing task 'status'
    
    Fatal error: Needed to prompt for a connection or sudo password (host: 100.64.106.214), but abort-on-prompts was set to True
    
    Aborting.
    Needed to prompt for a connection or sudo password (host: 100.64.106.214), but abort-on-prompts was set to True
    
    Fatal error: One or more hosts failed while executing task 'status'
    
    Aborting.
    One or more hosts failed while executing task 'status'
    
    

    then there is an issue with the configuredKEY_FILENAMEonmozartor theauthorized_keysfile under the component's~/.sshdirectory for userOPS_USER. Resolve this issue before continuing on.

  3. Update all HySDS components:

    sds update all
    
    

    If you receive any errors, they will need to be addressed.

  4. Start up all HySDS components:

    sds start all
    
    
  5. View status of HySDS components and services:

    sds status all
    
    
  6. During installation, the latest versions of thelightweight-jobscore HySDS package and theverdidocker image was downloaded. Next we import thelightweight-jobspackage:

    cd ~/mozart/pkgs


    sds pkg import container-hysds_lightweight-jobs.*.sdspkg.tar
  7. Finally we copy theverdidocker image to the code bucket (CODE_BUCKETas specified duringsds configure). EnsureVERDI_PRIMER_IMAGEurl is consistent:

    aws s3 cp hysds-verdi-latest.tar.gz s3://<CODE_BUCKET>/hysds-verdi-latest.tar.gz

Next Step

Now that you have your HySDS cluster configured, continue on to Step 5: Running your First "Hello World" Job


  • No labels