...
...
...
Page Navigation: | |
---|---|
|
Confidence Level TBD This article has not been reviewed for accuracy, timeliness, or completeness. Check that this information is valid before acting on it. |
---|
When installation of the HySDS framework is complete on your mozart
instance (see Installation), we must configure the rest of the cluster instances so that they can talk to each other. We do this using
...
the sdscli
...
command on
...
the mozart
...
instance. The idea is that all code and configuration is centralized on
...
the mozart
...
instance and when ready to deploy updates during the development cycle or when upgrading operations, we can push them easily from a single location.
Configure your cluster parameters
...
using
sdscli
:
...
The
sdscli
...
repo was installed on
...
your
mozart
instance during Installation. Configure your cluster by running:Code Block cd ~ source ~/mozart/bin/activate sds configure
...
The
sds configure
...
command will prompt you for your cluster parameters. A description of the parameters with examples is provided below
*** LOOK OVER THE FIELDS AND HAVE THE VALUES READY BEFORE HAND ***
field | description | example |
---|---|---|
MOZART_PVT_IP | private IP address |
...
of | 100.64.134.201 |
MOZART_PUB_IP | publicly accessible IP address |
...
of |
...
instance, e.g. Elastic IP; can be the same as MOZART_PVT_IP | 64.34.21.123 |
MOZART_FQDN | publicly resolvable FQDN |
...
of |
...
instance; can be the same as MOZART_PVT_IP | |
MOZART_RABBIT_PVT_IP | private IP address |
...
of | 100.64.134.201 |
MOZART_RABBIT_PUB_IP | publicly accessible IP address |
...
of | 64.34.21.123 |
MOZART_RABBIT_FQDN | publicly resolvable FQDN |
...
of |
...
rabbitMQ instance (if running rabbitMQ on a different instance); otherwise replicate value from MOZART_FQDN | |
MOZART_RABBIT_USER | rabbitMQ user account; default installed by rabbitMQ |
...
is | guest |
MOZART_RABBIT_PASSWORD | rabbitMQ user password; default installed by rabbitMQ |
...
is | guest |
MOZART_REDIS_PVT_IP | private IP address |
...
of | 100.64.134.201 |
MOZART_REDIS_PUB_IP | publicly accessible IP address |
...
of |
...
redis instance (if running redis on a different instance); otherwise replicate value from MOZART_PUB_IP | 64.34.21.123 |
MOZART_REDIS_FQDN | publicly resolvable FQDN |
...
of |
...
redis instance (if running redis on a different instance); otherwise replicate value from MOZART_FQDN | ||
MOZART_REDIS_PASSWORD | redis password (if AUTH is configured) | empty string or <redis password> |
MOZART_ES_PVT_IP | private IP address |
...
of | 100.64.134.201 |
MOZART_ES_PUB_IP | publicly accessible IP address |
...
of | 64.34.21.123 |
MOZART_ES_FQDN | publicly resolvable FQDN |
...
of |
...
elasticsearch instance (if running elasticsearch on a different instance); otherwise replicate value from MOZART_FQDN | ||
OPS_USER | ops account on HySDS cluster instances | ops or hysdsops or swotops |
OPS_HOME | ops account home directory on HySDS cluster instances | /home/ops or /data/home/hysdsops |
OPS_PASSWORD_HASH | sha224sum password hash for ops user account login to HySDS web interfaces | output |
...
of | ||
LDAP_GROUPS | comma-separated list of LDAP groups to use for user authentication into HySDS web interfaces | hysds-v2,aria.users,ariamh |
KEY_FILENAME | private ssh key to use for logging into other cluster instances; used for deployment via fabric | /home/ops/.ssh/my_cloud_keypair.pem |
JENKINS_USER | account |
...
on |
...
instance that owns and runs the Jenkins CI server | jenkins | |
JENKINS_DIR | location of the Jenkins HOME directory (where jobs/ directory is located) | /var/lib/jenkins |
METRICS_PVT_IP | private IP address |
...
of |
...
instance | 100.64.134.153 |
METRICS_PUB_IP | publicly accessible IP address |
...
of | 64.34.21.124 |
METRICS_FQDN | publicly resolvable FQDN |
...
of |
...
instance; can be the same as METRICS_PVT_IP | |
METRICS_REDIS_PVT_IP | private IP address |
...
of |
...
redis instance (if running redis on a different instance); otherwise replicate value from METRICS_PVT_IP | 100.64.134.153 |
METRICS_REDIS_PUB_IP | publicly accessible IP address |
...
of |
...
redis instance (if running redis on a different instance); otherwise replicate value from METRICS_PUB_IP | 64.34.21.123 |
METRICS_REDIS_FQDN | publicly resolvable FQDN |
...
of |
...
redis instance (if running redis on a different instance); otherwise replicate value from METRICS_FQDN | ||
METRICS_REDIS_PASSWORD | redis password (if AUTH is configured) | empty string or <redis password> |
METRICS_ES_PVT_IP | private IP address |
...
of |
...
elasticsearch instance (if running elasticsearch on a different instance); otherwise replicate value from METRICS_PVT_IP | 100.64.134.153 |
METRICS_ES_PUB_IP | publicly accessible IP address |
...
of | 64.34.21.124 |
METRICS_ES_FQDN | publicly resolvable FQDN |
...
of |
...
elasticsearch instance (if running elasticsearch on a different instance); otherwise replicate value from METRICS_FQDN | |
GRQ_PVT_IP | private IP address |
...
of | 100.64.134.71 |
GRQ_PUB_IP | publicly accessible IP address |
...
of |
...
instance, e.g. Elastic IP; can be the same as GRQ_PVT_IP | 64.34.21.125 |
GRQ_FQDN | publicly resolvable FQDN |
...
of |
...
instance; can be the same as GRQ_PVT_IP | ||
GRQ_PORT | port to use for the grq2 REST API | 8878 |
GRQ_ES_PVT_IP | private IP address |
...
of | 100.64.134.71 |
GRQ_ES_PUB_IP | publicly accessible IP address |
...
of | 64.34.21.125 |
GRQ_ES_FQDN | publicly resolvable FQDN |
...
of |
...
elasticsearch instance (if running elasticsearch on a different instance); otherwise replicate value from GRQ_FQDN | |
FACTOTUM_PVT_IP | private IP address |
...
of |
...
instance | 100.64.134.184 |
FACTOTUM_PUB_IP | publicly accessible IP address |
...
of | 64.34.21.126 |
FACTOTUM_FQDN | publicly resolvable FQDN |
...
of |
...
instance; can be the same as FACTOTUM_PVT_IP | |
CI_PVT_IP | private IP address |
...
of | 100.64.134.179 |
CI_PUB_IP | publicly accessible IP address |
...
of |
...
instance, e.g. Elastic IP; can be the same as CI_PVT_IP | 64.34.21.127 |
CI_FQDN | publicly resolvable FQDN |
...
of |
...
instance; can be the same as CI_PVT_IP | |
VERDI_PVT_IP | private IP address |
...
of |
...
no |
...
use |
...
instance value for CI_PVT_IP | 100.64.134.179 |
VERDI_PUB_IP | publicly accessible IP address |
...
of |
...
instance, e.g. Elastic IP; if |
...
no |
...
use |
...
instance value for CI_PUB_IP | 64.34.21.127 |
VERDI_FQDN | publicly resolvable FQDN |
...
of |
...
instance; if |
...
no |
...
instance, |
...
use | ||
JENKINS_API_USER | Jenkins user account to use for access to Jenkins API | gmanipon |
JENKINS_API_KEY | Jenkins user API key to use for access to Jenkins API. Go to an already set up Jenkins web page and click on “People”, your username, then “Configure”. Click on “Show API Token”. Use that token and you username for API_USER. | <api key> |
DAV_SERVER | WebDAV server for dataset publication (optional); leave blank if using S3 | |
DAV_USER | WebDAV server account with R/W access | ops |
DAV_PASSWORD | DAV_USER account password | <password> |
DATASET_AWS_ACCESS_KEY | AWS access key for account or role with R/W access to S3 bucket for dataset repository | <access key> |
DATASET_AWS_SECRET_KEY | AWS secret key for DATASET_AWS_ACCESS_KEY | <secret key> |
DATASET_AWS_REGION | AWS region for S3 bucket for dataset repository | us-west-2 |
DATASET_S3_ENDPOINT | S3 endpoint for the DATASET_AWS_REGION | |
DATASET_S3_WEBSITE_ENDPOINT | S3 website endpoint for the DATASET_AWS_REGION | |
DATASET_BUCKET | bucket name for dataset repository | ops-product-bucket |
AWS_ACCESS_KEY | AWS access key for account or role with R/W access to S3 bucket for code/config bundle and docker image repository; can be the same as DATASET_AWS_ACCESS_KEY | <access key> |
AWS_SECRET_KEY | AWS secret key for AWS_ACCESS_KEY; can be the same as DATASET_AWS_SECRET_KEY | <secret key> |
AWS_REGION | AWS region for S3 bucket for code/config bundle and docker image repository | us-west-2 |
S3_ENDPOINT | S3 endpoint for the AWS_REGION | |
CODE_BUCKET | bucket name for code/config bundle and docker image repository | ops-code-bucket |
VERDI_PRIMER_IMAGE | S3 url |
...
to |
...
docker image in CODE_BUCKET | |
VERDI_TAG | docker tag |
...
for |
...
docker image | latest |
VERDI_UID | UID of ops user |
...
on | 1001 |
VERDI_GID | GID of ops user |
...
on | 1001 | |
QUEUES (v2.* and earlier) | space-delimited list of queues to create autoscaling code/config bundles for | "dumby-job_worker-small dumby-job_worker-large" |
INSTANCE_TYPES (v2.* and earlier) | space-delimited list of instance types to use for the corresponding queue as defined |
...
in | "t2.micro t2.micro" | |||||||||
| list of queue configurations specifying the queue name and the list of instance types to configure for the autoscaling fleet of workers that will pull from the queue; for each queue configuration, a code/config bundle will be generated |
| ||||||||
VENUE | unique tag name to differentiate this HySDS cluster from others | e.g. ops or dev or oasis or test | ||||||||
PROVES_URL | url to PROV-ES server (optional) | |||||||||
PROVES_IMPORT_URL | PROV-ES API url for import of PROV-ES documents (optional) | https://prov-es.jpl.nasa.gov/beta/api/v0.1/prov_es/import/json | ||||||||
DATASETS_CFG | location |
...
...
on workers | /home/ops/verdi/etc/datasets.json | |
SYSTEM_JOBS_QUEUE | name of queue to use for system jobs | system-jobs-queue |
GIT_OAUTH_TOKEN | optional Github OAuth token to use |
...
on |
...
instance when checking out code for continuous integration (optional) | <token> | |
CONTAINER_REGISTRY | if using the container registry feature, the URL location of the container registry (optional) | localhost:5050 |
CONTAINER_REGISTRY_BUCKET | if using the container registry feature, the bucket that will be used for the docker registry's storage backend | ops-code-bucket |
Make sure elasticsearch is up on the mozart and grq instances. You can run the following command to check:
Code Block curl 'http://<mozart/grq ip>:9200/?pretty'
you should get answer back from ES, something like this:
Code Block { "status" : 200, "name" : "Dweller-in-Darkness", "cluster_name" : "resource_cluster", "version" : { "number" : "1.7.3", "build_hash" : "05d4530971ef0ea46d0f4fa6ee64dbc8df659682", "build_timestamp" : "2015-10-15T09:14:17Z", "build_snapshot" : false, "lucene_version" : "4.10.4" }, "tagline" : "You Know, for Search" }
...
Code Block If you can not connect to elastic search, you need to start ElasticSearch in mozart and grq instances:
Code Block sudo systemctl start elasticsearch
...
Code Block Ensure
mozart
component can connect to other components over ssh using the configuredKEY_FILENAME
. If correctly configured, thesds status all
command should show that it was able to ssh into each component to check that thesupervisord
daemon was not running like below:Code Block sds status all ######################################## grq ######################################## [100.64.106.214] Executing task 'status' Supervisord is not running on grq. ######################################## mozart ######################################## [100.64.106.38] Executing task 'status' Supervisord is not running on mozart. ######################################## metrics ######################################## [100.64.106.140] Executing task 'status' Supervisord is not running on metrics. ######################################## factotum ######################################## [100.64.106.64] Executing task 'status' Supervisord is not running on factotum. ######################################## ci ######################################## [100.64.106.220] Executing task 'status' Supervisord is not running on ci. ######################################## verdi ######################################## [100.64.106.220] Executing task 'status' Supervisord is not running on verdi.
Otherwise if any of the components show the following error, for example for the grq component:
Code Block ######################################## grq ######################################## [100.64.106.214] Executing task 'status' Fatal error: Needed to prompt for a connection or sudo password (host: 100.64.106.214), but abort-on-prompts was set to True Aborting. Needed to prompt for a connection or sudo password (host: 100.64.106.214), but abort-on-prompts was set to True Fatal error: One or more hosts failed while executing task 'status' Aborting. One or more hosts failed while executing task 'status'
...
Code Block then there is an issue with the configured
KEY_FILENAME
onmozart
or theauthorized_keys
file under the component's~/.ssh
directory for userOPS_USER
. Resolve this issue before continuing on.Update all HySDS components:
Code Block sds update all
...
Code Block If you receive any errors, they will need to be addressed.
Start up all HySDS components:
Code Block sds start all
...
Code Block View status of HySDS components and services:
Code Block sds status all
...
Code Block During installation, the latest versions of the
lightweight-jobs
core HySDS package and theverdi
docker image was downloaded. Next we import thelightweight-jobs
package:Code Block cd ~/mozart/pkgs
Code Block sds pkg import container-hysds_lightweight-jobs.*.sdspkg.tar
Finally we copy the
verdi
docker image to the code bucket (CODE_BUCKET
as specified duringsds configure
). EnsureVERDI_PRIMER_IMAGE
url is consistent:Code Block aws s3 cp hysds-verdi-latest.tar.gz s3://<CODE_BUCKET>/hysds-verdi-latest.tar.gz
Code Block
Next Step
Now that you have your HySDS cluster configured, continue on
...