...
Workers
What are HySDS jobs?
HySDS jobs are essentially celery tasks. More specifically, they are celery tasks that encapsulate the execution of some executable within a docker image. The celery task callable (hysds.job_worker.run_job
) is responsible for setup, execution, and tear down of the job's work environment. Specifically, it ensures:
- there is enough free space on the root work directory (threshold defaults to 10% free)
- if there isn't, it cleans out old work directories until the threshold is met
- the job has a unique work directory to execute in
- job state is propagated to
mozart
- job metrics is propagated to
metrics
- pre-processing steps are executed
- default built-in pre-processing step is
hysds.utils.localize_urls
which downloads input data
- default built-in pre-processing step is
- docker parameters such as volume mounts and UID/GID are set according to job specifications (job-spec)
- executable is run via docker
- post-processing steps are executed
- default built-in post-processing step is
hysds.utils.publish_datasets
which searches for and publishes HySDS datasets generated by the executable
- default built-in post-processing step is
How do you define a HySDS job?
You define a HySDS job by defining a job-spec
and a hysds-io
. See Job and HySDS IO Specifications. For a step-by-step example, see Hello World.
What are HySDS Workers?
Workers are Celery-level workers that run tasks. Since jobs are tasks, they also run jobs within the context of a unique working directory.
Each job is invoked from a unique working directory on the worker node.
Worker Events
See http://celery.readthedocs.org/en/latest/userguide/monitoring.html#worker-events
worker-online
signature: worker-online(hostname,timestamp,freq,sw_ident,sw_ver,sw_sys)
The worker has connected to the broker and is online.
- hostname: Hostname of the worker.
- timestamp: Event timestamp.
- freq: Heartbeat frequency in seconds (float).
- sw_ident: Name of worker software (e.g. py-celery).
- sw_ver: Software version (e.g. 2.2.0).
- sw_sys: Operating System (e.g. Linux, Windows, Darwin).
worker-heartbeat
signature: worker-heartbeat(hostname,timestamp,freq,sw_ident,sw_ver,sw_sys,active,processed)
Sent every minute, if the worker has not sent a heartbeat in 2 minutes, it is considered to be offline.
- hostname: Hostname of the worker.
- timestamp: Event timestamp.
- freq: Heartbeat frequency in seconds (float).
- sw_ident: Name of worker software (e.g. py-celery).
- sw_ver: Software version (e.g. 2.2.0).
- sw_sys: Operating System (e.g. Linux, Windows, Darwin).
- active: Number of currently executing tasks.
- processed: Total number of tasks processed by this worker.
worker-offline
signature: worker-offline(hostname,timestamp,freq,sw_ident,sw_ver,sw_sys)
The worker has disconnected from the broker.
Celery Worker Naming Convention
The naming of the worker is important for parsing purposes to be displayed on mozart's faceted search.
Transport
Job events are shipped out to mozart via redis using with msgpack.
msgpack
It's fast, small, and has first class language support.http://msgpack.org/
PGE handling
Work dir scrubbers
POSIX signal handling for verdi worker
Verdi has python handlers for capturing any kill signal from celery worker. verdi then emits them as events to mozart via redis.
Supported POSIX signal handling and event emitting from verdi:
- 1 SIGHUP: Hangup
- 2 SIGINT: Terminal interrupt signal.
- 3 SIGQUIT: Terminal quit signal.
- 6 SIGABRT: Process abort signal
- 9 SIGKILL: Kill (cannot be caught or ignored).
- 15 SIGTERM: Termination signal.
Localize and Publish Data Products
Run in stand-alone test mode
Create the ./work directory and run the following command:
HYSDS_DATASETS_CFG=~/verdi/ops/hysds/configs/datasets/datasets.json HYSDS_WORKER_CFG=job_worker.json ~/verdi/ops/hysds/scripts/run_job.py test_job.json