Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Gerald Manipon edited this page on May 17, 2018 · 8 revisions

Workers

What are HySDS jobs?

HySDS jobs are essentially celery tasks. More specifically, they are celery tasks that encapsulate the execution of some executable within a docker image. The celery task callable (hysds.job_worker.run_job) is responsible for setup, execution, and tear down of the job's work environment. Specifically, it ensures:

  • there is enough free space on the root work directory (threshold defaults to 10% free)
    • if there isn't, it cleans out old work directories until the threshold is met
  • the job has a unique work directory to execute in
  • job state is propagated to mozart
  • job metrics is propagated to metrics
  • pre-processing steps are executed
    • default built-in pre-processing step is hysds.utils.localize_urls which downloads input data
  • docker parameters such as volume mounts and UID/GID are set according to job specifications (job-spec)
  • executable is run via docker
  • post-processing steps are executed
    • default built-in post-processing step is hysds.utils.publish_datasets which searches for and publishes HySDS datasets generated by the executable

How do you define a HySDS job?

You define a HySDS job by defining a job-spec and a hysds-io. See Job and HySDS IO Specifications. For a step-by-step example, see Hello World.

What are HySDS Workers?

Workers are Celery-level workers that run tasks. Since jobs are tasks, they also run jobs within the context of a unique working directory.

Each job is invoked from a unique working directory on the worker node.

Worker Events

See http://celery.readthedocs.org/en/latest/userguide/monitoring.html#worker-events

worker-online

signature: worker-online(hostname,timestamp,freq,sw_ident,sw_ver,sw_sys)

The worker has connected to the broker and is online.

  • hostname: Hostname of the worker.
  • timestamp: Event timestamp.
  • freq: Heartbeat frequency in seconds (float).
  • sw_ident: Name of worker software (e.g. py-celery).
  • sw_ver: Software version (e.g. 2.2.0).
  • sw_sys: Operating System (e.g. Linux, Windows, Darwin).

worker-heartbeat

signature: worker-heartbeat(hostname,timestamp,freq,sw_ident,sw_ver,sw_sys,active,processed)

Sent every minute, if the worker has not sent a heartbeat in 2 minutes, it is considered to be offline.

  • hostname: Hostname of the worker.
  • timestamp: Event timestamp.
  • freq: Heartbeat frequency in seconds (float).
  • sw_ident: Name of worker software (e.g. py-celery).
  • sw_ver: Software version (e.g. 2.2.0).
  • sw_sys: Operating System (e.g. Linux, Windows, Darwin).
  • active: Number of currently executing tasks.
  • processed: Total number of tasks processed by this worker.

worker-offline

signature: worker-offline(hostname,timestamp,freq,sw_ident,sw_ver,sw_sys)

The worker has disconnected from the broker.

Celery Worker Naming Convention

The naming of the worker is important for parsing purposes to be displayed on mozart's faceted search.

Transport

Job events are shipped out to mozart via redis using with msgpack.

msgpack

It's fast, small, and has first class language support.http://msgpack.org/

PGE handling

Work dir scrubbers

POSIX signal handling for verdi worker

Verdi has python handlers for capturing any kill signal from celery worker. verdi then emits them as events to mozart via redis.

Supported POSIX signal handling and event emitting from verdi:

  • 1 SIGHUP: Hangup
  • 2 SIGINT: Terminal interrupt signal.
  • 3 SIGQUIT: Terminal quit signal.
  • 6 SIGABRT: Process abort signal
  • 9 SIGKILL: Kill (cannot be caught or ignored).
  • 15 SIGTERM: Termination signal.

Localize and Publish Data Products

Run in stand-alone test mode

Create the ./work directory and run the following command:

HYSDS_DATASETS_CFG=~/verdi/ops/hysds/configs/datasets/datasets.json HYSDS_WORKER_CFG=job_worker.json ~/verdi/ops/hysds/scripts/run_job.py test_job.json
  • No labels