Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 6 Next »

Gerald Manipon edited this page on May 16 · 7 revisions

(blue star) Confidence Level TBD  This article has not been reviewed for accuracy, timeliness, or completeness. Check that this information is valid before acting on it.


HySDS provides a plugin facility to hook in arbitrary function calls before (pre-processorfunctions) and after (post-processor functions) the execution of a docker command in a job. HySDS provides builtin pre-processor functions that runs by default for each job (hysds.utils.localize_urlshysds.utils.mark_localized_datasetshysds.utils.validate_checksum_files) to localize any inputs specified in the job payload. It also provides builtin post-processor functions that runs by default as well (hysds.utils.publish_datasets) to search the work directory for any HySDS datasets to publish.

Function Definition

Pre-processor and post-processor functions are required to take 2 arguments: the job dict and context dict. These functions can do what they need using information stored in the job and context dicts. The function must return a boolean result, either True or False. In the case of pre-processor functions, the docker command for the job is only executed if all pre-processor functions return a True. In the case of post-processor functions, any that return a False will be logged but the job will not be considered a failure. In both cases, if the function raises an exception, then the job is a failure.

Pre-processor Functions

By default, job-specs have an implicit pre-processor function defined, hysds.utils.localize_urls. Thus the following job-spec:

{
  "command":"/home/ops/ariamh/ariaml/extractFeatures.sh",
  "disk_usage":"1GB",
  "imported_worker_files": {
    "/home/ops/.netrc": "/home/ops/.netrc",
    "/home/ops/.aws": "/home/ops/.aws",
    "/home/ops/ariamh/conf/settings.conf": "/home/ops/ariamh/conf/settings.conf"
  },
  "params" : [
    {
      "name": "uri",
      "destination": "positional"
    },
    {
      "name": "uri",
      "destination": "localize"
    }
  ]
}

implicitly provides the "pre" parameter as follows:

{
  "command":"/home/ops/ariamh/ariaml/extractFeatures.sh",
  "disk_usage":"1GB",
  "imported_worker_files": {
    "/home/ops/.netrc": "/home/ops/.netrc",
    "/home/ops/.aws": "/home/ops/.aws",
    "/home/ops/ariamh/conf/settings.conf": "/home/ops/ariamh/conf/settings.conf"
  },
  "pre": [ "hysds.utils.localize_urls" ],
  "params" : [
    {
      "name": "uri",
      "destination": "positional"
    },
    {
      "name": "uri",
      "destination": "localize"
    }
  ]
}

To disable this builtin feature, specify the "disable_pre_builtins" parameter as follows:

{
  "command":"/home/ops/ariamh/ariaml/extractFeatures.sh",
  "disk_usage":"1GB",
  "imported_worker_files": {
    "/home/ops/.netrc": "/home/ops/.netrc",
    "/home/ops/.aws": "/home/ops/.aws",
    "/home/ops/ariamh/conf/settings.conf": "/home/ops/ariamh/conf/settings.conf"
  },
  "disable_pre_builtins": true,
  "params" : [
    {
      "name": "uri",
      "destination": "positional"
    },
    {
      "name": "uri",
      "destination": "localize"
    }
  ]
}

In this case, even though there exists a parameter "uri" with destination "localize", the hysds.utils.localize_urls pre-processor will not run. To specify additional pre-processor functions, define them using the "pre" parameter like so:

{
  "command":"/home/ops/ariamh/ariaml/extractFeatures.sh",
  "disk_usage":"1GB",
  "imported_worker_files": {
    "/home/ops/.netrc": "/home/ops/.netrc",
    "/home/ops/.aws": "/home/ops/.aws",
    "/home/ops/ariamh/conf/settings.conf": "/home/ops/ariamh/conf/settings.conf"
  },
  "pre": [ "my.custom.preprocessor_function" ],
  "params" : [
    {
      "name": "uri",
      "destination": "positional"
    },
    {
      "name": "uri",
      "destination": "localize"
    }
  ]
}

Because the default builtin pre-processor functions aren't disabled, the effect of the above job-spec is this:

{
  "command":"/home/ops/ariamh/ariaml/extractFeatures.sh",
  "disk_usage":"1GB",
  "imported_worker_files": {
    "/home/ops/.netrc": "/home/ops/.netrc",
    "/home/ops/.aws": "/home/ops/.aws",
    "/home/ops/ariamh/conf/settings.conf": "/home/ops/ariamh/conf/settings.conf"
  },
  "disable_pre_builtins": true,
  "pre": [ "hysds.utils.localize_urls", "my.custom.preprocessor_function" ],
  "params" : [
    {
      "name": "uri",
      "destination": "positional"
    },
    {
      "name": "uri",
      "destination": "localize"
    }
  ]
}

Post-processor Functions

By default, job-specs have an implicit post-processor function defined, hysds.utils.publish_datasets. Thus the following job-spec:

{
  "command":"/home/ops/ariamh/ariaml/extractFeatures.sh",
  "disk_usage":"1GB",
  "imported_worker_files": {
    "/home/ops/.netrc": "/home/ops/.netrc",
    "/home/ops/.aws": "/home/ops/.aws",
    "/home/ops/ariamh/conf/settings.conf": "/home/ops/ariamh/conf/settings.conf"
  },
  "params" : [
    {
      "name": "uri",
      "destination": "positional"
    },
    {
      "name": "uri",
      "destination": "localize"
    }
  ]
}

implicitly provides the "post" parameter as follows:

{
  "command":"/home/ops/ariamh/ariaml/extractFeatures.sh",
  "disk_usage":"1GB",
  "imported_worker_files": {
    "/home/ops/.netrc": "/home/ops/.netrc",
    "/home/ops/.aws": "/home/ops/.aws",
    "/home/ops/ariamh/conf/settings.conf": "/home/ops/ariamh/conf/settings.conf"
  },
  "post": [ "hysds.utils.publish_datasets" ],
  "params" : [
    {
      "name": "uri",
      "destination": "positional"
    },
    {
      "name": "uri",
      "destination": "localize"
    }
  ]
}

To disable this builtin feature, specify the "disable_post_builtins" parameter as follows:

{
  "command":"/home/ops/ariamh/ariaml/extractFeatures.sh",
  "disk_usage":"1GB",
  "imported_worker_files": {
    "/home/ops/.netrc": "/home/ops/.netrc",
    "/home/ops/.aws": "/home/ops/.aws",
    "/home/ops/ariamh/conf/settings.conf": "/home/ops/ariamh/conf/settings.conf"
  },
  "disable_post_builtins": true,
  "params" : [
    {
      "name": "uri",
      "destination": "positional"
    },
    {
      "name": "uri",
      "destination": "localize"
    }
  ]
}

In this case, even though the docker command may create a HySDS dataset in the job's working directory, the hysds.utils.publish_urls post-processor will not run. To specify additional post-processor functions, define them using the "post" parameter like so:

{
  "command":"/home/ops/ariamh/ariaml/extractFeatures.sh",
  "disk_usage":"1GB",
  "imported_worker_files": {
    "/home/ops/.netrc": "/home/ops/.netrc",
    "/home/ops/.aws": "/home/ops/.aws",
    "/home/ops/ariamh/conf/settings.conf": "/home/ops/ariamh/conf/settings.conf"
  },
  "post": [ "my.custom.postprocessor_function" ],
  "params" : [
    {
      "name": "uri",
      "destination": "positional"
    },
    {
      "name": "uri",
      "destination": "localize"
    }
  ]
}

Because the default builtin post-processor functions aren't disabled, the effect of the above job-spec is this:

{
  "command":"/home/ops/ariamh/ariaml/extractFeatures.sh",
  "disk_usage":"1GB",
  "imported_worker_files": {
    "/home/ops/.netrc": "/home/ops/.netrc",
    "/home/ops/.aws": "/home/ops/.aws",
    "/home/ops/ariamh/conf/settings.conf": "/home/ops/ariamh/conf/settings.conf"
  },
  "disable_pre_builtins": true,
  "post": [ "hysds.utils.publish_datasets", "my.custom.preprocessor_function" ],
  "params" : [
    {
      "name": "uri",
      "destination": "positional"
    },
    {
      "name": "uri",
      "destination": "localize"
    }
  ]
}

triage

HySDS provides an additional builtin post-processor function for providing triage of failed jobs: hysds.utils.triage. It is not enabled by default and must be explicitly set:

{
  "command":"/home/ops/ariamh/ariaml/extractFeatures.sh",
  "disk_usage":"1GB",
  "imported_worker_files": {
    "/home/ops/.netrc": "/home/ops/.netrc",
    "/home/ops/.aws": "/home/ops/.aws",
    "/home/ops/ariamh/conf/settings.conf": "/home/ops/ariamh/conf/settings.conf"
  },
  "post": [ "hysds.utils.triage" ],
  "params" : [
    {
      "name": "uri",
      "destination": "positional"
    },
    {
      "name": "uri",
      "destination": "localize"
    }
  ]
}

The above job-spec results in 2 post-processing functions being called after the docker command for the job executes: hysds.utils.publish_datasets and hysds.utils.triage. In both cases, the functions perform the proper checks up-front before continuing on with their functionality. For example, hysds.utils.publish_datasets checks the exit code of the docker command by inspecting the job dict and only proceeds with searching for HySDS datasets if the exit code was 0. For hysds.utils.triage, the function checks that the exit code was not a 0and continues with the creation and publishing of the triage dataset if so. If the job was triaged, there will be a _triaged.json file left in the job's work directory which contains the JSON result returned by the GRQ ingest REST call.

What gets triaged?

By default, all _* and *.log files in the root of the job work directory are triaged.

Can we triage other files as well?

The behavior of the triage function can be modified by updating certain fields in the _context.json of the job's work directory. Within the docker command that is called, the process can open up the _context.json and add a top-level parameter named _triage_additional_globs as a list of glob patterns that will be triaged in addition to the default files. For example, a python script that is called as a docker command for the job can add these files for triage:

with open(ctx_file) as f:
ctx = json.load(f)

ctx['_triage_additional_globs'] = [ 'S1-IFG*', 'AOI_*', 'celeryconfig.py', 'datasets.json' ]

with open(ctx_file, 'w') as f:
    json.dump(ctx, f, sort_keys=True, indent=2)

Triage is enabled in the job-spec. Can we disable triage at runtime?

Yes. Similar to the addition of other files to triage, you can add a top-level parameter named _triage_disabled to disable triage:

with open(ctx_file) as f:
ctx = json.load(f)

ctx['_triage_additional_globs'] = [ 'S1-IFG*', 'AOI_*', 'celeryconfig.py', 'datasets.json' ]

if some_condition:
    ctx['_triage_disabled'] = True

with open(ctx_file, 'w') as f:
    json.dump(ctx, f, sort_keys=True, indent=2)

Can we customize the job ID format for a triage job?

Yes. By default, the job ID format for triage jobs uses the following python format string:

triaged_job-{job[job_info][id]}-{job[task_id]}

This string format is then passed both the job payload (e.g. contents of _job.json) and context payload (e.g. contents of _context.json) at format time.

Similar to the way you can modify the behavior of triage in the previous 2 questions, you can add a top-level parameter named _triage_id_format to override the default format string:

 
with open(ctx_file) as f:
ctx = json.load(f)

ctx['_triage_id_format'] = "my_custom_triage_job-{job[job_info][id]}-{job[job_info][time_start]}" with open(ctx_file, 'w') as f: json.dump(ctx, f, sort_keys=True, indent=2)
 
  • No labels