Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 12 Next »

Operators can identify failed jobs using (TODO: maybe rephrase in more accessible language?) any precondition failure in the Resource Management interface (Figaro). This allows jobs to be processed in bulk using any of the lightweight job management functions found in the On-Demand window.

Instructions

Update: precondition failure “is when a PGE fails not because of a runtime error but due to missing inputs, corrupt inputs, etc. These show up as failed jobs… Not something easily visible. These aren’t necessarily distinguished from a failed job”

TODO: Update step 1 to click on the Red failed job Tile along top bar. (It’s more reliable and user friendly) --per Andrew

  1. Inside Figaro, first select the “job-failed” facet (See #1 in image) under the left-hand menu column labeled “status”. This will narrow the scope of jobs displayed within Figaro to only those with that status.

Note: the left-hand menu column dynamically updates according to the chosen facets. If an option is not visible confirm that any undesired facets aren’t selected by mistake.

2. Next, select the type of error. In this example, the error “SoftTimeLimitExceeded()” (#2) is faceted on.

3. Each facet appears across the top of the updated job results in a blue box (#3). After the desired job-failed facet is selected, the total number of matching jobs (#4) can be seen under the facet tags. Next, click the “On-Demand” button (#5) to select the desired lightweight job action to perform.

4. In the On-Demand window, add a unique user-defined tag (#6) and select the lightweight job to perform from the drop-down menu (#7).

5. When selecting the job from the Action drop-down menu,(TODO: confirm accuracy and wording of following text) its recommended to use the latest release of the desired job type. The release date is noted in square brackets following the job type name. In this example (#8) the “Purge jobs” action is chosen, and the latest release date is: [release-20180529].

6. Select “Process Now” to complete the selected task on the targeted failed jobs.

  • No labels