(TODO: Note - JIRA #597: How to facet on all failed jobs with the same failure message, purge them and then re-submit them. Related to JIRA #601)
Question: what are all of the possible precondition failures? (this is mentioned in Jira 601).
Rough Draft outline for JIRA #597: How to facet on all failed jobs with the same failure message, purge them and then re-submit them:
(v2 revisions, 4/16/20, begin below)
Navigate to the Resource Manager (Figaro).
Inside Figaro, facet on “job-failed” in the left-hand column under the status menu (See #1 in image).
Narrow the scope to a specific, unique job failure type by selecting the targeted message in the left-hand column under the error column (#2). {{INSERT UPDATED SCREENSHOTS}}
Note: Multiple job types can share an error message. Users can confirm the error message faceted on belongs to only one job type by checking the type menu in the left-hand column.
4. After the similar failed jobs have been faceted on, click the “On Demand” button (#3).
these using “purge” from the drop down menu. (Unclear if its beneficial to add a unique tag here for later steps)
6. Leave other settings unchanged, click “Process Now”
(from here on I’m unclear how to retry these jobs, this is my best guess understanding)
7. Remove the job-failed facet in the Resource Manager. Then search for the unique tag created (in “quotations”) when purging the jobs, “purge_tag_test” in this example. This shows all the failed and now purged similar jobs.
8. Now click On-Demand, add another unique tag, and select “Retry Jobs/Tasks” from the drop-down action menu.
9. This retries the failed and purged jobs. You can remove the “purge_tag_test” facet and add the unique tag created in step #8 to facet on the newly retried similar jobs.
Instructions