Managing Job Workers on Pleiades
Confidence Level TBD This article has not been reviewed for accuracy, timeliness, or completeness. Check that this information is valid before acting on it. |
---|
See background : NASA HECC Pleiades
Setting up Tunnel on Pleiades head node back to PCM in AWS
. The ssh tunnel to Pleiades is through Mamba Cluster factotum
. ssh -i ~/.ssh/int-aria.pem hysdsops@<IP of mamba factotum>
. The ssh tunnel configuration is under ~/.ssh/config. To ssh to the frontend node of tpfe2 of Pleiades,
. ssh tpfe2-tunnel (or use the alias pleiades='ssh tpfe2-tunnel')
. two-factor authentication (an RSA token and a password) is needed to log on
Check Tunnel Ports are live
. hysds_pcm_check_port_forwarded_tunnel_services.sh
. 8 tests should pass, the last one may fail (email service setup on factotum)
. the above script is in GitHub - hysds/hysds-hec-utils: HySDS HEC Utilities
Start “Auto-Scaling” to PBS
. pbs_auto_scale_up.sh
. adjust settings in the # input settings section if necessary
. the above script is in GitHub - hysds/hysds-hec-utils: HySDS HEC Utilities
Job Worker Singularity on Pleiades
Location of Work Dirs
. different user may have different lustre file system assignment, e.g., for user lpan, it is /nobackupp12/lpan. To get lustre quota info, run: lfs quota -u <userid> /nobackupp??
. work dirs are under /nobackup??/<userid>/worker/$year/$month/$day/
. the work dir for a pbs job will be cleaned up as the pbs job finishes
Location of Job Worker logs
. log files are under /nobackup??/<userid>/worker/logs/$year/$month/$day/
. the log files will be kept even after the corresponding pbs jobs finish. Manual cleanup is needed to the log files in order to stay within lustre quota.
on exit of each job worker, it clean up the worker’s work dirs
Debugging
Related Articles: |
---|
Have Questions? Ask a HySDS Developer: |
Anyone can join our public Slack channel to learn more about HySDS. JPL employees can join #HySDS-Community
|
JPLers can also ask HySDS questions at Stack Overflow Enterprise
|
Page Information: |
---|
Was this page useful? |
Contribution History:
|
Subject Matter Expert: @Lei Pan @Hook Hua @Marjorie Lucas |
Find an Error? Is this document outdated or inaccurate? Please contact the assigned Page Maintainer: @Hook Hua |