Updated 2022-08-08
PBS Scripting Guide¶
What is PBS and what is it used for?¶
- PBS stands for Portable Bash Script
- A PBS Script is used to submit jobs (computation user wants to be done) to the scheduler. These jobs can then be handled by the scheduler and require no further input from the user, who often simply logs out after succesful submission.
- The PBS Script tells the scheduler what resources the job will need (# of processors, memory, time etc), and defines the computation (application/ software etc user wants to be run)
- The PBS script tells the cluster what you want to run and how to run it
PBS Directives¶
- Anything that starts with
#PBS
in the PBS script (job submission script) is a PBS directive - Writing PBS directives in the PBS Script saves you from writing a long line of options for
qsub
when submitting job - More info on PBS directives can be found by looking at the options in the
qsub
manual either by runningman qsub
or by visiting the online manuals
Common Options¶
-A
: Account. Required for Phoenix cluster job submissions, defines the account to charge for the job. If you don't know the charge accounts you have access to, run the commandpace-quota
-
-l
: resource-list. One of the more important options, defines resources that the job requires. Examples are:#PBS -l walltime=10:00:00
job requires 10 hours to run, will be stopped at 10 hours#PBS -l nodes=2:ppn=4
job requires two nodes and 4 processors per node, for a total of 8 processors#PBS -l mem=2gb
job requires 2gb over all nodes
-
-q
: the queue the job is going to be submitted to. Ex:#PBS -q inferno
-N
: job name (name that shows up in queue). Ex:#PBS -N mpiScript
-o
: names output file. Ex:#PBS -o results.out
-j oe
: combines output and error into one file. Ex:#PBS -j oe
-m <a,b,e>
: email. Will send a status email based on any combination of a,b,e. b is when job begins, e is when job ends, and a is if job is aborted. Ex:#PBS -m abe
send an email for all three scenarios-M
: email addresses to send status emails to. Ex:#PBS -M user1@gatech.edu user2@gatech.edu
will send emails to user1 and user2- more can be found in the manual. When logged in to the cluster, run
man qsub
- Here is an example script that does not utilize any software or load any modules. It simply prints the name of the node it was started on
#PBS -N exampleScript # name of job
#PBS -A [Account] # account to which job is charged, ex: GT-gburdell3
#PBS -l nodes=1:ppn=2 # resources allocated, 1 node 2 processors
#PBS -l pmem=2gb # memory per core
#PBS -l walltime=15:00 # job will run at most 15 min
#PBS -q inferno # job is submitted to inferno queue
#PBS -j oe # output and error is combined into the same file
#PBS -o gettingStarted.out # output file is named gettingStarted.out
# computation starts here
cd $PBS_O_WORKDIR # changes into directory where script was submitted from
echo "Started on `/bin/hostname`" # prints name of compute node job was started on
Caution
It is good to add a blank line at the end of the PBS
script. Sometimes, without the new line the PBS
script will fail to execute
- Notice the structure of the script. The first part is PBS directives, while the second part is computation. This is how most if not all PBS scripts will be set up.
- Most PBS Scripts use the same roughly 6 - 10 directives. Because of this, the PBS scripts we provide can be used as templates to hopefully reduce your workload. Simply include your own computation and tweak the parameters for the directives.
Tip
- To specify a processor type add one of the following lines:
- Intel:
#PBS -l feature=intel
- Amd:
#PBS -l feature=amd
Tip
- In your
PBS
script, you can request Specific Core Count / Nodes - To request a specific core count:
- Add
cores:x
to the directive in yourPBS
script requesting nodes. Example:#PBS -l nodes=1:ppn=1:cores24
- Add
- To reserve a job on specific nodes:
- Add the
-l nodes=<names>
to the directives part of yourPBS
script. Example:#PBS -l nodes=rich133-p32-33-l.pace.gatech.edu+rich133-p32-33-r.pace.gatech.edu
would request 2 specific nodes.
- Add the
Creating PBS Scripts and getting them onto the cluster¶
- To follow along this guide, you can download the script above and send it to your account on the cluster. You can also create an edit the
PBS
script on your personal computer, then transfer it over to the cluster. Refer to file transfer and storage if you are unsure how to do that - Or while logged in to the cluster, you can create, edit, and save your own PBS script and type in the commands from above. Heres how:
- Create and open a
PBS
script file with a text editor such asvim
. To usevim
, type the commandvim gettingStarted.pbs
(you can replace "gettingStarted" with any name). Wherever you open the editor is where the script will be saved. - Then type the PBS script as it is above
- To save using vim, type
:wq
to exit and save. You now have the PBS script ready to run
Submitting PBS Script to scheduler¶
- Navigate to where the PBS script is located
- Submit the job by running
qsub gettingStarted.pbs
, replace "gettingStarted.pbs" with whatever you called the file - If successful, something like
21558450.shared-sched.pace.gatech.edu
will be printed out. That number is thejobID
and is useful to check status of or a delete a job.
How to check the status of my job and/or delete my job?¶
- Once the job is submitted, based on the resources requested and how many jobs are currently being run, as well as how many jobs are already in the queue, your job might wait in the queue anywhere from minutes to hours.
- Easy way to check status is by using
qstat
qstat <jobid> #ex: qstat 21558450
qstat -u <gtusername3> -n #another way of checking job status
- This will display a grid with with info about the job. Look for the "S" column (status) to see the current state of the job. Q is in queue, R is running, and C is completed. C might mean either the job failed, was deleted by the user, or finished running.
- To estimate queue time (after the job has been in the queue for roughly 30 seconds) you can use
showstart <jobid>
- To delete a job, use
qdel <jobid>
- For more ways to check job status and more helpful commands, check out the helpful commands cheatsheet
Where are all the results of my job?¶
- Everything that is an output of the job will be printed to the output file (which was named in the PBS Script). In this case, it will be in the gettingStarted.out file. The out file will be in the same directory where the PBS script was ran. Use
ls
to see the contents of a directory - Any files that are created by the job will be also show up in whatever folder the PBS script was submitted from. If you are generating many files, see the advanced section.
- To view the contents of the output file, open it in a text editor such as vim with
vim gettingStarted.out
- The output will be in the middle, and in this case should be something like "Started on iw-k41-38-r.pace.gatech.edu"
- To move the output file / any files created off of the cluster, refer to the guide on file transfer and storage
- Congratulations! You have succesfully used a PBS Script to submit and run a job on the cluster.
Exported Batch Environment Variables¶
- Many environment variables are created when a batch job is started which can be used as needed in the batch script with
$VARIABLE_NAME
$PBS_O_WORKDIR
is one that you already use, but there are many others
Variable | Description |
---|---|
PBS_ARRAYID | Zero-based value of job array index for this job (in version 2.2.0 and later) |
PBS_ENVIRONMENT | Set to PBS_BATCH to indicate the job is a batch job, or to PBS_INTERACTIVE to indicate the job is a PBS interactive job (see -I option). |
PBS_GPUFILE | Line-delimited list of GPUs allocated to the job located in TORQUE_HOME/aux/jobidgpu. Each line follows the following format: |
PBS_JOBCOOKIE | Job cookie |
PBS_JOBID | Unique pbs job id |
PBS_JOBNAME | User specified jobname |
PBS_MOMPORT | Active port for MOM daemon |
PBS_NODEFILE | File containing line delimited list of nodes allocated to the job |
PBS_NODENUM | Node offset number |
PBS_NP | Number of execution slots (cores) for the job |
PBS_NUM_NODES | Number of nodes allocated to the job |
PBS_NUM_PPN | Number of procs per node allocated to the job |
PBS_O_HOME | Home directory of submitting user |
PBS_O_HOST | Host on which job script is currently running |
PBS_O_LANG | Language variable for job |
PBS_O_LOGNAME | Name of submitting user |
PBS_O_PATH | Path variable used to locate executables within job script |
PBS_O_SHELL | Script shell |
PBS_O_WORKDIR | Job's submission directory |
PBS_QUEUE | Job queue |
PBS_TASKNUM | Number of tasks requested |
- This table comes from Adaptive Computing's guide
Advanced: Generating many files¶
- If you are generating large amounts of files, use a tmp directory to prevent system slowdown.