Updated 2022-08-08

PBS Scripting Guide

What is PBS and what is it used for?

  • PBS stands for Portable Bash Script
  • A PBS Script is used to submit jobs (computation user wants to be done) to the scheduler. These jobs can then be handled by the scheduler and require no further input from the user, who often simply logs out after succesful submission.
  • The PBS Script tells the scheduler what resources the job will need (# of processors, memory, time etc), and defines the computation (application/ software etc user wants to be run)
  • The PBS script tells the cluster what you want to run and how to run it

PBS Directives

  • Anything that starts with #PBS in the PBS script (job submission script) is a PBS directive
  • Writing PBS directives in the PBS Script saves you from writing a long line of options for qsub when submitting job
  • More info on PBS directives can be found by looking at the options in the qsub manual either by running man qsub or by visiting the online manuals

Common Options

  • -A: Account. Required for Phoenix cluster job submissions, defines the account to charge for the job. If you don't know the charge accounts you have access to, run the command pace-quota
  • -l: resource-list. One of the more important options, defines resources that the job requires. Examples are:

    • #PBS -l walltime=10:00:00 job requires 10 hours to run, will be stopped at 10 hours
    • #PBS -l nodes=2:ppn=4 job requires two nodes and 4 processors per node, for a total of 8 processors
    • #PBS -l mem=2gb job requires 2gb over all nodes
  • -q: the queue the job is going to be submitted to. Ex: #PBS -q inferno

  • -N : job name (name that shows up in queue). Ex: #PBS -N mpiScript
  • -o : names output file. Ex: #PBS -o results.out
  • -j oe: combines output and error into one file. Ex: #PBS -j oe
  • -m <a,b,e>: email. Will send a status email based on any combination of a,b,e. b is when job begins, e is when job ends, and a is if job is aborted. Ex: #PBS -m abe send an email for all three scenarios
  • -M: email addresses to send status emails to. Ex: #PBS -M user1@gatech.edu user2@gatech.edu will send emails to user1 and user2
  • more can be found in the manual. When logged in to the cluster, run man qsub
  • Here is an example script that does not utilize any software or load any modules. It simply prints the name of the node it was started on
#PBS -N exampleScript               # name of job
#PBS -A [Account]                   # account to which job is charged, ex: GT-gburdell3
#PBS -l nodes=1:ppn=2               # resources allocated, 1 node 2 processors
#PBS -l pmem=2gb                    # memory per core
#PBS -l walltime=15:00              # job will run at most 15 min
#PBS -q inferno                     # job is submitted to inferno queue
#PBS -j oe                          # output and error is combined into the same file
#PBS -o gettingStarted.out          # output file is named gettingStarted.out

                                    # computation starts here
cd $PBS_O_WORKDIR                   # changes into directory where script was submitted from

echo "Started on `/bin/hostname`"   # prints name of compute node job was started on

Caution

It is good to add a blank line at the end of the PBS script. Sometimes, without the new line the PBS script will fail to execute

  • Notice the structure of the script. The first part is PBS directives, while the second part is computation. This is how most if not all PBS scripts will be set up.
  • Most PBS Scripts use the same roughly 6 - 10 directives. Because of this, the PBS scripts we provide can be used as templates to hopefully reduce your workload. Simply include your own computation and tweak the parameters for the directives.

Tip

  • To specify a processor type add one of the following lines:
  • Intel: #PBS -l feature=intel
  • Amd: #PBS -l feature=amd

Tip

  • In your PBS script, you can request Specific Core Count / Nodes
  • To request a specific core count:
    • Add cores:x to the directive in your PBS script requesting nodes. Example: #PBS -l nodes=1:ppn=1:cores24
  • To reserve a job on specific nodes:
    • Add the -l nodes=<names> to the directives part of your PBS script. Example: #PBS -l nodes=rich133-p32-33-l.pace.gatech.edu+rich133-p32-33-r.pace.gatech.edu would request 2 specific nodes.

Creating PBS Scripts and getting them onto the cluster

  • To follow along this guide, you can download the script above and send it to your account on the cluster. You can also create an edit the PBS script on your personal computer, then transfer it over to the cluster. Refer to file transfer and storage if you are unsure how to do that
  • Or while logged in to the cluster, you can create, edit, and save your own PBS script and type in the commands from above. Heres how:
  • Create and open a PBS script file with a text editor such as vim. To use vim, type the command vim gettingStarted.pbs (you can replace "gettingStarted" with any name). Wherever you open the editor is where the script will be saved.
  • Then type the PBS script as it is above
  • To save using vim, type :wq to exit and save. You now have the PBS script ready to run

Submitting PBS Script to scheduler

  • Navigate to where the PBS script is located
  • Submit the job by running qsub gettingStarted.pbs , replace "gettingStarted.pbs" with whatever you called the file
  • If successful, something like 21558450.shared-sched.pace.gatech.edu will be printed out. That number is the jobID and is useful to check status of or a delete a job.

How to check the status of my job and/or delete my job?

  • Once the job is submitted, based on the resources requested and how many jobs are currently being run, as well as how many jobs are already in the queue, your job might wait in the queue anywhere from minutes to hours.
  • Easy way to check status is by using qstat
qstat <jobid>  #ex: qstat 21558450
qstat -u <gtusername3> -n  #another way of checking job status
  • This will display a grid with with info about the job. Look for the "S" column (status) to see the current state of the job. Q is in queue, R is running, and C is completed. C might mean either the job failed, was deleted by the user, or finished running.
  • To estimate queue time (after the job has been in the queue for roughly 30 seconds) you can use showstart <jobid>
  • To delete a job, use qdel <jobid>
  • For more ways to check job status and more helpful commands, check out the helpful commands cheatsheet

Where are all the results of my job?

  • Everything that is an output of the job will be printed to the output file (which was named in the PBS Script). In this case, it will be in the gettingStarted.out file. The out file will be in the same directory where the PBS script was ran. Use ls to see the contents of a directory
  • Any files that are created by the job will be also show up in whatever folder the PBS script was submitted from. If you are generating many files, see the advanced section.
  • To view the contents of the output file, open it in a text editor such as vim with vim gettingStarted.out
  • The output will be in the middle, and in this case should be something like "Started on iw-k41-38-r.pace.gatech.edu"
  • To move the output file / any files created off of the cluster, refer to the guide on file transfer and storage
  • Congratulations! You have succesfully used a PBS Script to submit and run a job on the cluster.

Exported Batch Environment Variables

  • Many environment variables are created when a batch job is started which can be used as needed in the batch script with $VARIABLE_NAME
  • $PBS_O_WORKDIR is one that you already use, but there are many others
Variable Description
PBS_ARRAYID Zero-based value of job array index for this job (in version 2.2.0 and later)
PBS_ENVIRONMENT Set to PBS_BATCH to indicate the job is a batch job, or to PBS_INTERACTIVE to indicate the job is a PBS interactive job (see -I option).
PBS_GPUFILE Line-delimited list of GPUs allocated to the job located in TORQUE_HOME/aux/jobidgpu. Each line follows the following format: -gpu For example, myhost-gpu1.
PBS_JOBCOOKIE Job cookie
PBS_JOBID Unique pbs job id
PBS_JOBNAME User specified jobname
PBS_MOMPORT Active port for MOM daemon
PBS_NODEFILE File containing line delimited list of nodes allocated to the job
PBS_NODENUM Node offset number
PBS_NP Number of execution slots (cores) for the job
PBS_NUM_NODES Number of nodes allocated to the job
PBS_NUM_PPN Number of procs per node allocated to the job
PBS_O_HOME Home directory of submitting user
PBS_O_HOST Host on which job script is currently running
PBS_O_LANG Language variable for job
PBS_O_LOGNAME Name of submitting user
PBS_O_PATH Path variable used to locate executables within job script
PBS_O_SHELL Script shell
PBS_O_WORKDIR Job's submission directory
PBS_QUEUE Job queue
PBS_TASKNUM Number of tasks requested
  • This table comes from Adaptive Computing's guide

Advanced: Generating many files

  • If you are generating large amounts of files, use a tmp directory to prevent system slowdown.