Updated 2022-08-16

Convert PBS Scripts to Slurm Scripts

Basic Info

SLURM is a resource manager with scheduling logic integrated into it. In comparison to Moab/Torque, SLURM eliminates the need for dedicated queues. In addition to allocation of resources at the job level, jobs spawn steps (srun instances), which are further allocated resources from within the job's allocation. The job steps can therefore execute sequentially or concurrently.

Visit our Slurm on Hive guide for more information about using Slurm. To learn more about the move to Slurm, visit our Hive Slurm Conversion page.

Preparation Steps

  1. Be sure to recompile software you have written or installed, particularly if it uses MPI. The Slurm cluster contains updated libraries.
  2. Update module load commands to the current software offerings on Hive.
  3. Look up your tracking account with pace-quota.

How is Slurm usage different from Torque/Moab?

  • What Moab called queues, Slurm calls partitions. There is no default partition, user must specify one.
  • Resources are assigned per task/process. One core is given per task by default.
  • Environment variables from the submitting process are passed to the job by default. Use --export=NONE to have a clean environment when running jobs. The default means that variables like $HOSTNAME will be cloned from the login node when jobs are submitted from it.
  • Jobs can be submitted to multiple partitions to run on the first one with availability. User can provide a comma-separated list of partitions in the submission script.
  • First line of job script in Slurm must be #!<shell>, see conversion examples section below.
  • Slurm jobs start in the submission directory rather than $HOME
  • Slurm jobs have stdout and stderr output log files combined by default. For writing to a separate file, user can provide --error or -e option. In Moab, stdout and stderr would go to different files by default, and they were merged with -j oe option.
  • Slurm can send email when your job reaches certain percentage of walltime limit. Ex: sbatch --mail-type=TIME_LIMIT_90 myjob.txt
  • The default memory request on Slurm is 1 GB/core. To request all the memory on a node, include --mem=0.
  • Requesting a number of nodes or cores is structured differently. To request an exact number of nodes, use -N. To request an exact number of cores per node, use --ntasks-per-node. To request a total number of cores, use -n or --ntasks.
  • The commands that you use to submit and manage jobs on the cluster are different for SLURM than they were for Moab. To submit jobs, you will now use sbatch and srun commands. To check the status, you will most commonly use squeue command.
  • Arrays are given SLURM_ARRAY_JOB_ID for the parent job and each child job gets a SLURM_JOB_ID. Moab would assign the same PBS_JOBID to each job with a different index. For more options and guideline how to use arrays in SLURM, please visit the Array Jobs section in the following page.
  • To include environment variables for naming output files in SLURM, you need to use file patters as follows: Job name %x, Job id %j, Job array id %a, Username %u, Hostname %N.
  • srun is the standard SLURM command to start an MPI program. It automatically uses the allocated job resources: nodelist, tasks, logical cores per task. Do not use mpirun.

Warning

Do not use mpirun or mpiexec with Slurm. Use srun instead.

Cheat Sheet

This table lists the most common commands, environment variables, and job specification options used by the major workload management systems. Users can refer to this cheat sheet for converting their PBS scripts to SLURM scripts and user commands. A full list of SLURM commands can be found here. Further guidelines on more advanced scripts are in the user documentation on this page.

Conversion Examples

The following PBS script commands can be rewritten as a SLURM script below.

Specification PBS SLURM
Shell #!/bin/bash(optional) #!/bin/bash
Job Name #PBS -N jobname #SBATCH -Jjobname
Account name #PBS -A accountname (not used on Hive) #SBATCH -A accountname (required on Hive-Slurm)
Job Resources PBS -l nodes=2:ppn=4
#PBS -l nodes=100 (any 100 cores)
#PBS -l pmem=2gb
#PBS -l walltime=10:00 #PBS -l nodes=2:ppn=4:gpus=1,pmem=3gb (8 cores, 2 GPUs, and 24GB memory across 1 or 2 nodes)
#SBATCH -N 2 --ntasks-per-node=4
#SBATCH -n 100 (any 100 cores)
#SBATCH --mem-per-cpu=2G
#SBATCH -t10
#SBATCH -N2 --gres=gpu:1 --gres-flags=enforce-binding --mem-per-gpu=12G (exactly 2 nodes, each with 6 cores, 1 GPU, and 12 GB memory)
Queue Name #PBS -q hive #SBATCH -phive
Output/Error Reports #PBS -j oe
#PBS -o Report-$PBS_JOBID.out

#SBATCH -oReport-%j.out
Email Notification #PBS -m abe
#PBS -M gburdell3@gatech.edu
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH --mail-user=gburdell3@gatech.edu
Work Directory cd $PBS_O_WORKDIR cd $SLURM_SUBMIT_DIR (default/optional)
Run Process hello_world.out srun hello_world.out
MPI Process mpicc -02 mpi_program.c -o mpi_program
mpiexec -n 4 mpi_program program_arguments
mpicc -02 mpi_program.c -o mpi_program
srun mpi_program program_arguments
Array Job #PBS -t 1-10
python myscript.py dataset${PBS_ARRAYID}
#SBATCH --array=1-10
#SBATCH -o %A_%a.out
python myscript.py dataset${SLURM_ARRAY_TASK_ID}

Job Submission Examples

Job type Moab/Torque SLURM
Script Submission qsub myjobscript.pbs sbatch myjobscript.sbatch
Command-line Submission qsub -l walltime=02:00:00 -l nodes=2:ppn=4 -l pmem=1gb
-q hive -o job1.out -e job1.err myscript.py -f myfile.txt
sbatch -A hive-gburdell3 -t 2:00:00 -N2 --ntasks-per-node=4 --mem-per-cpu=1G
-p hive -o job1.out -e job1.err myscript.py -f myfile.txt
Interactive Session qsub -l nodes=1:ppn=4 -l walltime=02:00:00 -l pmem=128gb -q hive -I
mpiexec -n 4 python my_mpi_script.py
exit
salloc -p hive -A hive-gburdell3 -N1 --ntasks-per-node=4 --time=02:00:00 --mem=128G
srun -n 2 python my_mpi_script.py &
srun -n 2 <other_commands> &
(Multiple srun can execute in parallel using &)
exit


This material is based upon work supported by the National Science Foundation under grant number 1828187. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.