Updated 2022-11-08
Run Processes in Parallel with srun¶
On the Slurm scheduler, it is possible to run multiple processes in parallel natively with srun. This can be an alternative to Pylauncher, GNU Parallel, or job arrays for running a large number of smaller tasks at once in a single job. The method supports the execution of many small tasks in parallel, enabling HTC-style work-flows on HPC systems, such as PACE.
Your Slurm script will contain multiple srun
lines. There are several key requirements for them to run simultaneously:
- Ensure that each srun command asks for a fraction of the CPU and memory resource of the full job, with lines that should run simultaneously requesting less than or equal to the job's total. Each task will start in order as soon as sufficient resources have become available for it.
- Include
-c1
if using 1 CPU per task, which is standard. - Include
&
at the end of each line to have the commands run simultaneously in the background. - Include
wait
at the end of the sequence of srun commands, to avoid having the job end while the processes are running in the background.
In this example, we'll have six total tasks to run and want to run two at a time, each allocated half (12 cores and 84 GB of memory) of the job's resources (24 cores and 168 GB of memory). The third task can start as soon as either of the first two ends, and so on.
#!/bin/bash
#SBATCH -JSlurmParallelSrunExample # Job name
#SBATCH --account=gts-gburdell3 # charge or tracking account
#SBATCH -N1 --ntasks-per-node=24 # Number of nodes and cores per node required
#SBATCH --mem-per-cpu=7G # Memory per core
#SBATCH -t1:00:00 # Duration of the job (Ex: 1 hour)
#SBATCH -qinferno # QOS Name (on Hive, use -p<partition> instead)
#SBATCH -oReport-%j.out # Combined output and error messages file
#SBATCH --mail-type=BEGIN,END,FAIL # Mail preferences
#SBATCH --mail-user=gburdell3@gatech.edu # E-mail address for notifications
srun --quiet -n12 -c1 --mem=84G ./executable1 &
srun --quiet -n12 -c1 --mem=84G ./executable2 &
srun --quiet -n12 -c1 --mem=84G ./executable3 &
srun --quiet -n12 -c1 --mem=84G ./executable4 &
srun --quiet -n12 -c1 --mem=84G ./executable5 &
srun --quiet -n12 -c1 --mem=84G ./executable6 &
wait