Updated 2023-07-05
Using Slurm on Hive¶
Tip
Visit our conversion guide to convert your PBS scripts from prior to August 2022 to Slurm scripts. To learn more about the move to Slurm, visit our Hive Slurm Conversion page.
View this very useful guide from SchedMD for additional Slurm commands and options beyond those listed below. Further guidelines on more advanced scripts are in the user documentation on this page. The sections below are covered in detail on this page, click on link to navigate:
Informational Commands
Interactive Jobs
Batch Jobs - Basic Python Example
MPI Jobs
Array Jobs
GPU Jobs
Informational Commands ¶
squeue¶
Use squeue
to check job status for pending (PD) and running (R) jobs. Many options are available to include with the command, including these:
- Add
-j <job number>
to show information about specific jobs. Separate multiple job numbers with a comma. - Add
-u <username>
to show jobs belonging to a specific user, e.g.,-u gburdell3
. - Add
-A <tracking account>
to see jobs belonging to a specific tracking account, e.g.,-A hive-gburdell3
. - Add
-p <partition>
to see jobs submitted to a specific partition, e.g.,-p hive-gpu
. - Run
man squeue
or visit the squeue documentation page for more options.
sacct¶
After a job has completed, use sacct
to find information about it. Many of the same options for squeue
are available.
- Add
-j <job number>
to find information about specific jobs. - Add
-u <username>
to see all jobs belonging to a specific user. - Add
-A <tracking account>
to see jobs belonging to a specific tracking account, e.g.,-A hive-gburdell3
. - Add
-X
to show information only about the allocation, rather then steps inside it. - Add
-S <time>
to list jobs only after a specified time. Multiple time formats are accepted, including YYYY-MM-DD[HH:MM[:SS]], e.g., 2022-08-0119:05:23. - Add
-o <fields>
to specify which columns of data should appear in the output. Runsqueue --helpformat
to see a list of available fields. - Run
man sacct
or visit the sacct documentation page for more options.
scancel¶
To cancel a job, run scancel <job number>
, e.g., scancel 1440
to cancel job 1440. You can use squeue
to find the job number first.
pace-check-queue¶
The pace-check-queue
utility provides an overview of current utilization of each partition's nodes. Use the name of a specific partition as the input, e.g., pace-check-queue hive
. On Slurm clusters, utilized and allocated local disk (including percent utilization) are not available.
- Add
-s
to see all features of each node in the partition. - Add
-c
to color-code the "Accepting Jobs?" column.
pace-job-summary¶
The pace-job-summary
provides high level overview about job processed on the cluster. Usage of the utility is very simple as follows:
[gburdell3@login-hive-slurm ~]$ pace-job-summary
Usage: `pace-job-summary <JobID>`
Output example:
[gburdell3@login-hive-slurm ~]$ pace-job-summary 2836
---------------------------------------
Begin Slurm Job Summary for 2836
Query Executed on 2022-08-17 at 18:21:33
---------------------------------------
Job ID: 2836
User ID: gburdell3
Account: hive-gburdell3
Job name: SlurmPythonExample
Resources: cpu=4,mem=4G,node=1
Rsrc Used: cput=00:00:08,vmem=0.8M,walltime=00:00:02,mem=0.0M,energy_used=0
Exit Code: 0:0
Partition: hive
Nodes: atl1-1-01-011-4-2
---------------------------------------
Batch Script for 2836
---------------------------------------
#!/bin/bash
#SBATCH -JSlurmPythonExample # Job name
#SBATCH --account=hive-gburdell3 # Tracking account
#SBATCH -N1 -n4 # Number of nodes and cores required
#SBATCH --mem-per-cpu=1G # Memory per core
#SBATCH -t15 # Duration of the job (Ex: 15 mins)
#SBATCH -phive # Queue name (where job is submitted)
#SBATCH -oReport-%j.out # Combined output and error messages file
#SBATCH --mail-type=BEGIN,END,FAIL # Mail preferences
#SBATCH --mail-user=gburdell3@gatech.edu # E-mail address for notifications
cd $SLURM_SUBMIT_DIR # Change to working directory
module load anaconda3/2022.05 # Load module dependencies
srun python test.py # Example Process
---------------------------------------
pace-quota¶
To find your Hive-Slurm tracking accounts, run pace-quota
while logged into the Hive-Slurm cluster. Tracking accounts will be of the form hive-<PI username>
, e.g., hive-gburdell3
for researchers in Prof. Burdell's group. Researchers working with more than one faculty member may have multiple tracking accounts and should choose the one that best fits the project supervisor for each job run.
Running pace-quota
will also report on utilization of your storage allocations.
Interactive Jobs¶
A Slurm interactive job reserves resources on compute nodes to use interactively.
We recommend using the salloc
to allocate resources. At minimum, tracking account (--account or -A) and queue name (where the job is submitted, --partition or -p) are required to start an interactive job. Additionally, the number of nodes (--nodes or -N), CPU cores (--ntasks-per-node for cores per node or -n for total cores), and wall time requested (--time or -t using the format D-HH:MM:SS for days, hours, minutes, and seconds) may also be designated. Run man salloc
or visit the salloc documentation page for more options.
In this example, use salloc
to allocate 1 node with 4 cores for an interactive job using the hive-gburdell3
account on the hive
queue:
[gburdell3@login-hive-slurm ~]$ salloc -A hive-gburdell3 -phive -N1 --ntasks-per-node=4 -t1:00:00
salloc: Pending job allocation 1187
salloc: job 1187 queued and waiting for resources
After pending status, your job will start after resources are granted with the following prompt:
salloc: job 1187 has been allocated resources
salloc: Granted job allocation 1187
salloc: Waiting for resource configuration
salloc: Nodes atl1-1-03-002-35 are ready for job
---------------------------------------
Begin Slurm Prolog: Aug-03-2022 08:01:10
Job ID: 1187
User ID: gburdell3
Account: hive-gburdell3
Job name: interactive
Partition: hive
---------------------------------------
[gburdell3@atl1-1-03-002-35 ~]$
Once resources are available on the queue, you should be automatically logged into an interactive job on a compute node with the resources requested from the login node. Here, in this interactive session, use srun
with hostname
:
[gburdell3@atl1-1-03-002-35 ~]$ srun hostname
atl1-1-03-002-35.pace.gatech.edu
atl1-1-03-002-35.pace.gatech.edu
atl1-1-03-002-35.pace.gatech.edu
atl1-1-03-002-35.pace.gatech.edu
Note that there are 4 instances of the login node hostname because we requested 1 node with 4 cores. To exit the interactive job, you can wait for the allotted time to expire in your session (in this example, 1 hour) or you can exit manually using exit
:
[gburdell3@atl1-1-03-002-35 ~]$ exit
exit
salloc: Relinquishing job allocation 1187
salloc: Job allocation 1187 has been revoked.
Batch Jobs¶
Write a Slurm script as a plain text file, then submit it with the sbatch
command. Any computationally-intensive command should be prefixed with srun
for best performance using Slurm.
- (Required) Start the script with
#!/bin/bash
. - (Required) Include a tracking account with
#SBATCH -A <account>
. To find your Hive-Slurm tracking accounts, runpace-quota
while logged into the Hive-Slurm cluster. Tracking accounts will be of the formhive-<PI username>
, e.g.,hive-gburdell3
for researchers in Prof. Burdell's group. Researchers working with multiple faculty may have multiple tracking accounts and should choose the one that best fits the project supervisor for each job run. - (Required) Select a partition with
#SBATCH -p <partition>
. - Name a job with
#SBATCH -J <job name>
. - Include resource requests:
- For requesting cores, we recommend 1 of 2 options:
#SBATCH -n
or#SBATCH --ntasks
specifies the number of cores for the entire job. The default is 1 core.#SBATCH -N
specifies the number of nodes, combined with#SBATCH --ntasks-per-node
, which specifies the number of cores per node. For GPU jobs,#SBATCH --ntasks-per-node
does not need to be specified because the default is 6 cores per GPU.
- For requesting memory, we recommend 1 of 2 options:
- For CPU-only jobs, use
#SBATCH --mem-per-cpu=<request with units>
, which specifies the amount of memory per core. To request all the memory on a node, include#SBATCH --mem=0
. The default is 4 GB/core. - For GPU jobs, use
#SBATCH --mem-per-gpu=<request with units>
, which specifies the amount of memory per GPU.
- For CPU-only jobs, use
- For requesting cores, we recommend 1 of 2 options:
- Request walltime with
#SBATCH -t
. Job walltime requests (#SBATCH -t
) should use the format D-HH:MM:SS for days, hours, minutes, and seconds requested. Alternatively, include just an integer that represents minutes. The default is 1 hour. - Name your output file, which will include both STDOUT and STDERR, with
#SBATCH -o <file name>
. - If you would like to receive email notifications, include
#SBATCH --mail-type=NONE,BEGIN,END,FAIL,ARRAY_TASKS,ALL
with only the conditions you prefer.- If you wish to use a non-default email address, add
#SBATCH --mail-user=<preferred email>
.
- If you wish to use a non-default email address, add
- When listing commands to run inside the job, any computationally-intensive command should be prefixed with
srun
for best performance. - Run
man sbatch
or visit the sbatch documentation page for more options.
Basic Python Example¶
- The guide will focus on providing a full runthrough of loading software and submitting a job on hive-slurm
- In this guide, we'll load
anaconda3
and run a simple python script, submitting to hive-gpu queue
While logged into hive, use a text editor such as nano
, vi
, or emacs
to create the following python script, call it test.py
#simple test script
result = 2 ** 2
print("Result of 2 ^ 2: {}".format(result))
Now, create a job submission script SlurmPythonExample.sbatch
with the commands below:
#!/bin/bash
#SBATCH -JSlurmPythonExample # Job name
#SBATCH --account=hive-gburdell3 # Tracking account
#SBATCH -n4 # Number of cores required
#SBATCH --mem-per-cpu=1G # Memory per core
#SBATCH -t15 # Duration of the job (Ex: 15 mins)
#SBATCH -phive # Queue name (where job is submitted)
#SBATCH -oReport-%j.out # Combined output and error messages file
#SBATCH --mail-type=BEGIN,END,FAIL # Mail preferences
#SBATCH --mail-user=gburdell3@gatech.edu # E-mail address for notifications
cd $SLURM_SUBMIT_DIR # Change to working directory
module load anaconda3/2022.05 # Load module dependencies
srun python test.py # Example Process
- Use the
pace-quota
command to find the name of your tracking account(s). - Make sure that
test.py
andSlurmPythonExample.sbatch
are in the same folder. It is important that you submit the job from this directory.$SLURM_SUBMIT_DIR
is a variable that contains path for this directory where job is submitted. - An account name is required to submit a job on hive-slurm cluster. This is only for tracking usage on hive, there is no charge.
module load anaconda3/2019.07
loads anaconda3, which includes python.srun python test.py
runs the python script.srun
runs the program as many times as specified by the-n
or--ntasks
option. If we have justpython test.py
, then the program will run only once.
You can submit the script by running sbatch SlurmPythonExample.sbatch
from command line. For checking job status, use squeue -u gburdell3
. For deleting a job, use scancel <jobid>
. Once the job is completed, you'll see a Report-<jobid>.out
file, which contains the results of the job. It will look something like this:
#Output file
---------------------------------------
Begin Slurm Prolog: Aug-16-2022 12:08:54
Job ID: 1831
User ID: gburdell3
Account: hive-gburdell3
Job name: SlurmPythonExample
Partition: hive
---------------------------------------
Result of 2 ^ 2: 4
Result of 2 ^ 2: 4
Result of 2 ^ 2: 4
Result of 2 ^ 2: 4
---------------------------------------
Begin Slurm Epilog: Aug-16-2022 12:08:58
Job ID: 1831
User ID: gburdell3
Account: hive-gburdell3
Job name: SlurmPythonExample
Resources: cpu=4,mem=4G,node=2
Rsrc Used: cput=00:00:16,vmem=20624K,walltime=00:00:04,mem=0,energy_used=0
Partition: hive
Nodes: atl1-1-01-011-5-[1-2]
---------------------------------------
MPI Jobs ¶
Warning
Do not use mpirun
or mpiexec
with Slurm. Use srun
instead.
You may want to run Message Passing Interface (MPI) jobs, which utilize a message-passing standard designed for parallel computing on the cluster.
In this set of examples, we will compile "hello world" MPI code from MPI Tutorial and run the program using srun
.
To set up our environment for both MPI job examples, follow the following steps to create a new directory and download the MPI code:
[gburdell3@login-hive-slurm ~]$ mkdir slurm_mpi_example
[gburdell3@login-hive-slurm ~]$ cd slurm_mpi_example
[gburdell3@login-hive-slurm slurm_mpi_example]$ wget https://raw.githubusercontent.com/mpitutorial/mpitutorial/gh-pages/tutorials/mpi-hello-world/code/mpi_hello_world.c
Interactive MPI Example¶
For running MPI in Slurm using an interactive job, follow the steps for Interactive Jobs to enter an interactive session:
- First, as in the interactive job example, use
salloc
to allocate 1 node with 4 cores for an interactive job using thehive-gburdell3
account on thehive
queue:
[gburdell3@login-hive-slurm ~]$ salloc -A hive-gburdell3 -phive -N2 --ntasks-per-node=4 -t1:00:00
salloc: Pending job allocation 1902
salloc: job 1902 queued and waiting for resources
- Next, after pending status, your job will start after resources are granted with the following prompt:
salloc: job 1902 has been allocated resources
salloc: Granted job allocation 1902
salloc: Waiting for resource configuration
salloc: Nodes atl1-1-01-011-4-2,atl1-1-01-011-5-1 are ready for job
---------------------------------------
Begin Slurm Prolog: Aug-16-2022 13:08:10
Job ID: 1902
User ID: gburdell3
Account: hive-gburdell3
Job name: interactive
Partition: hive
---------------------------------------
gburdell3@atl1-1-01-011-4-2 ~]$
- Next, within your interactive session and in the
slurm_mpi_example
directory created earlier with thempi_hello_world.c
example code, load the relevant modules and compile the MPI code usingmpicc
:
[gburdell3@atl1-1-01-011-4-2 ~]$ cd slurm_mpi_example
[gburdell3@atl1-1-01-011-4-2 slurm_mpi_example]$ module load gcc/10.3.0 mvapich2/2.3.6
[gburdell3@atl1-1-01-011-4-2 slurm_mpi_example]$ mpicc mpi_hello_world.c -o mpi_hello_world
- Next run the MPI job using
srun
:
[gburdell3@atl1-1-01-011-4-2 slurm_mpi_example]$ srun mpi_hello_world
- Finally, the following should be output from this interactive MPI example:
Hello world from processor atl1-1-01-011-5-1.pace.gatech.edu, rank 5 out of 8 processors
Hello world from processor atl1-1-01-011-4-2.pace.gatech.edu, rank 0 out of 8 processors
Hello world from processor atl1-1-01-011-5-1.pace.gatech.edu, rank 6 out of 8 processors
Hello world from processor atl1-1-01-011-5-1.pace.gatech.edu, rank 7 out of 8 processors
Hello world from processor atl1-1-01-011-4-2.pace.gatech.edu, rank 1 out of 8 processors
Hello world from processor atl1-1-01-011-4-2.pace.gatech.edu, rank 3 out of 8 processors
Hello world from processor atl1-1-01-011-5-1.pace.gatech.edu, rank 4 out of 8 processors
Hello world from processor atl1-1-01-011-4-2.pace.gatech.edu, rank 2 out of 8 processors
Batch MPI Example¶
For running MPI in Slurm using a batch job, follow the steps in Batch Jobs and Basic Python Example to set up and run a batch job.
- First, in the
slurm_mpi_example
directory created earlier with thempi_hello_world.c
example code, create a file namedSlurmBatchMPIExample.sbatch
with the following content:
#!/bin/bash
#SBATCH -JSlurmBatchMPIExample # Job name
#SBATCH --account=hive-gburdell3 # Tracking account
#SBATCH -N2 --ntasks-per-node=4 # Number of nodes and cores per node required
#SBATCH --mem-per-cpu=1G # Memory per core
#SBATCH -t1:00:00 # Duration of the job (Ex: 1 hour)
#SBATCH -phive # Queue name (where job is submitted)
#SBATCH -oReport-%j.out # Combined output and error messages file
#SBATCH --mail-type=BEGIN,END,FAIL # Mail preferences
#SBATCH --mail-user=gburdell3@gatech.edu # E-mail address for notifications
cd $HOME/slurm_mpi_example # Change to working directory created in $HOME
# Compile MPI Code
module load gcc/10.3.0 mvapich2/2.3.6
mpicc mpi_hello_world.c -o mpi_hello_world
# Run MPI Code
srun mpi_hello_world
- This batch file combines the configuration for the Slurm batch job submission, the compilation for the MPI code, and running the MPI job using
srun
. - Next run the MPI batch job using
sbatch
in theslurm_mpi_example
directory:
[gburdell3@login-hive-slurm ~]$ cd slurm_mpi_example
[gburdell3@login-hive-slurm slurm_mpi_example]$ sbatch SlurmBatchMPIExample.sbatch
Submitted batch job 1904
-
This example should not take long, but it may take time to run depending on how busy the Slurm queue is.
-
Finally, after the batch MPI job example has run, the following should be output in the file created in the same directory named
Report-<job id>.out
:
---------------------------------------
Begin Slurm Prolog: Aug-16-2022 13:08:07
Job ID: 1904
User ID: gburdell3
Account: hive-gburdell3
Job name: SlurmBatchMPIExample
Partition: hive
---------------------------------------
Hello world from processor atl1-1-01-011-5-1.pace.gatech.edu, rank 7 out of 8 processors
Hello world from processor atl1-1-01-011-4-2.pace.gatech.edu, rank 1 out of 8 processors
Hello world from processor atl1-1-01-011-4-2.pace.gatech.edu, rank 0 out of 8 processors
Hello world from processor atl1-1-01-011-4-2.pace.gatech.edu, rank 3 out of 8 processors
Hello world from processor atl1-1-01-011-4-2.pace.gatech.edu, rank 2 out of 8 processors
Hello world from processor atl1-1-01-011-5-1.pace.gatech.edu, rank 4 out of 8 processors
Hello world from processor atl1-1-01-011-5-1.pace.gatech.edu, rank 6 out of 8 processors
Hello world from processor atl1-1-01-011-5-1.pace.gatech.edu, rank 5 out of 8 processors
---------------------------------------
Begin Slurm Epilog: Aug-16-2022 13:08:08
Job ID: 1904
User ID: gburdell3
Account: hive-gburdell3
Job name: SlurmBatchMPIExample
Resources: cpu=8,mem=8G,node=2
Rsrc Used: cput=00:00:08,vmem=376K,walltime=00:00:01,mem=0,energy_used=0
Partition: hive
Nodes: atl1-1-01-011-4-2,atl1-1-01-011-5-1
---------------------------------------
Array Jobs ¶
To submit a number of identical jobs without having drive the submission
with an external script use the SLURM's feature of array jobs
.
:bulb: There is a maximum limit of 500 jobs (queued plus running) per user on Hive.
- A job array can be submitted simply by adding
#SBATCH --array=x-y
to the job script wherex
andy
are the array bounds. A job array can also be specified at the command line withsbatch --array=x-y job_script.sbatch
- A job array will then be created with a number of independent jobs a.k.a. array tasks that correspond to the defined array.
- SLURM's job array handling is very versatile. Instead of providing a
task range a comma-separated list of task numbers can be provided, for
example, to rerun a few failed jobs from a previously completed job
array as in
sbatch --array=4,8,15,16,23,42 job_script.sbatch
which can be used to quickly rerun the lost tasks from a previous job array for example. Command line options override options in the script, so those can be left unchanged.
Limiting the number of tasks that run at once¶
To throttle a job array by keeping only a certain number of tasks
active at a time use the %N
suffix where N
is the number of active
tasks. For example #SBATCH -a 1-200%5
will produce a 200 task job array
with only 5 tasks active at any given time.
Note that while the symbol used is the %
sign, this is the actual number
of tasks to be submitted at once.
Using scontrol to modify throttling of running array jobs¶
If you want to change the number of simultaneous tasks of an active job, you can use scontrol: scontrol update ArrayTaskThrottle=<count> JobId=<jobID>
e.g. scontrol update ArrayTaskThrottle=50 JobId=123456
.
Set ArrayTaskThrottle=0
to eliminate any limit.
:bulb: Reducing the "ArrayTaskThrottle" count on a running job array will not affect the tasks that have already entered the "RUNNING" state. It will only prevent new tasks from starting until the number or running tasks drops below the new lower threshold.
Naming output and error files¶
SLURM uses the %A
and %a
replacement strings for the master job ID and
task ID, respectively.
For example:
#SBATCH --output=Array_test.%A\_%a.out
#SBATCH --error=Array_test.%A\_%a.error
The error log is optional as both types of logs can be written to the 'output' log:
#SBATCH --output=Array_test.%A\_%a.log
:bulb: If you only use %A
in the log all array tasks will try to write
to a single file. The performance of the run will approach zero
asymptotically. Make sure to use both %A
and %a
in the log file name
specification.
Using the array ID Index¶
SLURM will provide a $SLURM_ARRAY_TASK_ID
variable to each task. It can be used inside the job script to handle input and output files for that task.
One common application of array jobs is to run many input files. While it is easy if the files are numbered as in the example above, this is not needed. If for example you have a folder of 100 files that end in .txt, you can use the following approach to get the name of the file for each task automatically:
file=$(ls *.txt \| sed -n ${SLURM_ARRAY_TASK_ID}p) myscript -in $file
If, alternatively, you use an input file (e.g. 'input.list') with a list of samples/datasets (one per line) to process you can pick an item from the list as follows:
SAMPLE_LIST=($(<input.list))
SAMPLE=${SAMPLE_LIST[${SLURM_ARRAY_TASK_ID}]}
Running many short tasks¶
While SLURM array jobs make it easy to run many similar tasks, if each task is short (seconds or even a few minutes), array jobs quickly bog down the scheduler and more time is spent managing jobs than actually doing any work for you. This also negatively impacts other users.
If you have hundreds or thousands of tasks, it is unlikely that a simple array job is the best solution. That does not mean that array jobs are not helpful in these cases, but that a little more thought needs to go into them for efficient use of the resources.
As an example let's imagine I have 500 runs of a program to do, with each run taking about 30 seconds to complete. Rather than running an array job with 500 tasks, it would be much more efficient to run 5 tasks where each completes 100 runs. Here's a sample script to accomplish this by combining array jobs with bash loops. Create submit file inside the directory slurm_array_example/SlurmArrayExample.sbatch
with the following content:
#!/bin/bash
#SBATCH --job-name=SlurmArrayExample # Job name
#SBATCH --account=hive-gburdell3 # Tracking account
#SBATCH --mail-type=ALL # Mail events (NONE, BEGIN, END, FAIL, ARRAY_TASKS, ALL)
#SBATCH --mail-user=user@gatech.edu # Where to send mail
#SBATCH --nodes=1 # Use one node
#SBATCH --ntasks=1 # Run a single task
#SBATCH --mem-per-cpu=1gb # Memory per processor
#SBATCH --time=00:10:00 # Time limit hrs:min:sec
#SBATCH --output=Report_%A-%a.out # Standard output and error log
#SBATCH --array=1-5 # Array range
# This is an example script that combines array tasks with
# bash loops to process many short runs. Array jobs are convenient
# for running lots of tasks, but if each task is short, they
# quickly become inefficient, taking more time to schedule than
# they spend doing any work and bogging down the scheduler for
# all users.
#Set the number of runs that each SLURM task should do
PER_TASK=100
# Calculate the starting and ending values for this task based
# on the SLURM task and the number of runs per task.
START_NUM=$(( ($SLURM_ARRAY_TASK_ID - 1) * $PER_TASK + 1 ))
END_NUM=$(( $SLURM_ARRAY_TASK_ID * $PER_TASK ))
# Print the task and run range
echo This is task $SLURM_ARRAY_TASK_ID, which will do runs $START_NUM to $END_NUM
# Run the loop of runs for this task.
for (( run=$START_NUM; run<=END_NUM; run++ )); do
echo This is SLURM task $SLURM_ARRAY_TASK_ID, run number $run
#Do your stuff here
#e.g. run test.py as array job from the Basic Python Example section above
srun python test.py
done
date
- When ready with submit file, run the array job using
sbatch
in theslurm_array_example
directory:
[gburdell3@login-hive-slurm ~]$ cd slurm_array_example
[gburdell3@login-hive-slurm slurm_array_example]$ sbatch SlurmArrayExample.sbatch
Submitted batch job 2037
- After the array job example has run, the following should be output in the file created in the same directory named
Report_<job id>-<array id>.out
:
[gburdell3@login-hive-slurm slurm_array_example]$ cat Report_2037-1.out
---------------------------------------
Begin Slurm Prolog: Aug-16-2022 14:08:27
Job ID: 2038
User ID: gburdell3
Account: hive-gburdell3
Job name: SlurmArrayExample
Partition: hive
---------------------------------------
This is task 1, which will do runs 1 to 100
Result of 2 ^ 2: 4
This is SLURM task 1, run number 2
Result of 2 ^ 2: 4
This is SLURM task 1, run number 3
Result of 2 ^ 2: 4
This is SLURM task 1, run number 4
...
...
---------------------------------------
Begin Slurm Epilog: Aug-16-2022 14:08:31
Array Job ID: 2037_1
User ID: gburdell3
Account: hive-gburdell3
Job name: SlurmArrayExample
Resources: cpu=1,mem=1G,node=1
Rsrc Used: cput=00:00:05,vmem=8K,walltime=00:00:05,mem=0,energy_used=0
Partition: hive
Nodes: atl1-1-01-011-4-2
---------------------------------------
Warning
Each array job is mapped to a unique JID, but both forms of Job ID
and Array Job ID
in prolog and epilog are valid IDs when querying the results
Deleting job arrays and tasks¶
To delete all of the tasks of an array job, use scancel
with the job ID:
scancel 123456
To delete a single task, add the task ID:
scancel 123456_1
GPU Jobs ¶
Let's take a look at running a tensorflow example on GPU resource. We have a test example in the $TENSORFLOWGPUROOT
directory.
Interactive GPU Example¶
For running GPUs in Slurm using an interactive job, follow the steps for Interactive Jobs to enter an interactive session:
- First, start a Slurm interactive session with GPUs with the following command, allocating for 1 node with a GPU on the
hive-gpu-short
partition, which is the best partition for interactive sessions with GPUs. Note you do not need to specify--ntasks-per-node
because 6 cores are assigned per GPU by default:
[gburdell3@login-hive-slurm ~]$ salloc -A hive-gburdell3 -N1 --mem-per-gpu=12G -phive-gpu-short -t0:15:00 --gres=gpu:1 --gres-flags=enforce-binding
salloc: Pending job allocation 2027
salloc: job 2027 queued and waiting for resources
- Next, after pending status, your job will start after resources are granted with the following prompt:
salloc: job 2027 has been allocated resources
salloc: Granted job allocation 2027
salloc: Waiting for resource configuration
salloc: Nodes atl1-1-01-018-25-0 are ready for job
---------------------------------------
Begin Slurm Prolog: Aug-16-2022 14:08:27
Job ID: 2027
User ID: gburdell3
Account: hive-gburdell3
Job name: interactive
Partition: hive-gpu-short
---------------------------------------
[gburdell3@atl1-1-01-018-25-0 slurm_mpi_example]$
- Next, within your interactive session, load the tensorflow-gpu module and run the
test.py
example:
[gburdell3@atl1-1-01-018-25-0 ~]$ cd slurm_gpu_example
[gburdell3@atl1-1-01-018-25-0 slurm_gpu_example ]$ module load tensorflow-gpu/2.9.0
(/usr/local/pace-apps/manual/packages/tensorflow-gpu/2.9.0) [gburdell3@atl1-1-01-018-25-0 slurm_mpi_example]$ srun python $TENSORFLOWGPUROOT/testgpu.py gpu 1000
- Finally, the sample output from the interactive session should be:
(/usr/local/pace-apps/manual/packages/tensorflow-gpu/2.9.0) [gburdell3@atl1-1-01-018-25-0 slurm_mpi_example]$ srun python $TENSORFLOWGPUROOT/testgpu.py gpu 1000
2022-08-16 14:05:13.536540: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2022-08-16 14:05:20.650763: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-08-16 14:05:21.263220: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 14637 MB memory: -> device: 0, name: Tesla V100-PCIE-16GB, pci bus id: 0000:3b:00.0, compute capability: 7.0
Num GPUs Available: 1
tf.Tensor(249909400.0, shape=(), dtype=float32)
Shape: (1000, 1000) Device: /gpu:0
Time taken: 0:00:01.147587
(/usr/local/pace-apps/manual/packages/tensorflow-gpu/2.9.0) [gburdell3@atl1-1-01-018-25-0 slurm_mpi_example]$
Batch GPU Example¶
For running GPUs in Slurm using a batch job, follow the steps in Batch Jobs and Basic Python Example to set up and run a batch job:
- First, create a directory named
slurm_gpu_example
:
[gburdell3@login-hive-slurm ~]$ mkdir slurm_gpu_example
- Next, create a batch script named
SlurmBatchGPUExample.sbatch
with the following content:
#!/bin/bash
#SBATCH -JGPUExample # Job name
#SBATCH -Ahive-gburdell3 # Charge account
#SBATCH -N1 --gres=gpu:1 # Number of nodes and GPUs required
#SBATCH --gres-flags=enforce-binding # Map CPUs to GPUs
#SBATCH --mem-per-gpu=12G # Memory per gpu
#SBATCH -t15 # Duration of the job (Ex: 15 mins)
#SBATCH -phive-gpu # Partition name (where job is submitted)
#SBATCH -oReport-%j.out # Combined output and error messages file
#SBATCH --mail-type=BEGIN,END,FAIL # Mail preferences
#SBATCH --mail-user=gburdell3@gatech.edu # e-mail address for notifications
cd $HOME/slurm_gpu_example # Change to working directory created in $HOME
module load tensorflow-gpu/2.9.0 # Load module dependencies
srun python $TENSORFLOWGPUROOT/testgpu.py gpu 1000 # Run test example
- Note that the GPU resource is requested with the specification
--gres:gpu:<gpu type>:<number of gpus per node>
. This specifies GPUs per node. The<gpu type>
is optional, as Hive has only a single GPU type, the Nvidia Tesla V100 16GB GPU.- The scheduler is configured to assign 6 cores per GPU by default, so there is no need to specify
--ntasks-per-node
in your request. - However, we strongly recommend the use of the specfication
--gres-flags=enforce-binding
to bind the default 6 cores per GPU.
- The scheduler is configured to assign 6 cores per GPU by default, so there is no need to specify
Warning
Although the -G or --gpus flag can be used to request GPUs per job, it can result in undesired CPU assignments. We strongly recommend the use of --gres=gpu:<number of gpus per node>
flag instead.
- We strongly recommend the the use of the specification
--mem-per-gpu=<memory allocated per GPU>
to allocate memory per GPU. - In general, use the
hive-gpu
partition for batch jobs. You can also use thehive-gpu-short
partition for interactive jobs on GPU. -
The sample output for the above example is provided below.
-
Next, run the GPU batch job using
sbatch
in theslurm_gpu_example
directory:
[gburdell3@login-hive-slurm ~]$ cd slurm_gpu_example
[gburdell3@login-hive-slurm slurm_gpu_example]$ sbatch SlurmBatchGPUExample.sbatch
Submitted batch job 2033
- Finally, after the batch MPI job example has run, the following should be output in the file created in the same directory named
Report-<job id>.out
:
---------------------------------------
Begin Slurm Prolog: Aug-16-2022 14:08:41
Job ID: 2033
User ID: gburdell3
Account: hive-gburdell3
Job name: GPUExample
Partition: hive-gpu
---------------------------------------
2022-08-16 14:08:46.639489: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2022-08-16 14:08:53.872929: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-08-16 14:08:54.492034: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 14637 MB memory: -> device: 0, name: Tesla V100-PCIE-16GB, pci bus id: 0000:af:00.0, compute capability: 7.0
Num GPUs Available: 1
tf.Tensor(250249200.0, shape=(), dtype=float32)
Shape: (1000, 1000) Device: /gpu:0
Time taken: 0:00:01.154084
---------------------------------------
Begin Slurm Epilog: Aug-16-2022 14:08:55
Job ID: 2033
User ID: gburdell3
Account: hive-gburdell3
Job name: GPUExample
Resources: cpu=6,gres/gpu:v100=1,mem=12G,node=1
Rsrc Used: cput=00:01:24,vmem=768K,walltime=00:00:14,mem=0,energy_used=0
Partition: hive-gpu
Nodes: atl1-1-01-018-25-0
---------------------------------------
This material is based upon work supported by the National Science Foundation under grant number 1828187. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.