Updated 2023-03-31
Run MPI Programs on the Cluster¶
Overview¶
- Workflow is similar to running any other program on the cluster * First write SBATCH script * Load MPI Environment and execute program
- For help and info on SBATCH Scripts, visit the SBATCH script guide
- Example
C program
file for guide can be found here (prints "Hello World" , name, and rank of each processor being used) - Example
SBATCH
submission script used in guide can be found here
Walkthrough: Run an Example MPI Script¶
Step 1: The SBATCH Script¶
# This is an example MPI SBATCH script
#!/bin/bash
#SBATCH -Jmpi_example_script # job name
#SBATCH -A [Account] # account to which job is charged, ex: GT-gburdell3
#SBATCH -N 2 --ntasks-per-node=4 # number of nodes and cores per node
#SBATCH --mem-per-cpu=2G # memory per core
#SBATCH -t10 # duration of the job (ex: 15 min)
#SBATCH -qinferno # queue name
#SBATCH --oReport-%j.out # output file name
#SBATCH --mail-type=BEGIN,END,FAIL # event notification, set to email on job start, end, or fail
#SBATCH --mail-user=shollister7@gatech.edu # your gatech email
# Computation start here
cd ~/test_directory # change into directory from where job to be executed (where data is / script is located)
echo "Started on `/bin/hostname`" # prints the name of the node job started on
module load gcc/4.9.0 mvapich2/2.1 # load necessary modules, gcc is a compiler, mvapich2 is an implementation of mpi
mpicc mpi_hello_world.c -o mpi_hello_world # compiles the c program to be run
srun mpi_hello_world # runs the parallel c program with mpi.
- The #SBATCH lines are directives, covered in depth in the Using Slurm on Phoenix Guide
- The first step is to tell the cluster to enter the directory where the parallel C program is located (in this case, ~/test_directory)
- Here, I have made a dir on my system called test_directory. It is recommended you make some sort of working or tmp directory to avoid running scripts out of the home directory
- Make sure any dependencies are in same directory (data, etc)
- the second line prints to the output file the node the job started on
- Then, the prefered MPI implementation is loaded, mvapich2, as well as the C compiler gcc. Finding and loading modules is covered in the next section
- Next, the C program is compiled with mpicc and run
Danger
The number of processors specified after after -np
in the mpirun command MUST equal the number of processors requested in the SBATCH script
- the number of processors specified after -np in the mpirun command (in this case, 8) MUST be equal to number of processors requested in the SBATCH script (in this case, 2 nodes x 4 processors = 8 processors)
- Using less than requested leaves idle cores that could be used by others
- Using more causes overload and slows down the system
Step 2: Loading preferred MPI Implementation¶
- Run
module avail
to see list of all modules available on the cluster* adding in a program name will return all available versions of that program
module avail <program name> #ex: module avail gcc
- then in the SBATCH script, load your preferred environment with the line
module load <module name(s)> #ex: module load gcc/4.9.0 mvapich2/2.1
Step 3: Submitting job and collecting results¶
- make sure your are in the folder where the SBATCH Script is located, and run
sbatch <scriptName.sbatch> #ex: sbatch example_mpi_script.sbatch
- if successful, this will print something like 2180446.shared-sched-pace.gatech.edu
- the number in the beginning is the job id, useful for checking predicted wait time in queue or job status
- After a couple seconds, find estimated wait time in queue with
squeue --start --job <jobID>
- check job status with
squeue --job <jobID>
- for more ways to check status, how to cancel job, and more useful commands, checkout the Informational Commands section
- Any files created by the script will show up in the folder where the script was ran (unless otherwise programmed)
- the output file will be found by typing ls and looking for the output file you named in the SBATCH script, in this case
mpi_hello_world.out
- To see the contents of the out file, run
cat <output file name> #ex: cat mpi_hello_world.out
- output for example script should print out "hello world", name, and rank of each processor being used:
* To move output files off the cluster, see storage and moving files guide
* Congratulations! you have succesffuly run a parallel C script using MPI on the cluster