Updated 2021-05-17
Run MPI Programs on the Cluster¶
Overview¶
- Workflow is similar to running any other program on the cluster
- First write PBS script
- Load MPI Environment and execute program
- For help and info on PBS Scripts, visit this PBS script guide
- Example C program file for guide can be found here (prints "Hello World" , name, and rank of each processor being used)
- Example PBS submission script used in guide can be found here
Walkthrough: Run an Example MPI Script¶
Step 1: The PBS Script¶
# This is an example MPI PBS script
#PBS -N mpi_example_script # job name
#PBS -A [Account] # account to which job is charged, ex: GT-gburdell3
#PBS -l nodes=2:ppn=4 # number of nodes and cores per node
#PBS -l pmem=2gb # memory per core
#PBS -l walltime=15:00 # duration of the job (ex: 15 min)
#PBS -q inferno # queue name
#PBS -j oe # combine output and error messages in 1 file
#PBS -o mpi_script.out # output file name
#PBS -m abe # event notification, set to email on job start, end, or fail
#PBS -M shollister7@gatech.edu # your gatech email
# Computation start here
cd ~/test_directory # change into directory from where job to be executed (where data is / script is located)
echo "Started on `/bin/hostname`" # prints the name of the node job started on
module load gcc/4.9.0 mvapich2/2.1 # load necessary modules, gcc is a compiler, mvapich2 is an implementation of mpi
mpicc mpi_hello_world.c -o mpi_hello_world # compiles the c program to be run
mpirun -np 8 ./mpi_hello_world # runs the parallel c program with mpi. Must have 8 processors specified (2 nodes x 4 cores)
- The
#PBS
lines are directives, covered in depth in the PBS guide - The first step is to tell the cluster to enter the directory where the parallel C program is located (in this case, ~/test_directory)
- Here, I have made a dir on my system called test_directory. It is recommended you make some sort of working or tmp directory to avoid running scripts out of the home directory
- Make sure any dependencies are in same directory (data, etc)
- the second line prints to the output file the node the job started on
- Then, the prefered MPI implementation is loaded, mvapich2, as well as the C compiler gcc. Finding and loading modules is covered in the next section
- Next, the C program is compiled with mpicc and run
Danger
The number of processors specified after -np in the mpirun command MUST equal number of processors requested in the PBS Script
- the number of processors specified after -np in the mpirun command (in this case, 8) MUST be equal to number of processors requested in the PBS script (in this case, 2 nodes x 4 processors = 8 processors)
- Using less than requested leaves idle cores that could be used by others
- Using more causes overload and slows down the system
Step 2: Loading preferred MPI Implementation¶
- Run
module avail
to see list of all modules available on the cluster - adding in a program name will return all available versions of that program
module avail <program name> #ex: module avail gcc
- then, in the PBS script, load your preferred environment with the line
module load <module name(s)> #ex: module load gcc/4.9.0 mvapich2/2.1
Step 3: Submitting job and collecting results¶
- make sure your're in the folder where the PBS Script is located, and run
qsub <scriptName.pbs> #ex: qsub example_mpi_script.pbs
- if successful, this will print something like
2180446.shared-sched-pace.gatech.edu
- the number in the beginning is the job id, useful for checking predicted wait time in queue or job status
- After a couple seconds, find estimated wait time in queue with
showstart <jobID>
- check job status with
qstat <jobID>
- for more ways to check status, how to cancel job, and more useful commands, checkout the command cheatsheet
- Any files created by the script will show up in the folder where the script was ran (unless otherwise programmed)
- the output file will be found by typing
ls
and looking for the output file you named in the PBS script, in this casempi_hello_world.out
- To see the contents of the out file, run
cat <output file name> #ex: cat mpi_hello_world.out
- output for example script should print out "hello world", name, and rank of each processor being used:
- To move output files off the cluster, see storage and moving files guide
- Congratulations! you have succesfully run a parallel C script using MPI on the cluster