Updated 2021-05-17

Run MPI Programs on the Cluster

Overview

  • Workflow is similar to running any other program on the cluster
    • First write PBS script
    • Load MPI Environment and execute program
  • For help and info on PBS Scripts, visit this PBS script guide
  • Example C program file for guide can be found here (prints "Hello World" , name, and rank of each processor being used)
  • Example PBS submission script used in guide can be found here

Walkthrough: Run an Example MPI Script

Step 1: The PBS Script

# This is an example MPI PBS script
#PBS -N mpi_example_script         # job name
#PBS -A [Account]                  # account to which job is charged, ex: GT-gburdell3
#PBS -l nodes=2:ppn=4              # number of nodes and cores per node
#PBS -l pmem=2gb                   # memory per core
#PBS -l walltime=15:00             # duration of the job (ex: 15 min)
#PBS -q inferno                    # queue name
#PBS -j oe                         # combine output and error messages in 1 file
#PBS -o mpi_script.out             # output file name
#PBS -m abe                        # event notification, set to email on job start, end, or fail
#PBS -M shollister7@gatech.edu     # your gatech email

                                                # Computation start here
cd ~/test_directory                             # change into directory from where job to be executed (where data is / script is located)
echo "Started on `/bin/hostname`"               # prints the name of the node job started on
module load gcc/4.9.0 mvapich2/2.1              # load necessary modules, gcc is a compiler, mvapich2 is an implementation of mpi
mpicc mpi_hello_world.c -o mpi_hello_world      # compiles the c program to be run
mpirun -np 8 ./mpi_hello_world                  # runs the parallel c program with mpi. Must have 8 processors specified (2 nodes x 4 cores)

  • The #PBS lines are directives, covered in depth in the PBS guide
  • The first step is to tell the cluster to enter the directory where the parallel C program is located (in this case, ~/test_directory)
  • Here, I have made a dir on my system called test_directory. It is recommended you make some sort of working or tmp directory to avoid running scripts out of the home directory
  • Make sure any dependencies are in same directory (data, etc)
  • the second line prints to the output file the node the job started on
  • Then, the prefered MPI implementation is loaded, mvapich2, as well as the C compiler gcc. Finding and loading modules is covered in the next section
  • Next, the C program is compiled with mpicc and run

Danger

The number of processors specified after -np in the mpirun command MUST equal number of processors requested in the PBS Script

  • the number of processors specified after -np in the mpirun command (in this case, 8) MUST be equal to number of processors requested in the PBS script (in this case, 2 nodes x 4 processors = 8 processors)
  • Using less than requested leaves idle cores that could be used by others
  • Using more causes overload and slows down the system

Step 2: Loading preferred MPI Implementation

  • Run module avail to see list of all modules available on the cluster
  • adding in a program name will return all available versions of that program
module avail <program name>  #ex: module avail gcc
  • then, in the PBS script, load your preferred environment with the line
module load <module name(s)>   #ex: module load gcc/4.9.0 mvapich2/2.1

Step 3: Submitting job and collecting results

  • make sure your're in the folder where the PBS Script is located, and run
qsub <scriptName.pbs>  #ex: qsub example_mpi_script.pbs
  • if successful, this will print something like 2180446.shared-sched-pace.gatech.edu
  • the number in the beginning is the job id, useful for checking predicted wait time in queue or job status
  • After a couple seconds, find estimated wait time in queue with
showstart <jobID>
  • check job status with
qstat <jobID>
  • for more ways to check status, how to cancel job, and more useful commands, checkout the command cheatsheet
  • Any files created by the script will show up in the folder where the script was ran (unless otherwise programmed)
  • the output file will be found by typing ls and looking for the output file you named in the PBS script, in this case mpi_hello_world.out
  • To see the contents of the out file, run
cat <output file name>  #ex: cat mpi_hello_world.out
  • output for example script should print out "hello world", name, and rank of each processor being used:

Screenshot

  • To move output files off the cluster, see storage and moving files guide
  • Congratulations! you have succesfully run a parallel C script using MPI on the cluster