Updated 2023-03-31

Run MPI Programs on the Cluster

Overview

  • Workflow is similar to running any other program on the cluster * First write SBATCH script * Load MPI Environment and execute program
  • For help and info on SBATCH Scripts, visit the SBATCH script guide
  • Example C program file for guide can be found here (prints "Hello World" , name, and rank of each processor being used)
  • Example SBATCH submission script used in guide can be found here

Walkthrough: Run an Example MPI Script

Step 1: The SBATCH Script

# This is an example MPI SBATCH script
#!/bin/bash
#SBATCH -Jmpi_example_script                    # job name
#SBATCH -A [Account]                            # account to which job is charged, ex: GT-gburdell3
#SBATCH -N 2 --ntasks-per-node=4                # number of nodes and cores per node
#SBATCH --mem-per-cpu=2G                        # memory per core
#SBATCH -t10                                    # duration of the job (ex: 15 min)
#SBATCH -qinferno                               # queue name
#SBATCH --oReport-%j.out                        # output file name
#SBATCH --mail-type=BEGIN,END,FAIL              # event notification, set to email on job start, end, or fail
#SBATCH --mail-user=shollister7@gatech.edu      # your gatech email
                                                # Computation start here
cd ~/test_directory                             # change into directory from where job to be executed (where data is / script is located)
echo "Started on `/bin/hostname`"               # prints the name of the node job started on
module load gcc/4.9.0 mvapich2/2.1              # load necessary modules, gcc is a compiler, mvapich2 is an implementation of mpi
mpicc mpi_hello_world.c -o mpi_hello_world      # compiles the c program to be run
srun mpi_hello_world                            # runs the parallel c program with mpi.
  • The #SBATCH lines are directives, covered in depth in the Using Slurm on Phoenix Guide
  • The first step is to tell the cluster to enter the directory where the parallel C program is located (in this case, ~/test_directory)
  • Here, I have made a dir on my system called test_directory. It is recommended you make some sort of working or tmp directory to avoid running scripts out of the home directory
  • Make sure any dependencies are in same directory (data, etc)
  • the second line prints to the output file the node the job started on
  • Then, the prefered MPI implementation is loaded, mvapich2, as well as the C compiler gcc. Finding and loading modules is covered in the next section
  • Next, the C program is compiled with mpicc and run

Danger

The number of processors specified after after -np in the mpirun command MUST equal the number of processors requested in the SBATCH script

  • the number of processors specified after -np in the mpirun command (in this case, 8) MUST be equal to number of processors requested in the SBATCH script (in this case, 2 nodes x 4 processors = 8 processors)
  • Using less than requested leaves idle cores that could be used by others
  • Using more causes overload and slows down the system

Step 2: Loading preferred MPI Implementation

  • Run module avail to see list of all modules available on the cluster* adding in a program name will return all available versions of that program
module avail <program name>  #ex: module avail gcc
  • then in the SBATCH script, load your preferred environment with the line
module load <module name(s)>   #ex: module load gcc/4.9.0 mvapich2/2.1

Step 3: Submitting job and collecting results

  • make sure your are in the folder where the SBATCH Script is located, and run
sbatch <scriptName.sbatch>  #ex: sbatch example_mpi_script.sbatch
  • if successful, this will print something like 2180446.shared-sched-pace.gatech.edu
  • the number in the beginning is the job id, useful for checking predicted wait time in queue or job status
  • After a couple seconds, find estimated wait time in queue with
squeue --start  --job <jobID>
  • check job status with
squeue --job <jobID>
  • for more ways to check status, how to cancel job, and more useful commands, checkout the Informational Commands section
  • Any files created by the script will show up in the folder where the script was ran (unless otherwise programmed)
  • the output file will be found by typing ls and looking for the output file you named in the SBATCH script, in this case mpi_hello_world.out
  • To see the contents of the out file, run
cat <output file name>  #ex: cat mpi_hello_world.out
  • output for example script should print out "hello world", name, and rank of each processor being used:

Screenshot * To move output files off the cluster, see storage and moving files guide * Congratulations! you have succesffuly run a parallel C script using MPI on the cluster