Updated 2023-03-03

OSU Micro Benchmark

Overview

The OSU Micro Benchmarks (OMB) are a widely used suite of benchmarks for measuring and evaluating the performance of MPI operations for point-to-point, multi-pair, and collective communications. These benchmarks are often used for comparing different MPI implementations and the underlying network interconnect.

We use OMB to show that PACE is able to provide the same native MPI high performance to containerized applications when using the native MVAPICH hook. As indicated in the documentation for the hook, the only conditions required are:

The MPI installed in the container image must comply to the requirements of the MVAPICH ABI Compatibility Initiative. ABI compatibility and its implications are further discussed here.

The application in the container image must be dynamically linked with the MPI libraries.

Running the container

We run the container using the Slurm Workload Manager.

Verification of Initialization

This will run osu_hello to verify that MPI can be initialized. This runs on CPU partitions.

#!/bin/bash
#SBATCH -J osu_hello_example
#SBATCH -A gts-gburdell3
#SBATCH -N 1 -n 8
#SBATCH -q embers
#SBATCH -o Report-%j.out
#SBATCH -e Report-%j.err
#SBATCH -t 15

cd $SLURM_SUMBIT_DIR

module load gcc/10.3.0 mvapich2/2.3.6
module load osu-micro-benchmarks

srun -l osu_hello

A typical output will look something like this:

---------------------------------------
Begin Slurm Prolog: Feb-27-2023 13:26:08
Job ID:    821559
User ID:   gburdell3
Account:   phx-pace-staff
Job name:  osu-hello-Test.sbatch
Partition: cpu-small
QOS:       inferno
---------------------------------------
# OSU MPI Hello World Test v5.9
This is a test with 8 processes
---------------------------------------
Begin Slurm Epilog: Feb-27-2023 13:26:10
Job ID:        821559
Array Job ID:  _4294967294
User ID:       gburdell3
Account:       phx-pace-staff
Job name:      osu-hello-Test.sbatch
Resources:     cpu=8,mem=8G,node=1
Rsrc Used:     cput=00:00:24,vmem=1180K,walltime=00:00:03,mem=0,energy_used=0
Partition:     cpu-small
QOS:           inferno
Nodes:         atl1-1-02-018-16-1
---------------------------------------

Latency

We can test the latency of the program be using OSU All Gather.

#!/bin/bash
#SBATCH -J osu_allgather_example
#SBATCH -A phx-pace-staff
#SBATCH -N 1 -n 8
#SBATCH -q embers
#SBATCH -o Report-%j.out
#SBATCH -e Report-%j.err
#SBATCH -t 15

cd $SLURM_SUMBIT_DIR

module load gcc/10.3.0 mvapich2/2.3.6
module load osu-micro-benchmarks

srun -l osu_allgather

A typical output looks like:

---------------------------------------
Begin Slurm Prolog: Mar-02-2023 21:23:53
Job ID:    862868
User ID:   gburdell3
Account:   phx-pace-staff
Job name:  osu_allgather_example
Partition: cpu-small
QOS:       embers
---------------------------------------
0:
0: # OSU MPI Allgather Latency Test v5.9
0: # Size       Avg Latency(us)
0: 1                       1.65
0: 2                       1.60
0: 4                       1.64
0: 8                       1.66
0: 16                      1.71
0: 32                      1.85
0: 64                      1.81
0: 128                     1.96
0: 256                     2.26
0: 512                     2.86
0: 1024                    3.92
0: 2048                    7.46
0: 4096                   16.85
0: 8192                   32.77
0: 16384                  51.75
0: 32768                  89.30
0: 65536                 167.08
0: 131072                349.55
0: 262144                723.29
0: 524288               1690.57
0: 1048576              3598.45
---------------------------------------
Begin Slurm Epilog: Mar-02-2023 21:23:56
Job ID:        862868
Array Job ID:  _4294967294
User ID:       gburdell3
Account:       phx-pace-staff
Job name:      osu_allgather_example
Resources:     cpu=8,mem=8G,node=1
Rsrc Used:     cput=00:00:24,vmem=1128K,walltime=00:00:03,mem=0,energy_used=0
Partition:     cpu-small
QOS:           embers
Nodes:         atl1-1-03-004-9-2
--------------------------------------- 

All-to-all

We can test the latency with an MPI all-to-all that runs on CPU partitions by running this command:

#!/bin/bash
#SBATCH -J osu_alltoall_example
#SBATCH -A gts-gburdell3
#SBATCH -N 4 -n 8
#SBATCH -q embers
#SBATCH -o Report-%j.out
#SBATCH -e Report-%j.err
#SBATCH -t 30
#SBATCH --mem=64GB

cd $SLURM_SUMBIT_DIR

module load gcc/10.3.0 mvapich2/2.3.6
module load osu-micro-benchmarks

srun -l -n $SLURM_NTASKS osu_alltoall -f -m 16384:16384

A typical output will look something like this:

---------------------------------------
Begin Slurm Prolog: Feb-27-2023 13:52:23
Job ID:    822185
User ID:   gbrudell3
Account:   phx-pace-staff
Job name:  osu_alltoall_example
Partition: cpu-large
QOS:       embers
---------------------------------------
0:
0: # OSU MPI All-to-All Personalized Exchange Latency Test v5.9
0: # Size       Avg Latency(us)   Min Latency(us)   Max Latency(us)  Iterations
0: 16384                  34.03             29.31             36.90         100
---------------------------------------
Begin Slurm Epilog: Feb-27-2023 13:52:26
Job ID:        822185
Array Job ID:  _4294967294
User ID:       gbrudell3
Account:       phx-pace-staff
Job name:      osu_alltoall_example
Resources:     cpu=8,mem=256G,node=4
Rsrc Used:     cput=00:00:24,vmem=1180K,walltime=00:00:03,mem=0,energy_used=0
Partition:     cpu-large
QOS:           embers
Nodes:         atl1-1-03-004-13-2,atl1-1-03-004-15-1,atl1-1-03-004-16-2,atl1-1-03-004-19-1
---------------------------------------

Container images and Dockerfiles

We built the OSU benchmarks on top of several images containing MPI, in order to demonstrate the effectiveness of the MPI hook regardless of the ABI-compatible MPI implementation present in the images:

MVAPICH

The container image ethcscs/mvapich:ub1804_cuda92_mpi22_osu (based on mvapich/2.3.6) used for this test case can be pulled from CSCS DockerHub or be rebuilt with this Dockerfile.