Updated 2023-01-30

Launcher

Introduction

Launcher is a utility for performing simple, data parallel, high throughput computing (HTC) workflow on clusters, massively parallel processor (MPP) systems, workgroups of computers, and personal machines. It is a product of the Texas Advanced Computing Center (TACC) and is primarily written by Lucas A. Wilson.

Launcher is written in Python/3.9.12. The files are available via GitHub, and is MIT-licensed.

License

Launcher is MIT licensed. The MIT license gives permission for a person to deal in the software without restriction as long at the MIT license is included in all copies or substantial derivatives of the software on an "as is" basis. Licensed works, modifications, and larger works may be distributed under different terms and without source code. However, this is not legal advice.

Why use Launcher

High-throughput computing (HTC) is a computing model where many independent jobs need to be executed in a rapid manner. HTC is sometimes also called bag-of-tasks (BOT) parallel, pleasantly parallel, embarrassingly parallel, and more terms.

The key scenario in HTC is that there are many tasks that need to be executed on a separate input set, which in turn generate separate output. HTC tends to find use in applications such as

  • Sequence alignment scoring of DNA/RNA sequences
  • Protein/ligand docking
  • Drug development
  • Statistical analysis on unknown variables
  • Parameter sweeps
  • Text analysis
  • Multi-start simulations
  • Monte Carlo methods

Impediments to HTC on HPC systems

There are a number of potential problems faced by users

  • Runtime limits
  • Job-in-queue limits
  • Dynamic job submission restrictions

Launcher allows each of these problems to be addressed to aid researchers in HTC systems.

General information regarding the use of Launcher

Components of Launcher

  • paramrun Top-level script which interfaces with the resource manager, ensures the environment, and starts multi-node execution
  • tskserver dynamic scheduling service
  • init_launcher script responsible for process management for multi-core and many-core processors
  • launcher script responsible for executing appropriate jobs from the job file, timing jobs, and providing stdout/stderr direction.

Control file

The control (also referred to as a job) file is a listing of the programs that the user wants to be executed by Launcher. Any of the environment variables listing in the Environment Variables section can be used. Here is an example of a control file:

echo 'Hello world' > hello.o$LAUNCHER_JID
./a.out $LAUNCHER_TSK_ID
./a.out $LAUNCHER_JID
echo $LAUNCHER_PPN
echo $LAUNCHER_NHOSTS

Environment Variables

Variable Meaning/Usage
LAUNCHER_NHOSTS Number of host on which the instance is executing
LAUNCHER_PPN Number of processes per node/host
LAUNCHER_NPROCS Total number of process (NHOSTS*PPN)
LAUNCHER_TSK_ID Launcher task number (0..NPROCS-1)
LAUNCHER_NJOBS Number of jobs for this bundle (NJOBS = wc -l | control_file)
LAUNCHER_JID The launcher job that is being executed. Corresponds to a particular line in the control file (1..NJOBS)
LAUNCHER_SCHED Controls the type of scheduling Launcher uses (default is dynamic)

Scheduling models

Launcher can schedule jobs using three methods, dynamic and two types of static scheduling: interleaved and block. On PACE, you should not currently use dynamic.

Dynamic scheduling

In the default dynamic scheduling, Launcher uses a client-server model to actively assign jobs as jobs are reported complete.

Dynamic scheduling is most efficient when the time for execution of the component jobs is variable, so that idle time processors is decreased.

In order to explicitly request dynamic scheduling:

export LAUNCHER_SCHED='dynamic'

within your SBATCH submission script or interactive session.

Static scheduling

Static scheduling is recommended for cases where the executation time is known and it is estimated to be very close to equal. Static scheduling involves the calculation of the schedule at the beginning of the Launcher job, so it does not require a server-client relationship.

There are two algorithms for job distribution.

Interleaved scheduling

In interleaved scheduling, a simple method each job (in the job file) to a particular task (i.e. reserved process on a node). As an example, the processor designated as the first one (task 1) in a system where 1 node and 4 processors have been request will get job (line) 1,5, ..., while task 2 gets 2,6,..., and so on.

In order to explicitly request interleaved static scheduling:

export LAUNCHER_SCHED='interleaved'

within your SBATCH submission script or interactive session.

Block scheduling

In blocked scheduling, an equal (or as close to equal) set of jobs is sent to each task. For example, task 1 may get jobs 1...10, task 2 will get jobs 11...21, and so on.

In order to explicitly request block static scheduling:

export LAUNCHER_SCHED='block'

within your SBATCH submission script or interactive session.

Usage on PACE resources

Setting up Launcher on PACE

On PACE, simply load the module for Launcher: module load launcher

Examples on different types of clusters

Red Hat Enterprise Linux 6

SBATCH scripts

On Red Hat Enterprise Linux 6, the default version of Python is not sufficient for Launcher to function. Please include a module load python/3.9.12 within the SBATCH submission script or interactive environment.

Due to a policy which places which may place multiples number of processes on a single node (i.e. a -N 2 --ntasks-per-node=8 request may effectively become -N 1 --ntasks-per-node=16 request), more than half of the available processors on a node should be requested (i.e. -N 2 --ntasks-per-node=10 if there are 16 processors on each node). Otherwise, it is possible that launcher will not be able to use all processors since it assumes the same number of processes on each node.

Dynamic schedule does not function as it should for multiple node job currently because of communication limitations between nodes.

Basic script
#!/bin/bash                                                                                                                                                     
#                                                                                                                                                               
# Simple SBATCH script for submitting multiple serial                                                                                                              
# jobs (e.g. parametric studies) using a script wrapper                                                                                                         
# to launch the jobs.                                                                                                                                           
#                                                                                                                                                               
# To use, build the launcher executable and your                                                                                                                
# serial application(s) and place them in your WORKDIR                                                                                                          
# directory.  Then, edit the CONTROL_FILE to specify                                                                                                            
# each executable per process.                                                                                                                              
#
#-------------------------------------------------------                                                                                                        
# Setup Parameters
#SBATCH -J PACE_example                                                                                                                                            
#SBATCH -N 1 --ntasks-per-node=4                                                                                                                                          
#SBATCH -o $SBATCH_JOBNAME.o$SBATCH_JOBID                                                                                                                                
#SBATCH -t 5                                                                                                                                      
#SBATCH -p inferno                                                                                                                                             
#------------------------------------------------------                                                                                                         

module load python/3.9.12
module load launcher

cd $SLURM_SUBMIT_DIR # optional: change to directory that the job script is submitted from
export LAUNCHER_SCHED='interleaved'
export LAUNCHER_WORKDIR=$SLURM_SUBMIT_DIR
export LAUNCHER_JOB_FILE=$LAUNCHER_DIR/extras/examples/pacemulti

paramrun
Multiple-node script
#!/bin/bash                                                                                                                                                     
#                                                                                                                                                               
# Simple SBATCH script for submitting multiple serial                                                                                                              
# jobs (e.g. parametric studies) using a script wrapper                                                                                                         
# to launch the jobs.                                                                                                                                           
#                                                                                                                                                               
# To use, build the launcher executable and your                                                                                                                
# serial application(s) and place them in your WORKDIR                                                                                                          
# directory.  Then, edit the CONTROL_FILE to specify                                                                                                            
# each executable per process.                                                                                                                                  
#
#-------------------------------------------------------                                                                                                        
#SBATCH -J PACE_example_mn                                                                                                                                         
#SBATCH -N 2 --ntasks-per-node=10                                                                                                                                        
#SBATCH -o $SBATCH_JOBNAME.o$SBATCH_JOBID                                                                                                                                
#SBATCH -t 5                                                                                                                                       
#SBATCH -p inferno                                                                                                                                             
#------------------------------------------------------                                                                                                         

module load python/3.9.12
module load launcher

cd $SLURM_SUBMIT_DIR # optional: change to directory that the job script is submitted from
export LAUNCHER_SCHED='block'
export LAUNCHER_WORKDIR=$SLURM_SUBMIT_DIR
# the user can bring their own job script, but here is an example
export LAUNCHER_JOB_FILE=$LAUNCHER_DIR/extras/examples/pacemulti

paramrun

Red Hat Enterprise Linux 7

SBATCH scripts

Due to a policy which places which may place multiples number of processes on a single node (i.e. a -N 2 --ntasks-per-node=8 request may effectively become -N 1 --ntasks-per-node=16 request), more than half of the available processors on a node should be requested (i.e. -N 2 --ntasks-per-node=10 if there are 16 processors on each node). Otherwise, it is possible that launcher will not be able to use all processors since it assumes the same number of processes on each node.

Dynamic schedule does not function as it should for multiple node jobs currently because of communication limitations between nodes.

Single node dynamic scheduling
#!/bin/bash                                                                                                                                                     
#                                                                                                                                                               
# Simple SBATCH script for submitting multiple serial                                                                                                              
# jobs (e.g. parametric studies) using a script wrapper                                                                                                         
# to launch the jobs.                                                                                                                                           
#                                                                                                                                                               
# To use, build the launcher executable and your                                                                                                                
# serial application(s) and place them in your WORKDIR                                                                                                          
# directory.  Then, edit the CONTROL_FILE to specify                                                                                                            
# each executable per process.                                                                                                                                  
#
#-------------------------------------------------------                                                                                                        
#SBATCH -J testflight_singlenode                                                                                                                                           
#SBATCH -A [Account]
#SBATCH -N 1 --ntasks-per-node=8                                                                                                                                           
#SBATCH -o $SBATCH_JOBNAME.o$SBATCH_JOBID                                                                                                                                  
#SBATCH -t 5                                                                                                                                       
#SBATCH -p testflight                                                                                                                                              
#------------------------------------------------------                                                                                                         

module load launcher

cd $SLURM_SUBMIT_DIR # optional: change to directory that the job script is submitted from
export LAUNCHER_SCHED='dynamic' # this is the default
export LAUNCHER_WORKDIR=$SLURM_SUBMIT_DIR
# the user can bring their own job script, but here is an example
export LAUNCHER_JOB_FILE=$LAUNCHER_DIR/extras/examples/pacemulti

paramrun
Multiple-node static scheduling
#!/bin/bash                                                                                                                                                       
#                                                                                                                                                                 
# Simple SBATCH script for submitting multiple serial                                                                                                                
# jobs (e.g. parametric studies) using a script wrapper                                                                                                           
# to launch the jobs.                                                                                                                                             
#                                                                                                                                                                 
# To use, build the launcher executable and your                                                                                                                  
# serial application(s) and place them in your WORKDIR                                                                                                            
# directory.  Then, edit the CONTROL_FILE to specify                                                                                                              
# each executable per process.                                                                                                                                    
#
#-------------------------------------------------------                                                                                                          
#SBATCH -J testflight-mn-interleaved                                                                                                                                 
#SBATCH -N 2 --ntasks-per-node=20                                                                                                                                           
#SBATCH -o $SBATCH_JOBNAME.o$SBATCH_JOBID                                                                                                                                  
#SBATCH -t 5                                                                                                                                         
#SBATCH -p testflight                                                                                                                                                
#------------------------------------------------------                                                                                                           

module load launcher

cd $SLURM_SUBMIT_DIR # optional: change to directory that the job script is submitted from
export LAUNCHER_SCHED='interleaved'
export LAUNCHER_WORKDIR=$SLURM_SUBMIT_DIR
# the user can bring their own job script, but here is an example
export LAUNCHER_JOB_FILE=$LAUNCHER_DIR/extras/examples/pacemulti

echo 'Unique nodes: '$(sort $SBATCH_NODEFILE | uniq)

paramrun