Updated 2020-10-20

pylauncher

pylauncher is a python-based parametric job launcher which allows for the execution of many small jobs in parallel. This is a utility for performing HTC-style work-flows on HPC systems, such as PACE.

Let's say that you need to run a large number of serial jobs, such as 5000. Your cluster may allow you to allocate a certain number of cores at any given time, such as 100 cores. In addition, policy may limit you to only have a certain number of jobs to run at any given time, say 2000 jobs. Under these conditions, it in not possible to simple enqueue all the jobs on the system in the most simple manner. However, using a tool such as pylauncher will allow you to submit one parallel job that will run all of your 5000 calculations while taking advantage of the number of cores you might be able. While there is a trade-off in the at it may take longer for a 100 core to start, it is possible to run them in a way that can best use the system resources without impact other users.

In a simple case, the pylaucnher takes a file with a set of commands lines, and gives them out to the cores in a cyclicly manner. This is not a optimal solution, since in most cases, the individual commands lines may take widely varying amounts of time. The pyluancher has a dynamic manager which will keep track of resources and use them as they become available.

In more ambitious use cases, the set of commands may not be known at the time of scheduling, but can be created programmaticly. For instance, after running a corse search over a parameter space, you could run a analysis and find where to focus your searchs. This would lead to an efficient use since it will allow you to decrease either decrease the computational resouces (an overly fine mesh search) or the time waiting for an answer which will then need to constructioned manually and resubmitted.

The pylauncher utility was developed by the Texas Advanced Computing Center, by Dr. Victor Eijkhout. The official GitHub repository is at the TACC GitHub. The version available for use with PBS-based schedulers is available at Dr. Christopher Blanton's GitHub.

Using pylauncher on PACE systems

Most common use case

The pylauncher is installed as a module on the PACE systems. It can be used by

   $ module load anaconda3
   $ module load pylauncher/3.0

It is necessary to have a Python environment with the paramiko package installed.

More complicated cases

In some cases, it may be necessary to have a custom environment installed for other packages that may used. The key is that you need an environment which has the paramiko installed. In addition, there are both Python 2 and 3 versions of pylauncher. The version for Python is named pylauncher2 and usage is very similar. We focus on pylauncher3 here as Python 2 has reached end of life as of 1/1/2020.

pylauncher helper script

Example Cases

Simple Serial Case using pylauncher

Running a serial pylauncher job requires the writing of a simple Python script, whcih is

#!/usr/bin/env python
import pylauncher3

pylauncher3.ClassicLauncher("testfile_serial.in")

The driver function pylauncher3.ClassicLauncher only needs one argument for a contant single processor job (it will find out the resources in the background so that does need to specified). The argument is the name of the file which containts the command lines. If the file is not located in the located with the script, the location will need to be specified.

The command line file ("testfile_serial.in" in the above case, is simple a collection of the commands to be be ran within each process. These lines will inherit the environment of the main process. Lines that start with # are ignored as are blank lines.

There are debug options that can be set by means of an additional keyword debug='job+host+task', so the function becomes

pylauncher3.ClassicLauncher("testfile_serial.in",debug='job+host+teask')

if needed.

Simple Serial Case pylauncher script

#!/usr/bin/env python
import pylauncher3

pylauncher3.ClassicLauncher("testfile_serial.in")

Simple Serial Case PBS Script

#!/bin/bash
#PBS -l nodes=2:ppn=16
#PBS -l walltime=00:05:00
#PBS -q myqueue
#PBS -j oe

module load anaconda3
module load pylauncher/3.0

cd $PBS_O_WORKDIR
python ./test_serial.py

Simple testfile_serial.in

echo 0 >> /dev/null 2>&1; sleep 21
echo 1 >> /dev/null 2>&1; sleep 30
echo 2 >> /dev/null 2>&1; sleep 8
echo 3 >> /dev/null 2>&1; sleep 34
echo 4 >> /dev/null 2>&1; sleep 39
echo 5 >> /dev/null 2>&1; sleep 9

Non-Distributed Memory Parallel Workflow

It is possible to also run parallel tasks using the pylauncher3.ClassicLauncher function by the addition of the cores keyword. The function call becomes

pylauncher3.ClassicLauncher(myjob,cores=3)

which will run the command with three cores for each command line.

Below, an example code is shown with the scripts needed to run it.

Example Executable Code

The code used is a simple hello world for pthreads. The executable takes the number of threads to use on the command line. The difference from the serial example is the inclusion

include <stdio.h>
#include <stdlib.h>
#include <pthread.h>

int thread_count;

void* Hello(void* rank);

int main(int argc, char* argv[])
{
  long thread;
  pthread_t* thread_handles;

  thread_count = strtol(argv[1], NULL, 10);

  thread_handles =  malloc ( thread_count*sizeof(pthread_t));

  for (thread=0; thread < thread_count; thread++)
    pthread_create(&thread_handles[thread], NULL, Hello, (void*) thread);

  printf("Hello from the main thread\n");

  for (thread=0; thread<thread_count; thread++)
    pthread_join(thread_handles[thread], NULL);

  free(thread_handles);
  return 0;

}

void* Hello(void* rank)
{
  long my_rank = (long) rank;

  printf("Hello from thread %ld of %d\n", my_rank, thread_count);

  return NULL;
}

An example Makefile for this executable is

CC=gcc
DEBUG= -g -Wall
PTHREADSLIB= -lpthread
#gcc -g -Wall -o pth_hello pth_hello.c -lpthread


pth_hello.o : pth_hello.c
        $(CC) $(DEBUG)  -c pth_hello.c
pth_hello : pth_hello.o
        $(CC) $(DEBUG) -o pth_hello pth_hello.o $(PTHREADSLIB)

clean:
        rm -f pth_hello.o pth_hello

Example PBS Script for constant number of processors

#!/bin/bash
#PBS -N constant_sn_job
#PBS -l nodes=1:ppn=20
#PBS -l walltime=00:10:00
#PBS -q testflight

module load anaconda3
module load pylauncher/3.0

cd $PBS_O_WORKDIR
echo "Starting job."
python ./constant_sn_launcher.py
echo "Ending job."

Example Python script for constant number of processors

#!/usr/bin/env python
import pylauncher3


myjob = 'constant_sn_job'
pylauncher3.ClassicLauncher(myjob,debug="",cores=3)

Variable number of processors for a non-MPI parallel job

It is also possible to use a changing number of processors for a non-MPI parallel job. The key change is that the cores keyword becomes cores="file" and the information is contained with the command line file.

The pylauncher3.ClassicLauncher function call becomes

pylauncher3.ClassicLauncher(myjob,cores="file")

and the command line file becomes something like

2,./pth_hello 2
5,./pth_hello 5
3,./pth_hello 3
4,./pth_hello 4
5,./pth_hello 5

where the first number is the number of processors to use for that command line.

Note

It is still up to you to tell your program how many processors to use in that case, so be careful to do that correctly.

Example PBS Script for variable number of processors

#!/bin/bash
#PBS -N variable_sn_job
#PBS -l nodes=1:ppn=20
#PBS -l walltime=00:10:00
#PBS -q <myqueue>

module load anaconda3
module load pylauncher/3.0

cd $PBS_O_WORKDIR
echo "Starting job."
python ./variable_sn_launcher.py
echo "Ending job."

Example Python script for variable number of processors

#!/usr/bin/env python
import pylauncher3


myjob = 'variable_sn_job'
pylauncher3.ClassicLauncher(myjob,debug="",cores="file")

Example command line file for variable number of processors

2,./pth_hello 2
5,./pth_hello 5
3,./pth_hello 3
4,./pth_hello 4
5,./pth_hello 5
3,./pth_hello 3
3,./pth_hello 3
3,./pth_hello 3
4,./pth_hello 4
4,./pth_hello 4
4,./pth_hello 4
2,./pth_hello 2
3,./pth_hello 3
4,./pth_hello 4
5,./pth_hello 5
4,./pth_hello 3

Running an MPI Workflow using Pylauncher

Distributed memory parallel programs form an important set of tools for many users. As more HPC programs begin to seek to create large scale computational sets of data, it become important to be able to support those users with this style of workflow.

The MPI workflow is a little more complicated than the other uses that have been detailed and requires the use of a different Launcher function to account for how the jobs are started. The function to be called is pylauncher3.MPILauncher, the call in the file can be

pylauncher3.MPILauncher(myjobname,cores="file")

in a similar form to the variable case above.

Example executable source code

include <stdlib.h>
#include <stdio.h>
#include "unistd.h"
#include "mpi.h"

int main(int argc,char **argv) {
  int jobno,slp,mytid,ntids;
  char outfile[5+5+5+1];
  FILE *f;

  MPI_Init(&argc,&argv);
  MPI_Comm_size(MPI_COMM_WORLD,&ntids);
  MPI_Comm_rank(MPI_COMM_WORLD,&mytid);
  if (argc<2) {
    if (mytid==0) printf("Usage: parallel id slp\n");
  }
  jobno = atoi(argv[1]);
  slp = atoi(argv[2]);

  MPI_Barrier(MPI_COMM_WORLD);
  sprintf(outfile,"pytmp-%04d-%04d",jobno,mytid);
  f = fopen(outfile,"w");
  fprintf(f,"%d/%d working\n",mytid,ntids);
  fclose(f);

  if (mytid==0) {
    printf("Job %d on %d processors\n",jobno,ntids);
  }
  sleep(slp);
  MPI_Finalize();
  return 0;
}

Example PBS script

#!/bin/bash
#PBS -N MPI pylauncher test
#PBS -l nodes=2:ppn=16
#PBS -l walltime=00:01:00
#PBS -q <myqueue>

module load anaconda3
module load pylauncher/3.0
module load gcc mvapich2

echo "Starting job."
cd $PBS_O_WORKDIR
python ./test_mpi3.py
echo "Ending job."

Example MPI pylauncher script

#!/usr/bin/env python
import pylauncher3

myjobname='mympijob'
pylauncher3.MPILauncher(myjobname,cores='file')

Example MPI command line file

4,./parallel 0 10
4,./parallel 1 10
8,./parallel 2 10
4,./parallel 3 10
4,./parallel 4 10
8,./parallel 5 10

Conclusion

It is hoped that pylauncher can be a good replacement for HTC Launcher, GNU Parallel, and job arrays.