Updated 2019-11-01

Run Python Scripts on Cluster

Overview

  • This guide will cover how to run a Python Script on the cluster, using scientific libraries and Anaconda
  • After loading the module for anaconda in your PBS script, you must add to the PBS script the necessary commands to execute your Python Script

Summary

  • The following would be added in your PBS script after the PBS directives (lines that begin with #PBS)
  • To load Anaconda:
    • module load anaconda3/2019.10
  • To run Python Script :
    • python <yourScript.py>
  • Submit to cluster with qsub <yourPbsScript.pbs>

Tip: Different Versions of Python

  • PACE Users have access to many versions of Anaconda that utilize various forms of Python 2 or Python 3, depending on what your looking for
  • Users can also utilize plain Python if they wish, when loading modules simply load the python version you want
  • Run module avail anaconda and module avail python to see the different versions of each available

Walkthrough: Run an Example Python Script with Anaconda

  • The example script is a simple parallel script that uses Numpy, a python scientific package to calculate the determinates of 8 random matricies size 500 x 500
  • Python Script: parallelPython.py
  • PBS Script: python_Test_Script.pbs
  • Anaconda will be used as the python environment for this example
  • Anaconda is a sceintific distribution of python that comes with hundreds of popular data packages including:
    • Numpy
    • Pandas
    • scikitlearn

Step 1: The PBS Script

#This is an example PBS script to run a parallel python program
#PBS -N pythonTest                      # job name
#PBS -l nodes=2:ppn=4                   # number of nodes and cores per node required
#PBS -l pmem=2gb                        # memory per core
#PBS -l walltime=2:00                   # duration of the job (ex: 15 min)
#PBS -q force-6                         # queue name (where job is submitted)
#PBS -j oe                              # combine output and error messages into 1 file
#PBS -o pythonTest.out                  # output file name
#PBS -m abe                             # event notification, set to email on start, end, or fail
#PBS -M shollister7@gatech.edu          # email to send notifications to

                                        # computations start here
cd $PBS_O_WORKDIR                       # enter directory where PBS Script is
echo "Started on `/bin/hostname`"       # prints name of node job is started on
module load anaconda3/2019.10            # loads python environment (anaconda)
python parallelPython.py                # runs parallel python script                                                                     
  • The #PBS lines are directives, requesting 2 min of walltime along with 2 nodes with 4 cores per node. More on PBS directives can be found in the PBS guide
  • PBS_O_WORKDIR is simply a variable that represents the file path to the directory you submit the PBS script from. Therefore, cd $PBS_O_WORKDIR tells the cluster to enter the folder that contains the PBS Script. Make sure the .py file your want to run is in the same folder as the PBS script. Data and any other files your script needs should be in the same folder as the PBS script as well.
  • Output files will show up in the same directory where the PBS script was submitted
  • The echo line prints to the output file the name of the node which the job started on
  • module load loads anaconda
  • Then the script is run with python parallelPython.py

Step 2: Submit Job and check status

  • Make sure you're in the directory that contains the PBS script and the .py file
  • Submit with qsub <pbs script name>. In this case qsub python_Test_Script.pbs
  • If successful, this will print something like 2180446.shared-sched-pace.gatech.edu. The number in the beginning is the job id, useful for checking predicted wait time in queue or job status
  • Check job status with qstat -u username3 -n, replacing "username3" with your gt username
  • You can delete the job with qdel 22182721, replacing the number with the jobid returned after running qsub
  • A couple seconds after submitting the job, find estimated wait time in queue with showstart <jobID>
  • For more ways to check status, how to cancel job, and more useful commands, checkout the command cheatsheet

Step 3: Collect Results

  • Make sure you're in the directory that contains the PBS script you submitted earlier
  • Any files created by the script will show up in the folder where the script was ran
  • Take a look at pythonTest.out using any text editor, such as vim with vim pythonTest.out
  • It should include computed determinants for the 8 matricies:
List of determinants for 8 random matrices of size 500 x 500: [2.2869797724157593e+298, 9.2541457140979006e+297, 5.2653963977892149e+295, 3.1195110580758749e+298, -3.1366572315193188e+296, -9.4950384386155796e+295, 3.4255407298895087e+297, -1.439004202726796e+297]
------ 0.7035911083221436 seconds ------
---------------------------------------
Begin PBS Epilogue Mon Jan  7 15:03:47 EST 2019
Job ID:     23467216.shared-sched.pace.gatech.edu

  • Note: the results above are just a small part of the output file, which also includes other important information about the job such as resources used, queue, and node(s) used
  • To move output files off the cluster, see storage and moving files guide
  • Congratulations! You have succesfully run a Python script on the cluster