Updated 2023-03-31

Run Python Scripts on Cluster

Overview

  • This guide will cover how to run a Python Script on the cluster, using scientific libraries and Anaconda
  • After loading the module for anaconda in your SBATCH script, you must add to the SBATCH script the necessary commands to execute your Python Script

Summary

  • The following would be added in your SBATCH script after the SBATCH directives (lines that begin with #SBATCH)
  • To load Anaconda:
    • module load anaconda3/2022.05.0.1
  • To run Python Script :
    • python <yourScript.py>
  • Submit to cluster with sbatch <yourSBATCHscript.sbatch>

Tip: Different Versions of Python

  • PACE Users have access to many versions of Anaconda that utilize various forms of Python 2 or Python 3, depending on what your looking for
  • Users can also utilize plain Python if they wish, when loading modules simply load the python version you want
  • Run module avail anaconda and module avail python to see the different versions of each available

Walkthrough: Run an Example Python Script with Anaconda

  • The example script is a simple parallel script that uses Numpy, a python scientific package to calculate the determinates of 8 random matricies size 500 x 500
  • Python Script: parallelPython.py
  • SBATCH Script: python.sbatch
  • Anaconda will be used as the python environment for this example
  • Anaconda is a sceintific distribution of python that comes with hundreds of popular data packages including:
    • Numpy
    • Pandas
    • scikitlearn

Step 1: The SBATCH Script

#!/bin/bash
#SBATCH -JpythonTest
#SBATCH -A [Account]
#SBATCH -N2 --ntasks-per-node=4
#SBATCH --mem-per-cpu=2G
#SBATCH -t2
#SBATCH -qinferno
#SBATCH -oReport-%j.out

cd $SLURM_SUBMIT_DIR
module load anaconda3/2022.05.0.1
python parallelPython.py                          
  • The #SBATCH lines are directives, requesting 2 min of walltime along with 2 nodes with 4 cores per node. More on SBATCH directives can be found in the Using Slurm on Phoenix Guide
  • $SLURM_SUBMIT_DIR is simply a variable that represents the file path to the directory you submit the SBATCH script from. Therefore, cd $SLURM_SUBMIT_DIR tells the cluster to enter the folder that contains the SBATCH Script. Make sure the .py file your want to run is in the same folder as the SBATCH script. Data and any other files your script needs should be in the same folder as the SBATCH script as well.
  • Output files will show up in the same directory where the SBATCH script was submitted
  • The echo line prints to the output file the name of the node which the job started on
  • module load loads anaconda
  • Then the script is run with python parallelPython.py

Step 2: Submit Job and check status

  • Make sure you're in the dir that contains the SBATCH Script as well as the .py file
  • Submit as normal, with sbatch < script name>. In this case sbatch python.sbatch
  • Check job status with squeue --job <jobID>, replacing with the jobid returned after running sbatch
  • You can delete the job with scancel <jobID> , replacing with the jobid returned after running sbatch

Step 3: Collect Results

  • Make sure you're in the directory that contains the SBATCH script you submitted earlier
  • Any files created by the script will show up in the folder where the script was ran
  • Take a look at pythonTest.out using any text editor, such as vim with vim pythonTest.out
  • It should include computed determinants for the 8 matricies:
---------------------------------------
Begin Slurm Prolog: Dec-29-2022 16:58:36
Job ID:    269730
User ID:   svangala3
Account:   phx-pace-staff
Job name:  pythonTest
Partition: cpu-small
QOS:       inferno
---------------------------------------
List of determinants for 8 random matrices of size 500 x 500: [-2.434403294827809e+297, 3.3373003787970883e+297, 1.9104324180989576e+295, -1.0040044568075118e+297, -4.368038540833462e+297, 1.1381429448661419e+297, 3.386642772196742e+296, 7.914278253733031e+295]
------ 3.2148385047912598 seconds ------
---------------------------------------
Begin Slurm Epilog: Dec-29-2022 16:58:46
Job ID:        269730
Array Job ID:  _4294967294
User ID:       svangala3
Account:       phx-pace-staff
Job name:      pythonTest
Resources:     cpu=8,mem=16G,node=2
Rsrc Used:     cput=00:01:28,vmem=8792K,walltime=00:00:11,mem=0,energy_used=0
Partition:     cpu-small
QOS:           inferno
Nodes:         atl1-1-02-007-1-2,atl1-1-02-007-5-1
---------------------------------------
  • Note: the results above are just a small part of the output file, which also includes other important information about the job such as resources used, queue, and node(s) used
  • To move output files off the cluster, see file transfer guide
  • Congratulations! You have succesfully run a Python script on the cluster