Updated 2023-03-31
Run Python Scripts on Cluster¶
Overview¶
- This guide will cover how to run a Python Script on the cluster, using scientific libraries and Anaconda
- After loading the module for
anaconda
in yourSBATCH
script, you must add to theSBATCH
script the necessary commands to execute your Python Script
Summary¶
- The following would be added in your
SBATCH
script after theSBATCH
directives (lines that begin with#SBATCH
) - To load Anaconda:
module load anaconda3/2022.05.0.1
- To run Python Script :
python <yourScript.py>
- Submit to cluster with
sbatch <yourSBATCHscript.sbatch>
Tip: Different Versions of Python¶
- PACE Users have access to many versions of Anaconda that utilize various forms of
Python 2
orPython 3
, depending on what your looking for - Users can also utilize plain Python if they wish, when loading modules simply load the python version you want
- Run
module avail anaconda
andmodule avail python
to see the different versions of each available
Walkthrough: Run an Example Python Script with Anaconda¶
- The example script is a simple parallel script that uses Numpy, a python scientific package to calculate the determinates of 8 random matricies size 500 x 500
Python Script
: parallelPython.pySBATCH
Script: python.sbatch- Anaconda will be used as the python environment for this example
- Anaconda is a sceintific distribution of python that comes with hundreds of popular data packages including:
- Numpy
- Pandas
- scikitlearn
Step 1: The SBATCH Script¶
#!/bin/bash
#SBATCH -JpythonTest
#SBATCH -A [Account]
#SBATCH -N2 --ntasks-per-node=4
#SBATCH --mem-per-cpu=2G
#SBATCH -t2
#SBATCH -qinferno
#SBATCH -oReport-%j.out
cd $SLURM_SUBMIT_DIR
module load anaconda3/2022.05.0.1
python parallelPython.py
- The
#SBATCH
lines are directives, requesting 2 min of walltime along with 2 nodes with 4 cores per node. More onSBATCH
directives can be found in the Using Slurm on Phoenix Guide $SLURM_SUBMIT_DIR
is simply a variable that represents the file path to the directory you submit the SBATCH script from. Therefore,cd $SLURM_SUBMIT_DIR
tells the cluster to enter the folder that contains the SBATCH Script. Make sure the.py
file your want to run is in the same folder as the SBATCH script. Data and any other files your script needs should be in the same folder as theSBATCH
script as well.- Output files will show up in the same directory where the
SBATCH
script was submitted - The
echo
line prints to the output file the name of the node which the job started on module load
loads anaconda- Then the script is run with
python parallelPython.py
Step 2: Submit Job and check status¶
- Make sure you're in the dir that contains the
SBATCH
Script as well as the.py
file - Submit as normal, with
sbatch < script name>
. In this casesbatch python.sbatch
- Check job status with
squeue --job <jobID>
, replacing with the jobid returned after running sbatch - You can delete the job with
scancel <jobID>
, replacing with the jobid returned after running sbatch
Step 3: Collect Results¶
- Make sure you're in the directory that contains the
SBATCH
script you submitted earlier - Any files created by the script will show up in the folder where the script was ran
- Take a look at pythonTest.out using any text editor, such as vim with
vim pythonTest.out
- It should include computed determinants for the 8 matricies:
---------------------------------------
Begin Slurm Prolog: Dec-29-2022 16:58:36
Job ID: 269730
User ID: svangala3
Account: phx-pace-staff
Job name: pythonTest
Partition: cpu-small
QOS: inferno
---------------------------------------
List of determinants for 8 random matrices of size 500 x 500: [-2.434403294827809e+297, 3.3373003787970883e+297, 1.9104324180989576e+295, -1.0040044568075118e+297, -4.368038540833462e+297, 1.1381429448661419e+297, 3.386642772196742e+296, 7.914278253733031e+295]
------ 3.2148385047912598 seconds ------
---------------------------------------
Begin Slurm Epilog: Dec-29-2022 16:58:46
Job ID: 269730
Array Job ID: _4294967294
User ID: svangala3
Account: phx-pace-staff
Job name: pythonTest
Resources: cpu=8,mem=16G,node=2
Rsrc Used: cput=00:01:28,vmem=8792K,walltime=00:00:11,mem=0,energy_used=0
Partition: cpu-small
QOS: inferno
Nodes: atl1-1-02-007-1-2,atl1-1-02-007-5-1
---------------------------------------
- Note: the results above are just a small part of the output file, which also includes other important information about the job such as resources used, queue, and node(s) used
- To move output files off the cluster, see file transfer guide
- Congratulations! You have succesfully run a Python script on the cluster