Updated 2023-03-31
Mash
un MASH on the Cluster #
Overview¶
- This guide will cover how to load and use
mash/2.3
mash
requires multiple other modules to be loaded before it itself can be loaded.
Load Mash¶
- To use Mash, load these module in your
SBATCH
script using module load:module load gcc/10.3.0
module load capnproto/0.8.0
module load boost/1.79.0
module load zlib
module load autoconf
Walkthrough: Run Mash on the Cluster¶
- This walkthrough will find the distance between two E.Coli genomes
- This example is straight from the Mash Documentation
- Genome 1: genome1.fna
- Genome 2: genome2.fna
- SBATCH Script: mash.sbatch
Part 1: The SBATCH Script¶
#!/bin/bash
#SBATCH -JmashTest
#SBATCH -A [Account]
#SBATCH -N2 --ntasks-per-node=4
#SBATCH --mem-per-cpu=2G
#SBATCH -t2
#SBATCH -qinferno
#SBATCH -oReport-%j.out
cd $SLURM_SUBMIT_DIR
module load gcc/10.3.0
module load capnproto/0.8.0
module load boost/1.79.0
module load mash/2.3
mash sketch genome1.fna
mash sketch genome2.fna
mash dist genome1.fna.msh genome2.fna.msh
- The
#SBATCH
directives are standard, requesting just 1 minute of walltime and 2 node with 4 cores. More on#SBATCH
directives can be found in the Using Slurm on Phoenix Guide $SLURM_SUBMIT_DIR
is simply a variable that represents the directory you submit the PBS script from.
Warning
Make sure the .fna
files you want to run are in the same directory you put the SBATCH script.
- Output Files will also show up in this dir as well
module load
lines load the dependent programs as well as Mash- Lines that begin with
mash
execute the program
Part 2: Submit Job and Check Status¶
- Make sure you're in the directory that contains the SBATCH Script and the
.fna
files - Submit as normal, with
<sbatch scriptname.sbatch>
. In this casesbatch mash.sbatch
- Check job status with
squeue --job <jobID>
, replacing with the jobid returned after running sbatch - You can delete the job with
scancel <jobID>
, replacing with the jobid returned after running sbatch
Part 3: Collecting Results¶
- In the directory where you submitted the
SBATCH
script, you should see a couple of newly generated files, includinggenome1.fna.msh
andReport-<jobID>.out
Report-<jobID>.out
should look something like this:
---------------------------------------
Begin Slurm Prolog: Dec-15-2022 18:01:58
Job ID: 190098
User ID: svangala3
Account: phx-pace-staff
Job name: mashTest
Partition: cpu-small
QOS: inferno
---------------------------------------
Sketching genome1.fna...
Writing to genome1.fna.msh...
Sketching genome2.fna...
Writing to genome2.fna.msh...
genome1.fna genome2.fna 0.0222766 0 456/1000
---------------------------------------
Begin Slurm Epilog: Dec-15-2022 18:02:00
Job ID: 190098
Array Job ID: _4294967294
User ID: svangala3
Account: phx-pace-staff
Job name: mashTest
Resources: cpu=8,mem=16G,node=2
Rsrc Used: cput=00:00:24,vmem=11684K,walltime=00:00:03,mem=0,energy_used=0
Partition: cpu-small
QOS: inferno
Nodes: atl1-1-01-004-26-[1-2]
---------------------------------------
- After the result files are produced, you can move the files off the cluster, refer to the file transfer guide for help.
- Congratulations! You successfully ran a Mash program on the cluster.