Updated 2023-03-31

Mash

un MASH on the Cluster #

Overview

  • This guide will cover how to load and use mash/2.3
  • mash requires multiple other modules to be loaded before it itself can be loaded.

Load Mash

  • To use Mash, load these module in your SBATCH script using module load:
    • module load gcc/10.3.0
    • module load capnproto/0.8.0
    • module load boost/1.79.0
    • module load zlib
    • module load autoconf

Walkthrough: Run Mash on the Cluster

Part 1: The SBATCH Script

#!/bin/bash
#SBATCH -JmashTest
#SBATCH -A [Account]
#SBATCH -N2 --ntasks-per-node=4
#SBATCH --mem-per-cpu=2G
#SBATCH -t2
#SBATCH -qinferno
#SBATCH -oReport-%j.out

cd $SLURM_SUBMIT_DIR
module load gcc/10.3.0
module load capnproto/0.8.0
module load boost/1.79.0
module load mash/2.3

mash sketch genome1.fna
mash sketch genome2.fna
mash dist genome1.fna.msh genome2.fna.msh
  • The #SBATCH directives are standard, requesting just 1 minute of walltime and 2 node with 4 cores. More on #SBATCH directives can be found in the Using Slurm on Phoenix Guide
  • $SLURM_SUBMIT_DIR is simply a variable that represents the directory you submit the PBS script from.

Warning

Make sure the .fna files you want to run are in the same directory you put the SBATCH script.

  • Output Files will also show up in this dir as well
  • module load lines load the dependent programs as well as Mash
  • Lines that begin with mash execute the program

Part 2: Submit Job and Check Status

  • Make sure you're in the directory that contains the SBATCH Script and the .fna files
  • Submit as normal, with <sbatch scriptname.sbatch>. In this case sbatch mash.sbatch
  • Check job status with squeue --job <jobID>, replacing with the jobid returned after running sbatch
  • You can delete the job with scancel <jobID> , replacing with the jobid returned after running sbatch

Part 3: Collecting Results

  • In the directory where you submitted the SBATCH script, you should see a couple of newly generated files, including genome1.fna.msh and Report-<jobID>.out
  • Report-<jobID>.out should look something like this:
---------------------------------------
Begin Slurm Prolog: Dec-15-2022 18:01:58
Job ID:    190098
User ID:   svangala3
Account:   phx-pace-staff
Job name:  mashTest
Partition: cpu-small
QOS:       inferno
---------------------------------------
Sketching genome1.fna...
Writing to genome1.fna.msh...
Sketching genome2.fna...
Writing to genome2.fna.msh...
genome1.fna     genome2.fna     0.0222766       0       456/1000
---------------------------------------
Begin Slurm Epilog: Dec-15-2022 18:02:00
Job ID:        190098
Array Job ID:  _4294967294
User ID:       svangala3
Account:       phx-pace-staff
Job name:      mashTest
Resources:     cpu=8,mem=16G,node=2
Rsrc Used:     cput=00:00:24,vmem=11684K,walltime=00:00:03,mem=0,energy_used=0
Partition:     cpu-small
QOS:           inferno
Nodes:         atl1-1-01-004-26-[1-2]
---------------------------------------
  • After the result files are produced, you can move the files off the cluster, refer to the file transfer guide for help.
  • Congratulations! You successfully ran a Mash program on the cluster.