Updated 2023-02-24
Run SAMtools on the Cluster¶
Overview¶
- SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM, and CRAM formats.
- This guide will cover how to run SAMtools on the Cluster.
- This is the link to the SAMtools Homepage.
Summary¶
- SAMtools has a set of various utilities to manipulate BAM files. It can import and export to SAM and does various operations like sorting, merging, indexing, etc.
- The example used in this walkthrough involves converting a SAM file into a BAM file and comes from this link.
Walkthrough: Run SAMtools on the Cluster¶
- This walkthrough will cover how to convert a SAM file into a BAM file using SAMtools.
sample.sam.gz
can be found hereSBATCH
Script can be found here- You can transfer the files to your account on the cluster to follow along. The file transfer guide may be helpful.
Part 1: The SBTACH Script¶
#!/bin/bash
#SBATCH -JsamtoolsTest
#SBATCH -A [Account]
#SBATCH -N1 --ntasks-per-node=2
#SBATCH --mem-per-cpu=8G
#SBATCH -t10
#SBATCH -qinferno
#SBATCH -oReport-%j.out
cd $SLURM_SUBMIT_DIR
module load samtools
# Convert SAM to its binary counterpart, BAM
samtools view -b -o sample.bam sample.sam
- The
#SBATCH
directives are standard, requesting 10 minutes of walltime and 1 node with 2 cores. More on#SBATCH
directives can be found in the Using Slurm on Phoenix Guide $SLURM_SUBMIT_DIR
is a variable that represents the directory you submit the SBATCH script from. Make sure the files you want to use are in the same directory you put the SBATCH script.- Output Files will also show up in this directory as well
module load samtools
loads the default 1.14 version of SAMtools. To see what SAMtools versions are available, runmodule avail samtools
, and load the one you want.samtools view -S -b sample.sam > sample.bam
converts the input SAM filesample.sam
to an output BAM filesample.bam
. The -S flag specifies that the input is SAM and the -b flag specifies that the output is BAM.
Part 2: Submit Job and Check Status¶
- Make sure you're in the dir that contains the
SBATCH
Script as well assample.sam
- Submit as normal, with
sbatch < script name>
. In this casesbatch samtools.sbatch
- Check job status with
squeue --job <jobID>
, replacing with the jobid returned after running sbatch - You can delete the job with
scancel <jobID>
, replacing with the jobid returned after running sbatch
Part 3: Collecting Results¶
- In the directory where you submitted the
SBATCH
script, you should see aReport-<jobID>.out
file which contains the results of the job and asample.bam
file which contains the BAM file created from thesample.sam
file. Report-<jobID>.out
should look like this:
---------------------------------------
Begin Slurm Prolog: Dec-28-2022 23:31:16
Job ID: 254450
User ID: svangala3
Account: phx-pace-staff
Job name: samtoolsTest
Partition: cpu-small
QOS: inferno
---------------------------------------
---------------------------------------
Begin Slurm Epilog: Dec-28-2022 23:31:18
Job ID: 254450
Array Job ID: _4294967294
User ID: svangala3
Account: phx-pace-staff
Job name: samtoolsTest
Resources: cpu=2,mem=16G,node=1
Rsrc Used: cput=00:00:04,vmem=4556K,walltime=00:00:02,mem=0,energy_used=0
Partition: cpu-small
QOS: inferno
Nodes: atl1-1-02-004-24-2
---------------------------------------
sample.bam
is greater than 400 MB, so it is too large for us to provide. If you wish to verify that you ran this example correctly, run the commandls -al
in the directory wheresample.bam
was placed and verify that the file is 4XXXXXXXX bytes large.- After the result files are produced, you can move the files off the cluster, refer to the file transfer guide for help.
- Congratulations! You successfully ran SAMtools on the cluster.