Updated 2023-03-31

Run SAMtools on the Cluster

Overview

  • SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM, and CRAM formats.
  • This guide will cover how to run SAMtools on the Cluster.
  • This is the link to the SAMtools Homepage.

Summary

  • SAMtools has a set of various utilities to manipulate BAM files. It can import and export to SAM and does various operations like sorting, merging, indexing, etc.
  • The example used in this walkthrough involves converting a SAM file into a BAM file and comes from this link.

Walkthrough: Run SAMtools on the Cluster

  • This walkthrough will cover how to convert a SAM file into a BAM file using SAMtools.
  • sample.sam.gz can be found here
  • SBATCH Script can be found here
  • You can transfer the files to your account on the cluster to follow along. The file transfer guide may be helpful.

Part 1: The SBTACH Script

#!/bin/bash
#SBATCH -JsamtoolsTest
#SBATCH -A [Account]
#SBATCH -N1 --ntasks-per-node=2
#SBATCH --mem-per-cpu=8G
#SBATCH -t10
#SBATCH -qinferno
#SBATCH -oReport-%j.out

cd $SLURM_SUBMIT_DIR
module load samtools

# Convert SAM to its binary counterpart, BAM
samtools view -b -o sample.bam sample.sam
  • The #SBATCH directives are standard, requesting 10 minutes of walltime and 1 node with 2 cores. More on #SBATCH directives can be found in the Using Slurm on Phoenix Guide
  • $SLURM_SUBMIT_DIR is a variable that represents the directory you submit the SBATCH script from. Make sure the files you want to use are in the same directory you put the SBATCH script.
  • Output Files will also show up in this directory as well
  • module load samtools loads the default 1.14 version of SAMtools. To see what SAMtools versions are available, run module avail samtools, and load the one you want.
  • samtools view -S -b sample.sam > sample.bam converts the input SAM file sample.sam to an output BAM file sample.bam. The -S flag specifies that the input is SAM and the -b flag specifies that the output is BAM.

Part 2: Submit Job and Check Status

  • Make sure you're in the dir that contains the SBATCH Script as well as sample.sam
  • Submit as normal, with sbatch < script name>. In this case sbatch samtools.sbatch
  • Check job status with squeue --job <jobID>, replacing with the jobid returned after running sbatch
  • You can delete the job with scancel <jobID> , replacing with the jobid returned after running sbatch

Part 3: Collecting Results

  • In the directory where you submitted the SBATCH script, you should see a Report-<jobID>.out file which contains the results of the job and a sample.bam file which contains the BAM file created from the sample.sam file.
  • Report-<jobID>.out should look like this:
---------------------------------------
Begin Slurm Prolog: Dec-28-2022 23:31:16
Job ID:    254450
User ID:   svangala3
Account:   phx-pace-staff
Job name:  samtoolsTest
Partition: cpu-small
QOS:       inferno
---------------------------------------
---------------------------------------
Begin Slurm Epilog: Dec-28-2022 23:31:18
Job ID:        254450
Array Job ID:  _4294967294
User ID:       svangala3
Account:       phx-pace-staff
Job name:      samtoolsTest
Resources:     cpu=2,mem=16G,node=1
Rsrc Used:     cput=00:00:04,vmem=4556K,walltime=00:00:02,mem=0,energy_used=0
Partition:     cpu-small
QOS:           inferno
Nodes:         atl1-1-02-004-24-2
---------------------------------------
  • sample.bam is greater than 400 MB, so it is too large for us to provide. If you wish to verify that you ran this example correctly, run the command ls -al in the directory where sample.bam was placed and verify that the file is 4XXXXXXXX bytes large.
  • After the result files are produced, you can move the files off the cluster, refer to the file transfer guide for help.
  • Congratulations! You successfully ran SAMtools on the cluster.