Updated 2021-05-17

Run SAMtools on the Cluster

Overview

  • SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM, and CRAM formats.
  • This guide will cover how to run SAMtools on the Cluster.
  • This is the link to the SAMtools Homepage.

Summary

  • SAMtools has a set of various utilities to manipulate BAM files. It can import and export to SAM and does various operations like sorting, merging, indexing, etc.
  • The example used in this walkthrough involves converting a SAM file into a BAM file and comes from this link.

Walkthrough: Run SAMtools on the Cluster

  • This walkthrough will cover how to convert a SAM file into a BAM file using SAMtools.
  • sample.sam.gz can be downloaded from here
    • To unzip the file, run the command gzip -d sample.sam.gz in the terminal after navigating to the directory you saved sample.sam.gz to.
  • PBS Script can be found here
  • You can transfer the files to your account on the cluster to follow along. The file transfer guide may be helpful.

Part 1: The PBS Script

#PBS -N samtoolsTest
#PBS -A [Account]
#PBS -l nodes=1:ppn=2
#PBS -l pmem=8gb
#PBS -l walltime=10:00
#PBS -q inferno
#PBS -j oe
#PBS -o samtoolsTest.out

cd $PBS_O_WORKDIR
module load samtools

# Convert SAM to its binary counterpart, BAM
samtools view -S -b sample.sam > sample.bam

  • The #PBS directives are standard, requesting 10 minutes of walltime and 1 node with 2 cores. More on #PBS directives can be found in the PBS guide
  • $PBS_O_WORKDIR is a variable that represents the directory you submit the PBS script from. Make sure the files you want to use are in the same directory you put the PBS script.
  • Output Files will also show up in this dir as well
  • module load samtools loads the default 0.1.18 version of SAMtools. To see what SAMtools versions are available, run module avail samtools, and load the one you want.
  • samtools view -S -b sample.sam > sample.bam converts the input SAM file sample.sam to an output BAM file sample.bam. The -S flag specifies that the input is SAM and the -b flag specifies that the output is BAM.

Part 2: Submit Job and Check Status

  • Make sure you're in the dir that contains the PBS Script as well as sample.sam.
  • Submit as normal, with qsub <pbs script name>. In this case qsub samtools.pbs
  • Check job status with qstat -t 22182721, replacing the number with the job id returned after running qsub
  • You can delete the job with qdel 22182721 , again replacing the number with the jobid returned after running qsub

Part 3: Collecting Results

  • In the directory where you submitted the PBS script, you should see a samtoolsTest.out file which contains the results of the job and a sample.bam file which contains the BAM file created from the sample.sam file. Use cat or open the file in a text editor to take a look.
  • samtoolsTest.out should look like this:
---------------------------------------
Begin PBS Prologue Mon Jul  8 15:59:21 EDT 2019
Job ID:     26304100.shared-sched.pace.gatech.edu
User ID:    svemuri8
Job name:   samtoolsTest
Queue:      inferno
End PBS Prologue Mon Jul  8 15:59:21 EDT 2019
---------------------------------------
[samopen] SAM header is present: 87 sequences.
---------------------------------------
Begin PBS Epilogue Mon Jul  8 16:00:42 EDT 2019
Job ID:     26304100.shared-sched.pace.gatech.edu
User ID:    svemuri8
Job name:   samtoolsTest
Resources:  neednodes=1:ppn=2,nodes=1:ppn=2,pmem=8gb,walltime=00:10:00
Rsrc Used:  cput=00:00:43,energy_used=0,mem=3660kb,vmem=232360kb,walltime=00:01:21
Queue:      inferno
Nodes:
rich133-c36-18-r.pace.gatech.edu
End PBS Epilogue Mon Jul  8 16:00:42 EDT 2019
---------------------------------------
  • sample.bam is greater than 400 MB, so it is too large for us to provide. If you wish to verify that you ran this example correctly, run the command ls -al in the directory where sample.bam was placed and verify that the file is 4XXXXXXXX bytes large.
  • After the result files are produced, you can move the files off the cluster, refer to the file transfer guide for help.
  • Congratulations! You successfully ran SAMtools on the cluster.