Updated 2021-05-17
Run SAMtools on the Cluster¶
Overview¶
- SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM, and CRAM formats.
- This guide will cover how to run SAMtools on the Cluster.
- This is the link to the SAMtools Homepage.
Summary¶
- SAMtools has a set of various utilities to manipulate BAM files. It can import and export to SAM and does various operations like sorting, merging, indexing, etc.
- The example used in this walkthrough involves converting a SAM file into a BAM file and comes from this link.
Walkthrough: Run SAMtools on the Cluster¶
- This walkthrough will cover how to convert a SAM file into a BAM file using SAMtools.
sample.sam.gz
can be downloaded from here- To unzip the file, run the command
gzip -d sample.sam.gz
in the terminal after navigating to the directory you savedsample.sam.gz
to.
- To unzip the file, run the command
- PBS Script can be found here
- You can transfer the files to your account on the cluster to follow along. The file transfer guide may be helpful.
Part 1: The PBS Script¶
#PBS -N samtoolsTest
#PBS -A [Account]
#PBS -l nodes=1:ppn=2
#PBS -l pmem=8gb
#PBS -l walltime=10:00
#PBS -q inferno
#PBS -j oe
#PBS -o samtoolsTest.out
cd $PBS_O_WORKDIR
module load samtools
# Convert SAM to its binary counterpart, BAM
samtools view -S -b sample.sam > sample.bam
- The
#PBS
directives are standard, requesting 10 minutes of walltime and 1 node with 2 cores. More on#PBS
directives can be found in the PBS guide $PBS_O_WORKDIR
is a variable that represents the directory you submit the PBS script from. Make sure the files you want to use are in the same directory you put the PBS script.- Output Files will also show up in this dir as well
module load samtools
loads the default 0.1.18 version of SAMtools. To see what SAMtools versions are available, runmodule avail samtools
, and load the one you want.samtools view -S -b sample.sam > sample.bam
converts the input SAM filesample.sam
to an output BAM filesample.bam
. The -S flag specifies that the input is SAM and the -b flag specifies that the output is BAM.
Part 2: Submit Job and Check Status¶
- Make sure you're in the dir that contains the
PBS
Script as well assample.sam
. - Submit as normal, with
qsub <pbs script name>
. In this caseqsub samtools.pbs
- Check job status with
qstat -t 22182721
, replacing the number with the job id returned after running qsub - You can delete the job with
qdel 22182721
, again replacing the number with the jobid returned after running qsub
Part 3: Collecting Results¶
- In the directory where you submitted the
PBS
script, you should see asamtoolsTest.out
file which contains the results of the job and asample.bam
file which contains the BAM file created from thesample.sam
file. Usecat
or open the file in a text editor to take a look. samtoolsTest.out
should look like this:
---------------------------------------
Begin PBS Prologue Mon Jul 8 15:59:21 EDT 2019
Job ID: 26304100.shared-sched.pace.gatech.edu
User ID: svemuri8
Job name: samtoolsTest
Queue: inferno
End PBS Prologue Mon Jul 8 15:59:21 EDT 2019
---------------------------------------
[samopen] SAM header is present: 87 sequences.
---------------------------------------
Begin PBS Epilogue Mon Jul 8 16:00:42 EDT 2019
Job ID: 26304100.shared-sched.pace.gatech.edu
User ID: svemuri8
Job name: samtoolsTest
Resources: neednodes=1:ppn=2,nodes=1:ppn=2,pmem=8gb,walltime=00:10:00
Rsrc Used: cput=00:00:43,energy_used=0,mem=3660kb,vmem=232360kb,walltime=00:01:21
Queue: inferno
Nodes:
rich133-c36-18-r.pace.gatech.edu
End PBS Epilogue Mon Jul 8 16:00:42 EDT 2019
---------------------------------------
sample.bam
is greater than 400 MB, so it is too large for us to provide. If you wish to verify that you ran this example correctly, run the commandls -al
in the directory wheresample.bam
was placed and verify that the file is 4XXXXXXXX bytes large.- After the result files are produced, you can move the files off the cluster, refer to the file transfer guide for help.
- Congratulations! You successfully ran SAMtools on the cluster.