Updated 2021-05-17
Run Trim Galore! on the Cluster¶
Overview¶
- Trim Galore! is a wrapper script to automate quality and adapter trimming as well as quality control, with some added functionality to remove biased methylation positions for RRBS sequence files (for directional, non-directional (or paired-end) sequencing).
- This guide will cover how to run Trim Galore! on the Cluster.
- Here is a link to Trim Galore!'s Homepage.
Summary¶
- Running Trim Galore! is as simple as loading the required modules are running
trim_galore file.fq
replacing file.fq with your FASTQ file.
Walkthrough: Run Trim Galore! on the Cluster¶
- This walkthrough will have Trim Galore! remove base calls with a Phred score of 20 or lower (assuming Sanger encoding) and remove sequences that became shorter than 20 bp from a FASTQ file.
- SP1.fq can be downloaded here which is sourced from this webpage.
- PBS Script can be found here.
- You can transfer the files to your account on the cluster to follow along. The file transfer guide may be helpful.
Part 1: The PBS Script¶
#PBS -N trimgaloreTest
#PBS -A [Account]
#PBS -l nodes=1:ppn=2
#PBS -l pmem=2gb
#PBS -l walltime=3:00
#PBS -q inferno
#PBS -j oe
#PBS -o trimgaloreTest.out
cd $PBS_O_WORKDIR
module load python/2.7
module load fastqc/0.10.1
module load trim_galore/0.3.7
trim_galore SP1.fq
- The
#PBS
directives are standard, requesting just 3 minutes of walltime and 1 node with 2 cores. More on#PBS
directives can be found in the PBS guide $PBS_O_WORKDIR
is a variable that represents the directory you submit the PBS script from. Make sure the files you want to use are in the same directory you put the PBS script.- Output Files will also show up in this dir as well
module load trim_galore/0.3.7
loads the 0.3.7 version of Trim Galore!. To see what versions of a software are available, runmodule avail [Software]
, and load the one you want. The other modules are dependencies that must be loaded before Trim Galore! is loaded.trim_galore SP1.fq
gets Trim Galore! to remove base calls with a Phred score of 20 or lower (assuming Sanger encoding) and remove sequences that became shorter than 20 bp.
Part 2: Submit Job and Check Status¶
- Make sure you're in the dir that contains the
PBS
Script as well as theSP1.fq
FASTQ file. - Submit as normal, with
qsub <pbs script name>
. In this caseqsub trim_galore.pbs
- Check job status with
qstat -t 22182721
, replacing the number with the job id returned after running qsub - You can delete the job with
qdel 22182721
, again replacing the number with the jobid returned after running qsub
Part 3: Collecting Results¶
- In the directory where you submitted the
PBS
script, you should see atrimgaloreTest.out
file which contains the results of the job,SP1.fq_trimming_report.txt
which contains a report of the trimming performed on the FASTQ file, andSP1_trimmed.fq
which contains the trimmed version ofSP1.fq
. Usecat
or open the file in a text editor to take a look. trimgaloreTest.out
should look like SP1.fq_trimming_report.txt except with the PBS Prologue at the top and PBS Epilogue at the bottom.SP1.fq_trimming_report.txt
should look like this:
SUMMARISING RUN PARAMETERS
==========================
Input filename: SP1.fq
Trimming mode: single-end
Trim Galore version: 0.3.7
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGC'
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length before a sequence gets removed: 20 bp
This is cutadapt 1.8.1 with Python 2.7.9
Command line parameters: -f fastq -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC SP1.fq
Trimming 1 adapter with at most 10.0% errors in single-end mode ...
Finished in 0.01 s (40 us/read; 1.50 M reads/minute).
=== Summary ===
Total reads processed: 250
Reads with adapters: 102 (40.8%)
Reads written (passing filters): 250 (100.0%)
Total basepairs processed: 7,750 bp
Quality-trimmed: 549 bp (7.1%)
Total written (filtered): 6,625 bp (85.5%)
=== Adapter 1 ===
Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 102 times.
No. of allowed errors:
0-9 bp: 0; 10-13 bp: 1
Bases preceding removed adapters:
A: 19.6%
C: 58.8%
G: 5.9%
T: 15.7%
none/other: 0.0%
Overview of removed sequences
length count expect max.err error counts
1 45 62.5 0 45
2 14 15.6 0 14
3 4 3.9 0 4
4 2 1.0 0 2
5 1 0.2 0 1
6 5 0.1 0 5
8 3 0.0 0 3
9 2 0.0 0 2
11 4 0.0 1 2 2
12 3 0.0 1 2 1
13 2 0.0 1 2
14 4 0.0 1 4
15 5 0.0 1 1 4
16 1 0.0 1 1
17 2 0.0 1 2
18 1 0.0 1 0 1
24 3 0.0 1 0 3
29 1 0.0 1 1
RUN STATISTICS FOR INPUT FILE: SP1.fq
=============================================
250 sequences processed in total
Sequences removed because they became shorter than the length cutoff of 20 bp: 39 (15.6%)
SP1_trimmed.fq
should look like this.- After the result files are produced, you can move the files off the cluster, refer to the file transfer guide for help.
- Congratulations! You successfully ran Trim Galore! on the cluster.