Updated 2023-03-31

Run Trim Galore! on the Cluster

Overview

  • Trim Galore! is a wrapper script to automate quality and adapter trimming as well as quality control, with some added functionality to remove biased methylation positions for RRBS sequence files (for directional, non-directional (or paired-end) sequencing).
  • This guide will cover how to run Trim Galore! on the Cluster.
  • Here is a link to Trim Galore!'s Homepage.

Summary

  • Running Trim Galore! is as simple as loading the required modules are running trim_galore file.fq replacing file.fq with your FASTQ file.

Walkthrough: Run Trim Galore! on the Cluster

  • This walkthrough will have Trim Galore! remove base calls with a Phred score of 20 or lower (assuming Sanger encoding) and remove sequences that became shorter than 20 bp from a FASTQ file.
  • SP1.fq can be found here which is sourced from this webpage.
  • SBATCH Script can be found here.
  • You can transfer the files to your account on the cluster to follow along. The file transfer guide may be helpful.

Part 1: The SBATCH Script

#!/bin/bash
#SBATCH -JtrimgaloreTest
#SBATCH -A [Account]
#SBATCH -N1 --ntasks-per-node=2
#SBATCH --mem-per-cpu=2G
#SBATCH -t3
#SBATCH -qinferno
#SBATCH -oReport-%j.out

cd $SLURM_SUBMIT_DIR
module load gcc/10.3.0
module load trimgalore/0.6.6

trim_galore SP1.fq
  • The #SBATCH directives are standard, requesting just 3 minutes of walltime and 1 node with 2 cores. More on #SBATCH directives can be found in the Using Slurm on Phoenix Guide
  • $SLURM_SUBMIT_DIR is a variable that represents the directory you submit the SBATCH script from. Make sure the files you want to use are in the same directory you put the SBATCH script.
  • Output Files will also show up in this dir as well
  • module load trim_galore/0.6.6 loads the 0.6.6 version of Trim Galore!. To see what versions of a software are available, run module avail [Software], and load the one you want. The other modules are dependencies that must be loaded before Trim Galore! is loaded.
  • trim_galore SP1.fq gets Trim Galore! to remove base calls with a Phred score of 20 or lower (assuming Sanger encoding) and remove sequences that became shorter than 20 bp.

Part 2: Submit Job and Check Status

  • Make sure you're in the dir that contains the SBATCH Script as well as SP1.fq
  • Submit as normal, with sbatch < script name>. In this case sbatch trimgalore.sbatch
  • Check job status with squeue --job <jobID>, replacing with the jobid returned after running sbatch
  • You can delete the job with scancel <jobID> , replacing with the jobid returned after running sbatch

Part 3: Collecting Results

  • In the directory where you submitted the SBATCH script, you should see a Report-<jobID>.out file which contains the results of the job, SP1.fq_trimming_report.txt which contains a report of the trimming performed on the FASTQ file, and SP1_trimmed.fq which contains the trimmed version of SP1.fq. Use cat or open the file in a text editor to take a look.
  • Report-<jobID>.out should look like SP1.fq_trimming_report.txt except with the SBATCH Prologue at the top and SBATCH Epilogue at the bottom.
  • SP1.fq_trimming_report.txt should look like this:
SUMMARISING RUN PARAMETERS
==========================
Input filename: SP1.fq
Trimming mode: single-end
Trim Galore version: 0.6.6
Cutadapt version: 2.10
Number of cores used for trimming: 1
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; default (inconclusive auto-detection))
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length before a sequence gets removed: 20 bp


This is cutadapt 2.10 with Python 3.9.12
Command line parameters: -j 1 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC SP1.fq
Processing reads on 1 core in single-end mode ...
Finished in 0.01 s (68 us/read; 0.88 M reads/minute).

=== Summary ===

Total reads processed:                     250
Reads with adapters:                       102 (40.8%)
Reads written (passing filters):           250 (100.0%)

Total basepairs processed:         7,750 bp
Quality-trimmed:                     549 bp (7.1%)
Total written (filtered):          6,625 bp (85.5%)

=== Adapter 1 ===

Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 102 times.

No. of allowed errors:
0-9 bp: 0; 10-13 bp: 1

Bases preceding removed adapters:
  A: 19.6%
  C: 58.8%
  G: 5.9%
  T: 15.7%
  none/other: 0.0%

Overview of removed sequences
length  count   expect  max.err error counts
1       45      62.5    0       45
2       14      15.6    0       14
3       4       3.9     0       4
4       2       1.0     0       2
5       1       0.2     0       1
6       5       0.1     0       5
8       3       0.0     0       3
9       2       0.0     0       2
11      4       0.0     1       2 2
12      3       0.0     1       2 1
13      2       0.0     1       2
14      4       0.0     1       4
15      5       0.0     1       1 4
16      1       0.0     1       1
17      2       0.0     1       2
18      1       0.0     1       0 1
24      3       0.0     1       0 3
29      1       0.0     1       1


RUN STATISTICS FOR INPUT FILE: SP1.fq
=============================================
250 sequences processed in total
Sequences removed because they became shorter than the length cutoff of 20 bp:  39 (15.6%)
  • SP1_trimmed.fq should look like this.
  • After the result files are produced, you can move the files off the cluster, refer to the file transfer guide for help.
  • Congratulations! You successfully ran Trim Galore! on the cluster.