Run Trim Galore! on the Cluster¶
- Trim Galore! is a wrapper script to automate quality and adapter trimming as well as quality control, with some added functionality to remove biased methylation positions for RRBS sequence files (for directional, non-directional (or paired-end) sequencing).
- This guide will cover how to run Trim Galore! on the Cluster.
- Here is a link to Trim Galore!'s Homepage.
- Running Trim Galore! is as simple as loading the required modules are running
trim_galore file.fqreplacing file.fq with your FASTQ file.
Walkthrough: Run Trim Galore! on the Cluster¶
- This walkthrough will have Trim Galore! remove base calls with a Phred score of 20 or lower (assuming Sanger encoding) and remove sequences that became shorter than 20 bp from a FASTQ file.
SP1.fqcan be found here which is sourced from this webpage.
SBATCHScript can be found here.
- You can transfer the files to your account on the cluster to follow along. The file transfer guide may be helpful.
Part 1: The SBATCH Script¶
#!/bin/bash #SBATCH -JtrimgaloreTest #SBATCH -A [Account] #SBATCH -N1 --ntasks-per-node=2 #SBATCH --mem-per-cpu=2G #SBATCH -t3 #SBATCH -qinferno #SBATCH -oReport-%j.out cd $SLURM_SUBMIT_DIR module load gcc/10.3.0 module load trimgalore/0.6.6 trim_galore SP1.fq
#SBATCHdirectives are standard, requesting just 3 minutes of walltime and 1 node with 2 cores. More on
#SBATCHdirectives can be found in the Using Slurm on Phoenix Guide
$SLURM_SUBMIT_DIRis a variable that represents the directory you submit the SBATCH script from. Make sure the files you want to use are in the same directory you put the SBATCH script.
- Output Files will also show up in this dir as well
module load trim_galore/0.6.6loads the 0.6.6 version of Trim Galore!. To see what versions of a software are available, run
module avail [Software], and load the one you want. The other modules are dependencies that must be loaded before Trim Galore! is loaded.
trim_galore SP1.fqgets Trim Galore! to remove base calls with a Phred score of 20 or lower (assuming Sanger encoding) and remove sequences that became shorter than 20 bp.
Part 2: Submit Job and Check Status¶
- Make sure you're in the dir that contains the
SBATCHScript as well as
- Submit as normal, with
sbatch < script name>. In this case
- Check job status with
squeue --job <jobID>, replacing with the jobid returned after running sbatch
- You can delete the job with
scancel <jobID>, replacing with the jobid returned after running sbatch
Part 3: Collecting Results¶
- In the directory where you submitted the
SBATCHscript, you should see a
Report-<jobID>.outfile which contains the results of the job,
SP1.fq_trimming_report.txtwhich contains a report of the trimming performed on the FASTQ file, and
SP1_trimmed.fqwhich contains the trimmed version of
cator open the file in a text editor to take a look.
Report-<jobID>.outshould look like SP1.fq_trimming_report.txt except with the SBATCH Prologue at the top and SBATCH Epilogue at the bottom.
SP1.fq_trimming_report.txtshould look like this:
SUMMARISING RUN PARAMETERS ========================== Input filename: SP1.fq Trimming mode: single-end Trim Galore version: 0.6.6 Cutadapt version: 2.10 Number of cores used for trimming: 1 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; default (inconclusive auto-detection)) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length before a sequence gets removed: 20 bp This is cutadapt 2.10 with Python 3.9.12 Command line parameters: -j 1 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC SP1.fq Processing reads on 1 core in single-end mode ... Finished in 0.01 s (68 us/read; 0.88 M reads/minute). === Summary === Total reads processed: 250 Reads with adapters: 102 (40.8%) Reads written (passing filters): 250 (100.0%) Total basepairs processed: 7,750 bp Quality-trimmed: 549 bp (7.1%) Total written (filtered): 6,625 bp (85.5%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 102 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 19.6% C: 58.8% G: 5.9% T: 15.7% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 45 62.5 0 45 2 14 15.6 0 14 3 4 3.9 0 4 4 2 1.0 0 2 5 1 0.2 0 1 6 5 0.1 0 5 8 3 0.0 0 3 9 2 0.0 0 2 11 4 0.0 1 2 2 12 3 0.0 1 2 1 13 2 0.0 1 2 14 4 0.0 1 4 15 5 0.0 1 1 4 16 1 0.0 1 1 17 2 0.0 1 2 18 1 0.0 1 0 1 24 3 0.0 1 0 3 29 1 0.0 1 1 RUN STATISTICS FOR INPUT FILE: SP1.fq ============================================= 250 sequences processed in total Sequences removed because they became shorter than the length cutoff of 20 bp: 39 (15.6%)
SP1_trimmed.fqshould look like this.
- After the result files are produced, you can move the files off the cluster, refer to the file transfer guide for help.
- Congratulations! You successfully ran Trim Galore! on the cluster.