Updated 2021-05-17
FastQC¶
Run FastQC in Batch Mode on the Cluster¶
Overview¶
- This guide will cover how to run FastQC in batch mode
- Running FastQC in batch mode means you have an input file, such as a
.fastq
file that you execute through aPBS
script. After you submit thePBS
script, the job will run on its own without any need to watch it. FastQC will create a.zip
file with images and reports, as well as a.html
report - FastQC also can be run interactively (with a gui)
Summary¶
- Run
module avail fastqc
to see all available FastQc versions on the cluster - In the
PBS
script:- Load FastQC with
module load fastqc
- Run FastQC with
fastqc <input file name>
- Load FastQC with
Warning
Users have reported that loading other Perl modules can disable the use of fastqc
on RHEL6 systems. A workaround is to unload all Perl modules before using fastqc
.
Walkthrough: Run FastQC in Batch Mode on the Cluster¶
- The data being used is from the 1000 Genomes Project
- Input file: SRR081241.filt.fastq
- PBS Script: fastqc.pbs
- To follow along, manually copy these files into the same directory in your account on the cluster, or transfer them to your account.
Part 1: The PBS Script¶
#PBS -N fastqcTest
#PBS -A [Account]
#PBS -l nodes=1:ppn=4
#PBS -l walltime=2:00
#PBS -q inferno
#PBS -j oe
#PBS -o fastqc.out
cd $PBS_O_WORKDIR
module load fastqc
fastqc SRR081241.filt.fastq
- The
#PBS
directives are standard, requesting just 2 minutes of walltime and 1 node with 4 cores. More on#PBS
directives can be found in the PBS guide $PBS_O_WORKDIR
is simply a variable that represents the directory you submit the PBS script from. Make sure the.fastq
file you want to run is in the same directory you put the PBS script. This line tells the cluster to enter this directory where you have stored the files for the job, so it has access to all the files it needs- Output Files, such as the resulting
.html
report and.zip
file of results, will also show up in the same directory as thePBS
script. module load fastqc
loads FastQC 0.11.2fastqc SRR081241.filt.fastq
executes FastQC on the input file
Part 2: Submit the Job and Check Status¶
- Make sure you're in the directory that contains the
PBS
script and the.fastq
file - Submit as normal, with
qsub <pbs script name>
. In this caseqsub fastq.pbs
- Check job status with
qstat -u username3 -n
, replacing "username3" with your gt username - You can delete the job with
qdel 22182721
, replacing the number with the jobid returned after running qsub
Part 3: Collecting Results¶
- In the directory where you submitted the
PBS
script, you should see a couple of newly generated files, includingfastqc.out
,SRR081241.filt_fastqc.html
, andSRR081241.filt_fastqc.zip
. fastqc.out
contains the status of the job and how the analysis went, the.html
file can be opened with any web browser and contains the full report, the.zip
file contains all the resulting images, graphs, and resources that are displayed in thehtml
report as well as the report itself.- Open
fastqc.out
using a text editor such as vim withvim fastqc.out
. The file should look something like this:
Job name: fastqcTest
Queue: inferno
End PBS Prologue Mon Nov 26 08:54:52 EST 2018
---------------------------------------
Started analysis of SRR081241.filt.fastq
Approx 5% complete for SRR081241.filt.fastq
Approx 10% complete for SRR081241.filt.fastq
Approx 15% complete for SRR081241.filt.fastq
Approx 20% complete for SRR081241.filt.fastq
Approx 25% complete for SRR081241.filt.fastq
Approx 30% complete for SRR081241.filt.fastq
Approx 35% complete for SRR081241.filt.fastq
Approx 40% complete for SRR081241.filt.fastq
Approx 45% complete for SRR081241.filt.fastq
Approx 50% complete for SRR081241.filt.fastq
Approx 55% complete for SRR081241.filt.fastq
Approx 60% complete for SRR081241.filt.fastq
Approx 65% complete for SRR081241.filt.fastq
Approx 70% complete for SRR081241.filt.fastq
Approx 75% complete for SRR081241.filt.fastq
Approx 80% complete for SRR081241.filt.fastq
Approx 85% complete for SRR081241.filt.fastq
Approx 90% complete for SRR081241.filt.fastq
Approx 95% complete for SRR081241.filt.fastq
Analysis complete for SRR081241.filt.fastq
---------------------------------------
Begin PBS Epilogue Mon Nov 26 08:55:00 EST 2018
Job ID: 22971857.shared-sched.pace.gatech.edu
User ID: shollister7
- To view the full report, you can use
firefox
on the cluster to view thehtml
report - Run
firefox SRR081241.filt_fastqc.html
Caution
You must be logged in with display enabled, meaning when you log in you have to use -X
or -Y
, otherwise the display (firefox window) cannot be opened
- logout and log back in with
ssh -X gtusername3@login-s.pace.gatech.edu
if the display won't open - Report will look something like this:
- After the result files are produced, you can move the
.zip
file as well as any other files off the cluster. Refer to the file transfer guide for help. - Congratulations! You successfully ran a FastQC program in batch mode on the cluster.
Run FastQC Interactively on the Cluster¶
Overview¶
- Running FastQC on the cluster follows the same general steps as any interacive job. These steps are:
- Set up VNC Session
- Load
fastqc
module. Usemodule avail fastqc
to see what versions of fastqc are available - Run FastQC with the command
fastqc
Set up VNC Session¶
- Please see the VNC guide for instructions on how to set up the Interactive VNC session
Load FastQC¶
- Open a terminal in the vnc window by clicking top left
Applications
>System Tools
> scroll down toterminal
- All commands here on will be typed in terminal in VNC
- On the cluster,
fastqc/0.10.1
andfastqc/0.11.2
are available module load fastqc/0.11.2
will load fastqc. Replace the number at the end with the version number you want to load
Run FastQC¶
- Run with the command
fastqc
*