Updated 2023-03-31
FastQC¶
Run FastQC in Batch Mode on the Cluster¶
Overview¶
- This guide will cover how to run FastQC in batch mode
- Running FastQC in batch mode means you have an input file, such as a
.fastq
file that you execute through aSBATCH
script. After you submit theSBATCH
script, the job will run on its own without any need to watch it. FastQC will create a.zip
file with images and reports, as well as a.html
report - FastQC also can be run interactively (with a gui)
Summary¶
- Run
module avail fastqc
to see all available FastQc versions on the cluster - In the
SBATCH
script:- Load FastQC with
module load fastqc
- Run FastQC with
fastqc <input file name>
- Load FastQC with
Warning
Users have reported that loading other Perl modules can disable the use of fastqc
on RHEL6 systems. A workaround is to unload all Perl modules before using fastqc
.
Walkthrough: Run FastQC in Batch Mode on the Cluster¶
- The data being used is from the 1000 Genomes Project
Input file
: SRR081241.filt.fastqSBATCH
Script: fastqc.sbatch- To follow along, manually copy these files into the same directory in your account on the cluster, or file transfer them to your account.
Part 1: The SBATCH Script¶
#!/bin/bash
#SBATCH -JfastqcTest
#SBATCH -A [Account]
#SBATCH -N1 --ntasks-per-node=4
#SBATCH -t2
#SBATCH -qinferno
#SBATCH -oReport-%j.out
cd $SLURM_SUBMIT_DIR
module load fastqc
fastqc SRR081241.filt.fastq
- The
#SBATCH
directives are standard, requesting just 2 minutes of walltime and 1 node with 4 cores. More on#SBATCH
directives can be found in the Using Slurm on Phoenix Guide $SBATCH_SUBMIT_DIR
is simply a variable that represents the directory you submit the SBATCH script from. Make sure the.fastq
file you want to run is in the same directory you put the SBATCH script. This line tells the cluster to enter this directory where you have stored the files for the job, so it has access to all the files it needs- Output Files, such as the resulting
.html
report and.zip
file of results, will also show up in the same directory as theSBATCH
script. module load fastqc
loads FastQC 0.11.2fastqc SRR081241.filt.fastq
executes FastQC on the input file
Part 2: Submit Job and Check Status¶
- Make sure you're in the dir that contains the
SBATCH
Script as well as.fastq
- Submit as normal, with
sbatch <script name>
. In this casesbatch fastqc.sbatch
- Check job status with
squeue --job <jobID>
, replacing with the jobid returned after running sbatch - You can delete the job with
scancel <jobID>
, replacing with the jobid returned after running sbatch
Part 3: Collecting Results¶
- In the directory where you submitted the
SBATCH
script, you should see a couple of newly generated files, includingReport-<jobID>.out
,SRR081241.filt_fastqc.html
, andSRR081241.filt_fastqc.zip
. Report-<jobID>.out
contains the status of the job and how the analysis went, the.html
file can be opened with any web browser and contains the full report, the.zip
file contains all the resulting images, graphs, and resources that are displayed in thehtml
report as well as the report itself.- Open
Report-<jobID>.out
using a text editor such as vim withvim Report-<jobID>.out
. The file should look something like this:
---------------------------------------
Begin Slurm Prolog: Jan-13-2023 10:31:41
Job ID: 469544
User ID: svangala3
Account: phx-pace-staff
Job name: fastqcTest
Partition: cpu-small
QOS: inferno
---------------------------------------
Started analysis of SRR081241.filt.fastq
Approx 5% complete for SRR081241.filt.fastq
Approx 10% complete for SRR081241.filt.fastq
Approx 15% complete for SRR081241.filt.fastq
Approx 20% complete for SRR081241.filt.fastq
Approx 25% complete for SRR081241.filt.fastq
Approx 30% complete for SRR081241.filt.fastq
Approx 35% complete for SRR081241.filt.fastq
Approx 40% complete for SRR081241.filt.fastq
Approx 45% complete for SRR081241.filt.fastq
Approx 50% complete for SRR081241.filt.fastq
Approx 55% complete for SRR081241.filt.fastq
Approx 60% complete for SRR081241.filt.fastq
Approx 65% complete for SRR081241.filt.fastq
Approx 70% complete for SRR081241.filt.fastq
Approx 75% complete for SRR081241.filt.fastq
Approx 80% complete for SRR081241.filt.fastq
Approx 85% complete for SRR081241.filt.fastq
Approx 90% complete for SRR081241.filt.fastq
Approx 95% complete for SRR081241.filt.fastq
Analysis complete for SRR081241.filt.fastq
---------------------------------------
Begin Slurm Epilog: Jan-13-2023 10:31:52
Job ID: 469544
Array Job ID: _4294967294
User ID: svangala3
Account: phx-pace-staff
Job name: fastqcTest
Resources: cpu=4,mem=4G,node=1
Rsrc Used: cput=00:00:52,vmem=62840K,walltime=00:00:13,mem=0,energy_used=0
Partition: cpu-small
QOS: inferno
Nodes: atl1-1-02-006-30-2
---------------------------------------
- To view the full report, you can use
firefox
on the cluster to view thehtml
report - Run
firefox SRR081241.filt_fastqc.html
Caution
You must be logged in with display enabled, meaning when you log in you have to use -X
or -Y
, otherwise the display (firefox window) cannot be opened
- logout and log back in with
ssh -X gtusername3@login-s.pace.gatech.edu
if the display won't open - Report will look something like this:
- After the result files are produced, you can move the
.zip
file as well as any other files off the cluster. Refer to the file transfer guide for help. - Congratulations! You successfully ran a FastQC program in batch mode on the cluster.
Run FastQC Interactively on the Cluster¶
Set up Interactive Desktop on Phoenix OnDemand¶
- In order to access Phoenix OnDemand you must be connected to the GT VPN. If you do not already have the VPN set up, visit Configure the GT VPN
- Once connected to the VPN you can access the application at Phoenix OnDemand.
- Choose the
Slurm Interactive Apps
tab and selectInteractive Desktop
to setup and launch. - Setting up Interactive Desktop:
- Charge Account:
gts-<PI username> ("phx-pace-staff" works for trial. Required.)
- Quality of Service:
inferno
- Node Type:
GPU Tesla V100-16B or GPU Tesla V100-32B
- Nodes:
1
- Cores Per Node:
8 (Number of cores (CPUs) per node. Max: 24 cores per node.)
- GPUs Per Node:
1
- Memery Per Crore (GB):
set Leave blank if unsure. Total memory for job is: nodes × cores × memory per core
- Number of hours:
1
- Charge Account:
- After the preceding details are entered, clik the
Launch
button - Once the
Launch Interactive Desktop
is an available option, click the button to open a Phoenix OnDemand Interactive Desktop to use for fastqc
Load FastQC¶
- Open terminal in the Interactive Desktop by clicking top left
Activities
> search for``terminal
- All commands here on will be typed in terminal
- On the cluster,
fastqc/0.11.9
is available module load fastqc/0.11.9
will load fastqc. Replace the number at the end with the version number you want to load
Run FastQC¶
- Run with the command
fastqc
*