Updated 2023-03-31

FastQC

Run FastQC in Batch Mode on the Cluster

Overview

  • This guide will cover how to run FastQC in batch mode
  • Running FastQC in batch mode means you have an input file, such as a .fastq file that you execute through a SBATCH script. After you submit the SBATCH script, the job will run on its own without any need to watch it. FastQC will create a .zip file with images and reports, as well as a .html report
  • FastQC also can be run interactively (with a gui)

Summary

  • Run module avail fastqc to see all available FastQc versions on the cluster
  • In the SBATCH script:
    • Load FastQC with module load fastqc
    • Run FastQC with fastqc <input file name>

Warning

Users have reported that loading other Perl modules can disable the use of fastqc on RHEL6 systems. A workaround is to unload all Perl modules before using fastqc.

Walkthrough: Run FastQC in Batch Mode on the Cluster

Part 1: The SBATCH Script

#!/bin/bash
#SBATCH -JfastqcTest
#SBATCH -A [Account]
#SBATCH -N1 --ntasks-per-node=4
#SBATCH -t2
#SBATCH -qinferno
#SBATCH -oReport-%j.out

cd $SLURM_SUBMIT_DIR
module load fastqc
fastqc SRR081241.filt.fastq
  • The #SBATCH directives are standard, requesting just 2 minutes of walltime and 1 node with 4 cores. More on #SBATCH directives can be found in the Using Slurm on Phoenix Guide
  • $SBATCH_SUBMIT_DIR is simply a variable that represents the directory you submit the SBATCH script from. Make sure the .fastq file you want to run is in the same directory you put the SBATCH script. This line tells the cluster to enter this directory where you have stored the files for the job, so it has access to all the files it needs
  • Output Files, such as the resulting .html report and .zip file of results, will also show up in the same directory as the SBATCH script.
  • module load fastqc loads FastQC 0.11.2
  • fastqc SRR081241.filt.fastq executes FastQC on the input file

Part 2: Submit Job and Check Status

  • Make sure you're in the dir that contains the SBATCH Script as well as .fastq
  • Submit as normal, with sbatch <script name>. In this case sbatch fastqc.sbatch
  • Check job status with squeue --job <jobID>, replacing with the jobid returned after running sbatch
  • You can delete the job with scancel <jobID> , replacing with the jobid returned after running sbatch

Part 3: Collecting Results

  • In the directory where you submitted the SBATCH script, you should see a couple of newly generated files, including Report-<jobID>.out, SRR081241.filt_fastqc.html, and SRR081241.filt_fastqc.zip.
  • Report-<jobID>.out contains the status of the job and how the analysis went, the .html file can be opened with any web browser and contains the full report, the .zip file contains all the resulting images, graphs, and resources that are displayed in the html report as well as the report itself.
  • Open Report-<jobID>.out using a text editor such as vim with vim Report-<jobID>.out. The file should look something like this:
---------------------------------------
Begin Slurm Prolog: Jan-13-2023 10:31:41
Job ID:    469544
User ID:   svangala3
Account:   phx-pace-staff
Job name:  fastqcTest
Partition: cpu-small
QOS:       inferno
---------------------------------------
Started analysis of SRR081241.filt.fastq
Approx 5% complete for SRR081241.filt.fastq
Approx 10% complete for SRR081241.filt.fastq
Approx 15% complete for SRR081241.filt.fastq
Approx 20% complete for SRR081241.filt.fastq
Approx 25% complete for SRR081241.filt.fastq
Approx 30% complete for SRR081241.filt.fastq
Approx 35% complete for SRR081241.filt.fastq
Approx 40% complete for SRR081241.filt.fastq
Approx 45% complete for SRR081241.filt.fastq
Approx 50% complete for SRR081241.filt.fastq
Approx 55% complete for SRR081241.filt.fastq
Approx 60% complete for SRR081241.filt.fastq
Approx 65% complete for SRR081241.filt.fastq
Approx 70% complete for SRR081241.filt.fastq
Approx 75% complete for SRR081241.filt.fastq
Approx 80% complete for SRR081241.filt.fastq
Approx 85% complete for SRR081241.filt.fastq
Approx 90% complete for SRR081241.filt.fastq
Approx 95% complete for SRR081241.filt.fastq
Analysis complete for SRR081241.filt.fastq
---------------------------------------
Begin Slurm Epilog: Jan-13-2023 10:31:52
Job ID:        469544
Array Job ID:  _4294967294
User ID:       svangala3
Account:       phx-pace-staff
Job name:      fastqcTest
Resources:     cpu=4,mem=4G,node=1
Rsrc Used:     cput=00:00:52,vmem=62840K,walltime=00:00:13,mem=0,energy_used=0
Partition:     cpu-small
QOS:           inferno
Nodes:         atl1-1-02-006-30-2
---------------------------------------
  • To view the full report, you can use firefox on the cluster to view the html report
  • Run firefox SRR081241.filt_fastqc.html

Caution

You must be logged in with display enabled, meaning when you log in you have to use -X or -Y, otherwise the display (firefox window) cannot be opened

  • logout and log back in with ssh -X gtusername3@login-s.pace.gatech.edu if the display won't open
  • Report will look something like this:

Screenshot

  • After the result files are produced, you can move the .zip file as well as any other files off the cluster. Refer to the file transfer guide for help.
  • Congratulations! You successfully ran a FastQC program in batch mode on the cluster.

Run FastQC Interactively on the Cluster

Set up Interactive Desktop on Phoenix OnDemand

  • In order to access Phoenix OnDemand you must be connected to the GT VPN. If you do not already have the VPN set up, visit Configure the GT VPN
  • Once connected to the VPN you can access the application at Phoenix OnDemand.
  • Choose the Slurm Interactive Apps tab and select Interactive Desktop to setup and launch.
  • Setting up Interactive Desktop:
    • Charge Account: gts-<PI username> ("phx-pace-staff" works for trial. Required.)
    • Quality of Service: inferno
    • Node Type: GPU Tesla V100-16B or GPU Tesla V100-32B
    • Nodes: 1
    • Cores Per Node: 8 (Number of cores (CPUs) per node. Max: 24 cores per node.)
    • GPUs Per Node: 1
    • Memery Per Crore (GB): set Leave blank if unsure. Total memory for job is: nodes × cores × memory per core
    • Number of hours: 1
  • After the preceding details are entered, clik the Launch button
  • Once the Launch Interactive Desktop is an available option, click the button to open a Phoenix OnDemand Interactive Desktop to use for fastqc

Load FastQC

  • Open terminal in the Interactive Desktop by clicking top left Activities > search for ``terminal
  • All commands here on will be typed in terminal
  • On the cluster, fastqc/0.11.9 is available
  • module load fastqc/0.11.9 will load fastqc. Replace the number at the end with the version number you want to load

Run FastQC

  • Run with the command fastqc * Screenshot