Updated 2019-06-17

Run STAR on the Cluster

Overview

  • This guide will focus on running the Spliced Transcripts Alignment to a Reference software.
  • Basic STAR workflow consists of:
    1. Generating genome indexes files
    2. Mapping reads to the genome
  • View this link to access the manual for STAR 2.5.3a.

Tips

  • Before executing module load STAR/2.5.3a, you will have to load the gcc dependency with module load gcc/4.9.0.
  • Make sure all files needed are in the same folder.
  • STAR command line has the following format:
STAR --option1-name option1-value(s)--option2-name option2-value(s) ...
  • If an option can accept multiple values, they are separated by spaces, and in a few cases - by commas.

Part 1: The PBS Script

#PBS -N testSTAR
#PBS -l nodes=1:ppn=2
#PBS -l pmem=2gb
#PBS -l walltime=1:00
#PBS -q force-6
#PBS -j oe
#PBS -o testSTAR.out

cd $PBS_O_WORKDIR
module load gcc/4.9.0
module load STAR/2.5.3a

mkdir <output dir if there isn't one>
STAR --runMode genomeGenerate --runThreadN 2 --genomeDir <output dir> \
--genomeFastaFiles <reference genome file> --sjdbGTFfile <genome annotation file> \
--sjdbGTFtagExonParentTranscript Parent --sjdbOverhang <reads_length - 1>

STAR --genomeDir <output dir> --runThreadN 2 --readFilesIn <reference genome index dir> \
--readFilesCommand zcat --outFileNamePrefix b_ --outFilterMultimapNmax 1 \
--outReadsUnmapped unmapped_b --outSAMtype BAM SortedByCoordinate

  • Here are what the parameters in the first step mean:

    • --runMode genomegenerate or read align mode, and default is read alignment
    • --runThreadN number of threads
    • --genomeDir output directory of indexed genome file
    • --genomeFastaFiles reference genome file
    • --sjdbGTFfile genome annotation file and it should be GTF format.
    • --sjdbOverhang Normally you can use the value (reads_length -1). It is the length of the genomic sequence around the annotated junction to be used for the splice junctions database
  • Here are what the parameters in the second step mean:

    • --genomeDir: reference genome index directory
    • --runThreadN: number of threads
    • --readFilesIn: input file
    • --readFilesCommand zcat: input file is a decompressed .gz file
  • The two files with the alignment results are a_Aligned.sortedByCoord.out.bam and b_Aligned.sortedByCoord.out.bam

  • This example comes from here.

Part 2: Submit Job and Check Status

  • Make sure you're in the dir that contains the PBS Script as well as the STAR program
  • Submit as normal, with qsub <pbs script name>. In this case qsub testSTAR.pbs
  • Check job status with qstat -t 22182721, replacing the number with the job id returned after running qsub
  • You can delete the job with qdel 22182721 , again replacing the number with the jobid returned after running qsub

Part 3: Collecting Results

  • You should see a directory with the name you added to the pbs script and a file called testSTAR.out.
  • After the result files are produced, you can move the files off the cluster, refer to the file transfer guide for help.
  • Congratulations! You successfully ran an STAR script.