Updated 2022-10-14
Run STAR on the Cluster¶
Overview¶
- This guide will focus on running the Spliced Transcripts Alignment to a Reference software.
- Basic STAR workflow consists of:
- Generating genome indexes filesPBS
- Mapping reads to the genome
- View this link to access the manual for STAR 2.5.3a.
Tips¶
- Before executing
module load STAR/2.5.3a
, you will hPBSave to load the gcc dependency withmodule load gcc/4.9.0
. - Make sure all files needed are in the same folder.
- STAR command line has the following format:
STAR --option1-name option1-value(s)--option2-name option2-value(s) ...
- If an option can accept multiple values, they are separated by spaces, and in a few cases - by commas.
Part 1: The PBS Script¶
#PBS -N testSTAR
#PBS -A [Account]
#PBS -l nodes=1:ppn=2
#PBS -l pmem=2gb
#PBS -l walltime=1:00
#PBS -q inferno
#PBS -j oe
#PBS -o testSTAR.out
cd $PBS_O_WORKDIR
module load gcc/4.9.0
module load STAR/2.5.3a
mkdir <output dir if there isn't one>
STAR --runMode genomeGenerate --runThreadN 2 --genomeDir <output dir> \
--genomeFastaFiles <reference genome file> --sjdbGTFfile <genome annotation file> \
--sjdbGTFtagExonParentTranscript Parent --sjdbOverhang <reads_length - 1>
STAR --genomeDir <output dir> --runThreadN 2 --readFilesIn <reference genome index dir> \
--readFilesCommand zcat --outFileNamePrefix b_ --outFilterMultimapNmax 1 \
--outReadsUnmapped unmapped_b --outSAMtype BAM SortedByCoordinate
-
Here are what the parameters in the first step mean:
- --runMode genomegenerate or read align mode, and default is read alignment
- --runThreadN number of threads
- --genomeDir output directory of indexed genome file
- --genomeFastaFiles reference genome file
- --sjdbGTFfile genome annotation file and it should be GTF format.
- --sjdbOverhang Normally you can use the value (reads_length -1). It is the length of the genomic sequence around the annotated junction to be used for the splice junctions database
-
Here are what the parameters in the second step mean:
- --genomeDir: reference genome index directory
- --runThreadN: number of threads
- --readFilesIn: input file
- --readFilesCommand zcat: input file is a decompressed .gz file
-
The two files with the alignment results are a_Aligned.sortedByCoord.out.bam and b_Aligned.sortedByCoord.out.bam
- This example comes from here.
Part 2: Submit Job and Check Status¶
- Make sure you're in the dir that contains the
PBS
Script as well as theSTAR
program - Submit as normal, with
qsub <PBS script name>
. In this caseqsub testSTAR.PBS
- Check job status with
qstat -t 22182721
, replacing the number with the job id returned after running qsub - You can delete the job with
qdel 22182721
, again replacing the number with the jobid returned after running qsub
Part 3: Collecting Results¶
- You should see a directory with the name you added to the PBS script and a file called
testSTAR.out
. - After the result files are produced, you can move the files off the cluster, refer to the file transfer guide for help.
- Congratulations! You successfully ran an STAR script.