Updated 2021-05-17
Run IDBA on the Cluster¶
Summary¶
- IDBA is the basic iterative de Bruijn graph assembler for second-generation sequencing reads. More information can be found on the IDBA github
- Use
module avail idba
to see all the available versions of IDBA on the cluster - To load IDBA in you
PBS
script:- Load with
module load idba/1.1.1
. Replace the version number with the version you want to load.
- Load with
- To run IDBA:
- In your
PBS
script, put all the lines to execute IDBA after themodule load
lines that load IDBA
- In your
Warning
you must set the --num_threads
option to the number of processors you requested in the #PBS
directives portion of your PBS
script. Example: if you requested 16 processors (2 nodes and 8 proc. per node), you would set the --num_threads
option as --num_threads 16
Example PBS Script¶
#PBS -N idbaTest
#PBS -A [Account]
#PBS -l nodes=2:ppn=8
#PBS -l walltime=1:00:00
#PBS -q inferno
#PBS -j oe
#PBS -o idbaResult.out
cd $PBS_O_WORKDIR
module load idba/1.1.1
idba_ud -r read.fa -o output_dir --num_threads 16
- The
#PBS
directives are standard, requesting 1 hour of walltime and 2 nodes with 8 cores per node. More on#PBS
directives can be found in the PBS guide
Note
If using $PBS_O_WORKDIR
, the .fa
files, as well as any other files required for the job, must be stored in the same folder as the PBS
script
$PBS_O_WORKDIR
is simply a variable that represents the folder you submit the PBS script from. Make sure the.fa
files, and any other files you need are in the same folder you put thePBS
script in. This line tells the cluster to enter this directory where you have stored thePBS
script, and look for all the files for the job. If you use$PBS_O_WORKDIR
, you need to have all your files in the same folder as yourPBS
script otherwise the cluster won't be able to find the files it needs.- Output Files will also show up in the same folder as the
PBS
script - The
module load
line loads IDBA - The
idba_ud
line executes IDBA. Note, this is just a general template. More info on IDBA can be found by runningidba
without any arguments. - There are a couple important things to point out in the execution line:
- It is placed after the
module load
line, as well as thecd $PBS_O_WORKDIR
line --num_threads 16
is set to the same number of processors requested in thePBS
script (16).
- It is placed after the
Submit Job and Check Status¶
- Make sure you're in the directory that contains the
PBS
script, the sequence files, and any other files you need. - Submit with
qsub <pbs script name>
. In this caseqsub idba.pbs
or whatever you called thePBS
script. You can name thePBS
scripts whatever you want, just keep the.pbs
at the end - Check job status with
qstat -u username3 -n
, replacing "username3" with your gt username - You can delete the job with
qdel 22182721
, replacing the number with the jobid returned after running qsub - Depending on the resources requested and queue the job is run on, it may take varying amounts of time for the job to start. To estimate the time until the job executes, run
showstart 22182721
, replacing the number with the jobid returned after running qsub. More helpful commands can be found in this guide
Collecting Results¶
- After the job finishes running, all files created will be in the same folder where your
PBS
script is (same directory you ranqsub
from) - The
.out
file will be found here as well. It contains the results of the job, as well as diagnostics and a report of resources used during the job. If the job fails or doesn't produce the result your were hoping for, the.out
file is a great debugging tool. - You can transfer the resulting files off the cluster using scp or a file transfer service