Updated 2019-02-11

Run Blat on the Cluster

Summary

  • Blat is a suite of programs that produces alignments at different DNA levels
  • The Blat suite is available on the cluster by loading the Blat module
  • Use module avail blat to see all the available versions of blat on the cluster
  • To load bedtools in your PBS script:
    • Load with module load blat/35. Replace the version with any version you prefer that is available on the cluster (found with module avail blat)
  • To run bedtools:
    • In your PBS script, put all lines execuing blat or one of blat's programs after the module load lines that load blat
    • Example: If you wanted to use pslReps, the line pslReps in.psl out.psl out.psr would go in the PBS script after the lines that load the correct module for blat.

Example PBS Script

#PBS -N blatTest
#PBS -l nodes=1:ppn=16
#PBS -l walltime=1:00:00
#PBS -q iw-shared-6
#PBS -j oe
#PBS -o blatResult.out

cd $PBS_O_WORKDIR
module load blat/35
pslReps in.psl out.psl out.psr
  • The #PBS directives are standard, requesting 1 hour of walltime and 1 node with 16 cores per node. More on #PBS directives can be found in the PBS guide

Note

If using $PBS_O_WORKDIR, the .psl files, as well as any other files required for the job, must be stored in the same folder as the PBS script

  • $PBS_O_WORKDIR is simply a variable that represents the folder you submit the PBS script from. Make sure the .psl files, and any other files you need are in the same folder you put the PBS script in. This line tells the cluster to enter this directory where you have stored the PBS script, and look for all the files for the job. If you use $PBS_O_WORKDIR, you need to have all your files in the same folder as your PBS script otherwise the cluster won't be able to find the files it needs.
  • The module load line loads blat
  • The pslReps line is just a general example showing how a program in the blat suite might be executed. The line is from the blat user guide which includes much more information on the capabilities of blat. The point is to show that the execution lines must be included after:
    • Entering the correct folder with all the files, data, and PBS script, in this case achieved with cd $PBS_O_WORKDIR
    • blat is loaded with the module load line

Submit Job and Check Status

  • Make sure you're in the directory that contains the PBS script, the sequence files, and any other files you need.
  • Submit with qsub <pbs script name>. In this case qsub blat.pbs or whatever you called the PBS script. You can name the PBS scripts whatever you want, just keep the .pbs at the end
  • Check job status with qstat -u username3 -n, replacing "username3" with your gt username
  • You can delete the job with qdel 22182721, replacing the number with the jobid returned after running qsub
  • Depending on the resources requested and queue the job is run on, it may take varying amounts of time for the job to start. To estimate the time until the job executes, run showstart 22182721, replacing the number with the jobid returned after running qsub. More helpful commands can be found in this guide

Collecting Results

  • After the job finishes running, all files created will be in the same folder where your PBS script is (same directory you ran qsub from)
  • The .out file will be found here as well. It contains the results of the job, as well as diagnostics and a report of resources used during the job. If the job fails or doesn't produce the result your were hoping for, the .out file is a great debugging tool.
  • You can transfer the resulting files off the cluster using scp or a file transfer service