Updated 2023-02-07
Run Bedtools on the Cluster¶
Summary¶
- Bedtools contains a multitude of tools used for genomic analysis and allows for functionality such as shuffling, merging, and intersecting genomic intervals.
- Use
module avail bedtools
to see all the available versions of bedtools on the cluster - To load bedtools in your
SLURM
script:- Load its dependent module first with
module load gcc/4.9.0
- Load bedtools with
module load bedtools/2.25
. Replace the version number with any version you prefer that is available on the cluster (found withmodule avail
)
- Load its dependent module first with
- To run bedtools:
- In your
SLURM
script, put all lines executing bedtools after themodule load
lines that load SLURMbedtools - Example: using the intersect tool, the line
bedtools intersect -a <filename>.bed -b genes.bed
would go in theSLURM
script after the lines that load the correct modules for bedtools.
- In your
Example SLURM Script¶
#!/bin/bash
#SBATCH -J bedtoolsTest
#SBATCH -A [Account]
#SBATCH -N 2 --ntasks-per-node=4
#SBATCH -t 20
#SBATCH -p inferno
#SBATCH -o bedtoolsResult.out
cd $SLURM_SUBMIT_DIR
module load gcc/4.9.0
module load bedtools/2.30.0
bedtools intersect -a reads.bed -b genes.bed
- The
SLURM
directives are standard, requesting 20 min of walltime and 2 nodes with 4 cores per node.
Note
If using $SLURM_SUBMIT_DIR
,the .bed
files, as well as any other files required for the job, must be stored in the same folder as the SBATCH
script
$SLURM_SUBMIT_DIR
is simply a variable that represents the directory you submit the SLURM script from. Make sure the.bed
files, and any other files you need are in the same directory you put theSLURM
script in. This line tells the cluster to enter this directory where you have stored theSBATCH
script, and look for all the files for the job. If you use$SLURM_SUBMIT_DIR
, you need to have all your files in the same folder as yourSBATCH
script otherwise the cluster won't be able to find the files it needs.- Output Files: Any files generated by the job will also show up in the same directory as the
SLURM
script. - The
module load
lines load bedtools and its dsbatchependent module - The
bedtools intersect
line is just a general example showing how bedtools might be executed. The line is from the bedtools documentation which includes much more information on the capabilities of bedtools. The point is to show that the execution lines must be included after:- Entering the correct folder with all the files and
SLURM
script, in this case achieved withcd $SLURM_SUBMIT_DIR
- bedtools is loaded with the
module load
lines
- Entering the correct folder with all the files and
Submit Job and Check Status¶
- Make sure you're in the directory that contains the
SLURM
script, the sequence files, and any other files you need. - Submit with
sbatch <SLURM script name>
. In this casesbatch bedtools.SLURM
or whatever you called theSLURM
script. You can name theSLURM
scripts whatever you want, just keep the.SLURM
at the end - Check job status with
squeue -u username3 -n
, replacing "username3" with your gt username - You can delete the job with
scancel 22182721
, replacing the number with the jobid returned after running sbatch - Depending on the resources requested and queue the job is run on, it may take varying amounts of time for the job to start. To estimate the time until the job executes, run
showstart 22182721
, replacing the number with the jobid returned after running sbatch. More helpful commands can be found in this guide
Collecting Results¶
- All files created will be in the same folder where your
SLURM
script is (same directory you ransbatch
from) - The
.out
file will be found here as well. It contains the results of the job, as well as diagnostics and a report of resources used during the job. If the job fails or doesn't produce the result your were hoping for, the.out
file is a great debugging tool. - You can transfer the resulting files off the cluster using scp or a file transfer service