Run MAFFT on the Cluster¶
- Multiple Alignment using Fast Fourier Transform is used to create multiple sequence alignments of amino acid or nucleotide sequences.
- This guide will cover how to run MAFFT on the cluster.
- MAFFT can be run interactively by loading the module and running
mafftto start the interactive shell. However, for the purposes of this guide we will cover how to run it batch-style.
- If you would like to see a video tutorial of running MAFFT interactively, you can watch this video which inspired the example covered in this guide.
- You can find more information about different options available when running mafft on the man page.
Walkthrough: Run MAFFT on the Cluster¶
- This walkthrough will take an example fasta file and perform a multiple sequence alignment.
example.fastafile can be found here
SBATCHscript can be found here
- You can transfer the files to your account on the cluster to follow along. The file transfer guide may be helpful.
Part 1: The SBATCH Script¶
#!/bin/bash #SBATCH -JtestMAFFT #SBATCH -A [Account] #SBATCH -N1 --ntasks-per-node=2 #SBATCH --mem-per-cpu=2G #SBATCH -t1 #SBATCH -qinferno #SBATCH -oReport-%j.out cd $SLURM_SUBMIT_DIR module load intel/20.0.4 module load mafft/7.481 srun mafft-linsi example.fasta > example.aln
#SBATCHdirectives are standard, requesting just 1 minute of walltime and 1 node with 2 cores. More on
#SBATCHdirectives can be found in the Using Slurm on Phoenix Guide
$SLURM_SUBMIT_DIRis a variable that represents the directory you submit the SBATCH script from. Make sure the files you want to use (in this case just
example.fasta) are in the same directory you put the SBATCH script in.
- Output Files (
example.aln) will also show up in this dir as well
module load mafft/7.245loads the 7.245 version of MAFFT. To see what MAFFT versions are available, run
module avail mafft, and load the one you want.
mafft-linsi example.fasta > example.alnruns the L-INS-i multiple alignment method on
example.aln. The general format of running mafft is
mafft [arguments] input > output.
Part 2: Submit Job and Check Status¶
- Make sure you're in the directory that contains the SBATCH Script as well as the
- Submit as normal, with
<sbatch scriptname.sbatch>. In this case
- Check job status with
squeue --job <jobID>, replacing with the jobid returned after running sbatch
- You can delete the job with
scancel <jobID>, replacing with the jobid returned after running sbatch
Part 3: Collecting Results¶
- In the directory where you submitted the
SBATCHscript, you should see a
Report-<jobID>.outfile which contains the command line output and an
example.alnfile which contains the multiple sequence alignment done by mafft on
Report-<jobID>.outfile should look like this:
example.alnfile should look like this.
- After the result files are produced, you can move the files off the cluster, refer to the file transfer guide for help.
- Congratulations! You successfully ran MAFFT on the cluster.