Updated 2023-03-31
Run MAFFT on the Cluster¶
Overview¶
- Multiple Alignment using Fast Fourier Transform is used to create multiple sequence alignments of amino acid or nucleotide sequences.
- This guide will cover how to run MAFFT on the cluster.
Tips¶
- MAFFT can be run interactively by loading the module and running
mafft
to start the interactive shell. However, for the purposes of this guide we will cover how to run it batch-style. - If you would like to see a video tutorial of running MAFFT interactively, you can watch this video which inspired the example covered in this guide.
- You can find more information about different options available when running mafft on the man page.
Walkthrough: Run MAFFT on the Cluster¶
- This walkthrough will take an example fasta file and perform a multiple sequence alignment.
- The
example.fasta
file can be found here SBATCH
script can be found here- You can transfer the files to your account on the cluster to follow along. The file transfer guide may be helpful.
Part 1: The SBATCH Script¶
#!/bin/bash
#SBATCH -JtestMAFFT
#SBATCH -A [Account]
#SBATCH -N1 --ntasks-per-node=2
#SBATCH --mem-per-cpu=2G
#SBATCH -t1
#SBATCH -qinferno
#SBATCH -oReport-%j.out
cd $SLURM_SUBMIT_DIR
module load intel/20.0.4
module load mafft/7.481
srun mafft-linsi example.fasta > example.aln
- The
#SBATCH
directives are standard, requesting just 1 minute of walltime and 1 node with 2 cores. More on#SBATCH
directives can be found in the Using Slurm on Phoenix Guide $SLURM_SUBMIT_DIR
is a variable that represents the directory you submit the SBATCH script from. Make sure the files you want to use (in this case justexample.fasta
) are in the same directory you put the SBATCH script in.- Output Files (
example.aln
) will also show up in this dir as well module load mafft/7.245
loads the 7.245 version of MAFFT. To see what MAFFT versions are available, runmodule avail mafft
, and load the one you want.mafft-linsi example.fasta > example.aln
runs the L-INS-i multiple alignment method onexample.fasta
to produceexample.aln
. The general format of running mafft ismafft [arguments] input > output
.
Part 2: Submit Job and Check Status¶
- Make sure you're in the directory that contains the SBATCH Script as well as the
example.fasta
- Submit as normal, with
<sbatch scriptname.sbatch>
. In this casesbatch mafft.sbatch
- Check job status with
squeue --job <jobID>
, replacing with the jobid returned after running sbatch - You can delete the job with
scancel <jobID>
, replacing with the jobid returned after running sbatch
Part 3: Collecting Results¶
- In the directory where you submitted the
SBATCH
script, you should see aReport-<jobID>.out
file which contains the command line output and anexample.aln
file which contains the multiple sequence alignment done by mafft onexample.fasta
. - The
Report-<jobID>.out
file should look like this: - The
example.aln
file should look like this. - After the result files are produced, you can move the files off the cluster, refer to the file transfer guide for help.
- Congratulations! You successfully ran MAFFT on the cluster.