Updated 2023-03-31

Run MAFFT on the Cluster

Overview

  • Multiple Alignment using Fast Fourier Transform is used to create multiple sequence alignments of amino acid or nucleotide sequences.
  • This guide will cover how to run MAFFT on the cluster.

Tips

  • MAFFT can be run interactively by loading the module and running mafft to start the interactive shell. However, for the purposes of this guide we will cover how to run it batch-style.
  • If you would like to see a video tutorial of running MAFFT interactively, you can watch this video which inspired the example covered in this guide.
  • You can find more information about different options available when running mafft on the man page.

Walkthrough: Run MAFFT on the Cluster

  • This walkthrough will take an example fasta file and perform a multiple sequence alignment.
  • The example.fasta file can be found here
  • SBATCH script can be found here
  • You can transfer the files to your account on the cluster to follow along. The file transfer guide may be helpful.

Part 1: The SBATCH Script

#!/bin/bash
#SBATCH -JtestMAFFT
#SBATCH -A [Account] 
#SBATCH -N1 --ntasks-per-node=2
#SBATCH --mem-per-cpu=2G
#SBATCH -t1
#SBATCH -qinferno
#SBATCH -oReport-%j.out


cd $SLURM_SUBMIT_DIR
module load intel/20.0.4
module load mafft/7.481

srun mafft-linsi example.fasta > example.aln
  • The #SBATCH directives are standard, requesting just 1 minute of walltime and 1 node with 2 cores. More on #SBATCH directives can be found in the Using Slurm on Phoenix Guide
  • $SLURM_SUBMIT_DIR is a variable that represents the directory you submit the SBATCH script from. Make sure the files you want to use (in this case just example.fasta) are in the same directory you put the SBATCH script in.
  • Output Files (example.aln) will also show up in this dir as well
  • module load mafft/7.245 loads the 7.245 version of MAFFT. To see what MAFFT versions are available, run module avail mafft, and load the one you want.
  • mafft-linsi example.fasta > example.aln runs the L-INS-i multiple alignment method on example.fasta to produce example.aln. The general format of running mafft is mafft [arguments] input > output.

Part 2: Submit Job and Check Status

  • Make sure you're in the directory that contains the SBATCH Script as well as the example.fasta
  • Submit as normal, with <sbatch scriptname.sbatch>. In this case sbatch mafft.sbatch
  • Check job status with squeue --job <jobID>, replacing with the jobid returned after running sbatch
  • You can delete the job with scancel <jobID> , replacing with the jobid returned after running sbatch

Part 3: Collecting Results

  • In the directory where you submitted the SBATCH script, you should see a Report-<jobID>.out file which contains the command line output and an example.aln file which contains the multiple sequence alignment done by mafft on example.fasta.
  • The Report-<jobID>.out file should look like this:
  • The example.aln file should look like this.
  • After the result files are produced, you can move the files off the cluster, refer to the file transfer guide for help.
  • Congratulations! You successfully ran MAFFT on the cluster.