Updated 2019-06-26

Run MAFFT on the Cluster

Overview

  • Multiple Alignment using Fast Fourier Transform is used to create multiple sequence alignments of amino acid or nucleotide sequences.
  • This guide will cover how to run MAFFT on the cluster.

Tips

  • MAFFT can be run interactively by loading the module and running mafft to start the interactive shell. However, for the purposes of this guide we will cover how to run it batch-style.
  • If you would like to see a video tutorial of running MAFFT interactively, you can watch this video which inspired the example covered in this guide.
  • You can find more information about different options available when running mafft on the man page.

Walkthrough: Run MAFFT on the Cluster

  • This walkthrough will take an example fasta file and perform a multiple sequence alignment.
  • The example.fasta file can be found here
  • PBS script can be found here
  • You can transfer the files to your account on the cluster to follow along. The file transfer guide may be helpful.

Part 1: The PBS Script

#PBS -N testMAFFT
#PBS -l nodes=1:ppn=2
#PBS -l pmem=2gb
#PBS -l walltime=1:00
#PBS -q force-6
#PBS -j oe
#PBS -o testMAFFT.out

cd $PBS_O_WORKDIR
module load mafft/7.245

mafft-linsi example.fasta > example.aln

  • The #PBS directives are standard, requesting just 1 minute of walltime and 1 node with 2 cores. More on #PBS directives can be found in the PBS guide
  • $PBS_O_WORKDIR is a variable that represents the directory you submit the PBS script from. Make sure the files you want to use (in this case just example.fasta) are in the same directory you put the PBS script in.
  • Output Files (example.aln) will also show up in this dir as well
  • module load mafft/7.245 loads the 7.245 version of MAFFT. To see what MAFFT versions are available, run module avail mafft, and load the one you want.
  • mafft-linsi example.fasta > example.aln runs the L-INS-i multiple alignment method on example.fasta to produce example.aln. The general format of running mafft is mafft [arguments] input > output.

Part 2: Submit Job and Check Status

  • Make sure you're in the dir that contains the PBS Script as well as example.fasta
  • Submit as normal, with qsub <pbs script name>. In this case qsub mafft.pbs
  • Check job status with qstat -t 22182721, replacing the number with the job id returned after running qsub
  • You can delete the job with qdel 22182721 , again replacing the number with the jobid returned after running qsub

Part 3: Collecting Results

  • In the directory where you submitted the PBS script, you should see a testMAFFT.out file which contains the command line output and an example.aln file which contains the multiple sequence alignment done by mafft on example.fasta.
  • The testMAFFT file should look like this:

nseq =  5
distance =  local
iterate =  16
cycle =  1
nthread = 0
lastonce = 0
done.
scoremtx = 1
    0 / 5
    1 / 5
    2 / 5
    3 / 5

##### writing hat3
pairlocalalign (aa) Version 7.245 alg=L, model=BLOSUM62, 2.00, -0.10, +0.10, noshift, amax=0.0
0 thread(s)
minimumweight = 0.000500
nthread = 0
blosum 62 / kimura 200
sueff_global = 0.100000
Loading 'hat3' ... 
done.
done.
scoremtx = 1
Gap Penalty = -1.53, +0.00, +0.00
Loading 'hat2' ... done.
Constructing a UPGMA tree ... 

    0 / 5
done.

Progressive alignment ... 

STEP     1 /4 c
STEP     2 /4 c
STEP     3 /4 c
STEP     4 /4 c
done.
tbfast (aa) Version 7.245 alg=A, model=BLOSUM62, 1.53, -0.00, -0.00, noshift, amax=0.0
0 thread(s)
minimumweight = 0.000500
autosubalignment = 0.000000
nthread = 0
randomseed = 0
blosum 62 / kimura 200
poffset = 0
niter = 16
sueff_global = 0.100000
Loading 'hat3' ... done.
done.
scoremtx = 1


    0 / 5
Segment   1/  1    1- 883
STEP 001-001-0  accepted.
STEP 001-001-1  identical.   
STEP 001-002-0  accepted.
STEP 001-002-1  identical.   
STEP 001-003-0  accepted.
STEP 001-003-1  accepted.
STEP 001-004-1  accepted.
STEP 002-004-1  identical.   
STEP 002-003-0  identical.   
STEP 002-003-1  identical.   
STEP 002-002-0  rejected.
STEP 002-002-1  identical.   
STEP 002-001-0  accepted.
STEP 002-001-1  identical.   
STEP 003-001-0  identical.   
STEP 003-001-1  identical.   
STEP 003-002-0  rejected.
STEP 003-002-1  identical.   
STEP 003-003-0  identical.   
STEP 003-003-1  identical.   
STEP 003-004-1  identical.   
STEP 004-004-1  identical.   
Oscillating.

done
dvtditr (aa) Version 7.245 alg=A, model=BLOSUM62, 1.53, -0.00, -0.00, noshift, amax=0.0
0 thread(s)


Strategy:
 L-INS-i (Probably most accurate, very slow)
 Iterative refinement method (<16) with LOCAL pairwise alignment information

If unsure which option to use, try 'mafft --auto input > output'.
For more information, see 'mafft --help', 'mafft --man' and the mafft page.

The default gap scoring scheme has been changed in version 7.110 (2013 Oct).
It tends to insert more gaps into gap-rich regions than previous versions.
To disable this change, add the --leavegappyregion option.

---------------------------------------
Begin PBS Epilogue Wed Jun 26 11:39:17 EDT 2019
Job ID:     26178539.shared-sched.pace.gatech.edu
User ID:    svemuri8
Job name:   testMAFFT
Resources:  neednodes=1:ppn=2,nodes=1:ppn=2,pmem=2gb,walltime=00:01:00
Rsrc Used:  cput=00:00:00,energy_used=0,mem=0kb,vmem=0kb,walltime=00:00:00
Queue:      force-6
Nodes:     
rich133-k37-27-r.pace.gatech.edu
End PBS Epilogue Wed Jun 26 11:39:17 EDT 2019
---------------------------------------
  • The example.aln file should look like this.
  • After the result files are produced, you can move the files off the cluster, refer to the file transfer guide for help.
  • Congratulations! You successfully ran MAFFT on the cluster.