Updated 2021-05-17
Run MAFFT on the Cluster¶
Overview¶
- Multiple Alignment using Fast Fourier Transform is used to create multiple sequence alignments of amino acid or nucleotide sequences.
- This guide will cover how to run MAFFT on the cluster.
Tips¶
- MAFFT can be run interactively by loading the module and running
mafft
to start the interactive shell. However, for the purposes of this guide we will cover how to run it batch-style. - If you would like to see a video tutorial of running MAFFT interactively, you can watch this video which inspired the example covered in this guide.
- You can find more information about different options available when running mafft on the man page.
Walkthrough: Run MAFFT on the Cluster¶
- This walkthrough will take an example fasta file and perform a multiple sequence alignment.
- The
example.fasta
file can be found here - PBS script can be found here
- You can transfer the files to your account on the cluster to follow along. The file transfer guide may be helpful.
Part 1: The PBS Script¶
#PBS -N testMAFFT
#PBS -A [Account]
#PBS -l nodes=1:ppn=2
#PBS -l pmem=2gb
#PBS -l walltime=1:00
#PBS -q inferno
#PBS -j oe
#PBS -o testMAFFT.out
cd $PBS_O_WORKDIR
module load mafft/7.245
mafft-linsi example.fasta > example.aln
- The
#PBS
directives are standard, requesting just 1 minute of walltime and 1 node with 2 cores. More on#PBS
directives can be found in the PBS guide $PBS_O_WORKDIR
is a variable that represents the directory you submit the PBS script from. Make sure the files you want to use (in this case justexample.fasta
) are in the same directory you put the PBS script in.- Output Files (
example.aln
) will also show up in this dir as well module load mafft/7.245
loads the 7.245 version of MAFFT. To see what MAFFT versions are available, runmodule avail mafft
, and load the one you want.mafft-linsi example.fasta > example.aln
runs the L-INS-i multiple alignment method onexample.fasta
to produceexample.aln
. The general format of running mafft ismafft [arguments] input > output
.
Part 2: Submit Job and Check Status¶
- Make sure you're in the dir that contains the
PBS
Script as well asexample.fasta
- Submit as normal, with
qsub <pbs script name>
. In this caseqsub mafft.pbs
- Check job status with
qstat -t 22182721
, replacing the number with the job id returned after running qsub - You can delete the job with
qdel 22182721
, again replacing the number with the jobid returned after running qsub
Part 3: Collecting Results¶
- In the directory where you submitted the
PBS
script, you should see atestMAFFT.out
file which contains the command line output and anexample.aln
file which contains the multiple sequence alignment done by mafft onexample.fasta
. - The
testMAFFT
file should look like this:
nseq = 5
distance = local
iterate = 16
cycle = 1
nthread = 0
lastonce = 0
done.
scoremtx = 1
0 / 5
1 / 5
2 / 5
3 / 5
##### writing hat3
pairlocalalign (aa) Version 7.245 alg=L, model=BLOSUM62, 2.00, -0.10, +0.10, noshift, amax=0.0
0 thread(s)
minimumweight = 0.000500
nthread = 0
blosum 62 / kimura 200
sueff_global = 0.100000
Loading 'hat3' ...
done.
done.
scoremtx = 1
Gap Penalty = -1.53, +0.00, +0.00
Loading 'hat2' ... done.
Constructing a UPGMA tree ...
0 / 5
done.
Progressive alignment ...
STEP 1 /4 c
STEP 2 /4 c
STEP 3 /4 c
STEP 4 /4 c
done.
tbfast (aa) Version 7.245 alg=A, model=BLOSUM62, 1.53, -0.00, -0.00, noshift, amax=0.0
0 thread(s)
minimumweight = 0.000500
autosubalignment = 0.000000
nthread = 0
randomseed = 0
blosum 62 / kimura 200
poffset = 0
niter = 16
sueff_global = 0.100000
Loading 'hat3' ... done.
done.
scoremtx = 1
0 / 5
Segment 1/ 1 1- 883
STEP 001-001-0 accepted.
STEP 001-001-1 identical.
STEP 001-002-0 accepted.
STEP 001-002-1 identical.
STEP 001-003-0 accepted.
STEP 001-003-1 accepted.
STEP 001-004-1 accepted.
STEP 002-004-1 identical.
STEP 002-003-0 identical.
STEP 002-003-1 identical.
STEP 002-002-0 rejected.
STEP 002-002-1 identical.
STEP 002-001-0 accepted.
STEP 002-001-1 identical.
STEP 003-001-0 identical.
STEP 003-001-1 identical.
STEP 003-002-0 rejected.
STEP 003-002-1 identical.
STEP 003-003-0 identical.
STEP 003-003-1 identical.
STEP 003-004-1 identical.
STEP 004-004-1 identical.
Oscillating.
done
dvtditr (aa) Version 7.245 alg=A, model=BLOSUM62, 1.53, -0.00, -0.00, noshift, amax=0.0
0 thread(s)
Strategy:
L-INS-i (Probably most accurate, very slow)
Iterative refinement method (<16) with LOCAL pairwise alignment information
If unsure which option to use, try 'mafft --auto input > output'.
For more information, see 'mafft --help', 'mafft --man' and the mafft page.
The default gap scoring scheme has been changed in version 7.110 (2013 Oct).
It tends to insert more gaps into gap-rich regions than previous versions.
To disable this change, add the --leavegappyregion option.
---------------------------------------
Begin PBS Epilogue Wed Jun 26 11:39:17 EDT 2019
Job ID: 26178539.shared-sched.pace.gatech.edu
User ID: svemuri8
Job name: testMAFFT
Resources: neednodes=1:ppn=2,nodes=1:ppn=2,pmem=2gb,walltime=00:01:00
Rsrc Used: cput=00:00:00,energy_used=0,mem=0kb,vmem=0kb,walltime=00:00:00
Queue: inferno
Nodes:
rich133-k37-27-r.pace.gatech.edu
End PBS Epilogue Wed Jun 26 11:39:17 EDT 2019
---------------------------------------
- The
example.aln
file should look like this. - After the result files are produced, you can move the files off the cluster, refer to the file transfer guide for help.
- Congratulations! You successfully ran MAFFT on the cluster.