Updated 2023-05-03
FastTree¶
License¶
FastTree on PACE uses the Georgia Tech license, for which an annual access fee is required per user. Visit documentation from CoE software for more information about access.
Overview¶
FastTree infers approximately-maximum-likelihood phylogenetic trees from alignments of nucleotide or protein sequences. FastTree can handle alignments with up to a million of sequences in a reasonable amount of time and memory. For large alignments, FastTree is 100-1,000 times faster than PhyML 3.0 or RAxML 7. FastTree is open-source software -- you can download the code below. FastTree is more accurate than PhyML 3 with default settings, and much more accurate than the distance-matrix methods that are traditionally used for large alignments. FastTree uses the Jukes-Cantor or generalized time-reversible (GTR) models of nucleotide evolution and the JTT (Jones-Taylor-Thornton 1992), WAG (Whelan & Goldman 2001), or LG (Le and Gascuel 2008) models of amino acid evolution. To account for the varying rates of evolution across sites, FastTree uses a single rate for each site (the "CAT" approximation). To quickly estimate the reliability of each split in the tree, FastTree computes local support values with the Shimodaira-Hasegawa test (these are the same as PhyML 3's "SH-like local supports").
Running FastTree Interactively¶
Allocating Resources¶
-
In order to run FastTree interactivley we can use the
salloc
command to specify the account, partitions, time, and queue -
Here is an example of an
salloc
command you can use:salloc -A [Account] -N 1 -n 8 -t 15 -q embers
-
Note, we are using FastTreeMP on our current system so it is able to handle multiple CPU's
-
This will allocate the proper resources to run FastTree
Using an Example Alignment File¶
-
The following example will show running FastTree interactively using a example alignment file
-
Here is what an Alignment File should look like, named
alignmentfile
:
>44919 AF310437-1 Neisseria meningitidis str- M1976
-TTGAACGCTGGCGGCATGCTTTACACATGCAAGTCGGACG-AGTGGCGAACGGGTGAGTAACATATCGGA-
ACGTACCGAGTAGTGGGGGATAACTGATCGAAAGATCAGCTAATACCGCATACTATTCGAGCGGCCGATATCTGATTAGCTAGT
TGGTGGGGTAAAGGCCTACCAAGGCGACGATCAGTAGCGGGTCTGAGAGGATGATCCGCCACACTGGGACTGAGACACGGCCCA
GACTCCTACGGGAGGCAGCAGTGGGGAATTTTGGACAATGGGCGCAAGCCTGATCCAGCCATGCCGCGTGTCTGAAGAAGGCCT
TCGGGTTGTAAAGGACTTTTGTCAGGGAAGCGGTACCTGAAGAATAAGCACCGGCTAAC-
TACGTGCCAGCAGCCGCGGTAATACGTAGGGTGCGAGCGTTAATCGGAATTACTGGGCGTAAAGCGGGCGCAGACGGTTACTTA
AGCAGGATGTGAAATCCCCGGGCTCAACCCGGGAACTGCGTTCTGAACTGGGTGACTCGAGTGTGTCAGAGGGAGGTAGAATTC
CACGTGTAGCAGTGAAATGCGTAGAGATGTGGAGGAATACCGATGGCGAAGGCAGCCTCCTGGGACAACACTGACGTTCATGCC
CGAAAGCGTGGGTAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCCTAAACGATGTCAATTAGCTGTTGGGTAGTAGCGTA
GCTAACGCGTGAAATTGACCGCCTGGGGAGTACGGTCGCAAGATTAAAACTCAAAGGAATTGACGGGGACCCGCACAAGCGGTG
GATGATGTGGATTAATTCGATGCAACGCGAAGAACCTTACCTGGTCTTGACATGCAGGTGCTGCATGGCTGTCGTCAGCTCGTG
TCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTGTCATTAGTTGCCACTCTAATGAGACTGCCGGTGACAAGCCG
GAGGAAGGTGGGGATGACGTCAAGTCCTCATGGCCCTTATGACCAGGGCTTCACACGTCATACAATGGTCGGTACAGAGGGTAG
CCAAGCCGCGAGGCGGAGCCAATCTCACAAAACCGATCGTAGTCCGGATTGCACTCTGCAACTCGAGTGCATGAAGTCGGAATC
GCTAGTAATCGCAGGTCAGCATACTGCGGTGAATACGTTCCCGGGTCTTGTACACACCGCCCGTCACACCATGGGAGTGGGGGA
TACCAGAAGTAGGTTACCACGGTATGCTTCATGACTGGGGTGAAGTCGTAA------------
>87220 AY289925-1 Acinetobacter calcoaceticus ADP1
------------------------------------------AGCGGCGGACGGGTGAGTAATACTTAGGA-
ATCTGCCTATTAGTGGGGGACAACATCTCGAAAGGGATGCTAATACCGCATACTAATAGATGAGCCTAAGTCGGATTAGCTAGT
TGGTGGGGTAAAGGCCT-
CCAAGGCGACGATCTGTAGCGGGTCTGAGAGGATGATCCGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAG
CAGTGGGGAATATTGGACAATGGGGGGAACCCTGATCCAGCCATGCCGCGTGTGTGAAGAAGGCCTTATGGTTGTAAAGCACTT
TAAGCGAGGAGGCGTTACTCGCAGAATAAGCACCGGCTAAC-
TCTGTGCCAGCAGCCGCGGTAATACAGAGGGTGCAAGCGTTAATCGGATTTACTGGGCGTAAAGCGCGCGTAGGCGGCCAATTA
AGTCAAATGTGAAATCCCCGAGCTTAACTTGGGAATTGCATTCGATACTGTTTGGCTAGAGTGTGGGAGAGGATGGTAGAATTC
CAGGTGTAGCGGTGAAATGCGTANAGATCTGGAGGAATACCGATGGCGAAGGCAGCCATCTGGCCTAACACTGACNCTGAGGTG
CGAAAGCATGGGGAGCAAACAGGATTANATACCCTGGTAGTCCATGCCGTAAACGATGTCTACTAGCCGTTGGGTAGTGGCGCA
GCTAACGCGATAAGTAGACCGCCTGGGGAGTACGGTCGCAAGACTAAAACTCAAATGAATTGACGGGGGCCCGCACAAGCGGTG
GAGCATGTGGTTTAATTCGATGCAACGCGAAGAACCTTACCTGGCCTTGACATACAGGTGCTGCATGGCTGTCGTCAGCTCGTG
TCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTTTCCTTACTTGCAACTTTAAGGATACTGCCAGTGACAAACTG
GAGGAAGGCGGGGACGACGTCAAGTCATCATGGCCCTTACGGCCAGGGCTACACACGTGCTACAATGGTCGGTACAAAGGGTTG
CTACCTAGCGATAGGATGCTAATCTCAAAAAGCCGATCGTAGTCCGGATTGGAGTCTGCAACTCGACTCCATGAAGTCGGAATC
GCTAGTAATCGCGGATCAG-
ATGCCGCGG---------------------------------------------------------------------------
...
...
FastTree
FastTree Version 2.1.10 SSE3, OpenMP (1 threads)
Alignment: alignmentfile
Amino acid distances: BLOSUM45 Joins: balanced Support: SH-like 1000
Search: Normal +NNI +SPR (2 rounds range 10) +ML-NNI opt-each=1
TopHits: 1.00*sqrtN close=default refresh=0.80
ML Model: Jones-Taylor-Thorton, CAT approximation with 20 rate categories
WARNING! 100.0% NUCLEOTIDE CHARACTERS -- IS THIS REALLY A PROTEIN ALIGNMENT?
Initial topology in 0.05 seconds
Refining topology: 23 rounds ME-NNIs, 2 rounds ME-SPRs, 11 rounds ML-NNIs
Total branch-length 4.597 after 0.60 sec3, 1 of 48 splits
ML-NNI round 1: LogLk = -38446.726 NNIs 10 max delta 42.18 Time 1.66
Switched to using 20 rate categories (CAT approximation)16 of 20
Rate categories were divided by 0.926 so that average rate = 1.0
CAT-based log-likelihoods may not be comparable across runs
Use -gamma for approximate but comparable Gamma(20) log-likelihoods
ML-NNI round 2: LogLk = -34660.022 NNIs 5 max delta 27.09 Time 3.05
ML-NNI round 3: LogLk = -34641.621 NNIs 2 max delta 16.08 Time 3.60
ML-NNI round 4: LogLk = -34641.467 NNIs 0 max delta 0.00 Time 3.96
Turning off heuristics for final round of ML NNIs (converged)
ML-NNI round 5: LogLk = -34641.321 NNIs 0 max delta 0.00 Time 4.81 (final)
Optimize all lengths: LogLk = -34641.321 Time 5.08
Total time: 5.81 seconds Unique: 50/50 Bad splits: 0/47
Running FastTree in Batch Mode¶
- We can also test this in a normal batch mode. Here is an example batch script:
#!/bin/bash
#SBATCH -J fasttreejob
#SBATCH -A [Account]
#SBATCH -N 1 -n 8
#SBATCH -t 15
#SBATCH -q embers
#SBATCH -o Report-%j.out
#SBATCH -e Report-%j.err
cd $SLURM_SUBMIT_DIR
module load fasttree
FastTreeMP alignmentfile > treefile
Submitting job and collecting results¶
- Make sure your're in the folder where the SBATCH Script is located, and run
sbatch <scriptName.sbatch> #ex: sbatch ex_FastTree.sbatch
-
If successful, this will print something like
Submitted batch job 1280063
-
The number in the beginning is the job id, useful for checking predicted wait time in queue or job status
-
After a couple seconds, find estimated wait time in queue and job status with
squeue -u ACCOUNTNAME
-
Any files created by the script will show up in the folder where the script was ran (unless otherwise programmed)
- The output file will be found by typing ls and looking for the output file you named in the SBATCH script, in this case something like
Report-1280063.out
- To see the contents of the resulting file, run
vi <output file name> #ex: vi treefile
- This is what you can expect as an output of running FastTree in Batch mode with an Alignment File:
---------------------------------------
Begin Slurm Prolog: Apr-28-2023 01:11:35
Job ID: 1774364
User ID: gburdell3
Account: [Account]
Job name: fasttreeJob
Partition: cpu-small
QOS: embers
---------------------------------------
FastTree Version 2.1.10 SSE3, OpenMP (1 threads)
Alignment: alignmentfile
Amino acid distances: BLOSUM45 Joins: balanced Support: SH-like 1000
Search: Normal +NNI +SPR (2 rounds range 10) +ML-NNI opt-each=1
TopHits: 1.00*sqrtN close=default refresh=0.80
ML Model: Jones-Taylor-Thorton, CAT approximation with 20 rate categories
WARNING! 100.0% NUCLEOTIDE CHARACTERS -- IS THIS REALLY A PROTEIN ALIGNMENT?
Initial topology in 0.05 seconds
Refining topology: 23 rounds ME-NNIs, 2 rounds ME-SPRs, 11 rounds ML-NNIs
Total branch-length 4.597 after 0.60 sec3, 1 of 48 splits
ML-NNI round 1: LogLk = -38446.726 NNIs 10 max delta 42.18 Time 1.66
Switched to using 20 rate categories (CAT approximation)16 of 20
Rate categories were divided by 0.926 so that average rate = 1.0
CAT-based log-likelihoods may not be comparable across runs
Use -gamma for approximate but comparable Gamma(20) log-likelihoods
ML-NNI round 2: LogLk = -34660.022 NNIs 5 max delta 27.09 Time 3.05
ML-NNI round 3: LogLk = -34641.621 NNIs 2 max delta 16.08 Time 3.60
ML-NNI round 4: LogLk = -34641.467 NNIs 0 max delta 0.00 Time 3.96
Turning off heuristics for final round of ML NNIs (converged)
ML-NNI round 5: LogLk = -34641.321 NNIs 0 max delta 0.00 Time 4.81 (final)
Optimize all lengths: LogLk = -34641.321 Time 5.08
Total time: 5.81 seconds Unique: 50/50 Bad splits: 0/47
---------------------------------------
Begin Slurm Epilog: Apr-28-2023 01:11:42
Job ID: 1774364
Array Job ID: _4294967294
User ID: gburdell3
Account: [Account]
Job name: fasttreeJob
Resources: cpu=1,mem=1G,node=1
Rsrc Used: cput=00:00:10,vmem=1748K,walltime=00:00:10,mem=0,energy_used=0
Partition: cpu-small
QOS: embers
Nodes: atl1-1-02-018-14-2
---------------------------------------
- Congratulations! you have succesfully run an alignment file using FastTree on the cluster.