Updated 2023-03-31

Run RAxML on the Cluster

Overview

  • RAxML (Randomized Axelerated Maximum Likelihood) is a program for sequential and parallel Maximum Likelihood based inference of large phylogenetic trees.
  • This guide will cover how to run RAxML on the Cluster.
  • This is the link to the RAxML Manual

Summary

  • To run RAxML, you can specify the follwing with the associated flags passed into raxmlHPC:
    • -m : Model of Binary (Morphological), Nucleotide, Multi-State, or Amino Acid Substitution.
    • -p : Specify a random number seed for the parsimony inferences. This allows you to reproduce your results and helps debug the program.
    • -s : Specify the name of the alignment data file in PHYLIP format.
    • -# : Specify the number of alternative runs on distinct starting trees
    • -n : Specifies the name of the output file.
  • You can find more information and more flags by running raxmlHPC -h after loading the requried modules on the Cluster.

Walkthrough: Run RAxML on the Cluster

  • This walkthrough will cover how to run an ML search on binary data in a PHYLIP file.
  • The example used in this walkthough plus many more can be found here.
  • binary.phy can be found here
  • SBATCH Script can be found here
  • You can transfer the files to your account on the cluster to follow along. The file transfer guide guide may be helpful.

Part 1: The SBATCH Script

#!/bin/bash
#SBATCH -JraxmlTest
#SBATCH -A [Account] 
#SBATCH -N1 --ntasks-per-node=2
#SBATCH --mem-per-cpu=2G
#SBATCH -t3
#SBATCH -qinferno
#SBATCH -oReport-%j.out

cd $SLURM_SUBMIT_DIR
module load gcc/10.3.0
module load mvapich2/2.3.6
module load raxml/8.2.12

raxmlHPC -m BINGAMMA -p 12345 -s binary.phy -# 20 -n T5
  • The #SBATCH directives are standard, requesting just 3 minutes of walltime and 1 node with 2 cores. More on #SBATCH directives can be found in the Using Slurm on Phoenix Guide
  • $SLURM_SUBMIT_DIR is a variable that represents the directory you submit the SBATCH script from. Make sure the files you want to use are in the same directory you put the SBATCH script.
  • Output Files will also show up in this dir as well
  • module load raxml/8.2.12 loads the 8.2.12 version of RAxML. To see what versions of a software are available, run module avail [Software], and load the one you want. The other modules are dependencies that must be loaded before RAxML is loaded.
  • raxmlHPC -m BINGAMMA -p 12345 -s binary.phy -# 20 -n T5 will have RAxML carry out 20 ML searches on 20 randomized stepwise addition parsimony trees.

Part 2: Submit Job and Check Status

  • Make sure you're in the dir that contains the SBATCH Script as well as the RAxML program
  • Submit as normal, with sbatch < script name>. In this case sbatch raxml.sbatch
  • Check job status with squeue --job <jobID>, replacing with the jobid returned after running sbatch
  • You can delete the job with scancel <jobID> , replacing with the jobid returned after running sbatch

Part 3: Collecting Results

  • In the directory where you submitted the SBATCH script, you should see a Report-<jobID>.out file which contains the results of the job, 20 RAxML_log.T5.Run.0 files, 20 RAxML_parsimonyTree.T5.RUN, 20 RAxML_result.T5.RUN files, a RAxML_info.T5 file, and a binary.phy.reduced file. Use cat or open the file in a text editor to take a look.
  • Report-<jobID>.out should look like this:
---------------------------------------
Begin Slurm Prolog: Dec-25-2022 22:38:26
Job ID:    231349
User ID:   svangala3
Account:   phx-pace-staff
Job name:  raxmlTest
Partition: cpu-small
QOS:       inferno
---------------------------------------


IMPORTANT WARNING: Sequences t2 and t3 are exactly identical


IMPORTANT WARNING: Sequences t2 and t4 are exactly identical

IMPORTANT WARNING
Found 2 sequences that are exactly identical to other sequences in the alignment.
Normally they should be excluded from the analysis.

Just in case you might need it, an alignment file with
sequence duplicates removed is printed to file binary.phy.reduced


This is RAxML version 8.2.12 released by Alexandros Stamatakis on May 2018.

With greatly appreciated code contributions by:
Andre Aberer      (HITS)
Simon Berger      (HITS)
Alexey Kozlov     (HITS)
Kassian Kobert    (HITS)
David Dao         (KIT and HITS)
Sarah Lutteropp   (KIT and HITS)
Nick Pattengale   (Sandia)
Wayne Pfeiffer    (SDSC)
Akifumi S. Tanabe (NRIFS)
Charlie Taylor    (UF)


Alignment has 19 distinct alignment patterns

Proportion of gaps and completely undetermined characters in this alignment: 1.36%

RAxML rapid hill-climbing mode

Using 1 distinct models/data partitions with joint branch length optimization


Executing 20 inferences on the original alignment using 20 distinct randomized MP trees

All free model parameters will be estimated by RAxML
GAMMA model of rate heterogeneity, ML estimate of alpha-parameter

GAMMA Model parameters will be estimated up to an accuracy of 0.1000000000 Log Likelihood units

Partition: 0
Alignment Patterns: 19
Name: No Name Provided
DataType: BINARY/MORPHOLOGICAL
Substitution Matrix: Uncorrected




RAxML was called as follows:

raxmlHPC -m BINGAMMA -p 12345 -s binary.phy -# 20 -n T5


Partition: 0 with name: No Name Provided
Base frequencies: 0.627 0.373

Inference[0]: Time 0.038043 GAMMA-based likelihood -119.663773, best rearrangement setting 5
Inference[1]: Time 0.037744 GAMMA-based likelihood -119.663772, best rearrangement setting 5
Inference[2]: Time 0.049993 GAMMA-based likelihood -119.622971, best rearrangement setting 5
Inference[3]: Time 0.037221 GAMMA-based likelihood -119.614407, best rearrangement setting 5
Inference[4]: Time 0.037553 GAMMA-based likelihood -119.614408, best rearrangement setting 5
Inference[5]: Time 0.040189 GAMMA-based likelihood -119.663772, best rearrangement setting 5
Inference[6]: Time 0.038556 GAMMA-based likelihood -119.614407, best rearrangement setting 5
Inference[7]: Time 0.037512 GAMMA-based likelihood -119.622971, best rearrangement setting 5
Inference[8]: Time 0.036832 GAMMA-based likelihood -119.614407, best rearrangement setting 5
Inference[9]: Time 0.028690 GAMMA-based likelihood -119.622971, best rearrangement setting 5
Inference[10]: Time 0.036923 GAMMA-based likelihood -119.663771, best rearrangement setting 5
Inference[11]: Time 0.036736 GAMMA-based likelihood -119.663772, best rearrangement setting 5
Inference[12]: Time 0.056471 GAMMA-based likelihood -119.622971, best rearrangement setting 5
Inference[13]: Time 0.049325 GAMMA-based likelihood -119.663771, best rearrangement setting 5
Inference[14]: Time 0.037075 GAMMA-based likelihood -119.663772, best rearrangement setting 5
Inference[15]: Time 0.037341 GAMMA-based likelihood -119.614408, best rearrangement setting 5
Inference[16]: Time 0.044418 GAMMA-based likelihood -119.614408, best rearrangement setting 5
Inference[17]: Time 0.042716 GAMMA-based likelihood -119.614408, best rearrangement setting 5
Inference[18]: Time 0.037358 GAMMA-based likelihood -119.614408, best rearrangement setting 5
Inference[19]: Time 0.061525 GAMMA-based likelihood -119.622971, best rearrangement setting 5


Conducting final model optimizations on all 20 trees under GAMMA-based models ....


WARNING the alpha parameter with a value of 13.041120 estimated by RAxML for partition number 0 with the name "No Name Provided"
is larger than 10.000000. You should do a model test and confirm that you actually need to incorporate a model of rate heterogeneity!
You can run inferences with a plain substitution model (without rate heterogeneity) by specifyng the CAT model and the "-V" option!

Inference[0] final GAMMA-based Likelihood: -119.545950 tree written to file /storage/coda1/pace-admins/svangala3/documentation/site_files/docs/slurm-software/test_directory/raxml/RAxML_result.T5.RUN.0
Inference[1] final GAMMA-based Likelihood: -119.545950 tree written to file /storage/coda1/pace-admins/svangala3/documentation/site_files/docs/slurm-software/test_directory/raxml/RAxML_result.T5.RUN.1
Inference[2] final GAMMA-based Likelihood: -119.545950 tree written to file /storage/coda1/pace-admins/svangala3/documentation/site_files/docs/slurm-software/test_directory/raxml/RAxML_result.T5.RUN.2
Inference[3] final GAMMA-based Likelihood: -119.545950 tree written to file /storage/coda1/pace-admins/svangala3/documentation/site_files/docs/slurm-software/test_directory/raxml/RAxML_result.T5.RUN.3
Inference[4] final GAMMA-based Likelihood: -119.545950 tree written to file /storage/coda1/pace-admins/svangala3/documentation/site_files/docs/slurm-software/test_directory/raxml/RAxML_result.T5.RUN.4
Inference[5] final GAMMA-based Likelihood: -119.545950 tree written to file /storage/coda1/pace-admins/svangala3/documentation/site_files/docs/slurm-software/test_directory/raxml/RAxML_result.T5.RUN.5
Inference[6] final GAMMA-based Likelihood: -119.545950 tree written to file /storage/coda1/pace-admins/svangala3/documentation/site_files/docs/slurm-software/test_directory/raxml/RAxML_result.T5.RUN.6
Inference[7] final GAMMA-based Likelihood: -119.545950 tree written to file /storage/coda1/pace-admins/svangala3/documentation/site_files/docs/slurm-software/test_directory/raxml/RAxML_result.T5.RUN.7
Inference[8] final GAMMA-based Likelihood: -119.545950 tree written to file /storage/coda1/pace-admins/svangala3/documentation/site_files/docs/slurm-software/test_directory/raxml/RAxML_result.T5.RUN.8
Inference[9] final GAMMA-based Likelihood: -119.545950 tree written to file /storage/coda1/pace-admins/svangala3/documentation/site_files/docs/slurm-software/test_directory/raxml/RAxML_result.T5.RUN.9
Inference[10] final GAMMA-based Likelihood: -119.545950 tree written to file /storage/coda1/pace-admins/svangala3/documentation/site_files/docs/slurm-software/test_directory/raxml/RAxML_result.T5.RUN.10
Inference[11] final GAMMA-based Likelihood: -119.545950 tree written to file /storage/coda1/pace-admins/svangala3/documentation/site_files/docs/slurm-software/test_directory/raxml/RAxML_result.T5.RUN.11
Inference[12] final GAMMA-based Likelihood: -119.545950 tree written to file /storage/coda1/pace-admins/svangala3/documentation/site_files/docs/slurm-software/test_directory/raxml/RAxML_result.T5.RUN.12
Inference[13] final GAMMA-based Likelihood: -119.545950 tree written to file /storage/coda1/pace-admins/svangala3/documentation/site_files/docs/slurm-software/test_directory/raxml/RAxML_result.T5.RUN.13
Inference[14] final GAMMA-based Likelihood: -119.545950 tree written to file /storage/coda1/pace-admins/svangala3/documentation/site_files/docs/slurm-software/test_directory/raxml/RAxML_result.T5.RUN.14
Inference[15] final GAMMA-based Likelihood: -119.545950 tree written to file /storage/coda1/pace-admins/svangala3/documentation/site_files/docs/slurm-software/test_directory/raxml/RAxML_result.T5.RUN.15
Inference[16] final GAMMA-based Likelihood: -119.545950 tree written to file /storage/coda1/pace-admins/svangala3/documentation/site_files/docs/slurm-software/test_directory/raxml/RAxML_result.T5.RUN.16
Inference[17] final GAMMA-based Likelihood: -119.545950 tree written to file /storage/coda1/pace-admins/svangala3/documentation/site_files/docs/slurm-software/test_directory/raxml/RAxML_result.T5.RUN.17
Inference[18] final GAMMA-based Likelihood: -119.545950 tree written to file /storage/coda1/pace-admins/svangala3/documentation/site_files/docs/slurm-software/test_directory/raxml/RAxML_result.T5.RUN.18
Inference[19] final GAMMA-based Likelihood: -119.545950 tree written to file /storage/coda1/pace-admins/svangala3/documentation/site_files/docs/slurm-software/test_directory/raxml/RAxML_result.T5.RUN.19


Starting final GAMMA-based thorough Optimization on tree 8 likelihood -119.545950 ....

Final GAMMA-based Score of best tree -119.545950

Program execution info written to /storage/coda1/pace-admins/svangala3/documentation/site_files/docs/slurm-software/test_directory/raxml/RAxML_info.T5
Best-scoring ML tree written to: /storage/coda1/pace-admins/svangala3/documentation/site_files/docs/slurm-software/test_directory/raxml/RAxML_bestTree.T5

Overall execution time: 0.875795 secs or 0.000243 hours or 0.000010 days

---------------------------------------
Begin Slurm Epilog: Dec-25-2022 22:38:29
Job ID:        231349
Array Job ID:  _4294967294
User ID:       svangala3
Account:       phx-pace-staff
Job name:      raxmlTest
Resources:     cpu=2,mem=4G,node=1
Rsrc Used:     cput=00:00:08,vmem=644K,walltime=00:00:04,mem=0,energy_used=0Partition:     cpu-small
QOS:           inferno
Nodes:         atl1-1-02-020-27-1
---------------------------------------
  • All output files can be found here.
  • After the result files are produced, you can move the files off the cluster, refer to the file transfer guide for help.
  • Congratulations! You successfully ran RAxML on the cluster.