Updated 2023-01-30

Run HMMER on the Cluster

Overview

  • HMMER is used for searching sequence databases for sequence homologs, and for making sequence alignments. It implements methods using probabilistic models called profile hidden Markov models (profile HMMs).
  • This guide will go over how to run HMMER on the Cluster.
  • More information about HMMER can be found in the HMMER User Guide

Summary

  • This tutorial covers how to search a sequence database with a profile.
  • Files used:
    • globins4.sto - An example alignment of four globin sequences, in Stockholm format. This alignment is a subset of a famous old published structural alignment from Don Bashford.
    • globins45.fa - 45 unaligned globin sequences, in FASTA format.
  • We will build a profile of globins4 with hmmbuild and search the database with hmmsearch.
  • More tutorials covering different uses of HMMER can be found in the Tutorial section of the HMMER User Guide.

Walkthrough: Run HMMER on the Cluster

  • This walkthrough will use HMMER to search a sequence database with a profile.
  • You can transfer the files to your account on the cluster to follow along. The file transfer guide may be helpful.

Part 1: The SBATCH Script

#!/bin/bash
#SBATCH -J testHMMER
#SBATCH -A [Account] 
#SBATCH -N 1 --ntasks-per-node=2
#SBATCH --mem-per-cpu=2G
#SBATCH -t 5
#SBATCH -p inferno
#SBATCH -o Report-%testHMMER.out

cd $SLURM_SUBMIT_DIR/tutorial
module load hmmer/3.1b1

# build a profile with hmmbuild
hmmbuild globins4.hmm globins4.sto

# search the sequence database with hmmsearch
hmmsearch globins4.hmm globins45.fa > globins4.out

  • The #SBATCH directives are standard, requesting just 5 minutes of walltime and 1 node with 2 cores. More on #SBATCH directives can be found in the SBATCH guide
  • $SLURM_SUBMIT_DIR is a variable that represents the directory you submit the SBATCH script from. Make sure the files you want to use (in this case the tutorial folder) are in the same directory you put the SBATCH script.
  • Output Files will also show up in this dir as well
  • module load hmmer/3.1b1 loads the 3.1b1 version of HMMER. To see what HMMER versions are available, run module avail hmmer, and load the one you want.
  • The comments in the SBATCH Script explain what each line after loading the module does.

Part 2: Submit Job and Check Status

  • Make sure you're in the dir that contains the SBATCH Script as well as the directory containing the HMMER Tutorial files.
  • Submit as normal, with sbatch <SBATCH script name>. In this case scancel testHMMER.SBATCH
  • Check job status with squeue -t 22182721, replacing the number with the job id returned after running sbatch
  • You can delete the job with squeue 22182721 , again replacing the number with the jobid returned after running sbatch

Part 3: Collecting Results

  • In the directory where you submitted the SBATCH script, you should see a testHMMER.out file which contains the results of the job, a globins4.hmm file which contains the new profile created (this is intended to be fed into HMMER, not for users), and a globins4.out file which contains a list of ranked top hits in a BLAST-like style with information like E-value, score, and bias for each hit.
  • For a more in-depth explanation of the output, visit the HMMER User Guide.
  • The testHMMER.out file should look like this:
---------------------------------------
Begin SBATCH Prologue Wed Jan 11 11:21:55 EDT 2023
Job ID:     26162088.shared-sched.pace.gatech.edu
User ID:    svemuri8
Job name:   testHMMER
Queue:      inferno
End SBATCH Prologue Wed Jan 11 11:21:55 EDT 2023
---------------------------------------
# hmmbuild :: profile HMM construction from multiple sequence alignments
# HMMER 3.1b1 (May 2013); http://hmmer.org/
# Copyright (C) 2013 Howard Hughes Medical Institute.
# Freely distributed under the GNU General Public License (GPLv3).
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# input alignment file:             globins4.sto
# output HMM file:                  globins4.hmm
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

# idx name                  nseq  alen  mlen eff_nseq re/pos description
#---- -------------------- ----- ----- ----- -------- ------ -----------
1     globins4                 4   171   149     0.96  0.589

# CPU time: 0.28u 0.00s 00:00:00.28 Elapsed: 00:00:00.31
---------------------------------------
Begin SBATCH Epilogue Wed Jan 11 11:21:56 EDT 2023
Job ID:     26162088.shared-sched.pace.gatech.edu
User ID:    svemuri8
Job name:   testHMMER
Resources:  neednodes=1:ppn=2,nodes=1:ppn=2,pmem=2gb,walltime=00:05:00
Rsrc Used:  cput=00:00:00,energy_used=0,mem=0kb,vmem=0kb,walltime=00:00:01
Queue:      inferno
Nodes:
rich133-g24-25-r.pace.gatech.edu
End SBATCH Epilogue Wed Jan 11 11:21:56 EDT 2023
---------------------------------------
  • The globins4.out file should look like this:
# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.1b1 (May 2013); http://hmmer.org/
# Copyright (C) 2013 Howard Hughes Medical Institute.
# Freely distributed under the GNU General Public License (GPLv3).
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  globins4.hmm
# target sequence database:        globins45.fa
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       globins4  [M=149]
Scores for complete sequences (score includes all domains):
   --- full sequence ---   --- best 1 domain ---    -#dom-
    E-value  score  bias    E-value  score  bias    exp  N  Sequence   Description
    ------- ------ -----    ------- ------ -----   ---- --  --------   -----------
    8.7e-67  215.6   2.9    9.7e-67  215.4   2.9    1.0  1  MYG_ESCGI
    1.1e-65  211.9   0.1    1.3e-65  211.8   0.1    1.0  1  HBB_MANSP
    7.4e-65  209.3   0.2    8.2e-65  209.2   0.2    1.0  1  HBB_CALAR
    5.5e-64  206.5   1.2    6.1e-64  206.3   1.2    1.0  1  MYG_HORSE
    2.8e-63  204.2   0.1    3.1e-63  204.1   0.1    1.0  1  HBB_URSMA
    9.9e-63  202.4   0.5    1.1e-62  202.3   0.5    1.0  1  HBB_RABIT
    2.6e-62  201.1   1.3    2.8e-62  200.9   1.3    1.0  1  HBA_PONPY
      2e-61  198.2   1.1    2.2e-61  198.1   1.1    1.0  1  HBB_SPECI
      1e-60  195.9   1.7    1.1e-60  195.8   1.7    1.0  1  MYG_LYCPI
    1.1e-60  195.8   0.3    1.2e-60  195.7   0.3    1.0  1  MYG_PROGU
    1.4e-60  195.5   0.7    1.5e-60  195.3   0.7    1.0  1  HBB_SPETO
    1.5e-60  195.3   0.6    1.7e-60  195.2   0.6    1.0  1  HBA_MACFA
    1.9e-60  195.0   1.1    2.1e-60  194.8   1.1    1.0  1  MYG_SAISC
    2.2e-60  194.8   0.1    2.4e-60  194.7   0.1    1.0  1  HBB_SUNMU
    3.5e-60  194.1   0.1    3.9e-60  194.0   0.1    1.0  1  HBB_TRIIN
    3.7e-60  194.0   0.5    4.1e-60  193.9   0.5    1.0  1  HBA_MACSI
    3.3e-59  191.0   0.1    3.7e-59  190.8   0.1    1.0  1  HBB_EQUHE
    4.5e-59  190.5   0.9      5e-59  190.4   0.9    1.0  1  HBA2_GALCR
    1.7e-58  188.7   0.3    1.8e-58  188.6   0.3    1.0  1  HBB_TACAC
    2.9e-58  187.9   0.3    3.2e-58  187.8   0.3    1.0  1  MYG_MOUSE
    5.7e-58  187.0   0.5    6.3e-58  186.8   0.5    1.0  1  HBE_PONPY
    1.6e-57  185.5   0.4    1.8e-57  185.3   0.4    1.0  1  HBA_MESAU
      2e-57  185.2   0.7    2.2e-57  185.0   0.7    1.0  1  HBA2_BOSMU
    6.2e-57  183.6   0.0    6.9e-57  183.5   0.0    1.0  1  HBB_TUPGL
    1.3e-56  182.6   0.2    1.4e-56  182.5   0.2    1.0  1  HBA_AILME
    1.6e-56  182.2   0.0    1.8e-56  182.1   0.0    1.0  1  HBB_ORNAN
    1.7e-56  182.1   0.1    1.9e-56  182.0   0.1    1.0  1  HBB_COLLI
    2.9e-55  178.2   0.1    3.2e-55  178.0   0.1    1.0  1  HBB_LARRI
    8.1e-55  176.7   0.7    8.9e-55  176.6   0.7    1.0  1  HBA_PAGLA
    2.6e-54  175.1   0.1    2.9e-54  174.9   0.1    1.0  1  HBA_ERIEU
    3.2e-54  174.8   0.1    3.6e-54  174.6   0.1    1.0  1  HBA_PROLO
    2.4e-51  165.5   0.2    2.7e-51  165.3   0.2    1.0  1  HBAZ_HORSE
    3.5e-51  164.9   0.1    3.8e-51  164.8   0.1    1.0  1  HBB1_VAREX
      3e-50  161.9   0.5    3.4e-50  161.7   0.5    1.0  1  HBA_FRAPO
    4.6e-50  161.3   0.4    5.1e-50  161.2   0.4    1.0  1  HBA_PHACO
    5.1e-49  157.9   0.1    5.6e-49  157.8   0.1    1.0  1  HBAD_PASMO
    8.4e-49  157.2   0.5    9.3e-49  157.1   0.5    1.0  1  HBA_ANSSE
      1e-48  157.0   0.4    1.1e-48  156.8   0.4    1.0  1  HBAD_CHLME
    1.1e-48  156.8   0.4    1.2e-48  156.7   0.4    1.0  1  HBA_TRIOC
      2e-48  156.0   0.3    2.2e-48  155.8   0.3    1.0  1  HBBL_RANCA
    2.4e-48  155.7   0.3    2.7e-48  155.6   0.3    1.0  1  HBA_COLLI
    1.1e-47  153.6   1.1    1.1e-47  153.5   1.1    1.0  1  HBB2_XENTR
    4.5e-43  138.6   0.1      5e-43  138.5   0.1    1.0  1  HBA4_SALIR
    9.9e-38  121.3   0.4    1.1e-37  121.1   0.4    1.0  1  MYG_MUSAN
    4.7e-36  115.8   0.0    5.2e-36  115.7   0.0    1.0  1  HBB2_TRICR


Domain annotation for each sequence (and alignments):
>> MYG_ESCGI
   #    score  bias  c-Evalue  i-Evalue hmmfrom  hmm to    alifrom  ali to    envfrom  env to     acc
 ---   ------ ----- --------- --------- ------- -------    ------- -------    ------- -------    ----
   1 !  215.4   2.9   9.7e-67   9.7e-67       2     149 .]       1     147 [.       1     147 [. 0.99

  Alignments for each domain:
  == domain 1  score: 215.4 bits;  conditional E-value: 9.7e-67
   globins4   2 vLseaektkvkavWakveadveesGadiLvrlfkstPatqefFekFkdLstedelkksadvkkHgkkvldAlsdalakldekleaklkdLselHakklkv 101
                vLs+ae++ v+++Wakveadv+++G+diL+rlfk +P+t+e+F+kFk+L+te+e+k+s+d+kkHg++vl+Al+ +l+k ++++ea+lk+L+++Ha+k+k+
  MYG_ESCGI   1 VLSDAEWQLVLNIWAKVEADVAGHGQDILIRLFKGHPETLEKFDKFKHLKTEAEMKASEDLKKHGNTVLTALGGILKK-KGHHEAELKPLAQSHATKHKI 99
                69****************************************************************************.99******************* PP

   globins4 102 dpkyfkllsevlvdvlaarlpkeftadvqaaleKllalvakllaskYk 149
                ++ky++++s+++++vl++r+p++f+ad+qaa++K+l+l++k++a+kYk
  MYG_ESCGI 100 PIKYLEFISDAIIHVLHSRHPGDFGADAQAAMNKALELFRKDIAAKYK 147
                ***********************************************7 PP

>> HBB_MANSP
  • After the result files are produced, you can move the files off the cluster, refer to the file transfer guide for help.
  • Congratulations! You successfully ran HMMER on the cluster.