Updated 2019-06-24

Run HMMER on the Cluster

Overview

  • HMMER is used for searching sequence databases for sequence homologs, and for making sequence alignments. It implements methods using probabilistic models called profile hidden Markov models (profile HMMs).
  • This guide will go over how to run HMMER on the Cluster.
  • More information about HMMER can be found in the HMMER User Guide

Summary

  • This tutorial covers how to search a sequence database with a profile.
  • Files used:
    • globins4.sto - An example alignment of four globin sequences, in Stockholm format. This alignment is a subset of a famous old published structural alignment from Don Bashford.
    • globins45.fa - 45 unaligned globin sequences, in FASTA format.
  • We will build a profile of globins4 with hmmbuild and search the database with hmmsearch.
  • More tutorials covering different uses of HMMER can be found in the Tutorial section of the HMMER User Guide.

Walkthrough: Run HMMER on the Cluster

  • This walkthrough will use HMMER to search a sequence database with a profile.
  • The tutorial folder used can be found here. It is listed as hmmer_tutorial but it should be saved as tutorial to work with the PBS Script in this guide.
  • PBS script can be found here
  • You can transfer the files to your account on the cluster to follow along. The file transfer guide may be helpful.

Part 1: The PBS Script

#PBS -N testHMMER
#PBS -l nodes=1:ppn=2
#PBS -l pmem=2gb
#PBS -l walltime=5:00
#PBS -q force-6
#PBS -j oe
#PBS -o testHMMER.out

cd $PBS_O_WORKDIR/tutorial
module load hmmer/3.1b1

# build a profile with hmmbuild
hmmbuild globins4.hmm globins4.sto

# search the sequence database with hmmsearch
hmmsearch globins4.hmm globins45.fa > globins4.out

  • The #PBS directives are standard, requesting just 5 minutes of walltime and 1 node with 2 cores. More on #PBS directives can be found in the PBS guide
  • $PBS_O_WORKDIR is a variable that represents the directory you submit the PBS script from. Make sure the files you want to use (in this case the tutorial folder) are in the same directory you put the PBS script.
  • Output Files will also show up in this dir as well
  • module load hmmer/3.1b1 loads the 3.1b1 version of HMMER. To see what HMMER versions are available, run module avail hmmer, and load the one you want.
  • The comments in the PBS Script explain what each line after loading the module does.

Part 2: Submit Job and Check Status

  • Make sure you're in the dir that contains the PBS Script as well as the directory containing the HMMER Tutorial files.
  • Submit as normal, with qsub <pbs script name>. In this case qsub testHMMER.pbs
  • Check job status with qstat -t 22182721, replacing the number with the job id returned after running qsub
  • You can delete the job with qdel 22182721 , again replacing the number with the jobid returned after running qsub

Part 3: Collecting Results

  • In the directory where you submitted the PBS script, you should see a testHMMER.out file which contains the results of the job, a globins4.hmm file which contains the new profile created (this is intended to be fed into HMMER, not for users), and a globins4.out file which contains a list of ranked top hits in a BLAST-like style with information like E-value, score, and bias for each hit.
  • For a more in-depth explanation of the output, visit the HMMER User Guide.
  • The testHMMER.out file should look like this:
---------------------------------------
Begin PBS Prologue Mon Jun 24 11:23:33 EDT 2019
Job ID:     26162088.shared-sched.pace.gatech.edu
User ID:    svemuri8
Job name:   testHMMER
Queue:      force-6
End PBS Prologue Mon Jun 24 11:23:33 EDT 2019
---------------------------------------
# hmmbuild :: profile HMM construction from multiple sequence alignments
# HMMER 3.1b1 (May 2013); http://hmmer.org/
# Copyright (C) 2013 Howard Hughes Medical Institute.
# Freely distributed under the GNU General Public License (GPLv3).
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# input alignment file:             globins4.sto
# output HMM file:                  globins4.hmm
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

# idx name                  nseq  alen  mlen eff_nseq re/pos description
#---- -------------------- ----- ----- ----- -------- ------ -----------
1     globins4                 4   171   149     0.96  0.589

# CPU time: 0.28u 0.00s 00:00:00.28 Elapsed: 00:00:00.31
---------------------------------------
Begin PBS Epilogue Mon Jun 24 11:23:34 EDT 2019
Job ID:     26162088.shared-sched.pace.gatech.edu
User ID:    svemuri8
Job name:   testHMMER
Resources:  neednodes=1:ppn=2,nodes=1:ppn=2,pmem=2gb,walltime=00:05:00
Rsrc Used:  cput=00:00:00,energy_used=0,mem=0kb,vmem=0kb,walltime=00:00:01
Queue:      force-6
Nodes:
rich133-g24-25-r.pace.gatech.edu
End PBS Epilogue Mon Jun 24 11:23:34 EDT 2019
---------------------------------------
  • The globins4.out file should look like this:
# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.1b1 (May 2013); http://hmmer.org/
# Copyright (C) 2013 Howard Hughes Medical Institute.
# Freely distributed under the GNU General Public License (GPLv3).
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  globins4.hmm
# target sequence database:        globins45.fa
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       globins4  [M=149]
Scores for complete sequences (score includes all domains):
   --- full sequence ---   --- best 1 domain ---    -#dom-
    E-value  score  bias    E-value  score  bias    exp  N  Sequence   Description
    ------- ------ -----    ------- ------ -----   ---- --  --------   -----------
    8.7e-67  215.6   2.9    9.7e-67  215.4   2.9    1.0  1  MYG_ESCGI
    1.1e-65  211.9   0.1    1.3e-65  211.8   0.1    1.0  1  HBB_MANSP
    7.4e-65  209.3   0.2    8.2e-65  209.2   0.2    1.0  1  HBB_CALAR
    5.5e-64  206.5   1.2    6.1e-64  206.3   1.2    1.0  1  MYG_HORSE
    2.8e-63  204.2   0.1    3.1e-63  204.1   0.1    1.0  1  HBB_URSMA
    9.9e-63  202.4   0.5    1.1e-62  202.3   0.5    1.0  1  HBB_RABIT
    2.6e-62  201.1   1.3    2.8e-62  200.9   1.3    1.0  1  HBA_PONPY
      2e-61  198.2   1.1    2.2e-61  198.1   1.1    1.0  1  HBB_SPECI
      1e-60  195.9   1.7    1.1e-60  195.8   1.7    1.0  1  MYG_LYCPI
    1.1e-60  195.8   0.3    1.2e-60  195.7   0.3    1.0  1  MYG_PROGU
    1.4e-60  195.5   0.7    1.5e-60  195.3   0.7    1.0  1  HBB_SPETO
    1.5e-60  195.3   0.6    1.7e-60  195.2   0.6    1.0  1  HBA_MACFA
    1.9e-60  195.0   1.1    2.1e-60  194.8   1.1    1.0  1  MYG_SAISC
    2.2e-60  194.8   0.1    2.4e-60  194.7   0.1    1.0  1  HBB_SUNMU
    3.5e-60  194.1   0.1    3.9e-60  194.0   0.1    1.0  1  HBB_TRIIN
    3.7e-60  194.0   0.5    4.1e-60  193.9   0.5    1.0  1  HBA_MACSI
    3.3e-59  191.0   0.1    3.7e-59  190.8   0.1    1.0  1  HBB_EQUHE
    4.5e-59  190.5   0.9      5e-59  190.4   0.9    1.0  1  HBA2_GALCR
    1.7e-58  188.7   0.3    1.8e-58  188.6   0.3    1.0  1  HBB_TACAC
    2.9e-58  187.9   0.3    3.2e-58  187.8   0.3    1.0  1  MYG_MOUSE
    5.7e-58  187.0   0.5    6.3e-58  186.8   0.5    1.0  1  HBE_PONPY
    1.6e-57  185.5   0.4    1.8e-57  185.3   0.4    1.0  1  HBA_MESAU
      2e-57  185.2   0.7    2.2e-57  185.0   0.7    1.0  1  HBA2_BOSMU
    6.2e-57  183.6   0.0    6.9e-57  183.5   0.0    1.0  1  HBB_TUPGL
    1.3e-56  182.6   0.2    1.4e-56  182.5   0.2    1.0  1  HBA_AILME
    1.6e-56  182.2   0.0    1.8e-56  182.1   0.0    1.0  1  HBB_ORNAN
    1.7e-56  182.1   0.1    1.9e-56  182.0   0.1    1.0  1  HBB_COLLI
    2.9e-55  178.2   0.1    3.2e-55  178.0   0.1    1.0  1  HBB_LARRI
    8.1e-55  176.7   0.7    8.9e-55  176.6   0.7    1.0  1  HBA_PAGLA
    2.6e-54  175.1   0.1    2.9e-54  174.9   0.1    1.0  1  HBA_ERIEU
    3.2e-54  174.8   0.1    3.6e-54  174.6   0.1    1.0  1  HBA_PROLO
    2.4e-51  165.5   0.2    2.7e-51  165.3   0.2    1.0  1  HBAZ_HORSE
    3.5e-51  164.9   0.1    3.8e-51  164.8   0.1    1.0  1  HBB1_VAREX
      3e-50  161.9   0.5    3.4e-50  161.7   0.5    1.0  1  HBA_FRAPO
    4.6e-50  161.3   0.4    5.1e-50  161.2   0.4    1.0  1  HBA_PHACO
    5.1e-49  157.9   0.1    5.6e-49  157.8   0.1    1.0  1  HBAD_PASMO
    8.4e-49  157.2   0.5    9.3e-49  157.1   0.5    1.0  1  HBA_ANSSE
      1e-48  157.0   0.4    1.1e-48  156.8   0.4    1.0  1  HBAD_CHLME
    1.1e-48  156.8   0.4    1.2e-48  156.7   0.4    1.0  1  HBA_TRIOC
      2e-48  156.0   0.3    2.2e-48  155.8   0.3    1.0  1  HBBL_RANCA
    2.4e-48  155.7   0.3    2.7e-48  155.6   0.3    1.0  1  HBA_COLLI
    1.1e-47  153.6   1.1    1.1e-47  153.5   1.1    1.0  1  HBB2_XENTR
    4.5e-43  138.6   0.1      5e-43  138.5   0.1    1.0  1  HBA4_SALIR
    9.9e-38  121.3   0.4    1.1e-37  121.1   0.4    1.0  1  MYG_MUSAN
    4.7e-36  115.8   0.0    5.2e-36  115.7   0.0    1.0  1  HBB2_TRICR


Domain annotation for each sequence (and alignments):
>> MYG_ESCGI
   #    score  bias  c-Evalue  i-Evalue hmmfrom  hmm to    alifrom  ali to    envfrom  env to     acc
 ---   ------ ----- --------- --------- ------- -------    ------- -------    ------- -------    ----
   1 !  215.4   2.9   9.7e-67   9.7e-67       2     149 .]       1     147 [.       1     147 [. 0.99

  Alignments for each domain:
  == domain 1  score: 215.4 bits;  conditional E-value: 9.7e-67
   globins4   2 vLseaektkvkavWakveadveesGadiLvrlfkstPatqefFekFkdLstedelkksadvkkHgkkvldAlsdalakldekleaklkdLselHakklkv 101
                vLs+ae++ v+++Wakveadv+++G+diL+rlfk +P+t+e+F+kFk+L+te+e+k+s+d+kkHg++vl+Al+ +l+k ++++ea+lk+L+++Ha+k+k+
  MYG_ESCGI   1 VLSDAEWQLVLNIWAKVEADVAGHGQDILIRLFKGHPETLEKFDKFKHLKTEAEMKASEDLKKHGNTVLTALGGILKK-KGHHEAELKPLAQSHATKHKI 99
                69****************************************************************************.99******************* PP

   globins4 102 dpkyfkllsevlvdvlaarlpkeftadvqaaleKllalvakllaskYk 149
                ++ky++++s+++++vl++r+p++f+ad+qaa++K+l+l++k++a+kYk
  MYG_ESCGI 100 PIKYLEFISDAIIHVLHSRHPGDFGADAQAAMNKALELFRKDIAAKYK 147
                ***********************************************7 PP

>> HBB_MANSP
  • After the result files are produced, you can move the files off the cluster, refer to the file transfer guide for help.
  • Congratulations! You successfully ran HMMER on the cluster.