Updated 2023-01-30
Run HMMER on the Cluster¶
Overview¶
- HMMER is used for searching sequence databases for sequence homologs, and for making sequence alignments. It implements methods using probabilistic models called profile hidden Markov models (profile HMMs).
- This guide will go over how to run HMMER on the Cluster.
- More information about HMMER can be found in the HMMER User Guide
Summary¶
- This tutorial covers how to search a sequence database with a profile.
- Files used:
- globins4.sto - An example alignment of four globin sequences, in Stockholm format. This alignment is a subset of a famous old published structural alignment from Don Bashford.
- globins45.fa - 45 unaligned globin sequences, in FASTA format.
- We will build a profile of globins4 with hmmbuild and search the database with hmmsearch.
- More tutorials covering different uses of HMMER can be found in the Tutorial section of the HMMER User Guide.
Walkthrough: Run HMMER on the Cluster¶
- This walkthrough will use HMMER to search a sequence database with a profile.
- You can transfer the files to your account on the cluster to follow along. The file transfer guide may be helpful.
Part 1: The SBATCH Script¶
#!/bin/bash
#SBATCH -J testHMMER
#SBATCH -A [Account]
#SBATCH -N 1 --ntasks-per-node=2
#SBATCH --mem-per-cpu=2G
#SBATCH -t 5
#SBATCH -p inferno
#SBATCH -o Report-%testHMMER.out
cd $SLURM_SUBMIT_DIR/tutorial
module load hmmer/3.1b1
# build a profile with hmmbuild
hmmbuild globins4.hmm globins4.sto
# search the sequence database with hmmsearch
hmmsearch globins4.hmm globins45.fa > globins4.out
- The
#SBATCH
directives are standard, requesting just 5 minutes of walltime and 1 node with 2 cores. More on#SBATCH
directives can be found in the SBATCH guide $SLURM_SUBMIT_DIR
is a variable that represents the directory you submit the SBATCH script from. Make sure the files you want to use (in this case thetutorial
folder) are in the same directory you put the SBATCH script.- Output Files will also show up in this dir as well
module load hmmer/3.1b1
loads the 3.1b1 version of HMMER. To see what HMMER versions are available, runmodule avail hmmer
, and load the one you want.- The comments in the SBATCH Script explain what each line after loading the module does.
Part 2: Submit Job and Check Status¶
- Make sure you're in the dir that contains the
SBATCH
Script as well as the directory containing the HMMER Tutorial files. - Submit as normal, with
sbatch <SBATCH script name>
. In this casescancel testHMMER.SBATCH
- Check job status with
squeue -t 22182721
, replacing the number with the job id returned after running sbatch - You can delete the job with
squeue 22182721
, again replacing the number with the jobid returned after running sbatch
Part 3: Collecting Results¶
- In the directory where you submitted the
SBATCH
script, you should see atestHMMER.out
file which contains the results of the job, aglobins4.hmm
file which contains the new profile created (this is intended to be fed into HMMER, not for users), and aglobins4.out
file which contains a list of ranked top hits in a BLAST-like style with information like E-value, score, and bias for each hit. - For a more in-depth explanation of the output, visit the HMMER User Guide.
- The
testHMMER.out
file should look like this:
---------------------------------------
Begin SBATCH Prologue Wed Jan 11 11:21:55 EDT 2023
Job ID: 26162088.shared-sched.pace.gatech.edu
User ID: svemuri8
Job name: testHMMER
Queue: inferno
End SBATCH Prologue Wed Jan 11 11:21:55 EDT 2023
---------------------------------------
# hmmbuild :: profile HMM construction from multiple sequence alignments
# HMMER 3.1b1 (May 2013); http://hmmer.org/
# Copyright (C) 2013 Howard Hughes Medical Institute.
# Freely distributed under the GNU General Public License (GPLv3).
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# input alignment file: globins4.sto
# output HMM file: globins4.hmm
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# idx name nseq alen mlen eff_nseq re/pos description
#---- -------------------- ----- ----- ----- -------- ------ -----------
1 globins4 4 171 149 0.96 0.589
# CPU time: 0.28u 0.00s 00:00:00.28 Elapsed: 00:00:00.31
---------------------------------------
Begin SBATCH Epilogue Wed Jan 11 11:21:56 EDT 2023
Job ID: 26162088.shared-sched.pace.gatech.edu
User ID: svemuri8
Job name: testHMMER
Resources: neednodes=1:ppn=2,nodes=1:ppn=2,pmem=2gb,walltime=00:05:00
Rsrc Used: cput=00:00:00,energy_used=0,mem=0kb,vmem=0kb,walltime=00:00:01
Queue: inferno
Nodes:
rich133-g24-25-r.pace.gatech.edu
End SBATCH Epilogue Wed Jan 11 11:21:56 EDT 2023
---------------------------------------
- The
globins4.out
file should look like this:
# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.1b1 (May 2013); http://hmmer.org/
# Copyright (C) 2013 Howard Hughes Medical Institute.
# Freely distributed under the GNU General Public License (GPLv3).
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file: globins4.hmm
# target sequence database: globins45.fa
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Query: globins4 [M=149]
Scores for complete sequences (score includes all domains):
--- full sequence --- --- best 1 domain --- -#dom-
E-value score bias E-value score bias exp N Sequence Description
------- ------ ----- ------- ------ ----- ---- -- -------- -----------
8.7e-67 215.6 2.9 9.7e-67 215.4 2.9 1.0 1 MYG_ESCGI
1.1e-65 211.9 0.1 1.3e-65 211.8 0.1 1.0 1 HBB_MANSP
7.4e-65 209.3 0.2 8.2e-65 209.2 0.2 1.0 1 HBB_CALAR
5.5e-64 206.5 1.2 6.1e-64 206.3 1.2 1.0 1 MYG_HORSE
2.8e-63 204.2 0.1 3.1e-63 204.1 0.1 1.0 1 HBB_URSMA
9.9e-63 202.4 0.5 1.1e-62 202.3 0.5 1.0 1 HBB_RABIT
2.6e-62 201.1 1.3 2.8e-62 200.9 1.3 1.0 1 HBA_PONPY
2e-61 198.2 1.1 2.2e-61 198.1 1.1 1.0 1 HBB_SPECI
1e-60 195.9 1.7 1.1e-60 195.8 1.7 1.0 1 MYG_LYCPI
1.1e-60 195.8 0.3 1.2e-60 195.7 0.3 1.0 1 MYG_PROGU
1.4e-60 195.5 0.7 1.5e-60 195.3 0.7 1.0 1 HBB_SPETO
1.5e-60 195.3 0.6 1.7e-60 195.2 0.6 1.0 1 HBA_MACFA
1.9e-60 195.0 1.1 2.1e-60 194.8 1.1 1.0 1 MYG_SAISC
2.2e-60 194.8 0.1 2.4e-60 194.7 0.1 1.0 1 HBB_SUNMU
3.5e-60 194.1 0.1 3.9e-60 194.0 0.1 1.0 1 HBB_TRIIN
3.7e-60 194.0 0.5 4.1e-60 193.9 0.5 1.0 1 HBA_MACSI
3.3e-59 191.0 0.1 3.7e-59 190.8 0.1 1.0 1 HBB_EQUHE
4.5e-59 190.5 0.9 5e-59 190.4 0.9 1.0 1 HBA2_GALCR
1.7e-58 188.7 0.3 1.8e-58 188.6 0.3 1.0 1 HBB_TACAC
2.9e-58 187.9 0.3 3.2e-58 187.8 0.3 1.0 1 MYG_MOUSE
5.7e-58 187.0 0.5 6.3e-58 186.8 0.5 1.0 1 HBE_PONPY
1.6e-57 185.5 0.4 1.8e-57 185.3 0.4 1.0 1 HBA_MESAU
2e-57 185.2 0.7 2.2e-57 185.0 0.7 1.0 1 HBA2_BOSMU
6.2e-57 183.6 0.0 6.9e-57 183.5 0.0 1.0 1 HBB_TUPGL
1.3e-56 182.6 0.2 1.4e-56 182.5 0.2 1.0 1 HBA_AILME
1.6e-56 182.2 0.0 1.8e-56 182.1 0.0 1.0 1 HBB_ORNAN
1.7e-56 182.1 0.1 1.9e-56 182.0 0.1 1.0 1 HBB_COLLI
2.9e-55 178.2 0.1 3.2e-55 178.0 0.1 1.0 1 HBB_LARRI
8.1e-55 176.7 0.7 8.9e-55 176.6 0.7 1.0 1 HBA_PAGLA
2.6e-54 175.1 0.1 2.9e-54 174.9 0.1 1.0 1 HBA_ERIEU
3.2e-54 174.8 0.1 3.6e-54 174.6 0.1 1.0 1 HBA_PROLO
2.4e-51 165.5 0.2 2.7e-51 165.3 0.2 1.0 1 HBAZ_HORSE
3.5e-51 164.9 0.1 3.8e-51 164.8 0.1 1.0 1 HBB1_VAREX
3e-50 161.9 0.5 3.4e-50 161.7 0.5 1.0 1 HBA_FRAPO
4.6e-50 161.3 0.4 5.1e-50 161.2 0.4 1.0 1 HBA_PHACO
5.1e-49 157.9 0.1 5.6e-49 157.8 0.1 1.0 1 HBAD_PASMO
8.4e-49 157.2 0.5 9.3e-49 157.1 0.5 1.0 1 HBA_ANSSE
1e-48 157.0 0.4 1.1e-48 156.8 0.4 1.0 1 HBAD_CHLME
1.1e-48 156.8 0.4 1.2e-48 156.7 0.4 1.0 1 HBA_TRIOC
2e-48 156.0 0.3 2.2e-48 155.8 0.3 1.0 1 HBBL_RANCA
2.4e-48 155.7 0.3 2.7e-48 155.6 0.3 1.0 1 HBA_COLLI
1.1e-47 153.6 1.1 1.1e-47 153.5 1.1 1.0 1 HBB2_XENTR
4.5e-43 138.6 0.1 5e-43 138.5 0.1 1.0 1 HBA4_SALIR
9.9e-38 121.3 0.4 1.1e-37 121.1 0.4 1.0 1 MYG_MUSAN
4.7e-36 115.8 0.0 5.2e-36 115.7 0.0 1.0 1 HBB2_TRICR
Domain annotation for each sequence (and alignments):
>> MYG_ESCGI
# score bias c-Evalue i-Evalue hmmfrom hmm to alifrom ali to envfrom env to acc
--- ------ ----- --------- --------- ------- ------- ------- ------- ------- ------- ----
1 ! 215.4 2.9 9.7e-67 9.7e-67 2 149 .] 1 147 [. 1 147 [. 0.99
Alignments for each domain:
== domain 1 score: 215.4 bits; conditional E-value: 9.7e-67
globins4 2 vLseaektkvkavWakveadveesGadiLvrlfkstPatqefFekFkdLstedelkksadvkkHgkkvldAlsdalakldekleaklkdLselHakklkv 101
vLs+ae++ v+++Wakveadv+++G+diL+rlfk +P+t+e+F+kFk+L+te+e+k+s+d+kkHg++vl+Al+ +l+k ++++ea+lk+L+++Ha+k+k+
MYG_ESCGI 1 VLSDAEWQLVLNIWAKVEADVAGHGQDILIRLFKGHPETLEKFDKFKHLKTEAEMKASEDLKKHGNTVLTALGGILKK-KGHHEAELKPLAQSHATKHKI 99
69****************************************************************************.99******************* PP
globins4 102 dpkyfkllsevlvdvlaarlpkeftadvqaaleKllalvakllaskYk 149
++ky++++s+++++vl++r+p++f+ad+qaa++K+l+l++k++a+kYk
MYG_ESCGI 100 PIKYLEFISDAIIHVLHSRHPGDFGADAQAAMNKALELFRKDIAAKYK 147
***********************************************7 PP
>> HBB_MANSP
- After the result files are produced, you can move the files off the cluster, refer to the file transfer guide for help.
- Congratulations! You successfully ran HMMER on the cluster.