HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment

HHBlits will use a Hidden-Markov modeled sequence (.hhm files) to find sequences in existing databases that match the pattern. HHBlits is a part of a HHSuite software (Github link). On COSMIC2, we have HHBlits available so that users can take outputs from Model Angelo to find which sequences match the hidden Markov Model pattern.

Input: 

  • .hmm file
  • Databases:
    • You can choose a database with which to search your .hmm file
    • UniClust30:
      • clusters UniProt database sequences at the level of 30% pairwise sequence identity
    • BFD:
      • Derived from environmental samples (2.5 billion proteins)
    • Pdb70
      • Database of sequences from protein databank with 70% identiy

Output:

  • .hhr file – list of proteins that match input .hmm pattern
  • Aligned FASTA sequence file (.a3m) – sequence alignment for hits in .hhr file

 

Example: Using output from Model Angelo to find sequence matches with HHBlits

After running Model Angelo, you will have .hhm files per chain. An example from KIFBP run in Model Angelo shows this result for one of the chains:

After running HHBlits using UniClust30, the .hhr file has the following list of proteins:

It’s hard to see, but in this file but the top hits are all KIFBP from different species, indicating that the approach worked at finding the type of protein.

Reference

Remmert, M., Biegert, A., Hauser, A. et al. HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat Methods 9, 173–175 (2012). https://doi.org/10.1038/nmeth.1818