Model Angelo – Automated model building into cryo-EM reconstructions

Model Angelo uses deep learning to build atomic models into cryo-EM density maps.  Model Angelo combines chain tracing, amino acid geometry, and cryo-EM density to build reliable starting models for subsequent refinement.

There are two modes to use Model Angelo:

  1. Build known sequence into map
    • Given a FASTA sequence, build atomic model
  2. Identify sequence in map
    • Given a map, output a sequence probability

Inputs:

  • 3D reconstruction
    • Sharpened map into which the model will be built
  • Optional: FASTA sequence
    • Sequence for proteins to be built
  • Optional: Mask for 3D reconstruction
    • To remove density that should be ignored during model building]
  • Optional: (Advanced) Build model without input sequence – Yes/No

Outputs:

  • output/output.cif
    • Full, final output
  • output/output_raw.cif
    • Shows all sequence regions built into the map. May be helpful for hard-to-build or low-confidence areas, providing a starting point for further model refinement
  • output/hmm_profiles
    • Hidden Markov Model files for identifying sequences in map using HHBlits

Example #1: Building known sequence into density map

Here is a map of kinesin-binding protein (EMD -24677) that we will build the following FASTA sequence into:

FASTA sequence:

sp|Q96EK5|KBP_HUMAN KIF-binding protein OS=Homo sapiens OX=9606 GN=KIFBP PE=1 SV=1
MANVPWAEVCEKFQAALALSRVELHKNPEKEPYKSKYSARALLEEVKALLGPAPEDEDER
PEAEDGPGAGDHALGLPAEVVEPEGPVAQRAVRLAVIEFHLGVNHIDTEELSAGEEHLVK
CLRLLRRYRLSHDCISLCIQAQNNLGILWSEREEIETAQAYLESSEALYNQYMKEVGSPP
LDPTERFLPEEEKLTEQERSKRFEKVYTHNLYYLAQVYQHLEMFEKAAHYCHSTLKRQLE
HNAYHPIEWAINAATLSQFYINKLCFMEARHCLSAANVIFGQTGKISATEDTPEAEGEVP
ELYHQRKGEIARCWIKYCLTLMQNAQLSMQDNIGELDLDKQSELRALRKKELDEEESIRK
KAVQFGTGELCDAISAVEEKVSYLRPLDFEEARELFLLGQHYVFEAKEFFQIDGYVTDHI
EVVQDHSALFKVLAFFETDMERRCKMHKRRIAMLEPLTVDLNPQYYLLVNRQIQFEIAHA
YYDMMDLKVAIADRLRDPDSHIVKKINNLNKSALKYYQLFLDSLRDPNKVFPEHIGEDVL
RPAMLAKFRVARLYGKIITADPKKELENLATSLEHYKFIVDYCEKHPEAAQEIEVELELS
KEMVSLLPTKMERFRTKMALT

Running the 3D map + FASTA Sequence in Model Angelo on COSMIC2 provides the following output:

Looking at output/output.cif superimposed into the map, you can see it did an OK job of finding helices:

Example #2: Building unknown sequence into density map

For this example, we will run Model Angelo using the KIFBP map above but without input sequence. To do this, make sure to check the box that says “Build model without input sequence” under the advanced settings.

After running, you will see the following output on COSMIC2:

The output/output.cif file shows a variety of segments identified by Model Angelo:

Note: you will see these different segments named ‘AH’ or ‘a’ as different chains in ChimeraX.

You can use Hidden Markov Model files to identify potential sequences in existing databases. These files are named according the chain name seen in ChimeraX. You can find these files by downloading the full output.zip file, un-compressing, and looking in the folder: output/hmm_profile. You can use these .hhm files to run HHBlits.

Reference

ModelAngelo: Automated Model Building in Cryo-EM Maps. Jimali, Kimanius, Scheres. arXiv:2210.00006