AlphaFold Multimer: Protein complex prediction

AlphaFold Multimer is an extension of AlphaFold2 that has been specifically built to predict protein-protein complexes. We recommend starting with ColabFold as it may be faster for you to get started. However, since ColabFold runs on Google Colab notebook, there are memory limitations that make running AlphaFold Multimer challenging.

New to AlphaFold? Check out this great set of presentations from EMBL-EBI training: How to interpret AlphaFold structures.

When should you use AlphaFold2 on COSMIC²?

Given the convenient ColabFold notebook, when should you use AlphaFold2 on COSMIC2?

  • To leverage larger sequence databases to help with sequences with fewer homologs
  • To run AlphaFold Multimer on larger proteins that may timeout on Google Colab notebooks
  • To compare the complete AlphaFold Multimer package versus other software (instead of adapted versions)

Running AlphaFold Multimer on COSMIC²

We run default AlphaFold2 parameters for all jobs, which includes using Amber for relaxing predicted PDB models.

Input: a FASTA protein sequence file containing your multiple sequences of interest. Since this is multimer, please include all sequences you would like to fold together.

  • Upload data via browser upload (not Globus!)

By default, all sequences are folded simultaneously with AlphaFold Multimer. We will incorporate additional modes of folding sequences in future releases such as folding one protein and then the other, etc.

Checking job status

After you submit your job, you can check your job status by clicking on the hyperlink ‘List’ which is next to the label ‘Intermediate Results.’ This will open a new window that lists the current job directory to show you files as they are generated. Watch a video here.

AlphaFold Multimer outputs on COSMIC²

We provide individual files for download on the output page as well the full output from AlphaFold Multimer (“output.tar.gz”).

Individual files available for download:

  • Per-residue confidence score (pLDDT)
    • Plot provided for each predicted model ending with suffix “_plddt.png”
    • Per residue values provided in text file “_plddt.txt”
  • Multimer output files
    • Five multimer predicted files ending with suffix “_multimer.pdb”

How to interpret AlphaFold Multimer outputs

Example target sequences for a leucine zipper homodimer:

> chain a
XRMKQLEDKVEELLSKNYHLENEVARLKKLVGER
> chain b
XRMKQLEDKVEELLSKNYHLENEVARLKKLVGER

Runtime: 26 minutes.

First, check the ranking of predicted models in ranking_debug.json to see which model has the highest score (as ranked by pLDDT). You can assess the quality of prediction by looking at the pLDDT file. Shown here is the top-scoring model (left) and the associated pLDDT plot (right):

 

The atomic model is colored from N- to C-termini (Blue to Red) and shown as a homodimer. For the pLDDT plot (right), the first subunit corresponds to amino acids residue number from 1 – 34 and the second subunit is 35 – 68. You can see that the N- and C-termini have the lowest score .

These files will be populated into the output page for each model. Note you can also download the entire AlphaFold result as a compressed file.

Citation:

Protein complex prediction with AlphaFold-Multimer. bioRxiv 2021.10.04.463034