AlphaFold2: Highly accurate protein structure prediction

AlphaFold2 leverages multiple sequence alignments and neural networks to predict protein structures. COSMIC² offers the full AlphaFold2 software package for use by the structural biology community.

May 2023 update:

  • We have merged previous jobs AlphaFold2 and AlphaFold Multimer into a single task option on COSMIC2. To run multimer, select ‘multimer’ from the model choice
  • We are now running Alphafold v2.3.1, which can predict up to ~5500 amino acids.

New to AlphaFold? Check out this great set of presentations from EMBL-EBI training: How to interpret AlphaFold structures.

Outline

  1. When should you use AlphaFold2 on COSMIC2?
  2. Running AlphaFold2 on COSMIC2
  3. AlphaFold2 – Monomer
  4. AlphaFold2 – Multimer

When should you use AlphaFold2 on COSMIC²?

Given the convenient ColabFold notebook, when should you use AlphaFold2 on COSMIC2?

  • To leverage larger sequence databases to help with sequences with fewer homologs
  • To run AlphaFold2 on larger proteins that may timeout on Google Colab notebooks
  • To compare the complete AlphaFold2 package versus other software (instead of adapted versions)

Running AlphaFold2 on COSMIC²

We run default AlphaFold2 parameters for all jobs, which includes using Amber for relaxing predicted PDB models.

Access COSMIC2: https://cosmic2.sdsc.edu

Input: a FASTA protein sequence file containing your sequence of interest. To predict multisubunit complexes, include multiple chains (see examples below).

  • Upload data via browser upload (not Globus!)

Options:

  • Number of predictions per model: [Default = 5]
    • Indicate how many models you would like generated during prediction.
  • Database: [Default = full_dbs]
    • We provide users with the choice of which database to use for prediction. The default is the full database (“full_dbs”). Reduced databases (“reduced_dbs”) are provided for speed but may have a loss of accuracy.
  • Model: [Default = monomer_ptm]
    • Indicate which model to use for prediction:
      • monomer – single chain prediction, no 3D confidence score (PAE)
      • monomer_ptm – single chain prediction and outputs 3D confidence PAE score. From DeepMind: “Slightly less accurate than monomer”
      • multimer – multi-chain prediction
  • Latest date (YYYY-mm-dd) to use for template search (if using templates): [Default = 2023-03-26]
    • If using PDB templates, latest date for use.
  • Skip calculating relaxed models: [Default = True].
    • When set to True, this will skip running AMBER molecular dynamics simulations to relax amino acid side chain position. AMBER relaxation takes additional time that is usually not needed. AMBER relaxed models are required for users who need accurate side-chain positions (e.g., phasing X-ray diffraction datasets). Most users do not need this performed.

Preparing and submitting a prediction job on COSMIC2

Checking job status

After you submit your job, you can check your job status by clicking on the hyperlink ‘List’ which is next to the label ‘Intermediate Results.’ This will open a new window that lists the current job directory to show you files as they are generated. Watch a video here.

AlphaFold2 outputs on COSMIC²

We provide individual files for download on the output page as well the full output from AlphaFold2 (“output.tar.gz”).

Individual files available for download:

  • Relaxed predicted PDB models
    • AMBER relaxed models
  • Per-residue confidence score (pLDDT)
    • Plot provided for each predicted model ending with suffix “_plddt.png”
    • Per residue values provided in text file “_plddt.txt”
  • Predicated aligned error (PAE) if using pTM option for job submission
    • Provides estimate of 3D confidence of predicted structure
    • Provided for each predicted model ending with suffix “_PAE.png”

AlphaFold2 – Monomer

Example target sequence:

> test job
PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASK

Runtime: 26 minutes.

First, check the ranking of predicted models in ranking_debug.json to see which model has the highest score (as ranked by pLDDT). You can assess the quality of prediction by looking at the pLDDT file and predicted aligned error (if you used the option pTM).

What you can see is that the C-terminus of this predicted structure has low confidence for pLDDT in addition to low confidence for the predicted aligned error (PAE). The PAE plot tells you that AlphaFold2 has low confidence for the 3D position of amino acids 58 & 59 relative to the rest of the molecule.

These files will be populated into the output page for each model. Note you can also download the entire AlphaFold result as a compressed file.

Citation:

Highly accurate protein structure prediction with AlphaFold. Jumper et al. Nature. 2021 Aug;596(7873):583-589.

AlphaFold2 Multimer

Example target sequences for a leucine zipper homodimer:

> chain a
XRMKQLEDKVEELLSKNYHLENEVARLKKLVGER
> chain b
XRMKQLEDKVEELLSKNYHLENEVARLKKLVGER

Runtime: 26 minutes.

First, check the ranking of predicted models in ranking_debug.json to see which model has the highest score (as ranked by pLDDT). You can assess the quality of prediction by looking at the pLDDT file. Shown here is the top-scoring model (left) and the associated pLDDT plot (right):

 

The atomic model is colored from N- to C-termini (Blue to Red) and shown as a homodimer. For the pLDDT plot (right), the first subunit corresponds to amino acids residue number from 1 – 34 and the second subunit is 35 – 68. You can see that the N- and C-termini have the lowest score .

These files will be populated into the output page for each model. Note you can also download the entire AlphaFold result as a compressed file.

Citation:

Protein complex prediction with AlphaFold-Multimer. bioRxiv 2021.10.04.463034