AlphaFold2: Highly accurate protein structure prediction

AlphaFold2 leverages multiple sequence alignments and neural networks to predict protein structures. COSMIC² offers the full AlphaFold2 software package for use by the structural biology community. ColabFold is a Google Colab notebook that runs jobs on Google Cloud machines and is likely faster for you to get results for small proteins with large sequence coverage.

New to AlphaFold? Check out this great set of presentations from EMBL-EBI training: How to interpret AlphaFold structures.

When should you use AlphaFold2 on COSMIC²?

Given the convenient ColabFold notebook, when should you use AlphaFold2 on COSMIC2?

  • To leverage larger sequence databases to help with sequences with fewer homologs
  • To run AlphaFold2 on larger proteins that may timeout on Google Colab notebooks
  • To compare the complete AlphaFold2 package versus other software (instead of adapted versions)

Running AlphaFold2 on COSMIC²

We run default AlphaFold2 parameters for all jobs, which includes using Amber for relaxing predicted PDB models.

Input: a FASTA protein sequence file containing your sequence of interest.

  • Upload via browser upload

Options:

  • Database: [Default = full_dbs]
    • We provide users the choice of which database to use for prediction. The default is the full database (“full_dbs”). Reduced databases (“reduced_dbs”) are provided for speed but may have a loss of accuracy. “casp14” refers to the database utilized during the CASP14 prediction competition.
  • Generate predicted template modeling (pTM) score: [Default = True]

Checking job status

After you submit your job, you can check your job status by clicking on the hyperlink ‘List’ which is next to the label ‘Intermediate Results.’ This will open a new window that lists the current job directory to show you files as they are generated. Watch a video here.

AlphaFold2 outputs on COSMIC²

We provide individual files for download on the output page as well the full output from AlphaFold2 (“output.tar.gz”).

Individual files available for download:

  • Relaxed predicted PDB models
    • AMBER relaxed models
  • Per-residue confidence score (pLDDT)
    • Plot provided for each predicted model ending with suffix “_plddt.png”
    • Per residue values provided in text file “_plddt.txt”
  • Predicated aligned error (PAE) if using pTM option for job submission
    • Provides estimate of 3D confidence of predicted structure
    • Provided for each predicted model ending with suffix “_PAE.png”

How to interpret AlphaFold2 outputs

Example target sequence:

> test job
PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASK

Runtime: 26 minutes.

First, check the ranking of predicted models in ranking_debug.json to see which model has the highest score (as ranked by pLDDT). You can assess the quality of prediction by looking at the pLDDT file and predicted aligned error (if you used the option pTM).

What you can see is that the C-terminus of this predicted structure has low confidence for pLDDT in addition to low confidence for the predicted aligned error (PAE). The PAE plot tells you that AlphaFold2 has low confidence for the 3D position of amino acids 58 & 59 relative to the rest of the molecule.

These files will be populated into the output page for each model. Note you can also download the entire AlphaFold result as a compressed file.

Citation:

Highly accurate protein structure prediction with AlphaFold. Jumper et al. Nature. 2021 Aug;596(7873):583-589.