AlphaFold2: Highly accurate protein structure prediction
AlphaFold2 leverages multiple sequence alignments and neural networks to predict protein structures. COSMIC² offers the full AlphaFold2 software package for use by the structural biology community. ColabFold is a Google Colab notebook that runs jobs on Google Cloud machines and is likely faster for you to get results for small proteins with large sequence coverage.
New to AlphaFold? Check out this great set of presentations from EMBL-EBI training: How to interpret AlphaFold structures.
When should you use AlphaFold2 on COSMIC²?
Given the convenient ColabFold notebook, when should you use AlphaFold2 on COSMIC2?
- To leverage larger sequence databases to help with sequences with fewer homologs
- To run AlphaFold2 on larger proteins that may timeout on Google Colab notebooks
- To compare the complete AlphaFold2 package versus other software (instead of adapted versions)
Running AlphaFold2 on COSMIC²
We run default AlphaFold2 parameters for all jobs, which includes using Amber for relaxing predicted PDB models.
Input: a FASTA protein sequence file containing your sequence of interest.
- Upload data via browser upload (not Globus!)
- Database: [Default = full_dbs]
- We provide users with the choice of which database to use for prediction. The default is the full database (“full_dbs”). Reduced databases (“reduced_dbs”) are provided for speed but may have a loss of accuracy. “casp14” refers to the database utilized during the CASP14 prediction competition.
- Generate predicted template modeling (pTM) score: [Default = True]
- When set to True, this will calculate the pTM score for a given prediction. pTM provides a measure of the error for the predicted structure in 3D and a plot will be generated showing the predicated aligned error for each predicted structure.
- Skip calculating relaxed models: [Default = True].
- When set to True, this will skip running AMBER molecular dynamics simulations to relax amino acid side chain position. AMBER relaxation takes additional time that is usually not needed. AMBER relaxed models are required for users who need accurate side-chain positions (e.g., phasing X-ray diffraction datasets). Most users do not need this performed.
Checking job status
After you submit your job, you can check your job status by clicking on the hyperlink ‘List’ which is next to the label ‘Intermediate Results.’ This will open a new window that lists the current job directory to show you files as they are generated. Watch a video here.
AlphaFold2 outputs on COSMIC²
We provide individual files for download on the output page as well the full output from AlphaFold2 (“output.tar.gz”).
Individual files available for download:
- Relaxed predicted PDB models
- AMBER relaxed models
- Per-residue confidence score (pLDDT)
- Plot provided for each predicted model ending with suffix “_plddt.png”
- Per residue values provided in text file “_plddt.txt”
- Predicated aligned error (PAE) if using pTM option for job submission
- Provides estimate of 3D confidence of predicted structure
- Provided for each predicted model ending with suffix “_PAE.png”
How to interpret AlphaFold2 outputs
Example target sequence:
> test job PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASK
Runtime: 26 minutes.
First, check the ranking of predicted models in ranking_debug.json to see which model has the highest score (as ranked by pLDDT). You can assess the quality of prediction by looking at the pLDDT file and predicted aligned error (if you used the option pTM).
What you can see is that the C-terminus of this predicted structure has low confidence for pLDDT in addition to low confidence for the predicted aligned error (PAE). The PAE plot tells you that AlphaFold2 has low confidence for the 3D position of amino acids 58 & 59 relative to the rest of the molecule.
These files will be populated into the output page for each model. Note you can also download the entire AlphaFold result as a compressed file.