ColabFold – Making protein folding accessible to all

We have implemented ColabFold on COSMIC2 to run AlphaFold predictions for single or multi-subunit complexes. ColabFold is faster than AlphaFold and gives slightly different results given that they use different sequence alignment steps.

New to AlphaFold? Check out this great set of presentations from EMBL-EBI training: How to interpret AlphaFold structures.

ColabFold Tutorial presented at the Boston Protein Design and Modeling Club: Video, Slides

When should you use ColabFold on COSMIC²?

Given the convenient ColabFold notebook, when should you use ColabFold on COSMIC2?

  • To run larger sequences than Google Colab allows
  • To be able to walk away from the job (instead of keeping tabs open)
  • To run AlphaFold with templates

Running ColabFold on COSMIC²

Access COSMIC2: https://cosmic2.sdsc.edu

Input: a FASTA protein sequence file containing your single OR multiple sequences of interest.

  • Upload data via browser upload (not Globus!)

Number of Models Increasing number of models can help with challenging models.
Number of Recycles Increasing recycles can help generate higher confidence models.
Use Amber relaxation? Perform MD-based relaxation of the model. Not usually needed, but important for using outputs in molecular replacement.
Use templates? Will download templates for best matching sequences if they have a PDB associated.

By default, all sequences are folded simultaneously.

Preparing and submitting a prediction job to COSMIC2

Checking job status

After you submit your job, you can check your job status by clicking on the hyperlink ‘List’ which is next to the label ‘Intermediate Results.’ This will open a new window that lists the current job directory to show you files as they are generated. Watch a video here.

ColabFold outputs on COSMIC²

We provide individual files for download on the output page as well as the full output from ColabFold (“output.tar.gz”).

Individual files available for download:

  • Per-residue confidence score (pLDDT)
    • Plot provided for each predicted model ending with suffix “_plddt.png”
    • Per residue values provided in text file “_plddt.txt”
  • PDB output files
    • Five multimer predicted files ending with suffix “_multimer.pdb”

How to interpret ColabFold outputs

Example target sequences for a leucine zipper homodimer:

> chain a
XRMKQLEDKVEELLSKNYHLENEVARLKKLVGER
> chain b
XRMKQLEDKVEELLSKNYHLENEVARLKKLVGER

Runtime: 14 minutes.

First, check the sequence coverage (_coverage.png) and scoring of the models (_plddt.png). Also, look at the 3D confidence for the models by looking at the predicted aligned error (_PAE.png). Shown here is the top-scoring model (left) and the associated pLDDT plot (right):

 

The atomic model is colored from N- to C-termini (Blue to Red) and shown as a homodimer. For the pLDDT plot (right), the first subunit corresponds to amino acids residue numbers from 1 – 34 and the second subunit is 35 – 68. You can see that the N- and C-termini have the lowest score.

These files will be populated into the output page for each model. Note you can also download the entire ColabFold result as a compressed file.

Citation:

 

ColabFold – Making protein folding accessible to all
Milot Mirdita, Konstantin Schütze, Yoshitaka Moriwaki, Lim Heo, Sergey Ovchinnikov, Martin Steinegger
bioRxiv 2021.08.15.456425; doi: https://doi.org/10.1101/2021.08.15.456425