Thursday, July 22, 2021

AlphaFold structure predictions freely available in UniProt

Today marks a historic moment in the world of molecular structural research. AlphaFold structure predictions covering the proteomes of 21 species have been made freely available to the scientific community, and we are delighted to announce that they are also accessible through the UniProt website and the Protvista protein viewer. AlphaFold is an Artificial Intelligence (AI) system developed by DeepMind that predicts the three-dimensional (3D) structure from an amino-acid sequence.


Currently, the Protein Data Bank (PDB) contains over 180,000 macromolecular structures which cover ~55,000 UniProt Knowledgebase (UniProtKB) proteins. There are ~7,300 human UniProt proteins with PDB structures and AlphaFold predicts structures for 20,610. This covers most of the human reference proteome! With more than 220 million proteins in UniProtKB today, AlphaFold represents a great opportunity to predict structures for millions of proteins in thousands of species. The availability of large numbers of predicted models can provide important insights on the function of proteins and their role in the species biology. They can be used to kick-start experimental de novo structure determination.

In November 2020, AlphaFold was recognised as the best-performing method for predicting 3D protein structure by the assessors of the 14th CASP experiment. The best-predicted 95% of residues in AlphaFold models had a median alpha carbon RMSD of 0.96 Å to the experimental models, compared to 2.83 Å for the next-best method. Thus, the AlphaFold predictions were very similar to the experimentally determined structures of the proteins included in this round of CASP. The AlphaFold prediction also includes a predicted model-quality score for individual residues indicating regions of high quality and those where the model is probably less reliable. 

And the future is looking bright. In the coming months AlphaFold will be expanded to cover a large proportion of all catalogued proteins in the UniProt, UniRef90 clusters. This means that for every known sequence in the UniProt data resource there will be either an experimentally determined structure in PDB, or an AlphaFold model in the AlphaFold database. This development represents a step-change for molecular biology - for the first time in history, for almost every protein of known sequence, a high-quality 3D model will be readily available. 

Open data resources like PDB and UniProt have been key in the successful development of AlphaFold. AlphaFold structure predictions freely accessible to the scientific community is a success story for academia-industry collaboration and for open science. 

 

No comments:

Post a Comment