Thursday, July 22, 2021

AlphaFold structure predictions freely available in UniProt

Today marks a historic moment in the world of molecular structural research. AlphaFold structure predictions covering the proteomes of 21 species have been made freely available to the scientific community, and we are delighted to announce that they are also accessible through the UniProt website and the Protvista protein viewer. AlphaFold is an Artificial Intelligence (AI) system developed by DeepMind that predicts the three-dimensional (3D) structure from an amino-acid sequence.

Currently, the Protein Data Bank (PDB) contains over 180,000 macromolecular structures which cover ~55,000 UniProt Knowledgebase (UniProtKB) proteins. There are ~7,300 human UniProt proteins with PDB structures and AlphaFold predicts structures for 20,610. This covers most of the human reference proteome! With more than 220 million proteins in UniProtKB today, AlphaFold represents a great opportunity to predict structures for millions of proteins in thousands of species. The availability of large numbers of predicted models can provide important insights on the function of proteins and their role in the species biology. They can be used to kick-start experimental de novo structure determination.

In November 2020, AlphaFold was recognised as the best-performing method for predicting 3D protein structure by the assessors of the 14th CASP experiment. The best-predicted 95% of residues in AlphaFold models had a median alpha carbon RMSD of 0.96 Å to the experimental models, compared to 2.83 Å for the next-best method. Thus, the AlphaFold predictions were very similar to the experimentally determined structures of the proteins included in this round of CASP. The AlphaFold prediction also includes a predicted model-quality score for individual residues indicating regions of high quality and those where the model is probably less reliable. 

And the future is looking bright. In the coming months AlphaFold will be expanded to cover a large proportion of all catalogued proteins in the UniProt, UniRef90 clusters. This means that for every known sequence in the UniProt data resource there will be either an experimentally determined structure in PDB, or an AlphaFold model in the AlphaFold database. This development represents a step-change for molecular biology - for the first time in history, for almost every protein of known sequence, a high-quality 3D model will be readily available. 

Open data resources like PDB and UniProt have been key in the successful development of AlphaFold. AlphaFold structure predictions freely accessible to the scientific community is a success story for academia-industry collaboration and for open science. 


Monday, July 5, 2021

Switching off disease?


Our bodies are living economies.

But unusual economies, because their component cells are interdependent - they all have the same business plan and work towards shared goals. Their employees are also unusual, because they are mainly proteins, and are the focus of our work at UniProt.


The shared goal of each cell in an organism is to maintain the function of the body. To reach this goal, each individual cell has a specific purpose which depends upon where each cell belongs and what it should do. For example:


     Does a cell need to pick up oxygen from the lungs?

     Or does it need to make insulin, so that it can send signals to other cells to use sugar for energy?

     Or does it need to send electrical messages to muscle cells, telling them to move...

The genome; the business plan for the cell

The business plan (the genome), consists of thousands of genes encoded by DNA molecules kept in the company head office - the cell nucleus. Genes describe the main building blocks of the cell, proteins, and how they should be made.


Proteins are responsible for creating the components of cells and bodies - including fats, sugars and even other proteins. These products may be used locally, inside the cell, or exported.


However, there is a big logistical challenge for every cell. Our genetic material, DNA, stays inside the nucleus so how does it communicate its 'memos' or instructions with the rest of the cell, telling it which proteins to make?


The answer, as in so many things, lies in sending the right message, at the right time.

Transcription factors; the business managers of the cell

Messages from the nucleus to the rest of the cell are made by proteins, known as transcription factors, that copy (transcribe) the information in genes into dedicated messenger RNA molecules.  RNA is related to DNA, but shorter-lived, and can leave the nucleus and carry memos to the protein synthesis machinery. Transcription factors work as a team, and include generalists, needed to transcribe any gene, and specialists, which ensure that the correct genes are transcribed  in a  particular cell at a specific moment in time. Generalist transcription factors are active in almost all cells and work with the basic cellular machinery to make messenger RNAs. Specialist transcription factors are required to help make decisions such as: which genes need to be switched on in a muscle cell when the muscle is active? Or in a cell that is responding to a viral invasion ? Or when an egg has just been fertilized by a sperm in order to produce a new human? Or, any other process which requires a gene to be switched on, or off.


Diseases are often the result of errors in the genome (business plan) that generate faulty products (proteins). As transcription factors are like molecular switches, turning genes on or off, they could potentially be used to turn off a damaged gene in a genetic disease.

Transcription factors as drug targets?

It’s early days, but a transcription factor could be a promising new target for treating the crippling genetic illness, sickle cell disease (SCD). Starting in childhood, people with sickle cell disease suffer with chronic pain, recurrent infections, shortness of breath and sometimes strokes. The condition is caused by mutations in one of the proteins that carry oxygen around the body inside red blood cells, known as beta-globin, which cause it to become abnormally sticky. The name of the disease comes from its effect on the red blood cells, which become sickle-shaped (like a crescent moon) and block small vessels, like the capillaries, stopping oxygen getting to the tissues and sometimes causing vessels to burst. Most treatments only reduce the symptoms, but don’t prevent the disease. The only true therapy involves a stem cell transplant, and if successful, people spend the rest of their lives dealing with the side-effects of immune-suppressing drugs.

However, back in 1948, it was discovered that a few people with sickle disease had much milder symptoms if their cells produced a protein related to beta-globin, called gamma-globin (also known as hemoglobin subunit gamma-1). Gamma-globin reduces symptoms in sickle cell disease by reducing the stickiness of the abnormal beta-globin. In 2008, researchers made the exciting, and rather surprising, discovery that the gamma-globin gene can be switched on by turning off the transcription factor BCL11A.
So, if we could switch off BCL11A in the cells which develop into red blood cells, we could potentially develop a new treatment for this devastating illness. And there is now evidence that this approach may work...

In January 2021, a small clinical trial of a drug that switches off BCL11A demonstrated safety, and both an increase in gamma-globin in red blood cells and an apparent reduction in complications of sickle cell disease. Longer-term and larger clinical trials are in progress.


References and additional reading:


Esrick EB, Lehmann LE, Biffi A, Achebe M, Brendel C, Ciuculescu MF, et al. Post-Transcriptional Genetic Silencing of BCL11A to Treat Sickle Cell Disease. NEJM. 2021 Jan 21;384:205-215. doi: 10.1056/NEJMoa2029392 PMID: 33283990


Anholt RRH, O'Grady P, Wolfner MF, Harbison ST. Human fetal hemoglobin expression is regulated by the developmental stage-specific repressor BCL11A. Science. 2008 Dec 19;322(5909):1839-42. doi: 10.1126/science.1165409. PMID: 19056937


Orkin SH and Bauer DE. Emerging Genetic Therapy for Sickle Cell Disease. Annu Rev Med. 2019 Jan 27;70:257-271. doi: 10.1146/annurev-med-041817-125507. PMID: 30355263