Thursday, November 25, 2021

Navigating the use of tools on the new UniProt website


One of the biggest changes we have made on the new UniProt website is to improve your

 usage of the tools associated with our data - BLAST, Align, ID mapping and Peptide 


You can easily access all of these from the homepage of the new website,






 or from the toolbar of any entry.


When you open BLAST, you can search by on any UniProtKB ID (e.g. P05067 or 

A4_HUMAN) or a UniParc protein sequence archive ID (e.g. UPI0000000001) to

autofill the protein sequence. One of the biggest changes we have made is to allow 

you to select the species you wish to BLAST against, so if you are only interested in

dog (TaxID:9615), start to type the common (dog) or species (Canis lupus familiaris) 

name and the autocomplete will help you select the correct taxonomic group 

(TaxID:9615), to search for a match. You can choose to limit your search to any 

level of the taxonomic tree, so you can search your dog sequence against, for example,

all mammalian sequences by using the appropriate taxonomic identifier (TaxID:40674).

You can also upload a list of sequences to BLAST, up to a maximum of 20, from an 

external file. You can reset the BLAST parameters, should you wish, and name your 

job so you can easily identify when you return in up to a week's time.

Once your job has run,, it will appear in the new Tool Results view. Here you can see 

all of your  jobs that have completed over the last 7 days, identified by the names you 

have chosen to give them. 

When you open your job, we have increased the number of ways you can view the results.

 For example, in the case of a Sequence Alignment, you can take a look at the traditional 

view of the aligned amino acids, with a summary of any one of the entries you select acting 

as a reference sequence below, or you can see a phylogenetic view or a percent identity 


Loading data from a file (up to a maximum of 100 sequences) and restricting the search

criteria by species are features also available in the Peptide Search tool, where users, 

for example proteomics scientists, can match peptide to protein, and an increasing 

number of database identifiers are now available in the ID mapping tool for users to 

switch between.

We hope you find this first look at the new beta version of the UniProt website useful, 

and easy to navigate. Please let us know if you have any problems at all, or if you can 

think of any way in which it can be further improved by using the contact form accessible 

from every page.

Wednesday, October 27, 2021

A New Look to UniProt


The UniProt Consortium is happy to announce a new look and feel to our web pages, designed to improve the user experience and enhance your journey of discovery through the world of protein science. We have only released the beta version of the site so far, but please take time to explore and send us your feedback - we will be working hard over the next few months to improve the site, ready for it to replace our current website in 2022.

Here are a few examples to get you started, but please make use of the entire website and let us know where we can still make further improvements.

You can find your favorite protein with a simple search - for example, by typing ‘hemoglobin’ or a function description 'apoptosis' or even a disease name such as ‘cystic fibrosis’ in the search box.


This brings up many hits in our new ‘card view’ but it is easy to filter your search results down to more manageable numbers by selecting a species or by limiting the results to reviewed, expertly curated entries using the left-hand menu.

The card view gives a quick overview of what data you can find in each entry - the CFTR protein has 210 reviewed variants and 38 3D structures, for example. The entries are ranked by the amount of content, so you will always see those with the most information at the top of your search.

We have embedded our ProtVista feature viewer directly into the entry so that you can both see and read about protein sequence features at the same time.

You can also see the structured Gene Ontology annotation in a summarized ribbon view, developed by the Alliance for Genomic Resources.

You can view protein 3D structures - both experimental (when available) or as predicted by AlphaFold (

And take a look at the subcellular location of your protein, pictured using SwissBioPics.


We hope you find this first look at the new UniProt website useful, and easy to navigate. Please let us know if you have any problems at all, or if you can think of any way in which it can be further improved by using the contact form accessible from every page.

Thursday, July 22, 2021

AlphaFold structure predictions freely available in UniProt

Today marks a historic moment in the world of molecular structural research. AlphaFold structure predictions covering the proteomes of 21 species have been made freely available to the scientific community, and we are delighted to announce that they are also accessible through the UniProt website and the Protvista protein viewer. AlphaFold is an Artificial Intelligence (AI) system developed by DeepMind that predicts the three-dimensional (3D) structure from an amino-acid sequence.

Currently, the Protein Data Bank (PDB) contains over 180,000 macromolecular structures which cover ~55,000 UniProt Knowledgebase (UniProtKB) proteins. There are ~7,300 human UniProt proteins with PDB structures and AlphaFold predicts structures for 20,610. This covers most of the human reference proteome! With more than 220 million proteins in UniProtKB today, AlphaFold represents a great opportunity to predict structures for millions of proteins in thousands of species. The availability of large numbers of predicted models can provide important insights on the function of proteins and their role in the species biology. They can be used to kick-start experimental de novo structure determination.

In November 2020, AlphaFold was recognised as the best-performing method for predicting 3D protein structure by the assessors of the 14th CASP experiment. The best-predicted 95% of residues in AlphaFold models had a median alpha carbon RMSD of 0.96 Å to the experimental models, compared to 2.83 Å for the next-best method. Thus, the AlphaFold predictions were very similar to the experimentally determined structures of the proteins included in this round of CASP. The AlphaFold prediction also includes a predicted model-quality score for individual residues indicating regions of high quality and those where the model is probably less reliable. 

And the future is looking bright. In the coming months AlphaFold will be expanded to cover a large proportion of all catalogued proteins in the UniProt, UniRef90 clusters. This means that for every known sequence in the UniProt data resource there will be either an experimentally determined structure in PDB, or an AlphaFold model in the AlphaFold database. This development represents a step-change for molecular biology - for the first time in history, for almost every protein of known sequence, a high-quality 3D model will be readily available. 

Open data resources like PDB and UniProt have been key in the successful development of AlphaFold. AlphaFold structure predictions freely accessible to the scientific community is a success story for academia-industry collaboration and for open science. 


Monday, July 5, 2021

Switching off disease?


Our bodies are living economies.

But unusual economies, because their component cells are interdependent - they all have the same business plan and work towards shared goals. Their employees are also unusual, because they are mainly proteins, and are the focus of our work at UniProt.


The shared goal of each cell in an organism is to maintain the function of the body. To reach this goal, each individual cell has a specific purpose which depends upon where each cell belongs and what it should do. For example:


     Does a cell need to pick up oxygen from the lungs?

     Or does it need to make insulin, so that it can send signals to other cells to use sugar for energy?

     Or does it need to send electrical messages to muscle cells, telling them to move...

The genome; the business plan for the cell

The business plan (the genome), consists of thousands of genes encoded by DNA molecules kept in the company head office - the cell nucleus. Genes describe the main building blocks of the cell, proteins, and how they should be made.


Proteins are responsible for creating the components of cells and bodies - including fats, sugars and even other proteins. These products may be used locally, inside the cell, or exported.


However, there is a big logistical challenge for every cell. Our genetic material, DNA, stays inside the nucleus so how does it communicate its 'memos' or instructions with the rest of the cell, telling it which proteins to make?


The answer, as in so many things, lies in sending the right message, at the right time.

Transcription factors; the business managers of the cell

Messages from the nucleus to the rest of the cell are made by proteins, known as transcription factors, that copy (transcribe) the information in genes into dedicated messenger RNA molecules.  RNA is related to DNA, but shorter-lived, and can leave the nucleus and carry memos to the protein synthesis machinery. Transcription factors work as a team, and include generalists, needed to transcribe any gene, and specialists, which ensure that the correct genes are transcribed  in a  particular cell at a specific moment in time. Generalist transcription factors are active in almost all cells and work with the basic cellular machinery to make messenger RNAs. Specialist transcription factors are required to help make decisions such as: which genes need to be switched on in a muscle cell when the muscle is active? Or in a cell that is responding to a viral invasion ? Or when an egg has just been fertilized by a sperm in order to produce a new human? Or, any other process which requires a gene to be switched on, or off.


Diseases are often the result of errors in the genome (business plan) that generate faulty products (proteins). As transcription factors are like molecular switches, turning genes on or off, they could potentially be used to turn off a damaged gene in a genetic disease.

Transcription factors as drug targets?

It’s early days, but a transcription factor could be a promising new target for treating the crippling genetic illness, sickle cell disease (SCD). Starting in childhood, people with sickle cell disease suffer with chronic pain, recurrent infections, shortness of breath and sometimes strokes. The condition is caused by mutations in one of the proteins that carry oxygen around the body inside red blood cells, known as beta-globin, which cause it to become abnormally sticky. The name of the disease comes from its effect on the red blood cells, which become sickle-shaped (like a crescent moon) and block small vessels, like the capillaries, stopping oxygen getting to the tissues and sometimes causing vessels to burst. Most treatments only reduce the symptoms, but don’t prevent the disease. The only true therapy involves a stem cell transplant, and if successful, people spend the rest of their lives dealing with the side-effects of immune-suppressing drugs.

However, back in 1948, it was discovered that a few people with sickle disease had much milder symptoms if their cells produced a protein related to beta-globin, called gamma-globin (also known as hemoglobin subunit gamma-1). Gamma-globin reduces symptoms in sickle cell disease by reducing the stickiness of the abnormal beta-globin. In 2008, researchers made the exciting, and rather surprising, discovery that the gamma-globin gene can be switched on by turning off the transcription factor BCL11A.
So, if we could switch off BCL11A in the cells which develop into red blood cells, we could potentially develop a new treatment for this devastating illness. And there is now evidence that this approach may work...

In January 2021, a small clinical trial of a drug that switches off BCL11A demonstrated safety, and both an increase in gamma-globin in red blood cells and an apparent reduction in complications of sickle cell disease. Longer-term and larger clinical trials are in progress.


References and additional reading:


Esrick EB, Lehmann LE, Biffi A, Achebe M, Brendel C, Ciuculescu MF, et al. Post-Transcriptional Genetic Silencing of BCL11A to Treat Sickle Cell Disease. NEJM. 2021 Jan 21;384:205-215. doi: 10.1056/NEJMoa2029392 PMID: 33283990


Anholt RRH, O'Grady P, Wolfner MF, Harbison ST. Human fetal hemoglobin expression is regulated by the developmental stage-specific repressor BCL11A. Science. 2008 Dec 19;322(5909):1839-42. doi: 10.1126/science.1165409. PMID: 19056937


Orkin SH and Bauer DE. Emerging Genetic Therapy for Sickle Cell Disease. Annu Rev Med. 2019 Jan 27;70:257-271. doi: 10.1146/annurev-med-041817-125507. PMID: 30355263