Tuesday, April 14, 2020

Scientists at home, UniProt to the rescue!

Many of you that work in the lab have switched to working remotely. Though your daily routine and the continuity of your research might have been impacted, your contribution to knowledge can continue in new ways.
Are you at home itching to contribute to science?  UniProt to the rescue!
Improve our resource for the community and receive credit for it.
We have the proteins and you have the expertise. You can now use that expertise by adding publications to protein entries.

What you need:

1.     ORCID, this is your researcher personal ID (used for validation and for credit)
2.     a protein of interest
3.     a publication with a PubMed ID (PMID) about the protein of interest. You don’t have to be the author of the publication
What to do (Figure 1):
1.     Identify the protein of interest in UniProt (note that this also includes proteins from the special UniProt COVID-19 website, which can be found at https://covid-19.uniprot.org/uniprotkb?query=*)
2.     Select “Add a publication” link on the top menu in the entry page
3.     Login with ORCID
4.     Fill in submission form
a.     Enter PubMed ID (PMID) to retrieve publication
b.     Confirm that the publication is correct and it is about the protein of interest
c.     Select what topics the paper is about
d.     Add short statements about protein name, function, disease, or other, as described in the publication
e.     Submit
5.     Reply to review questions, if any
6.     After review, check your publication on the website in next release




Figure 1-From publication to UniProtKB entry.

A sample blank submission form can be found here:

This is how your publication will be displayed on the UniProt entry publication page, under community
https://www.uniprot.org/uniprot/O58649/publications?query=&fil=Community with your ORCID as the contributing source for the publication and information.

Publications submitted can be tracked here

Follow the growth of contributions:

Learn more here:


Friday, March 13, 2020

To be or not to be an enzyme: pseudoenzymes in UniProt


Enzymes are essential for many biological processes. Without them, common tasks such as digesting food or replicating DNA would not be possible.
In recent years, and in part triggered by the expansion of the analysis and annotation of complete genomes, it has become apparent that several enzyme families in a wide range of species contain members that look like enzymes but fail to behave like enzymes. For example, in human, several of these families have between 5 to 10% of these enzyme-like proteins. Whilst these proteins have sequences and 3D structure features similar to active enzymes, they tend to lack essential amino acid residues such as those involved in catalytic reactions and/or binding substrates, making them incapable of catalysing chemical reactions. Based on these characteristics, scientists decided to call them pseudoenzymes.
Why are genes coding for pseudoenzymes maintained in the genome? It turns out that, despite their lack of enzymatic activity, this group of proteins carries out essential functions in cells. For example, they help assemble signalling cascades by acting as scaffolds, they regulate the activity of other enzymes and ensure that proteins are localized to the right cellular compartment. Consequently, they have become potential targets for the design of therapeutic treatments.
To support the growing interest in pseudoenzyme biology, UniProt recently revisited this important group of proteins. In collaboration with the pseudoenzyme community, we implemented changes to enhance their identification and discoverability. The outcome of this project was published in two articles in Science signalling and FEBS journal .

Ultimately, this effort will provide the scientific community with a comprehensive resource for pseudoenzymes, which in turn will lead to a better understanding of the evolution of these molecules and their active counterparts and the aetiology of related diseases. It will also support the ongoing quest to target pseudoenzymes for therapeutic treatments and offer some insight into the expanding field of enzyme engineering.


Friday, February 7, 2020

SARS-CoV-2 (Coronavirus) - UniProtKB acts to serve community need


UniProt has launched a COVID-19 portal https://covid-19.uniprot.org/ for the latest pre-release data. This will be updated independently to the general UniProt 8 week release cycle. You can also find the data on FTP here ftp://ftp.uniprot.org/pub/databases/uniprot/pre_release/ .

The 2019–20 COVID-19 outbreak is a viral epidemic which started in mainland China but has since spread to several other countries and territories. The Severe Acute Respiratory Syndrome Coronavirus 2  (SARS-CoV-2) was first identified in Wuhan, the capital of China's Hubei province. It is an enveloped single-stranded RNA virus. The particles are decorated with petal-shaped surface projections which are reminiscent of the solar corona.  These viruses are found in many vertebrate species and cause respiratory diseases, such as the common cold or SARS.  The more recent SARS-CoV-2 has emerged from a still unknown animal reservoir and can be transmitted from human to human.



Coronaviruses possess the largest genomes among all known RNA viruses. The 30 kilobase genome of the Wuhan seafood market strain has been sequenced (MN908947, NC_045512), this genome encodes a total of 13-14 proteins. In order to fast-track scientific research, these proteins have been manually annotated by UniProt biocurators and the entries made available as a pre-release dataset. This file provides pre-release access to the SARS-CoV-2 protein sequences in UniProt from the current public health emergency. The data will become part of a future UniProt release and may be subject to further changes. A high-resolution crystal structure of the SARS-CoV-2 3CL hydrolase (6lu7) has been determined by Zihe Rao and Haitao Yang's research team at ShanghaiTech University and is cross-referenced from P0DTD1.


Two copies of the 3C-like hydrolase (P0DTD1 -PRO_0000449623)
in a catalytically active assembly


In common with other public domain resources, UniProt has moved rapidly to make these valuable data publicly available at the time when it is most needed and hope that this will assist clinical researchers in their efforts to combat the virus. To download the entire dataset of protein sequences, expertly curated for function and fully cross-referenced to additional resources click here.