Friday, March 13, 2020

To be or not to be an enzyme: pseudoenzymes in UniProt


Enzymes are essential for many biological processes. Without them, common tasks such as digesting food or replicating DNA would not be possible.
In recent years, and in part triggered by the expansion of the analysis and annotation of complete genomes, it has become apparent that several enzyme families in a wide range of species contain members that look like enzymes but fail to behave like enzymes. For example, in human, several of these families have between 5 to 10% of these enzyme-like proteins. Whilst these proteins have sequences and 3D structure features similar to active enzymes, they tend to lack essential amino acid residues such as those involved in catalytic reactions and/or binding substrates, making them incapable of catalysing chemical reactions. Based on these characteristics, scientists decided to call them pseudoenzymes.
Why are genes coding for pseudoenzymes maintained in the genome? It turns out that, despite their lack of enzymatic activity, this group of proteins carries out essential functions in cells. For example, they help assemble signalling cascades by acting as scaffolds, they regulate the activity of other enzymes and ensure that proteins are localized to the right cellular compartment. Consequently, they have become potential targets for the design of therapeutic treatments.
To support the growing interest in pseudoenzyme biology, UniProt recently revisited this important group of proteins. In collaboration with the pseudoenzyme community, we implemented changes to enhance their identification and discoverability. The outcome of this project was published in two articles in Science signalling and FEBS journal .

Ultimately, this effort will provide the scientific community with a comprehensive resource for pseudoenzymes, which in turn will lead to a better understanding of the evolution of these molecules and their active counterparts and the aetiology of related diseases. It will also support the ongoing quest to target pseudoenzymes for therapeutic treatments and offer some insight into the expanding field of enzyme engineering.


Friday, February 7, 2020

SARS-CoV-2 (Coronavirus) - UniProtKB acts to serve community need


UniProt has launched a COVID-19 portal https://covid-19.uniprot.org/ for the latest pre-release data. This will be updated independently to the general UniProt 8 week release cycle. You can also find the data on FTP here ftp://ftp.uniprot.org/pub/databases/uniprot/pre_release/ .

The 2019–20 COVID-19 outbreak is a viral epidemic which started in mainland China but has since spread to several other countries and territories. The Severe Acute Respiratory Syndrome Coronavirus 2  (SARS-CoV-2) was first identified in Wuhan, the capital of China's Hubei province. It is an enveloped single-stranded RNA virus. The particles are decorated with petal-shaped surface projections which are reminiscent of the solar corona.  These viruses are found in many vertebrate species and cause respiratory diseases, such as the common cold or SARS.  The more recent SARS-CoV-2 has emerged from a still unknown animal reservoir and can be transmitted from human to human.



Coronaviruses possess the largest genomes among all known RNA viruses. The 30 kilobase genome of the Wuhan seafood market strain has been sequenced (MN908947, NC_045512), this genome encodes a total of 13-14 proteins. In order to fast-track scientific research, these proteins have been manually annotated by UniProt biocurators and the entries made available as a pre-release dataset. This file provides pre-release access to the SARS-CoV-2 protein sequences in UniProt from the current public health emergency. The data will become part of a future UniProt release and may be subject to further changes. A high-resolution crystal structure of the SARS-CoV-2 3CL hydrolase (6lu7) has been determined by Zihe Rao and Haitao Yang's research team at ShanghaiTech University and is cross-referenced from P0DTD1.


Two copies of the 3C-like hydrolase (P0DTD1 -PRO_0000449623)
in a catalytically active assembly


In common with other public domain resources, UniProt has moved rapidly to make these valuable data publicly available at the time when it is most needed and hope that this will assist clinical researchers in their efforts to combat the virus. To download the entire dataset of protein sequences, expertly curated for function and fully cross-referenced to additional resources click here.