Friday, February 7, 2020

SARS-CoV-2 (Coronavirus) - UniProtKB acts to serve community need


UniProt has launched a COVID-19 portal https://covid-19.uniprot.org/ for the latest pre-release data. This will be updated independently to the general UniProt 8 week release cycle. You can also find the data on FTP here ftp://ftp.uniprot.org/pub/databases/uniprot/pre_release/ .

The 2019–20 COVID-19 outbreak is a viral epidemic which started in mainland China but has since spread to several other countries and territories. The Severe Acute Respiratory Syndrome Coronavirus 2  (SARS-CoV-2) was first identified in Wuhan, the capital of China's Hubei province. It is an enveloped single-stranded RNA virus. The particles are decorated with petal-shaped surface projections which are reminiscent of the solar corona.  These viruses are found in many vertebrate species and cause respiratory diseases, such as the common cold or SARS.  The more recent SARS-CoV-2 has emerged from a still unknown animal reservoir and can be transmitted from human to human.



Coronaviruses possess the largest genomes among all known RNA viruses. The 30 kilobase genome of the Wuhan seafood market strain has been sequenced (MN908947, NC_045512), this genome encodes a total of 13-14 proteins. In order to fast-track scientific research, these proteins have been manually annotated by UniProt biocurators and the entries made available as a pre-release dataset. This file provides pre-release access to the SARS-CoV-2 protein sequences in UniProt from the current public health emergency. The data will become part of a future UniProt release and may be subject to further changes. A high-resolution crystal structure of the SARS-CoV-2 3CL hydrolase (6lu7) has been determined by Zihe Rao and Haitao Yang's research team at ShanghaiTech University and is cross-referenced from P0DTD1.


Two copies of the 3C-like hydrolase (P0DTD1 -PRO_0000449623)
in a catalytically active assembly


In common with other public domain resources, UniProt has moved rapidly to make these valuable data publicly available at the time when it is most needed and hope that this will assist clinical researchers in their efforts to combat the virus. To download the entire dataset of protein sequences, expertly curated for function and fully cross-referenced to additional resources click here.

Tuesday, February 4, 2020

Understanding protein complexes with UniProtKB and the Complex Portal


Very few proteins are solitary biological entities which act independently. Understanding the context in which a protein carries out its function or regulates the function of other proteins is essential for a complete overview of how a protein works. UniProt and the Complex Portal provide the scientific community with the mechanistic importance and physiological contributions of individual proteins by presenting data on interactions, protein networks, and the reactions and pathways in which proteins play a role.

In the recent publication entitled “Caenorhabditis elegans phosphatase complexes in UniProt and Complex Portal,” the two databases consider the phosphatome of the model organism C.elegans, and comprehensively review the way in which data characterising phosphatase-containing macromolecular complexes are presented. The databases use complementary curation approaches: UniProt is a protein-centric database and provides data on the complexes proteins may form or contribute to, whilst the Complex Portal presents data on macromolecular complexes and includes details of any specific protein functions that may be necessary for complex formation or function. This is the first study to compare and contrast the curation of phosphatase-containing complexes between the databases and portrays how collaborative efforts can answer a variety of research questions and widen investigative avenues.




























Phosphatases regulate intracellular signalling by catalysing the removal of phosphate groups from a diverse range of substrates. Phosphatase dis-regulation has been implicated in an increasing number of diseases, and as more than 50% of human phosphatases have a counterpart in C.elegans, this organism can complement human and mouse studies of the disease process. This study explores the dynamic context-specific roles of specific phosphatases within complexes and discusses how users may access and use data which compares complex function and regulation, when determined by cellular metal ion concentrations, subcellular location and interactions with other proteins and complexes.



The examples used give a small insight into this intriguing area and such a study enables us to address the reasons why in many species there are fewer phosphatases in the phosphatome than there are kinases in the kinome.

The endeavour to organise data in a way that facilitates growth is of significant importance. As research and data generation develops and becomes more complex, so too will data curation in databases such as UniProt and the Complex Portal. In this instance, the accommodation of biologically diverse functions of phosphatases within complexes, demonstrates that these databases are not limited in scope, and shows how powerful these databases are as data analysis tools.