Friday, February 7, 2020

SARS-CoV-2 (Coronavirus) - UniProtKB acts to serve community need

UniProt has launched a COVID-19 portal for the latest pre-release data. This will be updated independently to the general UniProt 8 week release cycle. You can also find the data on FTP here .

The 2019–20 COVID-19 outbreak is a viral epidemic which started in mainland China but has since spread to several other countries and territories. The Severe Acute Respiratory Syndrome Coronavirus 2  (SARS-CoV-2) was first identified in Wuhan, the capital of China's Hubei province. It is an enveloped single-stranded RNA virus. The particles are decorated with petal-shaped surface projections which are reminiscent of the solar corona.  These viruses are found in many vertebrate species and cause respiratory diseases, such as the common cold or SARS.  The more recent SARS-CoV-2 has emerged from a still unknown animal reservoir and can be transmitted from human to human.

Coronaviruses possess the largest genomes among all known RNA viruses. The 30 kilobase genome of the Wuhan seafood market strain has been sequenced (MN908947, NC_045512), this genome encodes a total of 13-14 proteins. In order to fast-track scientific research, these proteins have been manually annotated by UniProt biocurators and the entries made available as a pre-release dataset. This file provides pre-release access to the SARS-CoV-2 protein sequences in UniProt from the current public health emergency. The data will become part of a future UniProt release and may be subject to further changes. A high-resolution crystal structure of the SARS-CoV-2 3CL hydrolase (6lu7) has been determined by Zihe Rao and Haitao Yang's research team at ShanghaiTech University and is cross-referenced from P0DTD1.

Two copies of the 3C-like hydrolase (P0DTD1 -PRO_0000449623)
in a catalytically active assembly

In common with other public domain resources, UniProt has moved rapidly to make these valuable data publicly available at the time when it is most needed and hope that this will assist clinical researchers in their efforts to combat the virus. To download the entire dataset of protein sequences, expertly curated for function and fully cross-referenced to additional resources click here.


  1. This is highly informatics, crisp and clear. I think everything has been described in systematic manner so that reader could get maximum information and learn many things.
    get tested for COVID-19