Friday, July 19, 2019

Become a contributor of literature and knowledge in UniProt

The UniProt Knowledgebase (UniProtKB) contains a vast amount of protein sequence and function information. Expert curation in UniProtKB includes a critical review of experimental data from the literature as well as predicted data from sequence analysis tools. A representative set of publications is selected as evidence for the data. Thus, many literature articles with potentially relevant content may not be associated with a protein entry.

We have developed a prototype for literature submission where you will be able to add publications that you deem relevant to a protein entry, along with performing several optional tasks, such as classifying the article and adding annotations. 

Contribute in a few simple steps:
  1. Find your UniProt protein entry of interest
  2. Sign in with your ORCID ID (you can create one during the submission process if you do not already have one)
  3. Fill in the submission form (retrieve publication and add annotation)
  4. Submit

Bibliography submission prototype:

The publication and annotations will be included in the publication section of the UniProt entry in a future UniProt release. ORCIDs are used to validate and credit your contribution. The publications section currently provides all expert-curated literature as well as an additional set of computationally mapped literature. 

Why should you contribute?
  • You are the expert
  • It will help scale up curation
  • It will provide a comprehensive set of articles related to a given protein entry

Benefits to you
  • You will be credited for the papers and annotations contributed
  • Your contribution will be citable and can be used to broaden the impact of your research
  • You can play a role in improving the database
  • A better database will better support the whole research community

With the community expert contributions, UniProt will enable access to a more comprehensive set of articles and annotations, enabling discovery and benefitting the wider research community.

Try it!

Access to prototype:

Tuesday, May 21, 2019

Highlights from ‘UniProt genomic mapping for deciphering functional effects of missense variants'

Understanding the association of genetic variation with its functional consequences in proteins is essential for the interpretation of genomic data and identifying causal variants in diseases. It is crucial to integrate knowledge from genomic annotation with known protein function. UniProt is mapping genomic and protein data to build a better understanding of functional effects of variants. UniProt’s recent publication titled ‘UniProt genomic mapping for deciphering functional effects of missense variants’ describes this work of mapping UniProtKB human sequences and positional annotations, such as active sites, binding sites, and variants to the human genome (GRCh38). This mapping allowed the creation of public genome track hubs for viewing variants location in protein functional domains on genome browsers (Fig 1) and also allows data integration and comparison with other resources that map their data to the genome.

The genome track hubs and related UniProtKB files are downloadable from the UniProt FTP site and discoverable as public track hubs at the UCSC and Ensembl genome browsers.

Fig 1: Protein functional domains in genome browsers
The paper compared ClinVar’s clinically annotated single nucleotide polymorphism (SNP) data to UniProt features and variant annotation. To get an overview of variants in different functional features we examined ClinVar gold star rated SNPs that overlap selected protein features. As illustrated in Fig 2., a missense variant in a key functional feature of a protein may alter a protein’s structure and function and if severe enough might be classified as harmful. This suggests that a functional feature could be a useful attribute to be included in variant prediction algorithms, including machine‐learning approaches.
Fig 2: Percentage of ClinVar SNPs in each annotation category that exist in each feature type, underlying data table in supplemental methods.

The paper also presents a direct comparison of ClinVar SNPs annotation with UniProtKB natural variant annotation that affects the same amino acid. In general the annotation agrees with ~86% of co-located UniProtKB disease-associated variants mapping to 'pathogenic' ClinVar SNPs. The reasons for disagreements were examined and discussed.

A related publication by UniProtKB/Swiss-Prot curators looked in more detail at the concordance of variant interpretations from UniProtKB/Swiss-Prot with those of ClinVar. This publication also looked at the effect of re-curatingUniProtKB/Swiss-Prot variants - using guidelines of the American College of Medical Genetics and Genomics (ACMG) and tools from ClinGen. See “An enhanced workflow for variant interpretation in UniProtKB/Swiss-Prot improves consistency and reuse in ClinVar”.

The work described in these papers provides a basis for better integration and standardization of UniProtKB annotation with ClinVar and ClinGen.

UniProt hopes to investigate these and related topics in the future, and as a publicly funded resource, UniProt encourages others to further analyze the data as well.

Wednesday, January 23, 2019

Uncovering the mysteries of marine mammals with UniProtKB

Over millions of years of evolution, distantly related species such as dolphins, seals and manatees made the move from terrestrial environments to water, in the process undergoing extraordinary changes in physiology and physical abilities. These mammals acquired limbs adapted to swimming, an enhanced capacity to store and transport oxygen to enable underwater foraging and decreased bone density - changes that occurred independently in each species through convergent evolution. 

Dolphin by Ed Dunens is licensed under CC BY 2.0

Characterizing the underlying molecular landscape of these evolutionary adaptations could now become simpler thanks to the proteomes of twelve new marine mammal species available in the latest UniProt release. This diverse group includes cetaceans, the most aquatically adapted set comprising whales and dolphins, sirenians, the herbivorous sea cows, pinnipeds, the carnivorous seals and walruses as well as partially marine species such as the polar bear and sea otter.

Some of these species, Weddell seals in particular, are known to dive deep and remain underwater for extended periods depriving vital organs of oxygen without causing organ damage, a feat impossible for most land-based animals. This phenomenon is not unlike the hypoxia experienced by humans during a heart attack or stroke that when followed by reoxygenation, often leads to permanent organ damage or even death. Ongoing studies focused on a number of different marine and non-marine species have the potential to identify therapeutic targets to alleviate injury sustained during heart attacks and strokes (Ref 1). Dr. Benjamin Neely, at the Marine Biochemical Sciences group at NIST Charleston working alongside academics and veterinarians, is compiling blood proteomes from over 70 different species, including many marine mammals, as part of the Comparative Mammal Proteome Aggregator Resource (Ref 2). The availability of these proteomes in UniProtKB alongside their terrestrial relatives will aid us greatly in this endeavour’ he says. ‘Improving our understanding of advantageous biological adaptations will facilitate biomimetic studies’. Due to the high degree of homology within mammalian species, insights gleaned from comparative proteomics are likely applicable to humans.

Weddell seal by Robert Nunn is licensed under CC BY-NC 2.0

Comparative studies are also important for probing the challenges faced by these often protected species and in understanding the conflicts with human uses of shared environments. Major changes in the abundance of whales and and other marine species in the last few years have been linked to similar changes in commercial fishing practices worldwide. Research into marine mammal conservation thus also has the potential to impact fisheries and coastal communities around the world.

The UniProt Proteomes portal supports searching by free text (scientific or common name) and search results can be filtered using the options on the panel to the left (Fig 3).

We hope that the inclusion of these proteomes into UniProt will improve utilization of proteomic tools by biomedical researchers, evolutionary biologists and conservationists alike, help address some foundational questions and accelerate biomedical discoveries.


  1. Sobolesky P, Parry C, Boxall B, Wells R, Venn-Watson S, Janech MG. Proteomic
    Analysis of Non-depleted Serum Proteins from Bottlenose Dolphins Uncovers a High
    Vanin-1 Phenotype. Sci Rep. 2016 Sep 26;6:33879.