Tuesday, May 21, 2019

Highlights from ‘UniProt genomic mapping for deciphering functional effects of missense variants'

Understanding the association of genetic variation with its functional consequences in proteins is essential for the interpretation of genomic data and identifying causal variants in diseases. It is crucial to integrate knowledge from genomic annotation with known protein function. UniProt is mapping genomic and protein data to build a better understanding of functional effects of variants. UniProt’s recent publication titled ‘UniProt genomic mapping for deciphering functional effects of missense variants’ describes this work of mapping UniProtKB human sequences and positional annotations, such as active sites, binding sites, and variants to the human genome (GRCh38). This mapping allowed the creation of public genome track hubs for viewing variants location in protein functional domains on genome browsers (Fig 1) and also allows data integration and comparison with other resources that map their data to the genome.

The genome track hubs and related UniProtKB files are downloadable from the UniProt FTP site and discoverable as public track hubs at the UCSC and Ensembl genome browsers.

Fig 1: Protein functional domains in genome browsers
The paper compared ClinVar’s clinically annotated single nucleotide polymorphism (SNP) data to UniProt features and variant annotation. To get an overview of variants in different functional features we examined ClinVar gold star rated SNPs that overlap selected protein features. As illustrated in Fig 2., a missense variant in a key functional feature of a protein may alter a protein’s structure and function and if severe enough might be classified as harmful. This suggests that a functional feature could be a useful attribute to be included in variant prediction algorithms, including machine‐learning approaches.
Fig 2: Percentage of ClinVar SNPs in each annotation category that exist in each feature type, underlying data table in supplemental methods.

The paper also presents a direct comparison of ClinVar SNPs annotation with UniProtKB natural variant annotation that affects the same amino acid. In general the annotation agrees with ~86% of co-located UniProtKB disease-associated variants mapping to 'pathogenic' ClinVar SNPs. The reasons for disagreements were examined and discussed.

A related publication by UniProtKB/Swiss-Prot curators looked in more detail at the concordance of variant interpretations from UniProtKB/Swiss-Prot with those of ClinVar. This publication also looked at the effect of re-curatingUniProtKB/Swiss-Prot variants - using guidelines of the American College of Medical Genetics and Genomics (ACMG) and tools from ClinGen. See “An enhanced workflow for variant interpretation in UniProtKB/Swiss-Prot improves consistency and reuse in ClinVar”.

The work described in these papers provides a basis for better integration and standardization of UniProtKB annotation with ClinVar and ClinGen.

UniProt hopes to investigate these and related topics in the future, and as a publicly funded resource, UniProt encourages others to further analyze the data as well.

Wednesday, January 23, 2019

Uncovering the mysteries of marine mammals with UniProtKB

Over millions of years of evolution, distantly related species such as dolphins, seals and manatees made the move from terrestrial environments to water, in the process undergoing extraordinary changes in physiology and physical abilities. These mammals acquired limbs adapted to swimming, an enhanced capacity to store and transport oxygen to enable underwater foraging and decreased bone density - changes that occurred independently in each species through convergent evolution. 

Dolphin by Ed Dunens is licensed under CC BY 2.0

Characterizing the underlying molecular landscape of these evolutionary adaptations could now become simpler thanks to the proteomes of twelve new marine mammal species available in the latest UniProt release. This diverse group includes cetaceans, the most aquatically adapted set comprising whales and dolphins, sirenians, the herbivorous sea cows, pinnipeds, the carnivorous seals and walruses as well as partially marine species such as the polar bear and sea otter.

Some of these species, Weddell seals in particular, are known to dive deep and remain underwater for extended periods depriving vital organs of oxygen without causing organ damage, a feat impossible for most land-based animals. This phenomenon is not unlike the hypoxia experienced by humans during a heart attack or stroke that when followed by reoxygenation, often leads to permanent organ damage or even death. Ongoing studies focused on a number of different marine and non-marine species have the potential to identify therapeutic targets to alleviate injury sustained during heart attacks and strokes (Ref 1). Dr. Benjamin Neely, at the Marine Biochemical Sciences group at NIST Charleston working alongside academics and veterinarians, is compiling blood proteomes from over 70 different species, including many marine mammals, as part of the Comparative Mammal Proteome Aggregator Resource (Ref 2). The availability of these proteomes in UniProtKB alongside their terrestrial relatives will aid us greatly in this endeavour’ he says. ‘Improving our understanding of advantageous biological adaptations will facilitate biomimetic studies’. Due to the high degree of homology within mammalian species, insights gleaned from comparative proteomics are likely applicable to humans.

Weddell seal by Robert Nunn is licensed under CC BY-NC 2.0

Comparative studies are also important for probing the challenges faced by these often protected species and in understanding the conflicts with human uses of shared environments. Major changes in the abundance of whales and and other marine species in the last few years have been linked to similar changes in commercial fishing practices worldwide. Research into marine mammal conservation thus also has the potential to impact fisheries and coastal communities around the world.

The UniProt Proteomes portal supports searching by free text (scientific or common name) and search results can be filtered using the options on the panel to the left (Fig 3).

We hope that the inclusion of these proteomes into UniProt will improve utilization of proteomic tools by biomedical researchers, evolutionary biologists and conservationists alike, help address some foundational questions and accelerate biomedical discoveries.


  1. Sobolesky P, Parry C, Boxall B, Wells R, Venn-Watson S, Janech MG. Proteomic
    Analysis of Non-depleted Serum Proteins from Bottlenose Dolphins Uncovers a High
    Vanin-1 Phenotype. Sci Rep. 2016 Sep 26;6:33879.
  2. https://www.nist.gov/programs-projects/comparative-mammalian-proteome-aggregator-resource-compare-program

Thursday, November 29, 2018

Using UniProtKB to explore the world of protein structure

Protein structures are used to understand the architecture of a protein, to explain how a protein interacts with its ligands or cofactors and to study the composition of protein complexes. They help us to identify the position and nature of post-translational modifications and, as 3D structure is more evolutionarily conserved than primary sequence, can also be used to predict protein function. Identifying proteins sharing a conserved protein fold may help to also ascertain a molecular function that is common to them all. Understanding how topology affects the active sites of enzymes or identifying sequence-conserved regions, such as binding sites or areas of electrostatic potential, on the surface of a protein can also give valuable clues to the role a protein plays in a cell.

Annotation of proteins based on structure-based analyses is an integral part of the work of the UniProt Knowledgebase (UniProtKB). UniProt works closely with the Protein Databank in Europe (PDBe) to map 3D structural entries (~100,000) to the appropriate UniProtKB entries at the individual residue level [1]. It then becomes possible to use the UniProtKB advanced search functionality to ask questions such as ‘How many proteins in the human proteome have at least a partial 3D structure?’

                                               Searching for structural data in UniProtKB

Once you have found the protein you are interested in, use our navigation tool in the entry to move to the Structure section where you may either find more information in the table view or visualise a 3D image. The table view lists all the structures available for that molecule, give details of the method by which the structure has been determined (e.g. X-ray, NMR, Electron Microscopy) and an accurate residue-level mapping to the region of amino acid sequence covered by each structure. Links to a number of external data repositories and resources enable you to access more detailed information.  To help our users visualize the structure, we have recently incorporated the LiteMol Viewer, an HTML5 web application that not only provides cartoons, surface and balls and stick visualizations but also links you to the PDBe database, allowing you to view and explore validation and annotation data.


Visualising Bloom's syndrome helicase (P54132) in complex with ADP and duplex DNA.

Hovering over the structure will show you the amino-acid residue-level mappings, a single click and you can zoom in to a more detailed view, for example enabling you to visualize the details of cofactor binding.

Zooming in on Bloom's syndrome helicase to show ADP binding

Knowing the shape of a protein can give you valuable clues to the function of that molecule. Use UniProtKB to explore the links between sequence, structure and function and understand how molecule topology can drive cellular phenotype. 

Want to learn more:

Go to our pre-recorded webinars to learn more about the annotation of structural data in UniProtKB