Tuesday, February 4, 2020

Understanding protein complexes with UniProtKB and the Complex Portal


Very few proteins are solitary biological entities which act independently. Understanding the context in which a protein carries out its function or regulates the function of other proteins is essential for a complete overview of how a protein works. UniProt and the Complex Portal provide the scientific community with the mechanistic importance and physiological contributions of individual proteins by presenting data on interactions, protein networks, and the reactions and pathways in which proteins play a role.

In the recent publication entitled “Caenorhabditis elegans phosphatase complexes in UniProt and Complex Portal,” the two databases consider the phosphatome of the model organism C.elegans, and comprehensively review the way in which data characterising phosphatase-containing macromolecular complexes are presented. The databases use complementary curation approaches: UniProt is a protein-centric database and provides data on the complexes proteins may form or contribute to, whilst the Complex Portal presents data on macromolecular complexes and includes details of any specific protein functions that may be necessary for complex formation or function. This is the first study to compare and contrast the curation of phosphatase-containing complexes between the databases and portrays how collaborative efforts can answer a variety of research questions and widen investigative avenues.




























Phosphatases regulate intracellular signalling by catalysing the removal of phosphate groups from a diverse range of substrates. Phosphatase dis-regulation has been implicated in an increasing number of diseases, and as more than 50% of human phosphatases have a counterpart in C.elegans, this organism can complement human and mouse studies of the disease process. This study explores the dynamic context-specific roles of specific phosphatases within complexes and discusses how users may access and use data which compares complex function and regulation, when determined by cellular metal ion concentrations, subcellular location and interactions with other proteins and complexes.



The examples used give a small insight into this intriguing area and such a study enables us to address the reasons why in many species there are fewer phosphatases in the phosphatome than there are kinases in the kinome.

The endeavour to organise data in a way that facilitates growth is of significant importance. As research and data generation develops and becomes more complex, so too will data curation in databases such as UniProt and the Complex Portal. In this instance, the accommodation of biologically diverse functions of phosphatases within complexes, demonstrates that these databases are not limited in scope, and shows how powerful these databases are as data analysis tools.

















Friday, July 19, 2019

Become a contributor of literature and knowledge in UniProt


The UniProt Knowledgebase (UniProtKB) contains a vast amount of protein sequence and function information. Expert curation in UniProtKB includes a critical review of experimental data from the literature as well as predicted data from sequence analysis tools. A representative set of publications is selected as evidence for the data. Thus, many literature articles with potentially relevant content may not be associated with a protein entry.


We have developed a prototype for literature submission where you will be able to add publications that you deem relevant to a protein entry, along with performing several optional tasks, such as classifying the article and adding annotations. 

Contribute in a few simple steps:
  1. Find your UniProt protein entry of interest
  2. Sign in with your ORCID ID (you can create one during the submission process if you do not already have one)
  3. Fill in the submission form (retrieve publication and add annotation)
  4. Submit

Bibliography submission prototype: https://uuw.dbi.udel.edu/bbsub/bbsub.html




The publication and annotations will be included in the publication section of the UniProt entry in a future UniProt release. ORCIDs are used to validate and credit your contribution. The publications section currently provides all expert-curated literature as well as an additional set of computationally mapped literature. 



Why should you contribute?
  • You are the expert
  • It will help scale up curation
  • It will provide a comprehensive set of articles related to a given protein entry



Benefits to you
  • You will be credited for the papers and annotations contributed
  • Your contribution will be citable and can be used to broaden the impact of your research
  • You can play a role in improving the database
  • A better database will better support the whole research community


With the community expert contributions, UniProt will enable access to a more comprehensive set of articles and annotations, enabling discovery and benefitting the wider research community.

Try it!

Access to prototype:


Tuesday, May 21, 2019

Highlights from ‘UniProt genomic mapping for deciphering functional effects of missense variants'

Understanding the association of genetic variation with its functional consequences in proteins is essential for the interpretation of genomic data and identifying causal variants in diseases. It is crucial to integrate knowledge from genomic annotation with known protein function. UniProt is mapping genomic and protein data to build a better understanding of functional effects of variants. UniProt’s recent publication titled ‘UniProt genomic mapping for deciphering functional effects of missense variants’ describes this work of mapping UniProtKB human sequences and positional annotations, such as active sites, binding sites, and variants to the human genome (GRCh38). This mapping allowed the creation of public genome track hubs for viewing variants location in protein functional domains on genome browsers (Fig 1) and also allows data integration and comparison with other resources that map their data to the genome.

The genome track hubs and related UniProtKB files are downloadable from the UniProt FTP site and discoverable as public track hubs at the UCSC and Ensembl genome browsers.


Fig 1: Protein functional domains in genome browsers
The paper compared ClinVar’s clinically annotated single nucleotide polymorphism (SNP) data to UniProt features and variant annotation. To get an overview of variants in different functional features we examined ClinVar gold star rated SNPs that overlap selected protein features. As illustrated in Fig 2., a missense variant in a key functional feature of a protein may alter a protein’s structure and function and if severe enough might be classified as harmful. This suggests that a functional feature could be a useful attribute to be included in variant prediction algorithms, including machine‐learning approaches.
Fig 2: Percentage of ClinVar SNPs in each annotation category that exist in each feature type, underlying data table in supplemental methods.


The paper also presents a direct comparison of ClinVar SNPs annotation with UniProtKB natural variant annotation that affects the same amino acid. In general the annotation agrees with ~86% of co-located UniProtKB disease-associated variants mapping to 'pathogenic' ClinVar SNPs. The reasons for disagreements were examined and discussed.

A related publication by UniProtKB/Swiss-Prot curators looked in more detail at the concordance of variant interpretations from UniProtKB/Swiss-Prot with those of ClinVar. This publication also looked at the effect of re-curatingUniProtKB/Swiss-Prot variants - using guidelines of the American College of Medical Genetics and Genomics (ACMG) and tools from ClinGen. See “An enhanced workflow for variant interpretation in UniProtKB/Swiss-Prot improves consistency and reuse in ClinVar”.

The work described in these papers provides a basis for better integration and standardization of UniProtKB annotation with ClinVar and ClinGen.

UniProt hopes to investigate these and related topics in the future, and as a publicly funded resource, UniProt encourages others to further analyze the data as well.