Thursday, October 8, 2020

Introducing the UniProt Alzheimer’s disease portal


Alzheimer's disease (AD), the most common subtype of dementia, is the most prevalent neurodegenerative disorder with an estimated 30-35 million living with the disease worldwide. It is characterized by progressive memory loss, cognitive decline, and eventually leads to the loss of bodily functions and ultimately death.  

Although much is known about this complex disease, the underlying cause remains unclear. Current research suggests that the risk of developing AD is influenced by both genetic and environmental factors as well as age; although it is not a normal part of ageing. 

Despite considerable global scientific efforts into developing drugs, vaccines and other medical treatments, there are currently no effective medications for the prevention and treatment of AD. Since 1998, 146 drugs have been tested and rejected, and the four drugs that have been approved for therapeutic use only have modest symptom-reducing effects and do not alter the eventual progression of AD. 

It is therefore critical that the plethora of data generated by this research is collected, organized, freely-available and accessible to researchers, in order to increase the pace of discovery and innovation.


To better serve the needs of the AD research community and to facilitate discoverability, UniProt has developed the Alzheimer’s disease portal to help researchers explore and access current AD genomic-based data from the UniProtKB database, but in a single centralized UniProt disease portal.  It is linked from the UniProt Alzheimer Disease page in the first beta release.

Developed with the help of AD researchers, the portal incorporates UniProt functional annotations, protein network visualizations, and has integrated genomic and drug-related data from other resources; allowing users to easily visualize and compare data to identify similarities using variants, protein interactions, diseases, and drug data. The portal follows a card-based approach to allow exploration of interconnected data. For example, the navigation connects you from a given Disease to an associated protein to another disease that the protein may be involved in.

The landing page shows a disease card with a disease description and a dropdown to select sub-types for the disease. Further, it presents three tabs with associated proteins, drug candidates and UniProt curated sequence variants.

The ‘Proteins’ tab allows quick filtering and download of the entire protein dataset. For each protein, you can view all its interactions, all pathways it is involved in, all sequence variants it is known to have (including UniProt curated variants as well predicted variants from Large Scale Studies), all diseases that the protein is associated with according to UniProt curation, and any drugs linked to the disease according to Chembl. 

The Drug candidates tab shows all drug candidates associated with the disease from Chembl and Open Targets, with the max phase, mechanism of action, links to Chembl, links to literature, associated diseases and associated UniProt proteins for each drug.

The Sequence variants tab presents all UniProt curated sequence variants associated with the disease.

Try out the UniProt Disease Portal and help us develop it further for your requirements by sending us feedback through the ‘Feedback’ link in the header.


Cummings J, Lee G, Ritter A, Sabbagh M, Zhong K. Alzheimer's disease drug development pipeline: 2020. Alzheimers Dement. (2020);6(1):e12050. DOI: 10.1002/trc2.12050. UniProt Consortium. A Coordinated Approach by Public Domain Bioinformatics Resources to Aid the Fight Against Alzheimer's Disease Through Expert Curation of Key Protein Targets. J Alzheimers Dis. (2020);77(1):257-273. DOI: 10.3233/JAD-200206  

Monday, September 7, 2020

Association-Rule-Based Annotator (ARBA) in UniProt

UniProt has developed an automatic annotation system to enhance unreviewed TrEMBL entries in the UniProt Knowledgebase (UniProtKB) by enriching them with automatically predicted annotations. In release 2020_04 of August 2020, a new powerful automated system called ARBA replaced the previous SAAS (Statistical Automatic Annotation System) system.  ARBA is a multiclass learning system trained on expertly annotated entries in UniProtKB/Swiss-Prot. ARBA uses rule mining techniques to generate concise annotation models with the highest representativeness and coverage for annotation, based on the properties of InterPro group membership and taxonomy.

ARBA currently generates around 23 thousand models, resulting in annotations for more than 85 million proteins including 35 million that lacked any previous annotation. Consequently, UniProtKB witnessed an increase in automatic annotation coverage from 35% to 50%. All ARBA rules can be accessed here and relevant rules are also tagged as evidence for annotations from UniProtKB entries.

ARBA-based evidence for UniProtKB annotation

When an annotation is added to an entry based on an automatic annotation from an ARBA rule, the evidence tag indicates this along with a link to the rule itself, for example, the protein entry Q4SML2  derives annotation from ARBA rule ARBA00000621.

Browsing ARBA rules

In order to browse the dataset to view rules of interest, click on the dropdown next to the search box in the UniProt website and select ‘ARBA’. Now enter a query and hit the search button.

Exploring ARBA rule pages

Conditions are listed on the left-hand side of the rule page and annotations are on the right-hand side. If a condition holds true, then the corresponding annotation is applied.

ARBA annotation data is recalculated for every UniProt release to ensure that the annotations are accurate and up-to-date.

Tuesday, August 4, 2020

UniProt COVID-19 portal: Supporting research during the pandemic


Responding to the urgency of the pandemic, UniProt created and is continuing to develop a dedicated portal to provide access to the latest pre-release annotations and sequences for proteins related to COVID-19. It is released independently of UniProt’s 8 weekly release schedule. It can be accessed via and all sequences can also be downloaded directly via our FTP site

An integrated source of sequence, function and links to specialist resources


The portal provides SARS-CoV-2 annotated protein sequences, closest SARS-CoV 2003 sequences and human sequences relevant to the biology of viral infection. The SARS-CoV-2 proteome is annotated based on expert curation of literature and the knowledge extracted from the well-studied SARS-CoV virus. Rule-based automatic annotation also allows us to add information from a broader taxonomic range of viruses. Links to structures, drugs, interactions, molecular pathways as well as many other resources provide integrated information to help understand the biology and investigate routes to treatment.


The annotated UniProtKB entries include functional and positional annotations. The microbial infection information and essential positions and structures for the virus infection are also documented in these records. Each protein entry provides annotations such as the catalytic activity and function, Gene Ontology terms, 3D structures, interactions, external links to resources like IntAct, ChEMBL, DrugBank, PDBe-KB, etc, and the ProtVista visualisation of positional annotations on the sequence space. Within entries, the mature products that result from proteolytic cleavage of precursor proteins can be identified with UniProt product identifiers.

Contribute and explore literature about COVID-19

The portal provides access to the latest literature related to the virus and host protein through a link to LitCovid and a link to UniProt’s community literature submissions. Users can also contribute relevant publications through the ‘Add a publication’ link present in each entry.


Tuesday, April 14, 2020

Scientists at home, UniProt to the rescue!

Many of you that work in the lab have switched to working remotely. Though your daily routine and the continuity of your research might have been impacted, your contribution to knowledge can continue in new ways.
Are you at home itching to contribute to science?  UniProt to the rescue!
Improve our resource for the community and receive credit for it.
We have the proteins and you have the expertise. You can now use that expertise by adding publications to protein entries.

What you need:

1.     ORCID, this is your researcher personal ID (used for validation and for credit)
2.     a protein of interest
3.     a publication with a PubMed ID (PMID) about the protein of interest. You don’t have to be the author of the publication
What to do (Figure 1):
1.     Identify the protein of interest in UniProt (note that this also includes proteins from the special UniProt COVID-19 website, which can be found at*)
2.     Select “Add a publication” link on the top menu in the entry page
3.     Login with ORCID
4.     Fill in submission form
a.     Enter PubMed ID (PMID) to retrieve publication
b.     Confirm that the publication is correct and it is about the protein of interest
c.     Select what topics the paper is about
d.     Add short statements about protein name, function, disease, or other, as described in the publication
e.     Submit
5.     Reply to review questions, if any
6.     After review, check your publication on the website in next release

Figure 1-From publication to UniProtKB entry.

A sample blank submission form can be found here:

This is how your publication will be displayed on the UniProt entry publication page, under community with your ORCID as the contributing source for the publication and information.

Publications submitted can be tracked here

Follow the growth of contributions:

Learn more here:

Friday, March 13, 2020

To be or not to be an enzyme: pseudoenzymes in UniProt

Enzymes are essential for many biological processes. Without them, common tasks such as digesting food or replicating DNA would not be possible.
In recent years, and in part triggered by the expansion of the analysis and annotation of complete genomes, it has become apparent that several enzyme families in a wide range of species contain members that look like enzymes but fail to behave like enzymes. For example, in human, several of these families have between 5 to 10% of these enzyme-like proteins. Whilst these proteins have sequences and 3D structure features similar to active enzymes, they tend to lack essential amino acid residues such as those involved in catalytic reactions and/or binding substrates, making them incapable of catalysing chemical reactions. Based on these characteristics, scientists decided to call them pseudoenzymes.
Why are genes coding for pseudoenzymes maintained in the genome? It turns out that, despite their lack of enzymatic activity, this group of proteins carries out essential functions in cells. For example, they help assemble signalling cascades by acting as scaffolds, they regulate the activity of other enzymes and ensure that proteins are localized to the right cellular compartment. Consequently, they have become potential targets for the design of therapeutic treatments.
To support the growing interest in pseudoenzyme biology, UniProt recently revisited this important group of proteins. In collaboration with the pseudoenzyme community, we implemented changes to enhance their identification and discoverability. The outcome of this project was published in two articles in Science signalling and FEBS journal .

Ultimately, this effort will provide the scientific community with a comprehensive resource for pseudoenzymes, which in turn will lead to a better understanding of the evolution of these molecules and their active counterparts and the aetiology of related diseases. It will also support the ongoing quest to target pseudoenzymes for therapeutic treatments and offer some insight into the expanding field of enzyme engineering.