Tuesday, November 24, 2015

UniRule automatic annotation system in UniProt

UniProt has developed two prediction systems, UniRule (Unified Rule system) and the Statistical Automatic Annotation System (SAAS) to automatically annotate unreviewed UniProtKB/TrEMBL entries in an efficient and scalable manner. 

UniRule is a rule-based automatic annotation system that consists of rules devised and tested by experienced curators using experimental data from expertly annotated entries. It automatically annotates entries with a high degree of accuracy. This helps leverage curators' knowledge and expertise to add annotation to a much larger set of protein entries than are possible to annotate solely through expert curation. 

UniRule has been developed by merging existing curated rule-based systems (HAMAP, PIR name and site rules, and RuleBase rules) into one system which stores, applies, and evaluates all rules. 


What is a rule and how does it work to annotate proteins?

Let us look at a fictitious rule to see how this concept works for a basic rule.



Could you make this rule even more granular and specific by adding more conditions?



In this example, the main conditions delineate the space that can be annotated as a 'purple quadrilateral' and the further conditions help add more specific annotation of being a 'square' to a subset. 

This is essentially how rules are created with main conditions and additional conditions to identify sequence matches for which certain annotation can be applied with confidence. The quality of the rules is maintained thanks to the expert curators creating and checking rules before application. 


UniRule annotation in protein entries

If a protein entry contains annotations from the UniRule system, this is indicated in the entry, as seen below.


Clicking on the evidence will take you to the rule that is the source of that annotation. Here you can click on the annotations you're interested in and see how they are applied through the rule or click on the conditions you're interested in and see which annotations they would apply.




If you are interested in exploring rules for proteins, taxonomic groups etc. of your interest, you can also search the UniRule set directly. Just click on the dropdown to the left of the search box to change the focus from 'UniProtKB' to 'UniRule' and search for your query of interest.



So now you can explore rules that UniProt has built to annotate the sequence space of your interest! We always love to hear feedback so please let us know how you would plan to use this functionality and if there is any additional functionality you would find useful. You can always also email us as help@uniprot.org with queries and feedback.