Showing posts with label UniProtKB. Show all posts
Showing posts with label UniProtKB. Show all posts

Monday, October 10, 2016

Automatic learning based annotation in UniProt

Have you ever wondered how data mining and machine learning techniques might help in knowledge curation? Let us introduce you to the Statistical Automatic Annotation System (SAAS) in UniProt!

UniProt has an automatic annotation project that enhances unreviewed TrEMBL entries in the UniProt Knowledgeable (UniProtKB) by enriching them with automatically predicted annotations. SAAS is one of the systems that contribute to this project. 

SAAS is an automatic system with quality validation input from curators, such as exclusion of some data types as not appropriate for propagation. It learns on the properties present in the reviewed UniProtKB (Swiss-Prot) entries and uses the following attribute types to define the learning entries: InterPro protein family, taxonomy and sequence length. This combination allows SAAS to generate rules to annotate protein properties such as function, catalytic activity, pathway membership, subcellular location, protein names and feature predictions.

SAAS based evidence for UniProtKB annotation
When an annotation is added to an entry based on an automatic annotation from a SAAS rule, the evidence tag indicates this along with a link to the rule itself.


Browsing SAAS rules
In order to browse the dataset to view rules of interest, click on the dropdown next to the search box in the UniProt website and select ‘SAAS’. Now enter a query and hit the search button.



Exploring SAAS rule pages
Conditions are listed on the left hand side of the rule page and annotations are on the right hand side. If a condition holds true then the corresponding annotation is applied. 


SAAS annotation data is recalculated for every UniProt release to ensure that the annotations are accurate and up-to-date. 

Friday, September 9, 2016

How can you increase the impact of your research papers and contribute to UniProt?

Have you ever wondered how to get your life science data into public resources like UniProt with due credit and citation of your papers? Would you like the broaden the reach and impact of your research papers?  

Biomedical literature is vast, with over one million papers being added to PubMed every year.  Our team of curators triages these and selects relevant papers to create and update our protein entries. This herculean task is sometimes made more difficult when we cannot easily identify exactly which protein(s) a paper is about due to the lack of species, strain and even sequence information! The simple addition of a UniProt accession number in a paper could go a long way to helping both UniProt and other resources to use your work for adding knowledge into our databases and giving you due credit.




If the protein you are writing about is not in UniProt, you can get an accession number for it by submitting it to us through http://www.uniprot.org/help/submissions. Accession numbers are the alphanumeric identifiers that typically look like P12346 or A0A167SS16. Here are a couple of examples of a protein being referenced in this way within the text of a paper:





Remember, accession numbers are different to Entry names and are stable from release to release. Hence, accession numbers are the best identifier for referring to your protein in a manuscript. 

We recommend the format “UniProtKB P12346” to be used in the body of a manuscript, while the UniProtKB accession numbers would also be a suitable column title for a table. We do have procedures in place to identify the accession numbers alone in the text but elucidating that these are from UniProt would help us and readers of your papers to understand and use your work.  

Although we have discussed UniProt accessions here, adding accession numbers for other resources will also help to link up the research literature helping researchers search for and disseminate your work and hence increase its impact further. We are looking forward to working with you to achieve that goal!