Thursday, December 18, 2014

An insight into expert annotation with RS3_HUMAN

UniProt's expert curation consists of manual annotation based on literature and curation tools. You may know that previously unreviewed and automatically generated entries (from TrEMBL) go through expert curation to become reviewed entries in Swiss-Prot. However the curator's role doesn't end at annotating an unreviewed protein entry and making it part of Swiss-Prot. Did you know that reviewed protein entries also undergo revisions by curators? Even well characterised proteins with reviewed Swiss-Prot entries are considered for revision to include information from the latest publications. These revisions can be very valuable as a lot can be learnt from well characterised proteins through important updates such as newly identified enzymatic activities. One such example is the ribosomal S3 protein found in the UniProt entry RS3_HUMAN.

RS3_HUMAN was picked up for revision as it was originally missing a Function annotation. This was the beginning of a comprehensive review during which a UniProt curator read through a number of new publications about this protein. To provide high-quality in-depth experimental annotation, the choice of publications to use is critical. We prioritise publications with (i) a high impact in the scientific community that contain functional data for previously uncharacterized proteins, (ii) new 3D-structural information, (iii) enzymatic reactions that may complete the annotation of known metabolic pathways or networks, (iv) PTMs and their consequences, (v) novel splice variants and (vi) disease-causing variants as well as polymorphisms.

This review resulted in the addition of 26 new publications as sources for the RS3_HUMAN entry. Updates included annotations about function, enzyme activity, interactions, subcellular locations, post translational modifications, etc. You can see the latest function annotation in the entry, now tagged with 16 publications:


To see the full list of updates on any entry, you can use the 'History' button displayed towards the top of the page. In the case of RS3_HUMAN, the History button shows that it was last updated on the 26th of November 2014. To view the exact changes, simply click on the 'Previous versions' link in the dropdown as shown below:


This will bring you the full history page where you can see the version numbers and dates of all changes. To view the exact updates, select the versions that you want to compare. For example, select version 170 with one radio button and version 168 with another and click 'Compare'.


You will then see all the changes, with removed information coloured in red and added information coloured in green. RS3_HUMAN here has a long scrolling page full of green additions, including the Function annotation where it all began!





Wednesday, November 12, 2014

Saving proteins with the UniProt basket

Have you ever browsed through different UniProt proteins, wishing you could save them somewhere for later? That's exactly what the UniProt basket allows you to do. It remembers your saved proteins so you can build your selection over time or simply come back to a saved protein later on. Here's a quick guide to your UniProt basket.

UniProt provides several tools and action buttons you can use directly on the search results page (i.e. Align, Blast, Download and Add to Basket). The basket provides you with the same tools for your saved proteins. You can select proteins within the basket to align them, run a Blast search or download them in various formats. You can delete the entries one by one from the 'Remove' column or use the 'Clear' button to delete all saved entries from the tab you're in (UniProtKB, UniRef or UniParc). You can also use the 'Full View' button to transfer the entries to a full results page for additional functionality such as filters and customisable columns.


You can add proteins to your basket from search results pages for the UniProtKB, UniRef or UniParc datasets. Simply select them in the results table and click on the 'Add to basket' button.


Once you add entries to your basket, you will see a small basket icon appear under the entry ID. You will also see the number on the basket changing to show that new proteins have been added. Clicking on the basket button will show you your saved proteins.


You can also add proteins to your basket from the protein entry pages, using the 'Add to basket' button towards the top of the page.


There is a limit of 400 entries in the basket, so be careful when trying to add large datasets. The contents of the basket will remain there until you delete your browser's cookies or clear the basket yourself. 

Try the UniProt basket next time you're looking to save your proteins somewhere. Is there more functionality you would like to see in the basket? Write in and let us know!




Wednesday, October 15, 2014

Introducing Annotation Scores in UniProt

We are pleased to introduce you to annotation scores on the UniProt website! We have recently started providing annotation scores for all UniProtKB entries. Annotation scores are a five point heuristic score. An annotation score of 5 points is associated with the best-annotated entries, and a 1-point-score denotes an entry with rather basic annotation. A 5-point annotation score would look like:



Annotation scores can help you quickly gauge the annotation content in a protein entry. For example, you could see which is the best-annotated protein in a family. We hope the scores will be useful in helping you narrow down to your entries of interest.

You can view annotation scores in the ‘Status’ line on all UniProtKB protein entry pages, as shown below.



You can also add annotation scores to your search results table through the ‘Columns’ button.


How are they used?

There are several contexts in which annotation scores can be used:
  • UniProtKB
    The annotation scores can help you to get a quick idea of the relative level of annotation of the entries in your search results. Please note that search results are not ranked by the annotation score, but by a query score that considers not only the annotation scores of the entries that match your query, but also how often (and where) your query term(s) appear in a matching entry and across the whole database, and the importance of a term according to the total number of terms. For this reason, the best-ranked entries are not necessarily those with the highest annotation scores.
  • UniRef
    We will be using annotation scores to select the representative member of a UniRef cluster.

How are they computed?

  • Different UniProtKB annotation types (e.g. protein names, gene names, functional annotations (comments) and sequence annotations (features), GO annotations, cross-references) are scored either by presence or by number of occurrences. Annotations with experimental evidence score higher than equivalent predicted/inferred annotations, thereby favoring expert literature-based curation over automatic annotation.
  • The score of an individual entry is the sum of the scores of its annotations.
  • The score of a proteome is the sum of the scores of the entries that are part of the proteome.

Next time you’re looking at a UniProt protein, look out for annotation scores. We welcome your feedback. Would you apply these scores in your work? Would you like to see them in your UniProtKB search results by default? Write in and let us know!

Monday, August 18, 2014

Have you tried UniProt RDF?

RDF is a core technology for the World Wide Web Consortium’s Semantic Web activities (http://www.w3.org/2001/sw/) and is well suited to work in a distributed and decentralized environment. The RDF data model represents arbitrary information as a set of simple statements of the form subject-predicate-object. 

Why RDF?

UniProt collects information from the scientific literature and other databases and provides links to over one hundred and fifty biological resources. Such links between different databases are an important basis for data integration, but the lack of a common standard to represent and link information makes data integration an expensive business. One way to tackle this problem at UniProt is by using the Resource Description Framework (http://www.w3.org/RDF/) to represent our data.

Using the UniProt RDF

The UniProt SPARQL endpoint is available in its beta form at http://beta.sparql.uniprot.org/. This SPARQL endpoint contains all UniProt data and is freely accessible. RDF provides the foundation for publishing Linked Data and the UniProt Consortium has been publishing its data in RDF since 2008, both on its web and FTP sites. Since 2013, the EMBL - European Bioinformatics Institute RDF platform also links to the UniProt RDF (http://www.ebi.ac.uk/rdf/).

Information about the UniProt data concepts and relationships in our RDF are available on http://beta.uniprot.org/core/. Additionally, we use some general purpose relationships such as those provided by SKOS (http://www.w3.org/2004/02/skos/), OWL (http://www.w3.org/TR/owl-ref/) and RDFS (http://www.w3.org/TR/rdf-schema/). As an example of data concepts and relationships, the following figure shows the UniProt taxonomy data as linked in our RDF.


Friday, July 4, 2014

Visit us at ISMB 2014

Are you planning to go to ISMB this year? We will be there and we're looking forward to enjoying the talks and meeting all the participants. We will be presenting a technology track and a couple of posters.

Come say hello to us at any of the following:

We will be presenting the new UniProt website in a technology track titled TT30 UniProt: New website and latest developments on Tuesday, July 15: 2:00 p.m. - 2:25 p.m. in room 309.

If you miss the tech track, we also have a poster about the new website. It's Poster L59 - 'Redesigning the UniProt Web Interface' on Sunday, July 13: 5:00 p.m. - 7:00 p.m.

We will also be showcasing Poster G24 - Reference Proteomes in UniProtKB – Responding to Challenges in the Post Genomic Era on Monday, July 14: 5:45 p.m. - 7:30 p.m.

Drop by our event/ posters or leave us a comment if you'd like to arrange to meet us with any questions, comments or feedback. We look forward to seeing you in Boston!

Thursday, June 5, 2014

Introducing cross-references for isoforms

UniProtKB protein entries contain cross-references that link to many other resources, such as sequence databases, genome annotation databases, 3D structure databases and so on. These links are there to help you find more information about your protein of interest and some of the cross-references link to gene and transcript data provided by various genome annotation resources.

Making links more fine-grained 


Many UniProtKB protein entries provide information about the various isoforms produced by a single gene. For example, the human p53 entry http://beta.uniprot.org/uniprot/P04637 describes nine isoform sequences produced by a combination of alternative splicing and alternative promoter usage. We have often received requests in the past to be able to indicate the specific isoform that a cross-reference links to. With improved mapping between our data and that of the cross-referenced resources, we can now provide cross-references specific to isoforms. We have begun this effort with the cross-references for UCSC and Ensembl genome annotation databases. We recently did the same for the RefSeq sequence database and will soon add new cross-references to the CCDS project that will also indicate isoforms.

For example, if you go to http://beta.uniprot.org/uniprot/P04637 and then click on 'Cross-references' in the 'Display' tab on the left, you can see the RefSeq cross-references under 'Sequence databases'. The relevant isoform is listed in square brackets next to the cross-reference link. When the mapping only goes as far as the level of the entry, no isoform is mentioned. 


This improved mapping will allow you to easily identify which UniProtKB isoform is linked to a particular cross-reference so that you can access related information for that isoform in other resources. Any questions or feedback on this topic can be sent to help@uniprot.org.

Wednesday, May 7, 2014

Have you tried the new UniProt beta site?

We would like to introduce you to the new UniProt beta site http://beta.uniprot.org! We have been working on this site behind the scenes for a while and we're glad it's finally time to share it with you. The current UniProt website (uniprot.org) will still be available while we continue working on the beta site.

How it all started


At UniProt, we are keen to understand the scientific community that uses our services and to better understand your requirements. As part of this effort, we conducted two user workshops a couple of years ago - one in Washington DC and one in Hinxton, UK. The idea was to understand the gaps in our users' experience with our site and also to brainstorm ideas for future developments. We tested the uniprot.org website with users and identified various areas of improvement, finally leading to the decision of redesigning the interface. 

User centred design


We chose a user centred design approach to ensure that the new interface is intuitive and helps users get the most out of our data and functionality. This approach involved user feedback from the very early stages so we could iterate the design rapidly with this feedback. It broadly consisted of the following stages:



Key highlights


Some highlights of the changes and improvements:
- A new homepage and advanced search functionality
- A new results page interface with easy to use filters
- A basket to store your favourite proteins and build up your own set
- New protein entry page content classification and navigation bar
- New tool output interfaces (check out the Blast results!)

Please try it out and let us know what you think through the 'Send Feedback' button (you'll see on the left hand side of all pages). You can send feedback as many times as you like. To thank you for your feedback, we will enter you into a lucky draw to win a Kindle Fire HD tablet. We look forward to hearing from you!