Tuesday, January 27, 2015

Online training for UniProt

Would you like to learn more about using UniProt but aren't able to access any training courses locally? Perhaps you would like a course that you can read and watch in your own time? Or even something to share with your students to introduce them to UniProt? We  provide several online learning resources that can help! Here's a summary of everything you need to know to get started with using the UniProt website.

For an introduction to using pages on the UniProt website, there is a series of short videos that demonstrate how to use UniProt pages in one-two minutes on the UniProt YouTube Channel. Videos available so far:




If you are looking for a basic overview with further links to detailed help pages, the EBI Train Online portal provides a quick tour of UniProt.


We also provide a full length course through the EBI training portal titled 'UniProt: Exploring protein sequence and functional information'. This interactive and in-depth course that covers many areas including:
  • An introduction to what UniProt is and when to use it
  • Where the data and annotation comes from
  • How to track provenance of information
  • How to use UniProt datasets and tools
  • How to download data and submit your own data
  • Guided examples
  • Exercises
  • A quiz to check your learning

We are planning to launch a series of webinars in 2015 to provide more interactive online training. Are there any topics you would like us to cover in particular? Write in and let us know!

Thursday, December 18, 2014

An insight into expert annotation with RS3_HUMAN

UniProt's expert curation consists of manual annotation based on literature and curation tools. You may know that previously unreviewed and automatically generated entries (from TrEMBL) go through expert curation to become reviewed entries in Swiss-Prot. However the curator's role doesn't end at annotating an unreviewed protein entry and making it part of Swiss-Prot. Did you know that reviewed protein entries also undergo revisions by curators? Even well characterised proteins with reviewed Swiss-Prot entries are considered for revision to include information from the latest publications. These revisions can be very valuable as a lot can be learnt from well characterised proteins through important updates such as newly identified enzymatic activities. One such example is the ribosomal S3 protein found in the UniProt entry RS3_HUMAN.

RS3_HUMAN was picked up for revision as it was originally missing a Function annotation. This was the beginning of a comprehensive review during which a UniProt curator read through a number of new publications about this protein. To provide high-quality in-depth experimental annotation, the choice of publications to use is critical. We prioritise publications with (i) a high impact in the scientific community that contain functional data for previously uncharacterized proteins, (ii) new 3D-structural information, (iii) enzymatic reactions that may complete the annotation of known metabolic pathways or networks, (iv) PTMs and their consequences, (v) novel splice variants and (vi) disease-causing variants as well as polymorphisms.

This review resulted in the addition of 26 new publications as sources for the RS3_HUMAN entry. Updates included annotations about function, enzyme activity, interactions, subcellular locations, post translational modifications, etc. You can see the latest function annotation in the entry, now tagged with 16 publications:


To see the full list of updates on any entry, you can use the 'History' button displayed towards the top of the page. In the case of RS3_HUMAN, the History button shows that it was last updated on the 26th of November 2014. To view the exact changes, simply click on the 'Previous versions' link in the dropdown as shown below:


This will bring you the full history page where you can see the version numbers and dates of all changes. To view the exact updates, select the versions that you want to compare. For example, select version 170 with one radio button and version 168 with another and click 'Compare'.


You will then see all the changes, with removed information coloured in red and added information coloured in green. RS3_HUMAN here has a long scrolling page full of green additions, including the Function annotation where it all began!





Wednesday, November 12, 2014

Saving proteins with the UniProt basket

Have you ever browsed through different UniProt proteins, wishing you could save them somewhere for later? That's exactly what the UniProt basket allows you to do. It remembers your saved proteins so you can build your selection over time or simply come back to a saved protein later on. Here's a quick guide to your UniProt basket.

UniProt provides several tools and action buttons you can use directly on the search results page (i.e. Align, Blast, Download and Add to Basket). The basket provides you with the same tools for your saved proteins. You can select proteins within the basket to align them, run a Blast search or download them in various formats. You can delete the entries one by one from the 'Remove' column or use the 'Clear' button to delete all saved entries from the tab you're in (UniProtKB, UniRef or UniParc). You can also use the 'Full View' button to transfer the entries to a full results page for additional functionality such as filters and customisable columns.


You can add proteins to your basket from search results pages for the UniProtKB, UniRef or UniParc datasets. Simply select them in the results table and click on the 'Add to basket' button.


Once you add entries to your basket, you will see a small basket icon appear under the entry ID. You will also see the number on the basket changing to show that new proteins have been added. Clicking on the basket button will show you your saved proteins.


You can also add proteins to your basket from the protein entry pages, using the 'Add to basket' button towards the top of the page.


There is a limit of 400 entries in the basket, so be careful when trying to add large datasets. The contents of the basket will remain there until you delete your browser's cookies or clear the basket yourself. 

Try the UniProt basket next time you're looking to save your proteins somewhere. Is there more functionality you would like to see in the basket? Write in and let us know!




Wednesday, October 15, 2014

Introducing Annotation Scores in UniProt

We are pleased to introduce you to annotation scores on the UniProt website! We have recently started providing annotation scores for all UniProtKB entries. Annotation scores are a five point heuristic score. An annotation score of 5 points is associated with the best-annotated entries, and a 1-point-score denotes an entry with rather basic annotation. A 5-point annotation score would look like:



Annotation scores can help you quickly gauge the annotation content in a protein entry. For example, you could see which is the best-annotated protein in a family. We hope the scores will be useful in helping you narrow down to your entries of interest.

You can view annotation scores in the ‘Status’ line on all UniProtKB protein entry pages, as shown below.



You can also add annotation scores to your search results table through the ‘Columns’ button.


How are they used?

There are several contexts in which annotation scores can be used:
  • UniProtKB
    The annotation scores can help you to get a quick idea of the relative level of annotation of the entries in your search results. Please note that search results are not ranked by the annotation score, but by a query score that considers not only the annotation scores of the entries that match your query, but also how often (and where) your query term(s) appear in a matching entry and across the whole database, and the importance of a term according to the total number of terms. For this reason, the best-ranked entries are not necessarily those with the highest annotation scores.
  • UniRef
    We will be using annotation scores to select the representative member of a UniRef cluster.

How are they computed?

  • Different UniProtKB annotation types (e.g. protein names, gene names, functional annotations (comments) and sequence annotations (features), GO annotations, cross-references) are scored either by presence or by number of occurrences. Annotations with experimental evidence score higher than equivalent predicted/inferred annotations, thereby favoring expert literature-based curation over automatic annotation.
  • The score of an individual entry is the sum of the scores of its annotations.
  • The score of a proteome is the sum of the scores of the entries that are part of the proteome.

Next time you’re looking at a UniProt protein, look out for annotation scores. We welcome your feedback. Would you apply these scores in your work? Would you like to see them in your UniProtKB search results by default? Write in and let us know!

Monday, August 18, 2014

Have you tried UniProt RDF?

RDF is a core technology for the World Wide Web Consortium’s Semantic Web activities (http://www.w3.org/2001/sw/) and is well suited to work in a distributed and decentralized environment. The RDF data model represents arbitrary information as a set of simple statements of the form subject-predicate-object. 

Why RDF?

UniProt collects information from the scientific literature and other databases and provides links to over one hundred and fifty biological resources. Such links between different databases are an important basis for data integration, but the lack of a common standard to represent and link information makes data integration an expensive business. One way to tackle this problem at UniProt is by using the Resource Description Framework (http://www.w3.org/RDF/) to represent our data.

Using the UniProt RDF

The UniProt SPARQL endpoint is available in its beta form at http://beta.sparql.uniprot.org/. This SPARQL endpoint contains all UniProt data and is freely accessible. RDF provides the foundation for publishing Linked Data and the UniProt Consortium has been publishing its data in RDF since 2008, both on its web and FTP sites. Since 2013, the EMBL - European Bioinformatics Institute RDF platform also links to the UniProt RDF (http://www.ebi.ac.uk/rdf/).

Information about the UniProt data concepts and relationships in our RDF are available on http://beta.uniprot.org/core/. Additionally, we use some general purpose relationships such as those provided by SKOS (http://www.w3.org/2004/02/skos/), OWL (http://www.w3.org/TR/owl-ref/) and RDFS (http://www.w3.org/TR/rdf-schema/). As an example of data concepts and relationships, the following figure shows the UniProt taxonomy data as linked in our RDF.


Friday, July 4, 2014

Visit us at ISMB 2014

Are you planning to go to ISMB this year? We will be there and we're looking forward to enjoying the talks and meeting all the participants. We will be presenting a technology track and a couple of posters.

Come say hello to us at any of the following:

We will be presenting the new UniProt website in a technology track titled TT30 UniProt: New website and latest developments on Tuesday, July 15: 2:00 p.m. - 2:25 p.m. in room 309.

If you miss the tech track, we also have a poster about the new website. It's Poster L59 - 'Redesigning the UniProt Web Interface' on Sunday, July 13: 5:00 p.m. - 7:00 p.m.

We will also be showcasing Poster G24 - Reference Proteomes in UniProtKB – Responding to Challenges in the Post Genomic Era on Monday, July 14: 5:45 p.m. - 7:30 p.m.

Drop by our event/ posters or leave us a comment if you'd like to arrange to meet us with any questions, comments or feedback. We look forward to seeing you in Boston!

Thursday, June 5, 2014

Introducing cross-references for isoforms

UniProtKB protein entries contain cross-references that link to many other resources, such as sequence databases, genome annotation databases, 3D structure databases and so on. These links are there to help you find more information about your protein of interest and some of the cross-references link to gene and transcript data provided by various genome annotation resources.

Making links more fine-grained 


Many UniProtKB protein entries provide information about the various isoforms produced by a single gene. For example, the human p53 entry http://beta.uniprot.org/uniprot/P04637 describes nine isoform sequences produced by a combination of alternative splicing and alternative promoter usage. We have often received requests in the past to be able to indicate the specific isoform that a cross-reference links to. With improved mapping between our data and that of the cross-referenced resources, we can now provide cross-references specific to isoforms. We have begun this effort with the cross-references for UCSC and Ensembl genome annotation databases. We recently did the same for the RefSeq sequence database and will soon add new cross-references to the CCDS project that will also indicate isoforms.

For example, if you go to http://beta.uniprot.org/uniprot/P04637 and then click on 'Cross-references' in the 'Display' tab on the left, you can see the RefSeq cross-references under 'Sequence databases'. The relevant isoform is listed in square brackets next to the cross-reference link. When the mapping only goes as far as the level of the entry, no isoform is mentioned. 


This improved mapping will allow you to easily identify which UniProtKB isoform is linked to a particular cross-reference so that you can access related information for that isoform in other resources. Any questions or feedback on this topic can be sent to help@uniprot.org.