Thursday, June 5, 2014

Introducing cross-references for isoforms

UniProtKB protein entries contain cross-references that link to many other resources, such as sequence databases, genome annotation databases, 3D structure databases and so on. These links are there to help you find more information about your protein of interest and some of the cross-references link to gene and transcript data provided by various genome annotation resources.

Making links more fine-grained 


Many UniProtKB protein entries provide information about the various isoforms produced by a single gene. For example, the human p53 entry http://beta.uniprot.org/uniprot/P04637 describes nine isoform sequences produced by a combination of alternative splicing and alternative promoter usage. We have often received requests in the past to be able to indicate the specific isoform that a cross-reference links to. With improved mapping between our data and that of the cross-referenced resources, we can now provide cross-references specific to isoforms. We have begun this effort with the cross-references for UCSC and Ensembl genome annotation databases. We recently did the same for the RefSeq sequence database and will soon add new cross-references to the CCDS project that will also indicate isoforms.

For example, if you go to http://beta.uniprot.org/uniprot/P04637 and then click on 'Cross-references' in the 'Display' tab on the left, you can see the RefSeq cross-references under 'Sequence databases'. The relevant isoform is listed in square brackets next to the cross-reference link. When the mapping only goes as far as the level of the entry, no isoform is mentioned. 


This improved mapping will allow you to easily identify which UniProtKB isoform is linked to a particular cross-reference so that you can access related information for that isoform in other resources. Any questions or feedback on this topic can be sent to help@uniprot.org.