Monday, August 18, 2014

Have you tried UniProt RDF?

RDF is a core technology for the World Wide Web Consortium’s Semantic Web activities (http://www.w3.org/2001/sw/) and is well suited to work in a distributed and decentralized environment. The RDF data model represents arbitrary information as a set of simple statements of the form subject-predicate-object. 

Why RDF?

UniProt collects information from the scientific literature and other databases and provides links to over one hundred and fifty biological resources. Such links between different databases are an important basis for data integration, but the lack of a common standard to represent and link information makes data integration an expensive business. One way to tackle this problem at UniProt is by using the Resource Description Framework (http://www.w3.org/RDF/) to represent our data.

Using the UniProt RDF

The UniProt SPARQL endpoint is available in its beta form at http://beta.sparql.uniprot.org/. This SPARQL endpoint contains all UniProt data and is freely accessible. RDF provides the foundation for publishing Linked Data and the UniProt Consortium has been publishing its data in RDF since 2008, both on its web and FTP sites. Since 2013, the EMBL - European Bioinformatics Institute RDF platform also links to the UniProt RDF (http://www.ebi.ac.uk/rdf/).

Information about the UniProt data concepts and relationships in our RDF are available on http://beta.uniprot.org/core/. Additionally, we use some general purpose relationships such as those provided by SKOS (http://www.w3.org/2004/02/skos/), OWL (http://www.w3.org/TR/owl-ref/) and RDFS (http://www.w3.org/TR/rdf-schema/). As an example of data concepts and relationships, the following figure shows the UniProt taxonomy data as linked in our RDF.