Tuesday, December 5, 2017

How can UniProtKB help the gene regulation community?

This question was asked at a recent meeting of a group discussing the availability of information about the regulation of gene expression (http://greekc.org/). 

The first thing most researchers in this field ask for is simply a list of known transcriptional regulators. These can easily be retrieved from, for example, the human proteome by using the Advanced Search to specify the Keyword as “Transcription regulation” and Organism as “Homo sapiens”.

Adding an additional keyword to the search “DNA-binding” will limit the search to entries annotated as DNA-binding transcription factors. Selecting ‘Reviewed’ entries using the filters on the left-hand side bar to restrict the results to those entries in UniProtKB/Swiss-Prot, will complete your search.

If you are just interested in the list of UniProtKB accessions or protein names, you can export it using the download functionality and selecting your favourite format (select “List” for just getting the accession numbers).

However, if you want to review information about any of these entries, for example human TP63 (UniProt Accession Q9H3D4), clicking on the accession number will enable you to access a wealth of protein information. For example, you may wish to identify the DNA-binding region of the protein. The “Display” menu on the left hand side of the UniProtKB entry offers options to see the protein sequence features in a tabular view via the Feature table.


or in a graphical view with the ProtVista feature viewer (accessible  via the ‘Feature viewer’ link).
From this, the “Variants” track can be expanded to show the individual single nucleotide polymorphisms and disease they are associated with (see figure below). It can be seen that many single amino-acid variants fall into this region. The data can be filtered to reveal only the disease-related variants and/or those reviewed by UniProtKB. Clicking on a given variant position will show the annotation available for the variant.


This information is also detailed in the entry in the Pathology & Biotech section, sorted by disease type.

The feature table summarizes all the different sequence features that have been annotated. Then you could find the position of a transcription activation domain in this protein:


and that the activity of this domain has been confirmed by mutagenesis studies:

The same is true exploiting the Feature viewer. Now focusing on the transcription activation domain, you could review the mutagenesis track 

More information on the regulation of this particular transcription factor and the cellular processes it regulates, can be found in the Function section of the entry both in free-text form and the more structured Gene Ontology annotations.