Wednesday, May 9, 2018

Learning from our closest living relatives


In the March 2018 release, UniProt added the proteins of 10 new primate species to our collection of complete proteomes; the set of proteins believed to be expressed by an organism and typically obtained from the translation of a fully sequenced, annotated genome. As of release 2018_04 the total number of primate complete proteomes in UniProtKB stands at 24, a set which of course includes that of human. So why is this important?



Studying non-human primates enables scientists to understand the evolution of genomic change and how this has impacted on protein expression patterns. Comparisons across proteome sets allow us to understand which proteins are shared by all primates, and which are species specific, and how changes in protein expression patterns have affected our evolutionary development.
For example, multiple copies of a domain of unknown function, the recently renamed Olduvai domain (IPR010630), are found in neuroblastoma breakpoint family (NBPF) proteins. The copy number is highest in humans, lower in African great apes and further reduced in Orangutan and Old World monkeys. It has been speculated that this may be directly related to the size of the primate’s brain, more specifically to the volume of the neocortex, and links between domain copy number and both cognitive function and the severity of autism have been identified.
Another study looked at why gene duplication has led to a marked expansion of HLA proteins in macaque monkeys in comparison to other primates. The HLA cell-surface proteins are responsible for the regulation of the immune system, so does this mean that this genus of monkeys is more resistant to infection than other closely related species?



Primate proteomes are also studied as models of human disease. Non-human primates appear to be resistant to developing Alzheimer’s disease, despite depositing misfolded Aβ protein in the brain as they age. Aged brains from gorillas, orangutans, chimpanzees, green monkeys, baboons, guenons, mangabeys, squirrel monkeys, marmosets, tamarins, and lemurs have all been examined but Aβ plaques and neurofibrillary (tau) tangles have not been identified, nor do any of these species develop the behavioural and pathologic changes associated with the disease. Studying why Alzheimer’s disease fails to develop in species that are closely related to humans could help us to identify new therapeutic targets and treatments for this condition.


How do I find this these data?



To find the full list of primate proteomes, click on the ‘Proteomes’ button on the UniProt website Home Page, then search for ‘Primates’.

The proteomes portal provides two types of proteomes:


References 

Demuth J. P., et al (2006). The evolution of mammalian gene families. PLoS One 1:e85. 10.1371/journal.pone.0000085
Hahn, MW. Et al. (2007)  Accelerated Rate of Gene Gain and Loss in Primates Genetics. 177(3): 1941–1949. doi:  10.1534/genetics.107.080077
Sikela, JM, van Roy, F. (2018) A proposal to change the name of the NBPF/DUF1220 domain to the Olduvai domain https://f1000research.com/articles/6-2185/v1
Walker LC, Jucker M. The Exceptional Vulnerability of Humans to Alzheimer's Disease. Trends Mol Med. 2017 23(6) 534-545. doi:10.1016/j.molmed.2017.04.001. PMID: 28483344.
Sikela, J. M., & Searles Quick, V. B. (2018). Genomic trade-offs: are autism and schizophrenia the steep price of the human brain? Human Genetics, 137(1), 1–13. https://doi.org/10.1007/s00439-017-1865-9

Thursday, March 1, 2018

Would you like to annotate function with UniProt's annotation systems?

Register your interest here: https://goo.gl/forms/IFo28dAOa5HEwfSk1

One of the core activities at UniProt is to develop computational methods for the functional annotation of protein sequences. UniProt has developed two prediction systems, UniRule and the Statistical Automatic Annotation System (SAAS) to automatically annotate the unreviewed records in UniProtKB/TrEMBL with high coverage and a high degree of accuracy.

These prediction systems can annotate protein properties such as protein names, function, catalytic activity, pathway membership, and subcellular location, along with sequence-specific information, such as the positions of post-translational modifications and active sites.

As a result of discussions with researchers and genome sequencing centres interested in functional annotation, we plan to make our annotation rules publicly available for download. We would like to engage with users in the development of a standardised format for sharing these annotation rules, to help you use the rules for functional annotation of your own data.

Apply the UniProt rules on your own proteins


We also plan to provide a standalone tool to execute the UniProt annotation rules and enrich your own data with high-quality annotations. We invite user feedback towards the provision of such a tool for functional annotation of coding sequences.






By providing input data such as the protein sequences, taxonomy data and InterProScan signatures, along with the rules, a rule engine will be able to reason on the rules to infer new protein annotations.

Get involved


Would you like to try out the UniProt rules to annotate your own data? Would you like an early peek at the systems, formats and functionality we plan to make available and provide valuable feedback? Are you interested in integrating the UniProt rules (UniRule and SAAS) in your annotation pipeline? We would love to have your feedback and give you the opportunity to beta-test our latest developments.


Register your interest here: https://goo.gl/forms/IFo28dAOa5HEwfSk1