UniProt loves life in all its forms, but we especially
love its complement of proteins. We want
to bring you the protein sequences from the massive diversity of organisms
across the whole planet. We have been
closely following how the Tree of Life is expanding and being increasingly
accurately resolved. Here's a look at a couple of the most exciting discoveries and how they are reshaping
what we do. Below is a revised Tree of
Life presented by Laura Hug et al.,
which is based upon an alignment of 16 ribosomal proteins.
Figure 1. The revised
Tree of Life from Hug et al. 2016. Lineages
lacking an isolated representative are highlighted with non-italicized names
and red dots.
We can see that the
large majority of organisms are microbial, and as yet we are unable to grow a
large fraction of them in the lab. Red dots in the figure show phyla for which
not even a single organism has been cultured.
However, due to the power of next generation sequencing and improving metagenomic
assembly and binning tools, we now have access to thousands of complete or near
complete genomes assembled from metagenomic data (Anantharaman et al. 2016, Parks et al. 2017). These genomes
have been called MAGs for metagenomic assembled genomes.
Probably the most exciting MAG to have been
assembled is that of an enigmatic archaebacterium that lives in deep sea
sediments. Lokiarchaebacterium is named after the location at which it was
first identified (Spang et al. 2015);
the Loki’s Castle field of active hydrothermal vents or black smokers found in
the mid-Atlantic. This microbe possesses many protein families that were
considered to be characteristic of eukaryotes.
It was suggested that an ancestor of Lokiarchaebacterium was the origin
of all eukaryotic cells. The
identification of more remotely related archaebacteria led to the definition of the Asgard
phylum of archaebacteria (Zaremba-Niedzwiedzka et al. 2017). Phylogenetic
analysis showed that eukaryotes could be considered an ingroup of these archaebacteria. This finding and the growth of MAGs of these
organisms gives us an unprecedented opportunity to study the earliest events in
the emergence of the eukaryotic lineage.
A huge yet mysterious
branch on the tree of life has become known as the Candidate Phyla Radiation
(CPR). This branch of bacteria contains
vast numbers of uncultured organisms.
Analysis of the MAGs of these organisms suggests that they do not have
all the machinery necessary for free living and are likely to exist in
symbiotic associations. These cells are
extremely small, yet offer a glimpse into hitherto unknown protein functions
and diversity.
The authors of the
influential papers mentioned here have submitted data to the DNA sequence
databanks which flows into UniProt. The set of protein sequences (proteomes) for these complete
genomes can be searched for in the UniProt Proteome database. Some of these
organisms have been computationally selected as part of the UniProt Reference
Proteome collection (Chen et al. 2011)
which aims to provide a selection of key proteomes and their proteins that cover
the diversity of life. Reference proteomes and are indicated with the following icon:
You can investigate
the protein sequences in these organisms in UniProt by following the links
below to relevant proteomes:
References
Anantharaman K, Brown CT, Hug LA, Sharon I, Castelle CJ, Probst AJ, Thomas BC, Singh A, Wilkins MJ, Karaoz U, Brodie EL, Williams KH, Hubbard SS, Banfield JF. Nat Commun Thousands of microbial
genomes shed light on interconnected biogeochemical processes in an aquifer
system. [24 Oct 2016, 7:13219].
Parks DH, Rinke C, Chuvochina M, Chaumeil PA, Woodcroft BJ, Evans PN, Hugenholtz P, Tyson GW. Nat Microbiol. Recovery of nearly 8,000
metagenome-assembled genomes substantially expands the tree of life. 11 Sep 2017,
2(11):1533-1542.
Spang A, Saw JH, Jørgensen SL, Zaremba-Niedzwiedzka
K, Martijn J, Lind AE, van Eijk R, Schleper C, Guy L, Ettema TJG. Nature. Complex archaea that bridge the gap
between prokaryotes and eukaryotes. 6 May 2015,
521(7551):173-179.
Zaremba-Niedzwiedzka
K, Caceres EF, Saw JH, Bäckström D, Juzokaite L, Vancaester E, Seitz KW, Anantharaman K, Starnawski P, Kjeldsen KU, Stott MB, Nunoura T, Banfield JF, Schramm A, Baker BJ, Spang A, Ettema TJ. Nature. Asgard archaea illuminate
the origin of eukaryotic cellular complexity. 11 Jan 2017, 541(7637):353-358.
Chen C, Natale DA,
Finn RD, Huang H, Zhang J, Wu CH, Mazumder R. Representative proteomes: a
stable, scalable and unbiased proteome set for sequence analysis and functional
annotation. PLoS One 27 Apr 2011, 6(4):e18910.