With advances in structural biology, protein structures are becoming larger and more complex than ever. How do we navigate these complex structures?
PDB:8F2U captures the human COMMD–CCDC22–CCDC93 (CCC) complex, which is part of the Commander complex that plays a vital role in sorting transmembrane cargo for endosomal recycling. The CCC complex has 12 different protein components, but how do we identify the functional role of each protein?
A handy tool is UniProt's advanced search function, where you can find all the protein components using the PDB entry ID (xref:pdb-8F2U). This directs you to the individual entry pages, where you can learn about the biological role of each protein.
You can also use the ‘Customize columns’ option to include 3D structures as an additional column. This provides a glimpse of the available structures for each protein.
Which structure is the ‘best’ structure?
There is often more than one experimentally determined structure available for a protein and they might provide different information. In the UniProt entry page, you can go to the Structure section for a list of all the available structures. Let’s take a look at COMMD7 (AC:Q86VX2).4 structures are available for this protein and they were solved using different methods with varying resolutions. X-ray crystallography often provides high-resolution structures for smaller proteins, while cryo-EM can capture the conformation of larger and more dynamic protein complexes. Understanding how the structures were solved can help us select the most informative structure for our study.
Another useful piece of information is the ‘POSITIONS’ column. The positions here indicate the construct used to determine the structure and they may not cover the full-length protein. In this example, if you are interested in the N-terminal 130 residues of COMMD7, they are found in PDB:8F2R, 8F2U, 8P0W, but not in PDB:8ESD. In some cases, even if a full-length construct is used, some regions may not be observable in the structure. This is usually due to technical limitations in resolving highly dynamic regions in a protein.
If you are interested in the conformation of the full-length protein, you can refer to the AlphaFold prediction. UniProt provides an AlphaFold model for most protein entries, predicted based on the canonical sequence. These full-length models can provide insights for regions that are absent in experimentally determined structures.
Where is my residue of interest?
A high-resolution protein structure can provide residue-level information about a protein. For example, post-translational modification (PTM) analysis by the PTMeXchange project showed that residue Lys90 in human COMMD7 is ubiquitinated. But where is it in the structures?UniProt and PDB entries provide a consistent residue-level mapping through the SIFTS project. This means for all the PDB structures mapped to the same UniProt entry, you can expect to see the same Lys90 in the structures covering this region of the protein.
Using the UniProt Feature viewer, you can identify where Lys90 is.
Clicking on Lys90 will highlight this residue across all the feature viewer tracks. It will also zoom in and highlight Lys90 in the structure viewer. But don't forget to select a structure that covers residue 90! (Hint: not PDB:8ESD)
How to use this information for further studies?
If you want to study these structures in detail, you can directly download them from the UniProt entry page.
PyMOL and UCSF Chimera are among the most common molecular visualisation tools in bioinformatics and computational chemistry. You can use them for more complex analysis, such as measuring bond lengths, docking small molecules and simulating conformational changes.
We can use the information from UniProt to get us started. For example, knowing that chain G in PDB:8F2U is COMMD7, you can highlight COMMD7 in the structure of the 12-subunit CCC complex. You can also identify Lys90 and check if it is accessible for ubiquitination in the complex.
Try these commands in PyMOL:
fetch 8f2u
select COMMD7, 8f2u and chain G
util.cbay COMMD7
show sticks, (COMMD7 and resi 90 and not name N+C+O)
label n. CA and i. 90 and COMMD7, '%s%s' % (resn, resi)
Have fun navigating the world of protein structures!
Learn more about structural annotations in UniProt: https://www.uniprot.org/help/structure_section
Learn more about multiple structures for the same protein:
https://www.uniprot.org/help/multiple_pdb_xrefs
Learn more about sequence coverage in structures: