research advances

Annotating proteins

PSI-SGKB [doi:10.1038/th_psisgkb.2010.36]

Now that efficient high-throughput production of protein structures is established, the challenge is to identify the function.

Flow chart of the Functional Annotation Screening Technology by NMR (FAST-NMR). FAST-NMR uses a combination of NMR ligand affinity screens, structural biology and bioinformatics (CPASS) to infer a functional annotation based on ligand-binding site similarities.

Towards the end of 2009, the sequence of the 1000th prokaryotic genome was released. With genomes becoming available increasingly rapidly, the rate at which new proteins are being identified is outstripping the rate at which researchers can characterize them, leaving many proteins with little or no functional information.

The Protein Structure Initiative (PSI) and other structural genomics projects have produced thousands of protein structures, many for proteins with no previous experimental information on their function. The rapid progress made by the PSI in developing a high-throughput structure-solution pipeline has removed most bottlenecks in structure determination. The major challenge now is to develop computational methods that can predict function and so provide annotations for the many structures that lack characterization.

The standard way to produce annotations for gene products is to search for homologous proteins that already have been experimentally characterized. Distant homology recognition can, however, produce errors, especially for proteins with divergent sequences. For such proteins, searches for structurally similar proteins using servers such as DALI or FATCAT can provide help, verifying marginal similarities found with other methods or suggesting new, extremely distant homologs. For many other proteins though, this is no help because of a lack of sequence or structural similarity.

One way to tackle functional prediction in such circumstances is to analyze functional linkages. This involves identifying genes through bioinformatic approaches and then using this information to deduce molecular function. Functional linkages are pairs or groups of proteins that function together and can be identified by the analysis of the distribution of homologs in genomes 1 . For instance, a function for the uncharacterized (at the time) protein TM0449 from Thermotoga maritima was suggested by gene complementation assays because its homologs were found solely in genomes that lacked the 'classical' thymidylate synthase, and yet were known to synthesize thymidylate.

One PSI team has automated this search for functional linkages and developed the ProKnow server. The result is an improvement in the accuracy of functional assignments by more than 8% and enhanced descriptions of a third of proteins with preliminary assignments. This approach has been further developed to combine generalized functional linkage information with nonhomology-based methods, using an algorithm known as zorch 2 . It quantifies connections in protein–protein interaction networks and even works with proteins that are indirectly linked or have been shown to interact only in high-throughput experiments.

Alternatively, rather than looking for global topological similarities, local similarity between proteins can be examined to search for function clues. Many algorithms struggle to do this, but the MarkUs server is able to detect geometrically similar regions of proteins even when their overall topologies differ. MarkUs combines several sequence- and structure-based analysis methods, such as DALI, Psi-BLAST and DelPhi. By analyzing the biophysical and biochemical properties of a protein structure, it can suggest a nearest neighbor in the general 'functional space' 3 .

Another method for spotting local similarities, for example similar active sites or similar ligand-binding interactions, is to use rapid NMR screening 4 . Functional annotation screening technology using NMR spectroscopy (FAST-NMR) can screen a library of compounds with known biology activity to see whether any of them bind the new structure. Upon ligand-binding, a structure can be rapidly determined by combining chemical shift perturbation data with protein ligand docking data using the program AutoDock. The active site is then compared with that of proteins of known function using the software Comparison of Protein Active Site Structures (CPASS).

But annotations should come with a word of warning: while analyzing annotations of cytosolic sulfotransferase structures in the Protein Data Bank (PDB), a PSI team noted that many of the structures were incorrectly annotated and that the correct physiological interface was not indicated in many cases 5 . Software capable of analyzing the whole PDB is being developed and should help researchers identify the biologically relevant interface.

An important aim for PSI structures is to make them accessible to the wider biological community. One PSI center has developed The Open Protein Annotation Network (TOPSAN), a Wikipedia-type portal for the scientific community. Researchers are invited to annotate the structures with the function, and it promises to be a broad collaborative project that will assist annotation.

Maria Hodges

References:
  1. R. Llewellyn and D. S. Eisenberg. Annotating proteins with generalized functional linkages.

    Proc. Natl Acad. Sci. USA 105, 17700-17705 (2008). doi:10.1073/pnas.0809583105

  2. A. Medrano-Soto, D. Pal and D. Eisenberg. Inferring molecular function: contributions from functional linkages.

    Trends Genet. 24, 587-590 (2008). doi:10.1016/j.tig.2008.10.001

  3. D. Petrey, M. Fischer and B. Honig. Structural relationships among proteins with different topologies and their implications for function annotation strategies.

    Proc. Natl Acad. Sci USA 106, 17377-17382 (2009). doi:10.1073/pnas.0907971106

  4. K. A. Mercier, M. Baran, V. Ramanathan, P. Revesz, R. Xiao et al. FAST-NMR – Functional annotation screening technology using NMR spectroscopy.

    J. Am. Chem. Soc. 128, 15292-15299 (2006). doi:10.1021/ja0651759

  5. B. Weitzner, T. Meehan, Q. Xu and R. L. Dunbrack, Jr. An unusually small dimer interface is observed in all available crystal structures of cytosolic sulfotransferases.

    Proteins 75, 289-295 (2009). doi:10.1002/prot.22347

search

Explore proteins and this website

search

help