Computational Approaches to Peptide Identification via Tandem MS

Hubbard, Simon J.

doi:10.1007/978-1-60761-444-9_3

Simon J. Hubbard³

Part of the book series: Methods in Molecular Biology™ ((MIMB,volume 604))

4727 Accesses
5 Citations

Abstract

The peptide identification problem lies at the heart of modern proteomic methodology, from which the presence of a particular protein or proteins in a sample may be inferred. The challenge is to find the most likely amino acid sequence, which corresponds to each tandem mass spectrum that has been collected, and produce some kind of score and associated statistical measure that the putative identification is correct. This approach assumes that the peptide (and parent protein) sequence in question is known and is present in the database which is to be searched, as opposed to de novo methods, which seek to identify the peptide ab initio. This chapter will provide an overview of the methods that common, popular software tools employ to search protein sequence databases to provide the non-expert reader with sufficient background to appreciate the choices they can make. This will cover the approaches used to compare experimental and theoretical spectra and some of the methods used to validate and provide higher confidence in the assignments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 159.00; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Colinge, J., and Bennett, K. L. (2007) Introduction to computational proteomics. Plos Computational Biology 3, 1151-60.
Article CAS Google Scholar
Hernandez, P., Muller, M., and Appel, R. D. (2006) Automated protein identification by tandem mass spectrometry: Issues and strategies. Mass Spectrometry Reviews 25, 235-54.
Article CAS PubMed Google Scholar
Steen, H., and Mann, M. (2004) The ABC’s (and XYZ’s) of peptide sequencing. Nature Reviews Molecular Cell Biology 5, 699-711.
Article CAS PubMed Google Scholar
Veltri, P. (2008) Algorithms and tools for analysis and management of mass spectrometry data. Briefings in Bioinformatics 9, 144-55.
Article CAS PubMed Google Scholar
Webb-Robertson, B. J. M., and Cannon, W. R. (2007) Current trends in computational inference from mass spectrometry-based proteomics. Briefings in Bioinformatics 8, 304-17.
Article CAS PubMed Google Scholar
Washburn, M. P., Wolters, D., and Yates, J. R., 3rd (2001) Large-scale analysis of the yeast proteome by multidimensional protein identification technology. Nat Biotechnol 19, 242-7.
Article CAS PubMed Google Scholar
Eng, J. K., Fischer, B., Grossmann, J., and MacCoss, M. J. (2008) A fast SEQUEST cross correlation algorithm. Journal of Proteome Research 7, 4598-602.
Article CAS PubMed Google Scholar
Eng, J. K., McCormack, A. L., and Yates, J. R. (1994) An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. Journal of the American Society for Mass Spectrometry 5, 976-89.
Article CAS Google Scholar
Perkins, D. N., Pappin, D. J. C., Creasy, D. M., and Cottrell, J. S. (1999) Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20, 3551-67.
Article CAS PubMed Google Scholar
Geer, L. Y., Markey, S. P., Kowalak, J. A., Wagner, L., Xu, M., Maynard, D. M., Yang, X. Y., Shi, W. Y., and Bryant, S. H. (2004) Open mass spectrometry search algorithm. Journal of Proteome Research 3, 958-64.
Article CAS PubMed Google Scholar
Craig, R., and Beavis, R. C. (2004) TANDEM: matching proteins with tandem mass spectra. Bioinformatics 20, 1466-67.
Article CAS PubMed Google Scholar
Tabb, D. L., Fernando, C. G., and Chambers, M. C. (2007) MyriMatch: Highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis. Journal of Proteome Research 6, 654-61.
Article CAS PubMed Google Scholar
Colinge, J., Masselot, A., Giron, M., Dessingy, T., and Magnin, J. (2003) OLAV: Towards high-throughput tandem mass spectrometry data identification. Proteomics 3, 1454-63.
Article CAS PubMed Google Scholar
Park, C. Y., Klammer, A. A., Kall, L., MacCoss, M. J., and Noble, W. S. (2008) Rapid and accurate peptide identification from tandem mass spectra. Journal of Proteome Research 7, 3022-27.
Article CAS PubMed Google Scholar
Shilov, I. V., Seymour, S. L., Patel, A. A., Loboda, A., Tang, W. H., Keating, S. P., Hunter, C. L., Nuwaysir, L. M., and Schaeffer, D. A. (2007) The paragon algorithm, a next generation search engine that uses sequence temperature values and feature probabilities to identify peptides from tandem mass spectra. Molecular & Cellular Proteomics 6, 1638-55.
Article CAS Google Scholar
Tanner, S., Shu, H. J., Frank, A., Wang, L. C., Zandi, E., Mumby, M., Pevzner, P. A., and Bafna, V. (2005) InsPecT: Identification of posttransiationally modified peptides from tandem mass spectra. Analytical Chemistry 77, 4626-39.
Article CAS PubMed Google Scholar
Zhang, N., Aebersold, R., and Schwilkowski, B. (2002) ProbID: A probabilistic algorithm to identify peptides through sequence database searching using tandem mass spectral data. Proteomics 2, 1406-12.
Article CAS PubMed Google Scholar
Matthiesen, R., Trelle, M. B., Hojrup, P., Bunkenborg, J., and Jensen, O. N. (2005) VEMS 3.0: Algorithms and computational tools for tandem mass spectrometry based identification of post-translational modifications in proteins. Journal of Proteome Research 4, 2338-47.
Article CAS PubMed Google Scholar
Colinge, J., Masselot, A., Cusin, I., Mahe, E., Niknejad, A., Argoud-Puy, G., Reffas, S., Bederr, N., Gleizes, A., Rey, P. A., and Bougueleret, L. (2004) High-performance peptide identification by tandem mass spectrometry allows reliable automatic data processing in proteomics. Proteomics 4, 1977-84.
Article CAS PubMed Google Scholar
Samuelsson, J., Dalevi, D., Levander, F., and Rognvaldsson, T. (2004) Modular, scriptable and automated analysis tools for high-throughput peptide mass fingerprinting. Bioinformatics 20, 3628-35.
Article CAS PubMed Google Scholar
Frank, A. M., Bandeira, N., Shen, Z., Tanner, S., Briggs, S. P., Smith, R. D., and Pevzner, P. A. (2008) Clustering millions of tandem mass spectra. J Proteome Res 7, 113-22.
Article CAS PubMed Google Scholar
Salmi, J., Moulder, R., Filen, J. J., Nevalainen, O. S., Nyman, T. A., Lahesmaa, R., and Aittokallio, T. (2006) Quality classification of tandem mass spectrometry data. Bioinformatics 22, 400-6.
Article CAS PubMed Google Scholar
Tabb, D. L., MacCoss, M. J., Wu, C. C., Anderson, S. D., and Yates, J. R., 3rd (2003) Similarity among tandem mass spectra from proteomic experiments: detection, significance, and utility. Anal Chem 75, 2470-7.
Article CAS PubMed Google Scholar
Tabb, D. L., Thompson, M. R., Khalsa-Moyers, G., VerBerkmoes, N. C., and McDonald, W. H. (2005) MS2Grouper: group assessment and synthetic replacement of duplicate proteomic tandem mass spectra. J Am Soc Mass Spectrom 16, 1250-61.
Article CAS PubMed Google Scholar
Wong, J. W., Sullivan, M. J., Cartwright, H. M., and Cagney, G. (2007) msmsEval: tandem mass spectral quality assignment for high-throughput proteomics. BMC Bioinformatics 8, 51.
Article PubMed Google Scholar
Beer, I., Barnea, E., Ziv, T., and Admon, A. (2004) Improving large-scale proteomics by clustering of mass spectrometry data. Proteomics 4, 950-60.
Article CAS PubMed Google Scholar
Huang, Y. Y., Triscari, J. M., Tseng, G. C., Pasa-Tolic, L., Lipton, M. S., Smith, R. D., and Wysocki, V. H. (2005) Statistical characterization of the charge state and residue dependence of low-energy CID peptide dissociation patterns. Analytical Chemistry 77, 5800-13.
Article CAS PubMed Google Scholar
de Godoy, L. M. F., Olsen, J. V., Cox, J., Nielsen, M. L., Hubner, N. C., Frohlich, F., Walther, T. C., and Mann, M. (2008) Comprehensive mass-spectrometry-based proteome quantification of haploid versus diploid yeast. Nature 455, 1251-U60.
Article PubMed Google Scholar
McDonald, L., and Beynon, R. J. (2006) Positional proteomics: preparation of amino-terminal peptides as a strategy for proteome simplification and characterization. Nature Protocols 1, 1790-98.
Article CAS PubMed Google Scholar
McDonald, L., Robertson, D. H. L., Hurst, J. L., and Beynon, R. J. (2005) Positional proteomics: selective recovery and analysis of N-terminal proteolytic peptides. Nature Methods 2, 955-57.
Article CAS PubMed Google Scholar
Rodriguez, J., Gupta, N., Smith, R. D., and Pevzner, P. A. (2008) Does trypsin cut before proline? Journal of Proteome Research 7, 300-05.
Article CAS PubMed Google Scholar
Olsen, J. V., Ong, S. E., and Mann, M. (2004) Trypsin cleaves exclusively C-terminal to arginine and lysine residues. Molecular & Cellular Proteomics 3, 608-14.
Article CAS Google Scholar
Siepen, J. A., Keevil, E. J., Knight, D., and Hubbard, S. J. (2006) Prediction of missed cleavage sites in tryptic peptides aids protein identification in proteomics. Molecular & Cellular Proteomics 5, 1350.
Google Scholar
Modrek, B., and Lee, C. J. (2003) Alternative splicing in the human, mouse and rat genomes is associated with an increased frequency of exon creation and/or loss. Nature Genetics 34, 177-80.
Article CAS PubMed Google Scholar
Kersey, P. J., Duarte, J., Williams, A., Karavidopoulou, Y., Birney, E., and Apweiler, R. (2004) The International Protein Index: An integrated database for proteomics experiments. Proteomics 4, 1985-88.
Article CAS PubMed Google Scholar
Breci, L. A., Tabb, D. L., Yates, J. R., 3rd, and Wysocki, V. H. (2003) Cleavage N-terminal to proline: analysis of a database of peptide tandem mass spectra. Anal Chem 75, 1963-71.
Article CAS PubMed Google Scholar
Tabb, D. L., Huang, Y., Wysocki, V. H., and Yates, J. R., 3rd (2004) Influence of basic residue content on fragment ion peak intensities in low-energy collision-induced dissociation spectra of peptides. Anal Chem 76, 1243-8.
Article CAS PubMed Google Scholar
Tabb, D. L., Smith, L. L., Breci, L. A., Wysocki, V. H., Lin, D., and Yates, J. R., 3rd (2003) Statistical characterization of ion trap tandem mass spectra from doubly charged tryptic peptides. Anal Chem 75, 1155-63.
Article CAS PubMed Google Scholar
Elias, J. E., Gibbons, F. D., King, O. D., Roth, F. P., and Gygi, S. P. (2004) Intensity-based protein identification by machine learning from a library of tandem mass spectra. Nat Biotechnol 22, 214-9.
Article CAS PubMed Google Scholar
Gehrke, A., Sun, S., Kurgan, L., Ahn, N., Resing, K., Kafadar, K., and Cios, K. (2008) Improved machine learning method for analysis of gas phase chemistry of peptides. BMC Bioinformatics 9, 515.
Article PubMed Google Scholar
Zhou, C., Bowler, L. D., and Feng, J. (2008) A machine learning approach to explore the spectra intensity pattern of peptides using tandem mass spectrometry data. BMC Bioinformatics 9, 325.
Article PubMed Google Scholar
MacCoss, M. J., Wu, C. C., and Yates, J. R., 3rd (2002) Probability-based validation of protein identifications using a modified SEQUEST algorithm. Anal Chem 74, 5593-9.
Article CAS PubMed Google Scholar
Keller, A., Nesvizhskii, A. I., Kolker, E., and Aebersold, R. (2002) Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal Chem 74, 5383-92.
Article CAS PubMed Google Scholar
Nesvizhskii, A. I., Keller, A., Kolker, E., and Aebersold, R. (2003) A statistical model for identifying proteins by tandem mass spectrometry. Anal Chem 75, 4646-58.
Article CAS PubMed Google Scholar
Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D. J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25, 3389-402.
Article CAS PubMed Google Scholar
Balgley, B. M., Laudeman, T., Yang, L., Song, T., and Lee, C. S. (2007) Comparative evaluation of tandem MS search algorithms using a target-decoy search strategy. Mol Cell Proteomics 6, 1599-608.
Article CAS PubMed Google Scholar
Jones, A. R., Siepen, J.A., Hubbard, S.J., Paton, N.W. (2009) Improving sensitivity in proteome studies by analysis of false discovery rates for multiple search engines. Proteomics 9, 1220-9.
Google Scholar
Searle, B. C., Turner, M., and Nesvizhskii, A. I. (2008) Improving sensitivity by probabilistically combining results from multiple MS/MS search methodologies. J Proteome Res 7, 245-53.
Article CAS PubMed Google Scholar
Nesvizhskii, A. I. (2007) Protein identification by tandem mass spectrometry and sequence database searching. Methods Mol Biol 367, 87-119.
CAS PubMed Google Scholar
Choi, H., and Nesvizhskii, A. I. (2008) False discovery rates and related statistical concepts in mass spectrometry-based proteomics. J Proteome Res 7, 47-50.
Article CAS PubMed Google Scholar
Kall, L., Storey, J. D., MacCoss, M. J., and Noble, W. S. (2008) Assigning significance to peptides identified by tandem mass spectrometry using decoy databases. J Proteome Res 7, 29-34.
Article PubMed Google Scholar
Kall, L., Storey, J. D., MacCoss, M. J., and Noble, W. S. (2008) Posterior error probabilities and false discovery rates: two sides of the same coin. J Proteome Res 7, 40-4.
Article PubMed Google Scholar
Kim, S., Gupta, N., and Pevzner, P. A. (2008) Spectral probabilities and generating functions of tandem mass spectra: a strike against decoy databases. J Proteome Res 7, 3354-63.
Article CAS PubMed Google Scholar
Nesvizhskii, A. I., Vitek, O., and Aebersold, R. (2007) Analysis and validation of proteomic data generated by tandem mass spectrometry. Nature Methods 4, 787-97.
Article CAS PubMed Google Scholar
Peng, J., Elias, J. E., Thoreen, C. C., Licklider, L. J., and Gygi, S. P. (2003) Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LC-MS/MS) for large-scale protein analysis: the yeast proteome. J Proteome Res 2, 43-50.
Article CAS PubMed Google Scholar
Tabb, D. L. (2008) What’s driving false discovery rates? J Proteome Res 7, 45-6.
Article CAS PubMed Google Scholar
Wang, G., Wu, W. W., Zhang, Z., Masilamani, S., and Shen, R. F. (2009) Decoy methods for assessing false positives and false discovery rates in shotgun proteomics. Anal Chem 81, 146-59.
Article CAS PubMed Google Scholar
Nesvizhskii, A. I., and Aebersold, R. (2005) Interpretation of shotgun proteomic data: the protein inference problem. Mol Cell Proteomics 4, 1419-40.
Article CAS PubMed Google Scholar
Chepanoske, C. L., Richardson, B. E., von Rechenberg, M., and Peltier, J. M. (2005) Average peptide score: a useful parameter for identification of proteins derived from database searches of liquid chromatography/tandem mass spectrometry data. Rapid Commun Mass Spectrom 19, 9-14.
Article CAS PubMed Google Scholar
Shadforth, I., Dunkley, T., Lilley, K., Crowther, D., and Bessant, C. (2005) Confident protein identification using the average peptide score method coupled with search-specific, ab initio thresholds. Rapid Commun Mass Spectrom 19, 3363-8.
Article CAS PubMed Google Scholar
Wright, J. C., Sugden, D., Francis-McIntyre, S., Riba-Garcia, I., Gaskell, S. J., Grigoriev, I. V., Baker, S. E., Beynon, R. J., and Hubbard, S. J. (2009) Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger. BMC Genomics 10, 61.
Article PubMed Google Scholar
Taylor, C. F. (2006) Minimum reporting requirements for proteomics: a MIAPE primer. Proteomics 6 Suppl 2, 39-44.
Article PubMed Google Scholar
Taylor, C. F., Paton, N. W., Lilley, K. S., Binz, P. A., Julian, R. K., Jr., Jones, A. R., Zhu, W., Apweiler, R., Aebersold, R., Deutsch, E. W., Dunn, M. J., Heck, A. J., Leitner, A., Macht, M., Mann, M., Martens, L., Neubert, T. A., Patterson, S. D., Ping, P., Seymour, S. L., Souda, P., Tsugita, A., Vandekerckhove, J., Vondriska, T. M., Whitelegge, J. P., Wilkins, M. R., Xenarios, I., Yates, J. R., 3rd, and Hermjakob, H. (2007) The minimum information about a proteomics experiment (MIAPE). Nat Biotechnol 25, 887-93.
Article CAS PubMed Google Scholar
Mead, J. A., Shadforth, I. P., and Bessant, C. (2007) Public proteomic MS repositories and pipelines: available tools and biological applications. Proteomics 7, 2769-86.
Article CAS PubMed Google Scholar

Download references

Acknowledgments

The author would like to thank Jenny Siepen, Julian Selley and Andrew Jones for useful comments on the manuscript, and BBSRC for support from various research grants (BB/E024912/1, BB/F004605/1).as well as the EU ProDaC grant (European Commission project, 6th framework programme, project number LSHG-CT-2006-036814).

Author information

Authors and Affiliations

Faculty of Life Sciences, University of Manchester, Michael Smith Building, Manchester, UK
Simon J. Hubbard

Authors

Simon J. Hubbard
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Simon J. Hubbard .

Editor information

Editors and Affiliations

Fac. Life Sciences, University of Manchester, Oxford Rd., Manchester, M13 9PT, United Kingdom
Simon J. Hubbard
Fac. Veterinary Science, University of Liverpool, Liverpool, L69 7ZJ, United Kingdom
Andrew R. Jones

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Hubbard, S.J. (2010). Computational Approaches to Peptide Identification via Tandem MS. In: Hubbard, S., Jones, A. (eds) Proteome Bioinformatics. Methods in Molecular Biology™, vol 604. Humana Press. https://doi.org/10.1007/978-1-60761-444-9_3

Download citation

DOI: https://doi.org/10.1007/978-1-60761-444-9_3
Published: 05 December 2009
Publisher Name: Humana Press
Print ISBN: 978-1-60761-443-2
Online ISBN: 978-1-60761-444-9
eBook Packages: Springer Protocols

Publish with us

Policies and ethics