Abstract
In recent years, the generation and interpretation of MS/MS spectra for the identification of peptides and proteins has matured to a frequently used automatic workflow in Proteomics. Several software solutions for the automated analysis of MS/MS spectra allow for high-throughput/high-performance analyses of complex samples. Related to MS/MS searches, target–decoy approaches have gained more and more popularity: in a “decoy” part of the search database nonexistent sequences mimic real sequences (the “target” sequences). With their help, the number of falsely identified peptides/proteins can be estimated after a search and the resulting protein list can be cut at a specified false discovery rate (FDR). This is an essential prerequisite for all quantitative approaches, as they rely on correct identifications. Especially the label-free approach “spectral counting”—gaining more and more popularity due to low costs and simplicity—depends directly on the correctness of peptide–spectrum matches (PSMs). This work’s aim is to describe five popular search engines—especially their general properties regarding protein identification, but also their quantification abilities, if those go beyond spectral counting. By doing so, Proteomics researchers are enabled to compare their features and to choose an appropriate solution for their specific question. Furthermore, the search engines are applied to a spectrum data set generated from a complex sample with a Thermo LTQ Velos OrbiTrap (Thermo Fisher Scientific, Waltham, MA, USA). The results of the search engines are compared, e.g., regarding time requirements, peptides and proteins found, and the search engines’ behavior using the decoy approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Hunt DF, Yates JR 3rd, Shabanowitz J et al (1986) Protein sequencing by tandem mass spectrometry. Proc Natl Acad Sci USA 83(17):6233–6237
Eng JK, McCormack AL, Yates JR (1994) An approach to correlate tandem mass-spectral data of peptides with amino-acid-sequences in a protein database. J Am Soc Mass Spectrom 5(11):976–989
Perkins DN, Pappin DJ, Creasy DM et al (1999) Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20(18):3551–3567
Colinge J, Masselot A, Giron M et al (2003) OLAV: towards high-throughput tandem mass spectrometry data identification. Proteomics 3(8):1454–1463
Craig R, Beavis RC (2004) TANDEM: matching proteins with tandem mass spectra. Bioinformatics 20(9):1466–1467
Geer LY, Markey SP, Kowalak JA et al (2004) Open mass spectrometry search algorithm. J Proteome Res 3(5):958–964
Peng J, Elias JE, Thoreen CC et al (2003) Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LC-MS/MS) for large-scale protein analysis: the yeast proteome. J Proteome Res 2(1):43–50
Cargile BJ, Bundy JL, Stephenson JL Jr et al (2004) Potential for false positive identifications from large databases through tandem mass spectrometry. J Proteome Res 3(5):1082–1085
Elias JE, Haas W, Faherty BK et al (2005) Comparative evaluation of mass spectrometry platforms used in large-scale Proteomics investigations. Nat Methods 2(9):667–675
Kapp EA, Schutz F, Connolly LM et al (2005) An evaluation, comparison, and accurate benchmarking of several publicly available MS/MS search algorithms: sensitivity and specificity analysis. Proteomics 5(13):3475–3490
Reidegeld KA, Eisenacher M, Kohl M et al (2008) An easy-to-use Decoy Database Builder software tool, implementing different decoy strategies for false discovery rate calculation in automated MS/MS protein identifications. Proteomics 8(6):1129–1137
Balgley BM, Laudeman T, Yang L, Song T et al (2007) Comparative evaluation of tandem MS search algorithms using a target-decoy search strategy. Mol Cell Proteomics 6(9):1599–1608
Lam H, Deutsch EW, Aebersold R (2010) Artificial decoy spectral libraries for false discovery rate estimation in spectral library searching in Proteomics. J Proteome Res 9(1):605–610
Lam H, Deutsch EW, Eddes JS et al (2007) Development and validation of a spectral library searching method for peptide identification from MS/MS. Proteomics 7(5):655–667
Seidler J, Zinn N, Boehm ME et al (2010) De novo sequencing of peptides by MS/MS. Proteomics 10(4):634–649
Sadygov RG, Cociorva D, Yates JR 3rd (2004) Large-scale database searching using tandem mass spectra: looking up the answer in the back of the book. Nat Methods 1(3):195–202
Pappin DJ, Hojrup P, Bleasby AJ (1993) Rapid identification of proteins by peptide-mass fingerprinting. Curr Biol 3(6):327–332
Ong SE, Mann M (2005) Mass spectrometry-based Proteomics turns quantitative. Nat Chem Biol 1(5):252–262
Silva JC, Gorenstein MV, Li GZ et al (2006) Absolute quantification of proteins by LCMSE: a virtue of parallel MS acquisition. Mol Cell Proteomics 5(1):144–156
Ishihama Y, Oda Y, Tabata T et al (2005) Exponentially modified protein abundance index (emPAI) for estimation of absolute protein amount in Proteomics by the number of sequenced peptides per protein. Mol Cell Proteomics 4(9):1265–1272
Park CY, Klammer AA, Kall L et al (2008) Rapid and accurate peptide identification from tandem mass spectra. J Proteome Res 7(7):3022–3027
Eng JK, Fischer B, Grossmann J et al (2008) A fast SEQUEST cross correlation algorithm. J Proteome Res 7(10):4598–4602
Faherty BK, Gerber SA (2010) MacroSEQUEST: Efficient candidate-centric searching and high-resolution correlation analysis for large-scale Proteomics data sets. Anal Chem 82(16):6821–6829
Keller A, Eng J, Zhang N et al (2005) A uniform Proteomics MS/MS analysis platform utilizing open XML file formats. Mol Syst Biol 1:2005.0017
Turewicz M, Deutsch EW (2011) Spectra, chromatograms, Metadata: mzML-the standard data format for mass spectrometer output. Meth Mol Biol 696:179–203
Fenyo D, Beavis RC (2003) A method for assessing the statistical significance of mass spectrometry-based protein identifications using general scoring schemes. Anal Chem 75(4):768–774
Sadygov RG, Yates JR 3rd (2003) A hypergeometric probability model for protein identification and validation using tandem mass spectral data and protein sequence databases. Anal Chem 75(15):3792–3798
Keller A, Nesvizhskii AI, Kolker E et al (2002) Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal Chem 74(20):5383–5392
Nesvizhskii AI, Keller A, Kolker E et al (2003) A statistical model for identifying proteins by tandem mass spectrometry. Anal Chem 75(17):4646–4658
Kallberg M, Lu H (2010) An improved machine learning protocol for the identification of correct Sequest search results. BMC Bioinformatics 11:591
Klammer AA, Park CY, Noble WS (2009) Statistical calibration of the SEQUEST XCorr function. J Proteome Res 8(4):2106–2113
Link AJ, Eng J, Schieltz DM et al (1999) Direct analysis of protein complexes using mass spectrometry. Nat Biotechnol 17(7):676–682
Han DK, Eng J, Zhou HL et al (2001) Quantitative profiling of differentiation-induced microsomal proteins using isotope-coded affinity tags and mass spectrometry. Nat Biotechnol 19(10):946–951
Balgley BM, Laudeman T, Yang L et al (2007) Comparative evaluation of tandem MS search algorithms using a target-decoy search strategy. Mol Cell Proteomics 6(9):1599–1608
Pendarvis K, Kumar R, Burgess SC et al (2009) An automated proteomic data analysis workflow for mass spectrometry. BMC Bioinformatics 10:S17
Acknowledgments
Martin Eisenacher and Christian Stephan are funded from P.U.R.E. (Protein Unit for Research in Europe), a project of Nordrhein-Westfalen, a federal state of Germany. Michael Kohl is paid by “NGFN-Plus, Verbundprojekt: Funktionelle Genomik der Parkinson-Erkrankung”—contract number 01GS08143. Markus-Hermann Koch and Julian Uszkoreit are part of CLIB (“Cluster Industrielle Biotechnologie”) within the QProM project—contract number 616 40003 0315413B. Michael Turewicz is funded by “Hightech.NRW” in the project “Entwicklung eines Biomarker-Chips für das Parkinson-Syndrom (ParkCHIP).” The authors want to thank Heiner Falkenberg and Hanna Diehl for fruitful discussions about samples and instruments, and Maike Ahrens and Jesse Goering for proofreading.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Eisenacher, M., Kohl, M., Turewicz, M., Koch, MH., Uszkoreit, J., Stephan, C. (2012). Search and Decoy: The Automatic Identification of Mass Spectra. In: Marcus, K. (eds) Quantitative Methods in Proteomics. Methods in Molecular Biology, vol 893. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-61779-885-6_28
Download citation
DOI: https://doi.org/10.1007/978-1-61779-885-6_28
Published:
Publisher Name: Humana Press, Totowa, NJ
Print ISBN: 978-1-61779-884-9
Online ISBN: 978-1-61779-885-6
eBook Packages: Springer Protocols