Search and Decoy: The Automatic Identification of Mass Spectra

Eisenacher, Martin; Kohl, Michael; Turewicz, Michael; Koch, Markus-Hermann; Uszkoreit, Julian; Stephan, Christian

doi:10.1007/978-1-61779-885-6_28

Martin Eisenacher²,
Michael Kohl³,
Michael Turewicz³,
Markus-Hermann Koch³,
Julian Uszkoreit³ &
…
Christian Stephan²

Part of the book series: Methods in Molecular Biology ((MIMB,volume 893))

5783 Accesses
8 Citations

Abstract

In recent years, the generation and interpretation of MS/MS spectra for the identification of peptides and proteins has matured to a frequently used automatic workflow in Proteomics. Several software solutions for the automated analysis of MS/MS spectra allow for high-throughput/high-performance analyses of complex samples. Related to MS/MS searches, target–decoy approaches have gained more and more popularity: in a “decoy” part of the search database nonexistent sequences mimic real sequences (the “target” sequences). With their help, the number of falsely identified peptides/proteins can be estimated after a search and the resulting protein list can be cut at a specified false discovery rate (FDR). This is an essential prerequisite for all quantitative approaches, as they rely on correct identifications. Especially the label-free approach “spectral counting”—gaining more and more popularity due to low costs and simplicity—depends directly on the correctness of peptide–spectrum matches (PSMs). This work’s aim is to describe five popular search engines—especially their general properties regarding protein identification, but also their quantification abilities, if those go beyond spectral counting. By doing so, Proteomics researchers are enabled to compare their features and to choose an appropriate solution for their specific question. Furthermore, the search engines are applied to a spectrum data set generated from a complex sample with a Thermo LTQ Velos OrbiTrap (Thermo Fisher Scientific, Waltham, MA, USA). The results of the search engines are compared, e.g., regarding time requirements, peptides and proteins found, and the search engines’ behavior using the decoy approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 159.00; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Hunt DF, Yates JR 3rd, Shabanowitz J et al (1986) Protein sequencing by tandem mass spectrometry. Proc Natl Acad Sci USA 83(17):6233–6237
Article PubMed CAS Google Scholar
Eng JK, McCormack AL, Yates JR (1994) An approach to correlate tandem mass-spectral data of peptides with amino-acid-sequences in a protein database. J Am Soc Mass Spectrom 5(11):976–989
Article CAS Google Scholar
Perkins DN, Pappin DJ, Creasy DM et al (1999) Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20(18):3551–3567
Article PubMed CAS Google Scholar
Colinge J, Masselot A, Giron M et al (2003) OLAV: towards high-throughput tandem mass spectrometry data identification. Proteomics 3(8):1454–1463
Article PubMed CAS Google Scholar
Craig R, Beavis RC (2004) TANDEM: matching proteins with tandem mass spectra. Bioinformatics 20(9):1466–1467
Article PubMed CAS Google Scholar
Geer LY, Markey SP, Kowalak JA et al (2004) Open mass spectrometry search algorithm. J Proteome Res 3(5):958–964
Article PubMed CAS Google Scholar
Peng J, Elias JE, Thoreen CC et al (2003) Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LC-MS/MS) for large-scale protein analysis: the yeast proteome. J Proteome Res 2(1):43–50
Article PubMed CAS Google Scholar
Cargile BJ, Bundy JL, Stephenson JL Jr et al (2004) Potential for false positive identifications from large databases through tandem mass spectrometry. J Proteome Res 3(5):1082–1085
Article PubMed CAS Google Scholar
Elias JE, Haas W, Faherty BK et al (2005) Comparative evaluation of mass spectrometry platforms used in large-scale Proteomics investigations. Nat Methods 2(9):667–675
Article PubMed CAS Google Scholar
Kapp EA, Schutz F, Connolly LM et al (2005) An evaluation, comparison, and accurate benchmarking of several publicly available MS/MS search algorithms: sensitivity and specificity analysis. Proteomics 5(13):3475–3490
Article PubMed CAS Google Scholar
Reidegeld KA, Eisenacher M, Kohl M et al (2008) An easy-to-use Decoy Database Builder software tool, implementing different decoy strategies for false discovery rate calculation in automated MS/MS protein identifications. Proteomics 8(6):1129–1137
Article PubMed CAS Google Scholar
Balgley BM, Laudeman T, Yang L, Song T et al (2007) Comparative evaluation of tandem MS search algorithms using a target-decoy search strategy. Mol Cell Proteomics 6(9):1599–1608
Article PubMed CAS Google Scholar
Lam H, Deutsch EW, Aebersold R (2010) Artificial decoy spectral libraries for false discovery rate estimation in spectral library searching in Proteomics. J Proteome Res 9(1):605–610
Article PubMed CAS Google Scholar
Lam H, Deutsch EW, Eddes JS et al (2007) Development and validation of a spectral library searching method for peptide identification from MS/MS. Proteomics 7(5):655–667
Article PubMed CAS Google Scholar
Seidler J, Zinn N, Boehm ME et al (2010) De novo sequencing of peptides by MS/MS. Proteomics 10(4):634–649
Article PubMed CAS Google Scholar
Sadygov RG, Cociorva D, Yates JR 3rd (2004) Large-scale database searching using tandem mass spectra: looking up the answer in the back of the book. Nat Methods 1(3):195–202
Article PubMed CAS Google Scholar
Pappin DJ, Hojrup P, Bleasby AJ (1993) Rapid identification of proteins by peptide-mass fingerprinting. Curr Biol 3(6):327–332
Article PubMed CAS Google Scholar
Ong SE, Mann M (2005) Mass spectrometry-based Proteomics turns quantitative. Nat Chem Biol 1(5):252–262
Article PubMed CAS Google Scholar
Silva JC, Gorenstein MV, Li GZ et al (2006) Absolute quantification of proteins by LCMSE: a virtue of parallel MS acquisition. Mol Cell Proteomics 5(1):144–156
PubMed CAS Google Scholar
Ishihama Y, Oda Y, Tabata T et al (2005) Exponentially modified protein abundance index (emPAI) for estimation of absolute protein amount in Proteomics by the number of sequenced peptides per protein. Mol Cell Proteomics 4(9):1265–1272
Article PubMed CAS Google Scholar
Park CY, Klammer AA, Kall L et al (2008) Rapid and accurate peptide identification from tandem mass spectra. J Proteome Res 7(7):3022–3027
Article PubMed CAS Google Scholar
Eng JK, Fischer B, Grossmann J et al (2008) A fast SEQUEST cross correlation algorithm. J Proteome Res 7(10):4598–4602
Article PubMed CAS Google Scholar
Faherty BK, Gerber SA (2010) MacroSEQUEST: Efficient candidate-centric searching and high-resolution correlation analysis for large-scale Proteomics data sets. Anal Chem 82(16):6821–6829
Article PubMed CAS Google Scholar
Keller A, Eng J, Zhang N et al (2005) A uniform Proteomics MS/MS analysis platform utilizing open XML file formats. Mol Syst Biol 1:2005.0017
Article PubMed Google Scholar
Turewicz M, Deutsch EW (2011) Spectra, chromatograms, Metadata: mzML-the standard data format for mass spectrometer output. Meth Mol Biol 696:179–203
Article CAS Google Scholar
Fenyo D, Beavis RC (2003) A method for assessing the statistical significance of mass spectrometry-based protein identifications using general scoring schemes. Anal Chem 75(4):768–774
Article PubMed Google Scholar
Sadygov RG, Yates JR 3rd (2003) A hypergeometric probability model for protein identification and validation using tandem mass spectral data and protein sequence databases. Anal Chem 75(15):3792–3798
Article PubMed CAS Google Scholar
Keller A, Nesvizhskii AI, Kolker E et al (2002) Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal Chem 74(20):5383–5392
Article PubMed CAS Google Scholar
Nesvizhskii AI, Keller A, Kolker E et al (2003) A statistical model for identifying proteins by tandem mass spectrometry. Anal Chem 75(17):4646–4658
Article PubMed CAS Google Scholar
Kallberg M, Lu H (2010) An improved machine learning protocol for the identification of correct Sequest search results. BMC Bioinformatics 11:591
Article PubMed CAS Google Scholar
Klammer AA, Park CY, Noble WS (2009) Statistical calibration of the SEQUEST XCorr function. J Proteome Res 8(4):2106–2113
Article PubMed CAS Google Scholar
Link AJ, Eng J, Schieltz DM et al (1999) Direct analysis of protein complexes using mass spectrometry. Nat Biotechnol 17(7):676–682
Article PubMed CAS Google Scholar
Han DK, Eng J, Zhou HL et al (2001) Quantitative profiling of differentiation-induced microsomal proteins using isotope-coded affinity tags and mass spectrometry. Nat Biotechnol 19(10):946–951
Article PubMed CAS Google Scholar
Balgley BM, Laudeman T, Yang L et al (2007) Comparative evaluation of tandem MS search algorithms using a target-decoy search strategy. Mol Cell Proteomics 6(9):1599–1608
Article PubMed CAS Google Scholar
Pendarvis K, Kumar R, Burgess SC et al (2009) An automated proteomic data analysis workflow for mass spectrometry. BMC Bioinformatics 10:S17
Article PubMed Google Scholar

Download references

Acknowledgments

Martin Eisenacher and Christian Stephan are funded from P.U.R.E. (Protein Unit for Research in Europe), a project of Nordrhein-Westfalen, a federal state of Germany. Michael Kohl is paid by “NGFN-Plus, Verbundprojekt: Funktionelle Genomik der Parkinson-Erkrankung”—contract number 01GS08143. Markus-Hermann Koch and Julian Uszkoreit are part of CLIB (“Cluster Industrielle Biotechnologie”) within the QProM project—contract number 616 40003 0315413B. Michael Turewicz is funded by “Hightech.NRW” in the project “Entwicklung eines Biomarker-Chips für das Parkinson-Syndrom (ParkCHIP).” The authors want to thank Heiner Falkenberg and Hanna Diehl for fruitful discussions about samples and instruments, and Maike Ahrens and Jesse Goering for proofreading.

Author information

Authors and Affiliations

Department of Medical Proteomics/Bioanalytics, Medizinishchces Proteom-Center, Ruhr-Universität Bochum, Bochum, Germany
Martin Eisenacher & Christian Stephan
Department of Medical Proteomics/Bioanalytics, Medizinisches Proteom-Center, Ruhr-Universität Bochum, Bochum, Germany
Michael Kohl, Michael Turewicz, Markus-Hermann Koch & Julian Uszkoreit

Authors

Martin Eisenacher
View author publications
You can also search for this author in PubMed Google Scholar
Michael Kohl
View author publications
You can also search for this author in PubMed Google Scholar
Michael Turewicz
View author publications
You can also search for this author in PubMed Google Scholar
Markus-Hermann Koch
View author publications
You can also search for this author in PubMed Google Scholar
Julian Uszkoreit
View author publications
You can also search for this author in PubMed Google Scholar
Christian Stephan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Martin Eisenacher .

Editor information

Editors and Affiliations

Fak. Medizin, Medizinische Proteom Center, Universität Bochum, Universitätsstraße 150, Bochum, 44801, Germany
Katrin Marcus

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Eisenacher, M., Kohl, M., Turewicz, M., Koch, MH., Uszkoreit, J., Stephan, C. (2012). Search and Decoy: The Automatic Identification of Mass Spectra. In: Marcus, K. (eds) Quantitative Methods in Proteomics. Methods in Molecular Biology, vol 893. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-61779-885-6_28

Download citation

DOI: https://doi.org/10.1007/978-1-61779-885-6_28
Published: 07 May 2012
Publisher Name: Humana Press, Totowa, NJ
Print ISBN: 978-1-61779-884-9
Online ISBN: 978-1-61779-885-6
eBook Packages: Springer Protocols

Publish with us

Policies and ethics