Skip to main content

Search and Decoy: The Automatic Identification of Mass Spectra

  • Protocol
  • First Online:
Quantitative Methods in Proteomics

Abstract

In recent years, the generation and interpretation of MS/MS spectra for the identification of peptides and proteins has matured to a frequently used automatic workflow in Proteomics. Several software solutions for the automated analysis of MS/MS spectra allow for high-throughput/high-performance analyses of complex samples. Related to MS/MS searches, target–decoy approaches have gained more and more popularity: in a “decoy” part of the search database nonexistent sequences mimic real sequences (the “target” sequences). With their help, the number of falsely identified peptides/proteins can be estimated after a search and the resulting protein list can be cut at a specified false discovery rate (FDR). This is an essential prerequisite for all quantitative approaches, as they rely on correct identifications. Especially the label-free approach “spectral counting”—gaining more and more popularity due to low costs and simplicity—depends directly on the correctness of peptide–spectrum matches (PSMs). This work’s aim is to describe five popular search engines—especially their general properties regarding protein identification, but also their quantification abilities, if those go beyond spectral counting. By doing so, Proteomics researchers are enabled to compare their features and to choose an appropriate solution for their specific question. Furthermore, the search engines are applied to a spectrum data set generated from a complex sample with a Thermo LTQ Velos OrbiTrap (Thermo Fisher Scientific, Waltham, MA, USA). The results of the search engines are compared, e.g., regarding time requirements, peptides and proteins found, and the search engines’ behavior using the decoy approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Hunt DF, Yates JR 3rd, Shabanowitz J et al (1986) Protein sequencing by tandem mass spectrometry. Proc Natl Acad Sci USA 83(17):6233–6237

    Article  PubMed  CAS  Google Scholar 

  2. Eng JK, McCormack AL, Yates JR (1994) An approach to correlate tandem mass-spectral data of peptides with amino-acid-sequences in a protein database. J Am Soc Mass Spectrom 5(11):976–989

    Article  CAS  Google Scholar 

  3. Perkins DN, Pappin DJ, Creasy DM et al (1999) Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20(18):3551–3567

    Article  PubMed  CAS  Google Scholar 

  4. Colinge J, Masselot A, Giron M et al (2003) OLAV: towards high-throughput tandem mass spectrometry data identification. Proteomics 3(8):1454–1463

    Article  PubMed  CAS  Google Scholar 

  5. Craig R, Beavis RC (2004) TANDEM: matching proteins with tandem mass spectra. Bioinformatics 20(9):1466–1467

    Article  PubMed  CAS  Google Scholar 

  6. Geer LY, Markey SP, Kowalak JA et al (2004) Open mass spectrometry search algorithm. J Proteome Res 3(5):958–964

    Article  PubMed  CAS  Google Scholar 

  7. Peng J, Elias JE, Thoreen CC et al (2003) Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LC-MS/MS) for large-scale protein analysis: the yeast proteome. J Proteome Res 2(1):43–50

    Article  PubMed  CAS  Google Scholar 

  8. Cargile BJ, Bundy JL, Stephenson JL Jr et al (2004) Potential for false positive identifications from large databases through tandem mass spectrometry. J Proteome Res 3(5):1082–1085

    Article  PubMed  CAS  Google Scholar 

  9. Elias JE, Haas W, Faherty BK et al (2005) Comparative evaluation of mass spectrometry platforms used in large-scale Proteomics investigations. Nat Methods 2(9):667–675

    Article  PubMed  CAS  Google Scholar 

  10. Kapp EA, Schutz F, Connolly LM et al (2005) An evaluation, comparison, and accurate benchmarking of several publicly available MS/MS search algorithms: sensitivity and specificity analysis. Proteomics 5(13):3475–3490

    Article  PubMed  CAS  Google Scholar 

  11. Reidegeld KA, Eisenacher M, Kohl M et al (2008) An easy-to-use Decoy Database Builder software tool, implementing different decoy strategies for false discovery rate calculation in automated MS/MS protein identifications. Proteomics 8(6):1129–1137

    Article  PubMed  CAS  Google Scholar 

  12. Balgley BM, Laudeman T, Yang L, Song T et al (2007) Comparative evaluation of tandem MS search algorithms using a target-decoy search strategy. Mol Cell Proteomics 6(9):1599–1608

    Article  PubMed  CAS  Google Scholar 

  13. Lam H, Deutsch EW, Aebersold R (2010) Artificial decoy spectral libraries for false discovery rate estimation in spectral library searching in Proteomics. J Proteome Res 9(1):605–610

    Article  PubMed  CAS  Google Scholar 

  14. Lam H, Deutsch EW, Eddes JS et al (2007) Development and validation of a spectral library searching method for peptide identification from MS/MS. Proteomics 7(5):655–667

    Article  PubMed  CAS  Google Scholar 

  15. Seidler J, Zinn N, Boehm ME et al (2010) De novo sequencing of peptides by MS/MS. Proteomics 10(4):634–649

    Article  PubMed  CAS  Google Scholar 

  16. Sadygov RG, Cociorva D, Yates JR 3rd (2004) Large-scale database searching using tandem mass spectra: looking up the answer in the back of the book. Nat Methods 1(3):195–202

    Article  PubMed  CAS  Google Scholar 

  17. Pappin DJ, Hojrup P, Bleasby AJ (1993) Rapid identification of proteins by peptide-mass fingerprinting. Curr Biol 3(6):327–332

    Article  PubMed  CAS  Google Scholar 

  18. Ong SE, Mann M (2005) Mass spectrometry-based Proteomics turns quantitative. Nat Chem Biol 1(5):252–262

    Article  PubMed  CAS  Google Scholar 

  19. Silva JC, Gorenstein MV, Li GZ et al (2006) Absolute quantification of proteins by LCMSE: a virtue of parallel MS acquisition. Mol Cell Proteomics 5(1):144–156

    PubMed  CAS  Google Scholar 

  20. Ishihama Y, Oda Y, Tabata T et al (2005) Exponentially modified protein abundance index (emPAI) for estimation of absolute protein amount in Proteomics by the number of sequenced peptides per protein. Mol Cell Proteomics 4(9):1265–1272

    Article  PubMed  CAS  Google Scholar 

  21. Park CY, Klammer AA, Kall L et al (2008) Rapid and accurate peptide identification from tandem mass spectra. J Proteome Res 7(7):3022–3027

    Article  PubMed  CAS  Google Scholar 

  22. Eng JK, Fischer B, Grossmann J et al (2008) A fast SEQUEST cross correlation algorithm. J Proteome Res 7(10):4598–4602

    Article  PubMed  CAS  Google Scholar 

  23. Faherty BK, Gerber SA (2010) MacroSEQUEST: Efficient candidate-centric searching and high-resolution correlation analysis for large-scale Proteomics data sets. Anal Chem 82(16):6821–6829

    Article  PubMed  CAS  Google Scholar 

  24. Keller A, Eng J, Zhang N et al (2005) A uniform Proteomics MS/MS analysis platform utilizing open XML file formats. Mol Syst Biol 1:2005.0017

    Article  PubMed  Google Scholar 

  25. Turewicz M, Deutsch EW (2011) Spectra, chromatograms, Metadata: mzML-the standard data format for mass spectrometer output. Meth Mol Biol 696:179–203

    Article  CAS  Google Scholar 

  26. Fenyo D, Beavis RC (2003) A method for assessing the statistical significance of mass spectrometry-based protein identifications using general scoring schemes. Anal Chem 75(4):768–774

    Article  PubMed  Google Scholar 

  27. Sadygov RG, Yates JR 3rd (2003) A hypergeometric probability model for protein identification and validation using tandem mass spectral data and protein sequence databases. Anal Chem 75(15):3792–3798

    Article  PubMed  CAS  Google Scholar 

  28. Keller A, Nesvizhskii AI, Kolker E et al (2002) Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal Chem 74(20):5383–5392

    Article  PubMed  CAS  Google Scholar 

  29. Nesvizhskii AI, Keller A, Kolker E et al (2003) A statistical model for identifying proteins by tandem mass spectrometry. Anal Chem 75(17):4646–4658

    Article  PubMed  CAS  Google Scholar 

  30. Kallberg M, Lu H (2010) An improved machine learning protocol for the identification of correct Sequest search results. BMC Bioinformatics 11:591

    Article  PubMed  CAS  Google Scholar 

  31. Klammer AA, Park CY, Noble WS (2009) Statistical calibration of the SEQUEST XCorr function. J Proteome Res 8(4):2106–2113

    Article  PubMed  CAS  Google Scholar 

  32. Link AJ, Eng J, Schieltz DM et al (1999) Direct analysis of protein complexes using mass spectrometry. Nat Biotechnol 17(7):676–682

    Article  PubMed  CAS  Google Scholar 

  33. Han DK, Eng J, Zhou HL et al (2001) Quantitative profiling of differentiation-induced microsomal proteins using isotope-coded affinity tags and mass spectrometry. Nat Biotechnol 19(10):946–951

    Article  PubMed  CAS  Google Scholar 

  34. Balgley BM, Laudeman T, Yang L et al (2007) Comparative evaluation of tandem MS search algorithms using a target-decoy search strategy. Mol Cell Proteomics 6(9):1599–1608

    Article  PubMed  CAS  Google Scholar 

  35. Pendarvis K, Kumar R, Burgess SC et al (2009) An automated proteomic data analysis workflow for mass spectrometry. BMC Bioinformatics 10:S17

    Article  PubMed  Google Scholar 

Download references

Acknowledgments

Martin Eisenacher and Christian Stephan are funded from P.U.R.E. (Protein Unit for Research in Europe), a project of Nordrhein-Westfalen, a federal state of Germany. Michael Kohl is paid by “NGFN-Plus, Verbundprojekt: Funktionelle Genomik der Parkinson-Erkrankung”—contract number 01GS08143. Markus-Hermann Koch and Julian Uszkoreit are part of CLIB (“Cluster Industrielle Biotechnologie”) within the QProM project—contract number 616 40003 0315413B. Michael Turewicz is funded by “Hightech.NRW” in the project “Entwicklung eines Biomarker-Chips für das Parkinson-Syndrom (ParkCHIP).” The authors want to thank Heiner Falkenberg and Hanna Diehl for fruitful discussions about samples and instruments, and Maike Ahrens and Jesse Goering for proofreading.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Martin Eisenacher .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Eisenacher, M., Kohl, M., Turewicz, M., Koch, MH., Uszkoreit, J., Stephan, C. (2012). Search and Decoy: The Automatic Identification of Mass Spectra. In: Marcus, K. (eds) Quantitative Methods in Proteomics. Methods in Molecular Biology, vol 893. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-61779-885-6_28

Download citation

  • DOI: https://doi.org/10.1007/978-1-61779-885-6_28

  • Published:

  • Publisher Name: Humana Press, Totowa, NJ

  • Print ISBN: 978-1-61779-884-9

  • Online ISBN: 978-1-61779-885-6

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics