Skip to main content

MS-DPR: An Algorithm for Computing Statistical Significance of Spectral Identifications of Non-linear Peptides

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 7534))

Abstract

While non-linear peptide natural products such as Vanco- mycin and Daptomycin are among the most effective antibiotics, the computational techniques for sequencing such peptides are still in infancy. Previous methods for sequencing peptide natural products are based on Nuclear Magnetic Resonance spectroscopy, and require large amounts (milligrams) of purified materials. Recently, development of mass spectrometry based methods has enabled accurate sequencing of non-linear peptidic natural products using picograms of materials, but the question of evaluating statistical significance of Peptide Spectrum Matches (PSM) for these peptides remains open. Moreover, it is unclear how to decide whether a given spectrum is produced by linear, cyclic, or branch-cyclic peptide. Surprisingly, all previous mass spectrometery studies overlooked the fact that a very similar problem has been succesfully addressed in particle physics in 1951. In this paper we develop a method for estimating statistical significance of PSMs defined by non-linear peptides, which makes it possible to identify whether a peptide is linear, cyclic or branch-cyclic, an important step toward identification of peptidic natural products.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Li, J.W., Vederas, J.C.: Drug discovery and natural products: end of an era or an endless frontier? Science 325, 161–165 (2009)

    Article  Google Scholar 

  2. Ng, J., Bandeira, N., Liu, W.T., Ghassemian, M., Simmons, T.L., Gerwick, W.H., Linington, R., Dorrestein, P.C., Pevzner, P.A.: Dereplication and de novo sequencing of nonribosomal peptides. Nature Methods 6, 596–599 (2009)

    Article  Google Scholar 

  3. Mohimani, H., Liu, W.T., Liang, Y., Gaudenico, S., Fenical, W., Dorrestein, P.C., Pevzner, P.: Multiplex de novo sequencing of peptide antibiotics. J. Comp. Biol. 18(11), 1371–1381 (2011)

    Article  Google Scholar 

  4. Mohimani, H., Liang, Y., Liu, W.T., Hsieh, P.W., Dorrestein, P.C., Pevzner, P.: Sequencing cyclic peptides by multistage mass spectrometry. J. Proteomics 11(18), 3642–3650 (2011)

    Article  Google Scholar 

  5. Mohimani, H., Liu, W.T., Mylne, J.S., Poth, A.G., Tran, D., Selsted, M.E., Dorrestein, P.C., Pevzner, P.A.: Cycloquest: Identification of cyclopeptides via database search of their mass spectra against genome databases. J. Prot. Res. 10(10), 4505–4512 (2011)

    Article  Google Scholar 

  6. Fenyo, D., Beavis, R.: A method for assessing the statistical significance of mass spectrometry-based protein identifications using general scoring schemes. Anal. Chem. 75, 768–774 (2003)

    Article  Google Scholar 

  7. Sadygov, R.G., Liu, H., Yates, J.R.: Statistical Models for Protein Validation Using Tandem Mass Spectral Data and Protein Amino Acid Sequence Databases. Anal. Chem. 76(6), 1664–1671 (2004)

    Article  Google Scholar 

  8. Matthiesen, R., Trelle, M.B., Højrup, P., Bunkenborg, J., Jensen, O.N.: VEMS 3. 0: Algorithms and Computational Tools for Tandem Mass Spectrometry Based Identification of Post-translational Modifications in Proteins. J. Proteome Res. 4(6), 2338–2347 (2005)

    Article  Google Scholar 

  9. Chamrad, D.C., Koerting, G., Gobom, J., Thiele, H., Klose, J., Meyer, H.E., Blueggel, M.: Interpretation of mass spectrometry data for high-throughput proteomics. Analytical and Bioanalytical Chemistry 376(7), 1014–1022 (2007)

    Article  Google Scholar 

  10. Nesvizhskii, A., Vitek, O., Aebersold, R.: Analysis and validation of proteomic data generated by tandem mass spectrometry. Nature Methods 4, 787–797 (2007)

    Article  Google Scholar 

  11. Nesvizhskii, A., Aebersold, R.: Analysis, statistical validation and dissemination of large-scale proteomics datasets generated by tandem MS. Drug Discovery Today 9(4), 173–181 (2004)

    Article  Google Scholar 

  12. Spirin, V., Shpunt, A., Seebacher, J., Gentzel, M., Shevchenko, A., Gygi, S., Sunyaev, S.: Assigning spectrum-specific P-values to protein Identifications by mass spectrometry. Bioinformatics 27(8), 1128–1134 (2011)

    Article  Google Scholar 

  13. Weatherly, B., Atwood, J.A., Minning, T.A., Cavola, C., Tarleton, R.L., Orlando, R.: A Heuristic Method for Assigning a False-discovery Rate for Protein Identifications from Mascot Database Search Results. Mol. Cell. Proteomics 4, 762–772 (2005)

    Article  Google Scholar 

  14. Kim, S., Mischerikow, N., Bandeira, N., Navarro, J.D., Wich, L., Mohammed, S., Heck, A.J.R., Pevzner, P.A.: The generating function of CID, ETD and CID/ETD pairs of tandem mass spectra: Applications to database search. Molecular and Cellular Proteomics 9, 2840–2852 (2010)

    Article  Google Scholar 

  15. Kahn, H., Harris, T.E.: Estimation of Particle Transmission by Random Sampling. National Bureau of Standards Applied Mathematics Series (1951)

    Google Scholar 

  16. Villen-Altamirano, M., Villen-Altamirano, J.: RESTART: A method for accelerating rare events simulations. Queueing Performance and Control in ATM. In: Proceedings of ITC, vol. 13, pp. 71–76 (1991)

    Google Scholar 

  17. Kim, S., Gupta, N., Pevzner, P.: Spectral Probabilities and Generating Functions of Tandem Mass Spectra: A Strike against Decoy Databases. J. Prot. Res. 7(8), 3354–3363 (2008)

    Article  Google Scholar 

  18. Hammersley, J.M., Handscomb, D.C.: Monte carlo methods. Methuen, London (1964)

    Book  MATH  Google Scholar 

  19. Rubino, G., Tuffin, B.: Rare event simulation using Monte Carlo methods. Wiley (2009)

    Google Scholar 

  20. Kahn, H., Marshall, A.W.: Methods for reducing sample size in Monte Carlo computations. Oper. Res. Soc. Amer, 263–278 (1953)

    Google Scholar 

  21. Kahn, H.: Use of different Monte Carlo sampling techniques. RAND corporation (1956)

    Google Scholar 

  22. Glasserman, P., Heidelberger, P., Shahabuddin, P.: Asymptotically optimal importance sampling and stratification for pricing path dependent options. Mathematical Finance 9(2), 117–152 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  23. Blom, H.A.P., Krystul, J., Bakker, G.J., Klompstra, M.B., Obbink, B.K.: Free flight collision risk estimation by sequential MC simulation. In: Cassandras, C.G., Lygeros, J. (eds.) Stochastic Hybrid Systems. CRC Press, Boca Raton (2007)

    Google Scholar 

  24. Sandmann, W.: Applicability of importance sampling to coupled molecular reactions. In: Proceedings of the 12th International Conference on Applied Stochastic Models and Data Analysis (2007)

    Google Scholar 

  25. Elias, J.E., Gygi, S.P.: Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nature Methods 4(3), 207–214 (2007)

    Article  Google Scholar 

  26. Gupta, N., Bandeira, N., Keich, U., Pevzner, P.A.: Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. J. Am. Soc. Mass Spectrom. 22, 1111–1120 (2011)

    Article  Google Scholar 

  27. Nesvizhskii, A.: Survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. J. Prot. Res. 73(11), 2092–2123 (2010)

    Google Scholar 

  28. Kwon, T., Choi, H., Vogel, C., Nesvizhskii, A.I., Marcotte, E.M.: MSblender: A Probabilistic Approach for Integrating Peptide Identifications from Multiple Database Search Engines. J. Prot. Res. 10(7), 2949–2958 (2011)

    Article  Google Scholar 

  29. Waterman, M., Vingron, M.: Rapid and accurate estimates of statistical significance for sequence data base searches. Proc. Natl. Acad. Sci. U.S.A. 91, 4625–4628 (1994)

    Article  MATH  Google Scholar 

  30. Geer, L.Y., Markey, S.P., Kowalak, J.A., Wagner, L., Xu, M., Maynard, D.M., Yang, X., Shi, W., Bryant, S.H.: Open mass spectrometry search algorithm. J. Proteome Res. 3(5), 958–964 (2004)

    Article  Google Scholar 

  31. Asmussen, S., Glynn, P.W.: Stochastic simulation: algorithms and analysis. Springer (2007)

    Google Scholar 

  32. Haraszti, Z., Townsend, J.K.: The theory of direct probability redistribution and its application to rare even simulation. ACM Trans. Modeling and Computer Simulation 9(2), 105–140 (1999)

    Article  Google Scholar 

  33. Glasserman, P., Heidelberger, P., Shahabuddin, P.: A large deviations perspective on the efficiency of multilevel splitting. IEEE Trans. Automat. Contr. 43(12), 1666–1679 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  34. Klimek, J., Eddes, J.S., Hohmann, L., Jackson, J., Peterson, A., Letarte, S., Gafken, P.R., Katz, J.E., Mallick, P., Lee, H., Schmidt, A., Ossola, R., Eng, J.K., Aebersold, R., Martin, D.B.: The standard protein mix database: a diverse data set to assist in the production of improved peptide and protein identification software tools. J. Proteome Res. 7, 96–103 (2008)

    Article  Google Scholar 

  35. Eng, J., McCormack, A., Yates, J.: An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 5, 976–989 (1994)

    Article  Google Scholar 

  36. Keller, A., Nesvizhskii, A., Kolker, E., Aebersold, R.: Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 74, 5383–5392 (2002)

    Article  Google Scholar 

  37. Tanner, S., Shu, H., Frank, A., Wang, L., Zandi, E., Mumby, M., Pevzner, P., Bafna, V.: InsPecT: identification of posttranslationally modified peptides from tandem mass spectra. Anal. Chem. 77, 4626–4639 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mohimani, H., Kim, S., Pevzner, P.A. (2012). MS-DPR: An Algorithm for Computing Statistical Significance of Spectral Identifications of Non-linear Peptides. In: Raphael, B., Tang, J. (eds) Algorithms in Bioinformatics. WABI 2012. Lecture Notes in Computer Science(), vol 7534. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33122-0_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-33122-0_24

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33121-3

  • Online ISBN: 978-3-642-33122-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics