Skip to main content

Gapped Spectral Dictionaries and Their Applications for Database Searches of Tandem Mass Spectra

  • Conference paper
Research in Computational Molecular Biology (RECOMB 2010)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 6044))

Abstract

Generating all plausible de novo interpretations of a peptide tandem mass (MS/MS) spectrum (Spectral Dictionary) and quickly matching them against the database represent a recently emerged alternative approach to peptide identification. However, the sizes of the Spectral Dictionaries quickly grow with the peptide length making their generation impractical for long peptides. We introduce Gapped Spectral Dictionaries (all plausible de novo interpretations with gaps) that can be easily generated for any peptide length thus addressing the shortcoming of the Spectral Dictionary approach. We show that Gapped Spectral Dictionaries are small thus opening a possibility of using them to speed-up MS/MS database searches. Our MS-GappedDictionary algorithm (based on Gapped Spectral Dictionaries) enables proteogenomics applications that are prohibitively time consuming with existing approaches. We further introduce gapped tags that have advantages over the conventional peptide sequence tags in filtration-based MS/MS database searches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Cole, R., Gottlieb, L.-A., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: STOC, pp. 91–100 (2004)

    Google Scholar 

  2. Dancik, V., Addona, T., Clauser, K., Vath, J., Pevzner, P.: De novo protein sequencing via tandem mass-spectrometry. J. Comp. Biol. 6, 327–341 (1999)

    Article  Google Scholar 

  3. Eng, J., McCormack, A., Yates, J.: An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 7, 655–667 (1994)

    Google Scholar 

  4. Eppstein, D.: Finding the k Shortest Paths. SIAM J. Comput. 28, 652–673 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  5. Fermin, D., Allen, B., Blackwell, T., Menon, R., Adamski, M., Xu, Y., Ulintz, P., Omenn, G., States, D.: Novel gene and gene model detection using a whole genome open reading frame analysis in proteomics. Genome Biol. 7, R35 (2006)

    Google Scholar 

  6. Frank, A., Pevzner, P.: PepNovo: de novo peptide sequencing via probabilistic network modeling. Anal. Chem. 77, 964–973 (2005)

    Article  Google Scholar 

  7. Frank, A.: A ranking-based Scoring Function for peptide-spectrum matches. J. Proteome Res. 8, 2241–2252 (2009)

    Article  Google Scholar 

  8. Iliopoulos, C.S., Rahman, M.S.: Pattern Matching Algorithms with Don’t Cares. In: SOFSEM 2007, pp. 116–126 (2007)

    Google Scholar 

  9. Jaffe, J., Berg, H., Church, G.: Proteogenomic mapping as a complementary method to perform genome annotation. Proteomics 4, 59–77 (2004)

    Article  Google Scholar 

  10. Kalume, D., Peri, S., Reddy, R., Zhong, J., Okulate, M., Kumar, N., Pandey, A.: Genome annotation of Anopheles gambiae using mass spectrometry-derived data. BMC Genomics 6, 128–138 (2005)

    Article  Google Scholar 

  11. Keller, A., Nesvizhskii, A., Kolker, E., Aebersold, R.: Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 74, 5383–5392 (2002)

    Article  Google Scholar 

  12. Kim, S., Gupta, N., Pevzner, P.: Spectral probabilities and generating functions of tandem mass spectra: a strike against decoy databases. J. Proteome Res. 7, 3354–3363 (2008)

    Article  Google Scholar 

  13. Kim, S., Gupta, N.: Spectral dictionaries: Integrating de novo peptide sequencing with database search of tandem mass spectra. Mol. Cell Proteomics 8, 53–69 (2009)

    Article  Google Scholar 

  14. Kim, S., Bandeira, N., Pevzner, P.: Spectral Profiles, a Novel Representation of Tandem Mass Spectra and Their Applications for de Novo Peptide Sequencing and Identification. Mol. Cell Proteomics 8, 1391–1400 (2009)

    Article  Google Scholar 

  15. Klimek, J., Eddes, J.S., Hohmann, L., Jackson, J., Peterson, A., Letarte, S., Gafken, P.R., Katz, J.E., Mallick, P., Lee, H., Schmidt, A., Ossola, R., Eng, J.K., Aebersold, R., Martin, D.B.: The standard protein mix database: a diverse datasetto assist in the production of improved peptide and protein identification software tools. J. Proteome Res. 7, 96–103 (2008)

    Article  Google Scholar 

  16. Ma, B., Zhang, K., Hendrie, C., Liang, C., Li, M., Doherty-Kirby, A., Lajoie, G.: PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. Rapid Commun. Mass Spectrom 17, 2337–2342 (2003)

    Article  Google Scholar 

  17. Mann, M., Wilm, M.: Error-tolerant identification of peptides in sequence databases by peptide sequence tags. Anal. Chem. 66, 4390–4399 (1994)

    Article  Google Scholar 

  18. Rahman, M.S., Iliopoulos, C.S., Lee, I., Mohamed, M., Smyth, W.F.: Finding patterns with variable length gaps or don’t cares. In: Chen, D.Z., Lee, D.T. (eds.) COCOON 2006. LNCS, vol. 4112, pp. 146–155. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  19. Savidor, A., Donahoo, R., Hurtado-Gonzales, O., VerBerkmoes, N., Shah, M., Lamour, K., McDonald, W.: Expressed peptide tags: an additional layer of data for genome annotation. J. Proteome Res. 5, 3048–3058 (2006)

    Article  Google Scholar 

  20. Tanner, S., Shu, H., Frank, A., Wang, L., Zandi, E., Mumby, M., Pevzner, P., Bafna, V.: InsPecT: identification of posttranslationally modified peptides from tandem mass spectra. Anal. Chem. 77, 4626–4639 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jeong, K., Kim, S., Bandeira, N., Pevzner, P.A. (2010). Gapped Spectral Dictionaries and Their Applications for Database Searches of Tandem Mass Spectra. In: Berger, B. (eds) Research in Computational Molecular Biology. RECOMB 2010. Lecture Notes in Computer Science(), vol 6044. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12683-3_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12683-3_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12682-6

  • Online ISBN: 978-3-642-12683-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics