Abstract
Generating all plausible de novo interpretations of a peptide tandem mass (MS/MS) spectrum (Spectral Dictionary) and quickly matching them against the database represent a recently emerged alternative approach to peptide identification. However, the sizes of the Spectral Dictionaries quickly grow with the peptide length making their generation impractical for long peptides. We introduce Gapped Spectral Dictionaries (all plausible de novo interpretations with gaps) that can be easily generated for any peptide length thus addressing the shortcoming of the Spectral Dictionary approach. We show that Gapped Spectral Dictionaries are small thus opening a possibility of using them to speed-up MS/MS database searches. Our MS-GappedDictionary algorithm (based on Gapped Spectral Dictionaries) enables proteogenomics applications that are prohibitively time consuming with existing approaches. We further introduce gapped tags that have advantages over the conventional peptide sequence tags in filtration-based MS/MS database searches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Cole, R., Gottlieb, L.-A., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: STOC, pp. 91–100 (2004)
Dancik, V., Addona, T., Clauser, K., Vath, J., Pevzner, P.: De novo protein sequencing via tandem mass-spectrometry. J. Comp. Biol. 6, 327–341 (1999)
Eng, J., McCormack, A., Yates, J.: An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 7, 655–667 (1994)
Eppstein, D.: Finding the k Shortest Paths. SIAM J. Comput. 28, 652–673 (1998)
Fermin, D., Allen, B., Blackwell, T., Menon, R., Adamski, M., Xu, Y., Ulintz, P., Omenn, G., States, D.: Novel gene and gene model detection using a whole genome open reading frame analysis in proteomics. Genome Biol. 7, R35 (2006)
Frank, A., Pevzner, P.: PepNovo: de novo peptide sequencing via probabilistic network modeling. Anal. Chem. 77, 964–973 (2005)
Frank, A.: A ranking-based Scoring Function for peptide-spectrum matches. J. Proteome Res. 8, 2241–2252 (2009)
Iliopoulos, C.S., Rahman, M.S.: Pattern Matching Algorithms with Don’t Cares. In: SOFSEM 2007, pp. 116–126 (2007)
Jaffe, J., Berg, H., Church, G.: Proteogenomic mapping as a complementary method to perform genome annotation. Proteomics 4, 59–77 (2004)
Kalume, D., Peri, S., Reddy, R., Zhong, J., Okulate, M., Kumar, N., Pandey, A.: Genome annotation of Anopheles gambiae using mass spectrometry-derived data. BMC Genomics 6, 128–138 (2005)
Keller, A., Nesvizhskii, A., Kolker, E., Aebersold, R.: Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 74, 5383–5392 (2002)
Kim, S., Gupta, N., Pevzner, P.: Spectral probabilities and generating functions of tandem mass spectra: a strike against decoy databases. J. Proteome Res. 7, 3354–3363 (2008)
Kim, S., Gupta, N.: Spectral dictionaries: Integrating de novo peptide sequencing with database search of tandem mass spectra. Mol. Cell Proteomics 8, 53–69 (2009)
Kim, S., Bandeira, N., Pevzner, P.: Spectral Profiles, a Novel Representation of Tandem Mass Spectra and Their Applications for de Novo Peptide Sequencing and Identification. Mol. Cell Proteomics 8, 1391–1400 (2009)
Klimek, J., Eddes, J.S., Hohmann, L., Jackson, J., Peterson, A., Letarte, S., Gafken, P.R., Katz, J.E., Mallick, P., Lee, H., Schmidt, A., Ossola, R., Eng, J.K., Aebersold, R., Martin, D.B.: The standard protein mix database: a diverse datasetto assist in the production of improved peptide and protein identification software tools. J. Proteome Res. 7, 96–103 (2008)
Ma, B., Zhang, K., Hendrie, C., Liang, C., Li, M., Doherty-Kirby, A., Lajoie, G.: PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. Rapid Commun. Mass Spectrom 17, 2337–2342 (2003)
Mann, M., Wilm, M.: Error-tolerant identification of peptides in sequence databases by peptide sequence tags. Anal. Chem. 66, 4390–4399 (1994)
Rahman, M.S., Iliopoulos, C.S., Lee, I., Mohamed, M., Smyth, W.F.: Finding patterns with variable length gaps or don’t cares. In: Chen, D.Z., Lee, D.T. (eds.) COCOON 2006. LNCS, vol. 4112, pp. 146–155. Springer, Heidelberg (2006)
Savidor, A., Donahoo, R., Hurtado-Gonzales, O., VerBerkmoes, N., Shah, M., Lamour, K., McDonald, W.: Expressed peptide tags: an additional layer of data for genome annotation. J. Proteome Res. 5, 3048–3058 (2006)
Tanner, S., Shu, H., Frank, A., Wang, L., Zandi, E., Mumby, M., Pevzner, P., Bafna, V.: InsPecT: identification of posttranslationally modified peptides from tandem mass spectra. Anal. Chem. 77, 4626–4639 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jeong, K., Kim, S., Bandeira, N., Pevzner, P.A. (2010). Gapped Spectral Dictionaries and Their Applications for Database Searches of Tandem Mass Spectra. In: Berger, B. (eds) Research in Computational Molecular Biology. RECOMB 2010. Lecture Notes in Computer Science(), vol 6044. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12683-3_14
Download citation
DOI: https://doi.org/10.1007/978-3-642-12683-3_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12682-6
Online ISBN: 978-3-642-12683-3
eBook Packages: Computer ScienceComputer Science (R0)