Abstract
We study the Submass Finding Problem: Given a string s over a weighted alphabet, i.e., an alphabet Σ with a weight function \(\mu:\Sigma \to {\mathbb N}\), decide for an input mass M whether s has a substring whose weights sum up to M. If M is indeed a submass, then we want to find one or all occurrences of such substrings. We present efficient algorithms for both the decision and the search problem. Furthermore, our approach allows us to compute efficiently the number of different submasses of s.
The main idea of our algorithms is to define appropriate polynomials such that we can determine the solution for the Submass Finding Problem from the coefficients of the product of these polynomials. We obtain very efficient running times by using Fast Fourier Transform to compute this product. Our main algorithm for the decision problem runs in time \({\mathcal O}({\mu_s} \log {\mu_s})\), where μ s is the total mass of string s. Employing standard methods for compressing sparse polynomials, this runtime can be viewed as \({\mathcal O}({\sigma}(s)\log^2 {\sigma}(s))\), where σ(s) denotes the number of different submasses of s. In this case, the runtime is independent of the size of the individual masses of characters.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Edwards, N., Lippert, R.: Generating peptide candidates from amino-acid sequence databases for protein identification via mass spectrometry. In: Guigó, R., Gusfield, D. (eds.) WABI 2002. LNCS, vol. 2452, pp. 68–81. Springer, Heidelberg (2002)
Lu, B., Chen, T.: A suffix tree approach to the interpretation of tandem mass spectra: Applications to peptides of non-specific digestion and post-translational modifications. In: Bioinformatics Suppl. 2 (ECCB), pp.II113–II121 (2003)
Cieliebak, M., Erlebach, T., Lipták, Z., Stoye, J., Welzl, E.: Algorithmic complexity of protein identification: Combinatorics of weighted strings. In: DAM (2004), pp. 27–46 (2004)
Wilf, H.: generatingfunctionology. Academic Press, London (1990)
Cole, R., Hariharan, R.: Verifying candidate matches in sparse and wildcard matching. In: Proc. of 34th STOC (2002)
Benson, G.: Composition alignment. In: Benson, G., Page, R.D.M. (eds.) WABI 2003. LNCS (LNBI), vol. 2812, pp. 447–461. Springer, Heidelberg (2003)
Böcker, S.: Sequencing from compomers: Using mass spectrometry for DNA denovo sequencing of 200+ nt. In: Benson, G., Page, R.D.M. (eds.) WABI 2003. LNCS (LNBI), vol. 2812, pp. 476–497. Springer, Heidelberg (2003)
Böcker, S.: SNP and mutation discovery using base-specific cleavage and MALDITOF mass spectrometry. In: Bioinformatics, Suppl. 1, ISMB, pp.i44–i53 (2003)
Salomaa, A.: Counting (scattered) subwords. In: EATCS 81, pp. 165–179 (2003)
Eres, R., Landau, G.M., Parida, L.: A combinatorial approach to automatic discovery of cluster-patterns. In: Benson, G., Page, R.D.M. (eds.) WABI 2003. LNCS (LNBI), vol. 2812, pp. 139–150. Springer, Heidelberg (2003)
Apostolico, A., Landau, G., Satta, G.: Efficient text fingerprinting via Parikh mapping. J. of Discrete Algorithms(to appear)
Didier, G.: Common intervals of two sequences. In: Benson, G., Page, R.D.M. (eds.) WABI 2003. LNCS (LNBI), vol. 2812, pp. 17–24. Springer, Heidelberg (2003)
Cooley, J.W., Tukey, J.W.: An algorithm for the machine calculation of complex Fourier series. Mathematics of Computation 19(90), 297–301 (1965)
Demaine, E.D., Mitchell, J.S.B., O’Rourke, J.: The open problems project (2004), http://cs.smith.edu/orourke/TOPP/
Erickson, J.: Lower bounds for linear satisfiability problems. In: Proc. of 6th SODA, pp. 388–395 (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bansal, N., Cieliebak, M., Lipták, Z. (2004). Efficient Algorithms for Finding Submasses in Weighted Strings. In: Sahinalp, S.C., Muthukrishnan, S., Dogrusoz, U. (eds) Combinatorial Pattern Matching. CPM 2004. Lecture Notes in Computer Science, vol 3109. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27801-6_14
Download citation
DOI: https://doi.org/10.1007/978-3-540-27801-6_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22341-2
Online ISBN: 978-3-540-27801-6
eBook Packages: Springer Book Archive