Abstract
The problem of extracting a basis of irredundant motifs from a sequence is considered. In previous work such bases were built incrementally for all suffixes of the input string s in O(n 3), where n is the length of s. Faster, non-incremental algorithms have been based on the landmark approach to string searching due to Fischer and Paterson, and exhibit respective time bounds of O(n 2 logn log|Σ|) and O(|Σ|n 2 log2 n loglogn), with Σ denoting the alphabet. The algorithm by Fischer and Paterson makes crucial use of the FFT, which is impractical with long sequences.
The algorithm presented in the present paper does not need to resort to the FFT and yet is asymptotically faster than previously available ones. Specifically, an off-line algorithm is presented taking time O(|Σ|n 2), which is optimal for finite Σ.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Apostolico, A., Galil, Z.: Pattern matching algorithms. Oxford University Press, New York (1997)
Apostolico, A., Parida, L.: ncremental paradigms of motif discovery. Journal of Computational Biology 11(1), 15–25 (2004)
Apostolico, A.: Pattern discovery and the algorithmics of surprise. Artificial Intelligence and Heuristic Methods for Bioinformatics, pp. 111–127 (2003)
Cole, R., Hariharan, R.: Verifying candidate matches in sparse and wildcard matching. In: STOC 2002. Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, pp. 592–601 (2002)
Fischer, M.J., Paterson, M.S.: String matching and other products. In: Karp, R. (ed.) Proceedings of the SIAM-AMS Complexity of Computation, Providence, R.I. American Mathematical Society, pp. 113–125 (1974)
Navarro, G.: A guided tour to approximate string matching. ACM Computing Surveys 33(1), 31–88 (2001)
Pelfrêne, J., Abdeddaïm, S., Alexandre, J.: Extracting approximate patterns. Journal of Discrete Algorithms 3(2-4), 293–320 (2005)
Parida, L.: Algorithmic Techniques in Computational Genomics. PhD thesis, Department of Computer Science, New York University (1998)
Pisanti, N., Crochemore, M., Grossi, R., Sagot, M.-F.: Bases of motifs for generating repeated patterns with wild cards. IEEE/ACM Trans. Comput. Biol. Bioinformatics 2(1), 40–50 (2005)
Parida, L., Rigoutsos, I., Floratos, A., Platt, D., Gao, Y.: Pattern discovery on character sets and real-valued data: linear bound on irredundant motifs and an efficient polynomial time algorithm. In: Symposium on Discrete Algorithms, pp. 297–308 (2000)
Wang, J.T.L., Shapiro, B.A., Shasha, D.E.: Pattern Discovery in Biomolecular Data: Tools, Techniques and Applications. Oxford University Press, Oxford (1999)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Apostolico, A., Tagliacollo, C. (2007). Optimal Offline Extraction of Irredundant Motif Bases. In: Lin, G. (eds) Computing and Combinatorics. COCOON 2007. Lecture Notes in Computer Science, vol 4598. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73545-8_36
Download citation
DOI: https://doi.org/10.1007/978-3-540-73545-8_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73544-1
Online ISBN: 978-3-540-73545-8
eBook Packages: Computer ScienceComputer Science (R0)