Abstract
We consider the problem of enumerating all maximal flexible patterns in an input sequence database for the class of flexible patterns, where a maximal pattern (also called a closed pattern) is the most specific pattern among the equivalence class of patterns having the same list of occurrences in the input. Since our notion of maximal patterns is based on position occurrences, it is weaker than the traditional notion of maximal patterns based on document occurrences. Based on the framework of reverse search, we present an efficient depth-first search algorithm MaxFlex for enumerating all maximal flexible patterns in a given sequence database without duplicates in \(O(||{\mathcal{T}}||\times|\Sigma|)\) time per pattern and \(O(||{\mathcal T}||)\) space, where \(||{\mathcal T}||\) is the size of the input sequence database \(\mathcal T\) and |Σ| is the size of the alphabet on which the sequences are defined. This means that the enumeration problem for maximal flexible patterns is shown to be solvable in polynomial delay and polynomial space.
This research was partly supported by the Ministry of Education, Science, Sports and Culture, Grant-in-Aid for Specially Promoted Research, 17002008, 2007 on “semi-structured data mining”, and 18017015, 2007 on “developing high-speed high-quality algorithms for analyzing huge genome database”.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Avis, D., Fukuda, K.: Reverse Search for Enumeration. Discrete Appl. Math. 65, 21–46 (1996)
Arimura, H., Fujino, R., Shinohara, T.: Protein motif discovery from positive examples by minimal multiple generalization over regular patterns. In: Proc. GIW 1994, pp. 39–48 (1994)
Arimura, H., Shinohara, T., Otsuki, S.: Finding minimal generalizations for unions of pattern languages and its application to inductive inference from positive data. In: Enjalbert, P., Mayr, E.W., Wagner, K.W. (eds.) STACS 1994. LNCS, vol. 775, pp. 649–660. Springer, Heidelberg (1994)
Arimura, H., Uno, T.: A polynomial space and polynomial delay algorithm for enumeration of maximal motifs in a sequence. In: Deng, X., Du, D.-Z. (eds.) ISAAC 2005. LNCS, vol. 3827, Springer, Heidelberg (2005)
Mannila, H., Toivonen, H., Verkamo, A.I.: Discovery of frequent episodes in event sequences. Data Min. Knowl. Discov. 1(3), 259–289 (1997)
Parida, L., Rigoutsos, I., et al.: Pattern discovery on character sets and real-valued data: Linear-bound on irredandant motifs and efficient polynomial time algorithms. In: Proc. SODA 2000, SIAM-ACM (2000)
Pisanti, N., et al.: A basis of tiling motifs for generating repeated patterns and its complexity of higher quorum. In: Rovan, B., Vojtáš, P. (eds.) MFCS 2003. LNCS, vol. 2747, Springer, Heidelberg (2003)
Shapiro, E.Y.: Algorithmic Program Debugging. MIT Press, Cambridge (1982)
Shimozono, S., Arimura, H., Arikawa, S.: Efficient discovery of optimal word-association patterns in large text databases. New Generation Comput. 18(1), 49–60 (2000)
Shinohara, T.: Polynomial time inference of extended regular pattern Languages. In: Proc. RIMS Symp. on Software Sci. & Eng., pp. 115–127 (1982)
Yan, X., Han, J., Afshar, R.: CloSpan: Mining closed sequential patterns in large databases. In: Proc. SDM 2003, SIAM (2003)
Wang, J., Han, J.: BIDE: efficient mining of frequent closed sequences. In: Proc. ICDE 2004 (2004)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Arimura, H., Uno, T. (2008). Mining Maximal Flexible Patterns in a Sequence. In: Satoh, K., Inokuchi, A., Nagao, K., Kawamura, T. (eds) New Frontiers in Artificial Intelligence. JSAI 2007. Lecture Notes in Computer Science(), vol 4914. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78197-4_29
Download citation
DOI: https://doi.org/10.1007/978-3-540-78197-4_29
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78196-7
Online ISBN: 978-3-540-78197-4
eBook Packages: Computer ScienceComputer Science (R0)