Mining Maximal Flexible Patterns in a Sequence

Arimura, Hiroki; Uno, Takeaki

doi:10.1007/978-3-540-78197-4_29

Hiroki Arimura¹ &
Takeaki Uno²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4914))

Included in the following conference series:

Annual Conference of the Japanese Society for Artificial Intelligence

Abstract

We consider the problem of enumerating all maximal flexible patterns in an input sequence database for the class of flexible patterns, where a maximal pattern (also called a closed pattern) is the most specific pattern among the equivalence class of patterns having the same list of occurrences in the input. Since our notion of maximal patterns is based on position occurrences, it is weaker than the traditional notion of maximal patterns based on document occurrences. Based on the framework of reverse search, we present an efficient depth-first search algorithm MaxFlex for enumerating all maximal flexible patterns in a given sequence database without duplicates in \(O(||{\mathcal{T}}||\times|\Sigma|)\) time per pattern and \(O(||{\mathcal T}||)\) space, where \(||{\mathcal T}||\) is the size of the input sequence database \(\mathcal T\) and |Σ| is the size of the alphabet on which the sequences are defined. This means that the enumeration problem for maximal flexible patterns is shown to be solvable in polynomial delay and polynomial space.

This research was partly supported by the Ministry of Education, Science, Sports and Culture, Grant-in-Aid for Specially Promoted Research, 17002008, 2007 on “semi-structured data mining”, and 18017015, 2007 on “developing high-speed high-quality algorithms for analyzing huge genome database”.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

MFS-SubSC: an efficient algorithm for mining frequent sequences with sub-sequence constraint

Article 11 June 2024

Frequent high minimum average utility sequence mining with constraints in dynamic databases using efficient pruning strategies

Article 01 September 2021

FCloSM, FGenSM: two efficient algorithms for mining frequent closed and generator sequences using the local pruning strategy

Article 17 February 2017

References

Avis, D., Fukuda, K.: Reverse Search for Enumeration. Discrete Appl. Math. 65, 21–46 (1996)
Article MathSciNet Google Scholar
Arimura, H., Fujino, R., Shinohara, T.: Protein motif discovery from positive examples by minimal multiple generalization over regular patterns. In: Proc. GIW 1994, pp. 39–48 (1994)
Google Scholar
Arimura, H., Shinohara, T., Otsuki, S.: Finding minimal generalizations for unions of pattern languages and its application to inductive inference from positive data. In: Enjalbert, P., Mayr, E.W., Wagner, K.W. (eds.) STACS 1994. LNCS, vol. 775, pp. 649–660. Springer, Heidelberg (1994)
Google Scholar
Arimura, H., Uno, T.: A polynomial space and polynomial delay algorithm for enumeration of maximal motifs in a sequence. In: Deng, X., Du, D.-Z. (eds.) ISAAC 2005. LNCS, vol. 3827, Springer, Heidelberg (2005)
Chapter Google Scholar
Mannila, H., Toivonen, H., Verkamo, A.I.: Discovery of frequent episodes in event sequences. Data Min. Knowl. Discov. 1(3), 259–289 (1997)
Article Google Scholar
Parida, L., Rigoutsos, I., et al.: Pattern discovery on character sets and real-valued data: Linear-bound on irredandant motifs and efficient polynomial time algorithms. In: Proc. SODA 2000, SIAM-ACM (2000)
Google Scholar
Pisanti, N., et al.: A basis of tiling motifs for generating repeated patterns and its complexity of higher quorum. In: Rovan, B., Vojtáš, P. (eds.) MFCS 2003. LNCS, vol. 2747, Springer, Heidelberg (2003)
Chapter Google Scholar
Shapiro, E.Y.: Algorithmic Program Debugging. MIT Press, Cambridge (1982)
MATH Google Scholar
Shimozono, S., Arimura, H., Arikawa, S.: Efficient discovery of optimal word-association patterns in large text databases. New Generation Comput. 18(1), 49–60 (2000)
Article Google Scholar
Shinohara, T.: Polynomial time inference of extended regular pattern Languages. In: Proc. RIMS Symp. on Software Sci. & Eng., pp. 115–127 (1982)
Google Scholar
Yan, X., Han, J., Afshar, R.: CloSpan: Mining closed sequential patterns in large databases. In: Proc. SDM 2003, SIAM (2003)
Google Scholar
Wang, J., Han, J.: BIDE: efficient mining of frequent closed sequences. In: Proc. ICDE 2004 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Information Science and Technology, Hokkaido University, Kita 14 Nishi 9, Sapporo, 060-0814, Japan
Hiroki Arimura
National Institute of Informatics, 2-1-2, Hitotsubashi, Chiyoda-ku, Tokyo, 101-8430, Japan
Takeaki Uno

Authors

Hiroki Arimura
View author publications
You can also search for this author in PubMed Google Scholar
Takeaki Uno
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Ken Satoh Akihiro Inokuchi Katashi Nagao Takahiro Kawamura

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Arimura, H., Uno, T. (2008). Mining Maximal Flexible Patterns in a Sequence. In: Satoh, K., Inokuchi, A., Nagao, K., Kawamura, T. (eds) New Frontiers in Artificial Intelligence. JSAI 2007. Lecture Notes in Computer Science(), vol 4914. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78197-4_29

Download citation

DOI: https://doi.org/10.1007/978-3-540-78197-4_29
Published: 27 July 2009
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78196-7
Online ISBN: 978-3-540-78197-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics