Abstract
In this poster, we present demonstration of a prototype system for efficient discovery of combinatorial patterns, called proximity word-association patterns, from a collection of texts. The algorithm computes the best k-proximity d-word patterns in almost linear expected time in the total input length n, which is drastically faster than a straightforward algorithm of O(n 2d+1) time complexity
Similar content being viewed by others
References
Arimura, H., Wataki, A., Fujino, R., Arikawa, S., A fast algorithm for discovering optimal string patterns in large text databases. In Proc. ALT’98, LNAI, Springer, 1998. (To appear.)
Arimura, H., Shimozono, S., Maximizing agreement between a classification and bounded or unbounded number of associated words. Proc. ISAAC’98, LNCS, Springer, 1998. (To appear.)
Fukuda, T., Morimoto, Y., Morishita, S. and Tokuyama, T., Data mining using two-dimensional optimized association rules. In Proc. SIGMOD’96, 13–23, 1996.
Kearns, M. J., Shapire, R. E., Sellie, L. M., Toward efficient agnostic learning. Machine Learning, 17(2–3), 115–141, 1994.
Manber, U. and Baeza-Yates, R., An algorithm for string matching with a sequence of don’t cares. IPL 37, 133–136 (1991).
Wang, J. T.-L., Chirn, G.-W., Marr, T. G., Shapiro, B., Shasha, D., Zhang. K., Combinatorial Pattern Discovery for Scientific Data: Some preliminary results. In Proc. SIGMOD’94, (1994) 115–125.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Arimura, H., Wataki, A., Fujino, R., Shimozono, S., Arikawa, S. (1998). An Efficient Tool for Discovering Simple Combinatorial Patterns from Large Text Databases. In: Arikawa, S., Motoda, H. (eds) Discovey Science. DS 1998. Lecture Notes in Computer Science(), vol 1532. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-49292-5_37
Download citation
DOI: https://doi.org/10.1007/3-540-49292-5_37
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65390-5
Online ISBN: 978-3-540-49292-4
eBook Packages: Springer Book Archive