An Efficient Tool for Discovering Simple Combinatorial Patterns from Large Text Databases

Arimura, Hiroki; Wataki, Atsushi; Fujino, Ryoichi; Shimozono, Shinichi; Arikawa, Setsuo

doi:10.1007/3-540-49292-5_37

Hiroki Arimura³,
Atsushi Wataki³,
Ryoichi Fujino⁴,
Shinichi Shimozono⁵ &
…
Setsuo Arikawa³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1532))

Included in the following conference series:

International Conference on Discovery Science

584 Accesses
1 Citations

Abstract

In this poster, we present demonstration of a prototype system for efficient discovery of combinatorial patterns, called proximity word-association patterns, from a collection of texts. The algorithm computes the best k-proximity d-word patterns in almost linear expected time in the total input length n, which is drastically faster than a straightforward algorithm of O(n ^2d+1) time complexity

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Domain-agnostic discovery of similarities and concepts at scale

Article 30 August 2016

A Brief Overview of Dead-Zone Pattern Matching Algorithms

Statistical Methods for Word Association in Text Mining

References

Arimura, H., Wataki, A., Fujino, R., Arikawa, S., A fast algorithm for discovering optimal string patterns in large text databases. In Proc. ALT’98, LNAI, Springer, 1998. (To appear.)
Google Scholar
Arimura, H., Shimozono, S., Maximizing agreement between a classification and bounded or unbounded number of associated words. Proc. ISAAC’98, LNCS, Springer, 1998. (To appear.)
Google Scholar
Fukuda, T., Morimoto, Y., Morishita, S. and Tokuyama, T., Data mining using two-dimensional optimized association rules. In Proc. SIGMOD’96, 13–23, 1996.
Google Scholar
Kearns, M. J., Shapire, R. E., Sellie, L. M., Toward efficient agnostic learning. Machine Learning, 17(2–3), 115–141, 1994.
MATH Google Scholar
Manber, U. and Baeza-Yates, R., An algorithm for string matching with a sequence of don’t cares. IPL 37, 133–136 (1991).
Article MATH MathSciNet Google Scholar
Wang, J. T.-L., Chirn, G.-W., Marr, T. G., Shapiro, B., Shasha, D., Zhang. K., Combinatorial Pattern Discovery for Scientific Data: Some preliminary results. In Proc. SIGMOD’94, (1994) 115–125.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Informatics, Kyushu University, Hakozaki 6-10-1, Fukuoka, 812-8581, Japan
Hiroki Arimura, Atsushi Wataki & Setsuo Arikawa
Dept. of Arti.cial Intelligence, Kyushu Inst. of Tech., Iizuka, 820-8502, Japan
Ryoichi Fujino
Fujitsu LTD, Japan
Shinichi Shimozono

Authors

Hiroki Arimura
View author publications
You can also search for this author in PubMed Google Scholar
Atsushi Wataki
View author publications
You can also search for this author in PubMed Google Scholar
Ryoichi Fujino
View author publications
You can also search for this author in PubMed Google Scholar
Shinichi Shimozono
View author publications
You can also search for this author in PubMed Google Scholar
Setsuo Arikawa
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Informatics, Kyushu University, Fukuoka, 812-8581, USA
Setsuo Arikawa
Institute of Scientific and Industrial Research Devision of Intelligent Systems Science, Osaka University, 8-1 Mihogaoka, Ibaraki, Osaka, 567-0047, Japan
Hiroshi Motoda

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Arimura, H., Wataki, A., Fujino, R., Shimozono, S., Arikawa, S. (1998). An Efficient Tool for Discovering Simple Combinatorial Patterns from Large Text Databases. In: Arikawa, S., Motoda, H. (eds) Discovey Science. DS 1998. Lecture Notes in Computer Science(), vol 1532. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-49292-5_37

Download citation

DOI: https://doi.org/10.1007/3-540-49292-5_37
Published: 14 January 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65390-5
Online ISBN: 978-3-540-49292-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics