Abstract
In the present paper, we propose a new method of n-gram analysis using ZBDDs (Zero-suppressed BDDs). ZBDDs are known as a compact representation of combinatorial item sets. Here, we newly apply the ZBDD-based techniques for efficiently handling sets of sequences. Using the algebraic operations defined over ZBDDs, such as union, intersection, difference, etc., we can execute various processings and/or analyses for large-scale sequence data. We conducted experiments for generating n-gram statistical data for given real document files. The obtained results show the potentiality of the ZBDD-based method for the sequence database analysis.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Hoffmeister, B., Zeugmann, T.: Text Mining Using Markov Chains of Variable Length. In: Jantke, K.P., Lunzer, A., Spyratos, N., Tanaka, Y. (eds.) Federation over the Web. LNCS (LNAI), vol. 3847, pp. 1–24. Springer, Heidelberg (2006)
Jokinen, P., Ukkonen, E.: Two algorithms for approximate string matching in static texts. In: Tarlecki, A. (ed.) Mathematical Foundations of Computer Science 1991. LNCS, vol. 520, pp. 240–248. Springer, Heidelberg (1991)
Kudo, T., Yamamoto, K., Tsuboi, Y., Matsumoto, Y.: Text mining using linguistic information (in Japanese). IPSJ SIG-NLP NL-148 , pp. 65–72 (2002)
Minato, S.: Zero-suppressed BDDs for set manipulation in combinatorial problems. In: Proc. 30th Design Automation Conference (DAC-93), June, pp. 272–277. ACM Press, New York (1993)
Minato, S.: Zero-suppressed BDDs and their applications. International Journal on Software Tools for Technology Transfer (STTT) 3(2), 156–170 (2001)
Minato, S.: VSOP (Valued-Sum-of-Products) Calculator for Knowledge Processing Based on Zero-Suppressed BDDs. In: Jantke, K.P., Lunzer, A., Spyratos, N., Tanaka, Y. (eds.) Federation over the Web. LNCS (LNAI), vol. 3847, pp. 40–58. Springer, Heidelberg (2006)
Nagano, M., Mori, S.: A new method of N-gram statistics for large number of n and automatic extraction of words and phrases from large text data of Japanese. In: Proc. 15th Conference on Computational Linguistics, vol. 1, pp. 611–615. Association for Computational Linguistics, Morristown, NJ, USA (1994)
Tsuboi, Y.: Mining frequent substrings, Technical Report of IEICE, NLC, -47, 2003 (in Japanese) (2003)
Ukkonen, E.: Approximate string-matching with q-grams and maximal matches. Theoretical Computer Science 92(1), 191–211 (1992)
Ruby Home Page. http://www.ruby-lang.org/en/
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Kurai, R., Minato, Si., Zeugmann, T. (2007). N-Gram Analysis Based on Zero-Suppressed BDDs. In: Washio, T., Satoh, K., Takeda, H., Inokuchi, A. (eds) New Frontiers in Artificial Intelligence. JSAI 2006. Lecture Notes in Computer Science(), vol 4384. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69902-6_25
Download citation
DOI: https://doi.org/10.1007/978-3-540-69902-6_25
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69901-9
Online ISBN: 978-3-540-69902-6
eBook Packages: Computer ScienceComputer Science (R0)