Abstract
This paper surveys our recent studies of text mining from literary works, especially classical Japanese poems, Waka. We present methods for finding characteristic patterns in anthologies of Waka poems, as well as those for finding similar poem pairs. Our aim is to obtain good results that are of interest to Waka researchers, not just to develop efficient algorithms. We report successful results in finding patterns and similar poem pairs, some of which led to new discoveries.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
D. Angluin. Finding patterns common to a set of strings. J. Comput. Sys. Sci., 21:46–62, 1980.
H. Arimura. Text data mining with optimized pattern discovery. In Proc. 17th Workshop on Machine Intelligence, Cambridge, July 2000.
A. Blumer, J. Blumer, D. Haussler, R. Mcconnell, and A. Ehrenfeucht. Complete inverted files for efficient text retrieval and analysis. J. ACM, 34(3):578–595, 1987. Previous version in: STOC’84.
A. Bräzma, E. Ukkonen, and J. Vilo. Discovering unbounded unions of regular pattern languages from positive examples. In Proc. 7th International Symposium on Algorithms and Computation (ISAAC’96), pages 95–104, 1996.
M. Crochemore and W. Rytter. Text Algorithms. Oxford University Press, 1994.
L. Devroye, L. Györ., and G. Lugosi. A Probabilistic Theory of Pattern Recognition. Springer, 1997.
U. M. Fayyad, G. P.-Shapiro, and P. Smyth. From data mining to knowledge discovery: an overview. In Advances in Knowledge Discovery and Data Mining, pages 1–34. The AAAI Press, 1996.
Z. Galil. Open problems in stringology. In A. Apostolico and Z. Galil, editors, Combinatorial Algorithms on Words, NATO ASI Series, Advanced Science Institutes Series, Series F: Computer and Systems Sciences, Vol. 12, pages 1–8. Springer-Verlag, 1985.
D. Gusfield. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, New York, 1997.
H. Hori, S. Shimozono, M. Takeda, and A. Shinohara. Fragmentary pattern matching: Complexity, algorithms and applications for analyzing classic literary works. In Proc. 12th Annual International Symposium on Algorithms and Computation (ISAAC’⫗1), 2001. To appear.
T. Kadota, M. Hirao, A. Ishino, M. Takeda, A. Shinohara, and F. Matsuo. Musical sequence comparison for melodic and rhythmic similarities. In Proc. 8th International Symposium on String Processing and Information Retrieval (SPIRE2001). IEEE Computer Society, 2001. To appear.
O. Maruyama, T. Uchida, K. L. Sim, and S. Miyano. Designing views in HypothesisCreator: System for assisting in discovery. In Proc. 2nd International Conference on Discovery Science (DS’99), LNAI 1721, pages 115–127, 1999.
S. Morishita. On classification and regression. InProc. 1st International Conference on Discovery Science (DS’99), LNAI1532, pages 49–59, 1998.
S. Shimozono, H. Arimura, and S. Arikawa. Efficient discovery of optimal wordassociation patterns in large databases. New Gener. Comput., 18(1):49–60, 2000.
M. Takeda, T. Fukuda, I. Nanri, M. Yamasaki, and K. Tamari. Discovering instances of poetic allusion from anthologies of classical Japanese poems. Theor. Comput. Sci., 2001. To appear. Preliminary version in: Proc. DS’99 (LNAI 1721).
M. Takeda, T. Matsumoto, T. Fukuda, and I. Nanri. Discovering characteristic expressions from literary works. Theor. Comput. Sci., 2001. To appear. Preliminary version in: Proc. DS 2000 (LNAI1967).
K. Yamamoto, M. Takeda, A. Shinohara, T. Fukuda, and I. Nanri. Discovering repetitive expressions and afinities from anthologies of classical Japanese poems. In Proc. 4th International Conference on Discovery Science (DS2001), 2001. To appear.
M. Yamasaki, M. Takeda, T. Fukuda, and I. Nanri. Discovering characteristic patterns from collections of classical Japanese poems. New Gener. Comput., 18(1):61–73, 2000. Preliminary version in: Proc. DS’98 (LNAI 1532).
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Takeda, M., Fukuda, T., Nanri, I. (2002). Mining from Literary Texts: Pattern Discovery and Similarity Computation. In: Arikawa, S., Shinohara, A. (eds) Progress in Discovery Science. Lecture Notes in Computer Science(), vol 2281. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45884-0_39
Download citation
DOI: https://doi.org/10.1007/3-540-45884-0_39
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43338-5
Online ISBN: 978-3-540-45884-5
eBook Packages: Springer Book Archive