Skip to main content

Mining from Literary Texts: Pattern Discovery and Similarity Computation

  • Chapter
  • First Online:
Progress in Discovery Science

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2281))

  • 520 Accesses

Abstract

This paper surveys our recent studies of text mining from literary works, especially classical Japanese poems, Waka. We present methods for finding characteristic patterns in anthologies of Waka poems, as well as those for finding similar poem pairs. Our aim is to obtain good results that are of interest to Waka researchers, not just to develop efficient algorithms. We report successful results in finding patterns and similar poem pairs, some of which led to new discoveries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. D. Angluin. Finding patterns common to a set of strings. J. Comput. Sys. Sci., 21:46–62, 1980.

    Article  MATH  MathSciNet  Google Scholar 

  2. H. Arimura. Text data mining with optimized pattern discovery. In Proc. 17th Workshop on Machine Intelligence, Cambridge, July 2000.

    Google Scholar 

  3. A. Blumer, J. Blumer, D. Haussler, R. Mcconnell, and A. Ehrenfeucht. Complete inverted files for efficient text retrieval and analysis. J. ACM, 34(3):578–595, 1987. Previous version in: STOC’84.

    Article  MathSciNet  Google Scholar 

  4. A. Bräzma, E. Ukkonen, and J. Vilo. Discovering unbounded unions of regular pattern languages from positive examples. In Proc. 7th International Symposium on Algorithms and Computation (ISAAC’96), pages 95–104, 1996.

    Google Scholar 

  5. M. Crochemore and W. Rytter. Text Algorithms. Oxford University Press, 1994.

    Google Scholar 

  6. L. Devroye, L. Györ., and G. Lugosi. A Probabilistic Theory of Pattern Recognition. Springer, 1997.

    Google Scholar 

  7. U. M. Fayyad, G. P.-Shapiro, and P. Smyth. From data mining to knowledge discovery: an overview. In Advances in Knowledge Discovery and Data Mining, pages 1–34. The AAAI Press, 1996.

    Google Scholar 

  8. Z. Galil. Open problems in stringology. In A. Apostolico and Z. Galil, editors, Combinatorial Algorithms on Words, NATO ASI Series, Advanced Science Institutes Series, Series F: Computer and Systems Sciences, Vol. 12, pages 1–8. Springer-Verlag, 1985.

    Google Scholar 

  9. D. Gusfield. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, New York, 1997.

    MATH  Google Scholar 

  10. H. Hori, S. Shimozono, M. Takeda, and A. Shinohara. Fragmentary pattern matching: Complexity, algorithms and applications for analyzing classic literary works. In Proc. 12th Annual International Symposium on Algorithms and Computation (ISAAC’⫗1), 2001. To appear.

    Google Scholar 

  11. T. Kadota, M. Hirao, A. Ishino, M. Takeda, A. Shinohara, and F. Matsuo. Musical sequence comparison for melodic and rhythmic similarities. In Proc. 8th International Symposium on String Processing and Information Retrieval (SPIRE2001). IEEE Computer Society, 2001. To appear.

    Google Scholar 

  12. O. Maruyama, T. Uchida, K. L. Sim, and S. Miyano. Designing views in HypothesisCreator: System for assisting in discovery. In Proc. 2nd International Conference on Discovery Science (DS’99), LNAI 1721, pages 115–127, 1999.

    Google Scholar 

  13. S. Morishita. On classification and regression. InProc. 1st International Conference on Discovery Science (DS’99), LNAI1532, pages 49–59, 1998.

    Google Scholar 

  14. S. Shimozono, H. Arimura, and S. Arikawa. Efficient discovery of optimal wordassociation patterns in large databases. New Gener. Comput., 18(1):49–60, 2000.

    Google Scholar 

  15. M. Takeda, T. Fukuda, I. Nanri, M. Yamasaki, and K. Tamari. Discovering instances of poetic allusion from anthologies of classical Japanese poems. Theor. Comput. Sci., 2001. To appear. Preliminary version in: Proc. DS’99 (LNAI 1721).

    Google Scholar 

  16. M. Takeda, T. Matsumoto, T. Fukuda, and I. Nanri. Discovering characteristic expressions from literary works. Theor. Comput. Sci., 2001. To appear. Preliminary version in: Proc. DS 2000 (LNAI1967).

    Google Scholar 

  17. K. Yamamoto, M. Takeda, A. Shinohara, T. Fukuda, and I. Nanri. Discovering repetitive expressions and afinities from anthologies of classical Japanese poems. In Proc. 4th International Conference on Discovery Science (DS2001), 2001. To appear.

    Google Scholar 

  18. M. Yamasaki, M. Takeda, T. Fukuda, and I. Nanri. Discovering characteristic patterns from collections of classical Japanese poems. New Gener. Comput., 18(1):61–73, 2000. Preliminary version in: Proc. DS’98 (LNAI 1532).

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Takeda, M., Fukuda, T., Nanri, I. (2002). Mining from Literary Texts: Pattern Discovery and Similarity Computation. In: Arikawa, S., Shinohara, A. (eds) Progress in Discovery Science. Lecture Notes in Computer Science(), vol 2281. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45884-0_39

Download citation

  • DOI: https://doi.org/10.1007/3-540-45884-0_39

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43338-5

  • Online ISBN: 978-3-540-45884-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics