Skip to main content

Readability Factors of Japanese Text Classification

  • Conference paper
Databases in Networked Information Systems (DNIS 2007)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4777))

Included in the following conference series:

Abstract

Languages with comprehensive alphabets in written form, such as the ideographic system of Chinese adopted to Japanese, have specific combinatorial potential for text summarization and categorization. Modern Japanese text is composed of strings over the Roman alphabet, components of two phonetic systems, Japanese syllabaries hiragana and katakana, and Chinese characters. This richness of information expression facilitates, unlike from most other languages, creation of synonyms and paraphrases, which may but do not need to be context-wise substantiable, depending not only on circumstance but also on the user of the text. Therefore readability of Japanese text is largely individual; it depends on education and incorporates life-long experience. This work presents a quantitative study into common readability factors of Japanese text, for which thirteen text markers were developed. Our statistical analysis expressed as a numerical readability index is accompanied by categorization of text contents, which is visualized as a specific location on self-organizing map over a reference text corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Hand, D.J., Mannila, H., Smyth, P.: Principles of Data Mining. MIT Press, Cambridge, MA (2001)

    Google Scholar 

  2. Feldman, R., Sanger, J.: The Text Mining Handbook. Cambridge University Press, Cambridge, UK (2006)

    Google Scholar 

  3. Smith, E.A., Kinkaid, P.: Derivation and Validation of the Automated Readability Index for Use with Technical Materials. Human Factors 12, 457–464 (1970)

    Google Scholar 

  4. Tateishi, Y., Ono, Y., Yamada, H.: A Computer Readability Formula of Japanese Texts for Machine Scoring. In: Proceedings of the 12th Conference on Computational Linguistics, vol. 2, pp. 649–654 (1988)

    Google Scholar 

  5. Hayashi, S.: Yomi no nouryoku to Yomiyasusa no youin to Yomareta kekka to. Mathematical Linguistics 11, 20–33 (1959)

    Google Scholar 

  6. Mc Laughlin, G.H.: SMOG grading: A new readability formula. Journal of Reading 12(8), 639–646 (1969)

    Google Scholar 

  7. Gunning, R.: The Technique of Clear Writing. McGraw-Hill, New York, NJ (1952)

    Google Scholar 

  8. Flesch, R.: A new readability yardstick. Journal of Applied Psychology 32, 221–233 (1948)

    Article  Google Scholar 

  9. Coleman, M., Liau, T.L.: A computer readability formula designed for machine scoring. Journal of Applied Psychology 60, 283–284 (1975)

    Article  Google Scholar 

  10. Kohonen, T.: Self-Organization and Associative Memory. Springer-Verlag, New York, NJ (1988)

    MATH  Google Scholar 

  11. Roussinov, D.G., Chen, H.: A Scalable Self-organizing Map Algorithm for Textual Classification. Artificial Intelligence Journal 15, 81–111 (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Subhash Bhalla

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Pichl, L., Narita, J. (2007). Readability Factors of Japanese Text Classification. In: Bhalla, S. (eds) Databases in Networked Information Systems. DNIS 2007. Lecture Notes in Computer Science, vol 4777. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75512-8_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-75512-8_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-75511-1

  • Online ISBN: 978-3-540-75512-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics