Skip to main content

An Ordinal Multi-class Classification Method for Readability Assessment of Chinese Documents

  • Conference paper
Knowledge Science, Engineering and Management (KSEM 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8793))

  • 1613 Accesses

Abstract

Readability assessment is worthwhile in recommending suitable documents for the readers. In this paper, we propose an Ordinal Multi-class Classification with Voting (OMCV) method for estimating the reading levels of Chinese documents. Based on current achievements of natural language processing, we also design five groups of text features to explore the peculiarities of Chinese. We collect the Chinese primary school language textbook dataset, and conduct experiments to demonstrate the effectiveness of both the method and the features. Experimental results show that our method has potential in improving the performance of the state-of-the-art classification and regression models, and the designed features are valuable in readability assessment of Chinese documents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Benjamin, R.G.: Reconstructing readability: Recent developments and recommendations in the analysis of text difficulty. Educational Psychology Review 24(1), 63–88 (2012)

    Article  MathSciNet  Google Scholar 

  2. Chen, Y.T., Chen, Y.H., Cheng, Y.C.: Assessing chinese readability using term frequency and lexical chain. In: Computational Linguistics and Chinese Language Processing (2013)

    Google Scholar 

  3. Collins-Thompson, K., Callan, J.P.: A language modeling approach to predicting reading difficulty. In: Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, pp. 193–200 (2004)

    Google Scholar 

  4. DuBay, W.: Smart language: Readers, readability, and the grading of text. BookSurge Publishing, Charleston (2007)

    Google Scholar 

  5. Feng, L., Jansche, M., Huenerfauth, M., Elhadad, N.: A comparison of features for automatic readability assessment. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 276–284. Association for Computational Linguistics (2010)

    Google Scholar 

  6. Flesch, R.: A new readability yardstick. Journal of Applied Psychology 32(3), 221 (1948)

    Article  Google Scholar 

  7. François, T.L.: Combining a statistical language model with logistic regression to predict the lexical and syntactic difficulty of texts for ffl. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop, pp. 19–27. Association for Computational Linguistics (2009)

    Google Scholar 

  8. Frank, E., Hall, M.: A simple approach to ordinal classification. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 145–156. Springer, Heidelberg (2001)

    Google Scholar 

  9. Fry, E.: Readability versus leveling. The Reading Teacher 56(3), 286–291 (2002)

    MathSciNet  Google Scholar 

  10. Hancke, J., Vajjala, S., Meurers, D.: Readability classification for german using lexical, syntactic, and morphological features. In: Proceedings of the 24th International Conference on Computational Linguistics, pp. 1063–1080 (2012)

    Google Scholar 

  11. Heilman, M., Collins-Thompson, K., Eskenazi, M.: An analysis of statistical models and features for reading difficulty prediction. In: Proceedings of the Third Workshop on Innovative Use of NLP for Building Educational Applications, pp. 71–79. Association for Computational Linguistics (2008)

    Google Scholar 

  12. Islam, Z., Mehler, A., Rahman, R.: Text readability classification of textbooks of a low-resource language. In: Proceedings of the 26th Pacific Asia Conference on Language, Information and Computation, pp. 545–553 (2012)

    Google Scholar 

  13. Kate, R.J., Luo, X., Patwardhan, S., Franz, M., Florian, R., Mooney, R.J., Roukos, S., Welty, C.: Learning to predict readability using diverse linguistic features. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 546–554. Association for Computational Linguistics (2010)

    Google Scholar 

  14. Kincaid, J.P., Fishburne Jr., R.P., Rogers, R.L., Chissom, B.S.: Derivation of new readability formulas for navy enlisted personnel. Tech. rep., Research Branch Report 8-75 (1975)

    Google Scholar 

  15. Levy, R., Manning, C.: Is it harder to parse Chinese, or the Chinese treebank? In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, vol. 1, pp. 439–446. Association for Computational Linguistics (2003)

    Google Scholar 

  16. Li, B., Vogel, C.: Improving multiclass text classification with error-correcting output coding and sub-class partitions. In: Farzindar, A., KeÅ¡elj, V. (eds.) Canadian AI 2010. LNCS, vol. 6085, pp. 4–15. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  17. McLaughlin, G.H.: Smog grading: A new readability formula. Journal of Reading 12(8), 639–646 (1969)

    Google Scholar 

  18. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12, 2825–2830 (2011)

    MATH  Google Scholar 

  19. Schwarm, S.E., Ostendorf, M.: Reading level assessment using support vector machines and statistical language models. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 523–530. Association for Computational Linguistics (2005)

    Google Scholar 

  20. Sinha, M., Sharma, S., Dasgupta, T., Basu, A.: New readability measures for Bangla and Hindi texts. In: Proceedings of COLING 2012: Posters, pp. 1141–1150 (2012)

    Google Scholar 

  21. Stevens, S.S.: On the theory of scales of measurement. Science 103(2684), 677–680 (1946)

    Article  MATH  Google Scholar 

  22. Vogel, M., Washburne, C.: An objective method of determining grade placement of children’s reading material. The Elementary School Journal 28(5), 373–381 (1928)

    Article  Google Scholar 

  23. Wan, X., Li, H., Xiao, J.: Eusum: extracting easy-to-understand English summaries for non-native readers. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 491–498. ACM (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Jiang, Z., Sun, G., Gu, Q., Chen, D. (2014). An Ordinal Multi-class Classification Method for Readability Assessment of Chinese Documents. In: Buchmann, R., Kifor, C.V., Yu, J. (eds) Knowledge Science, Engineering and Management. KSEM 2014. Lecture Notes in Computer Science(), vol 8793. Springer, Cham. https://doi.org/10.1007/978-3-319-12096-6_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-12096-6_6

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-12095-9

  • Online ISBN: 978-3-319-12096-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics