Abstract
Readability assessment is worthwhile in recommending suitable documents for the readers. In this paper, we propose an Ordinal Multi-class Classification with Voting (OMCV) method for estimating the reading levels of Chinese documents. Based on current achievements of natural language processing, we also design five groups of text features to explore the peculiarities of Chinese. We collect the Chinese primary school language textbook dataset, and conduct experiments to demonstrate the effectiveness of both the method and the features. Experimental results show that our method has potential in improving the performance of the state-of-the-art classification and regression models, and the designed features are valuable in readability assessment of Chinese documents.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Benjamin, R.G.: Reconstructing readability: Recent developments and recommendations in the analysis of text difficulty. Educational Psychology Review 24(1), 63–88 (2012)
Chen, Y.T., Chen, Y.H., Cheng, Y.C.: Assessing chinese readability using term frequency and lexical chain. In: Computational Linguistics and Chinese Language Processing (2013)
Collins-Thompson, K., Callan, J.P.: A language modeling approach to predicting reading difficulty. In: Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, pp. 193–200 (2004)
DuBay, W.: Smart language: Readers, readability, and the grading of text. BookSurge Publishing, Charleston (2007)
Feng, L., Jansche, M., Huenerfauth, M., Elhadad, N.: A comparison of features for automatic readability assessment. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 276–284. Association for Computational Linguistics (2010)
Flesch, R.: A new readability yardstick. Journal of Applied Psychology 32(3), 221 (1948)
François, T.L.: Combining a statistical language model with logistic regression to predict the lexical and syntactic difficulty of texts for ffl. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop, pp. 19–27. Association for Computational Linguistics (2009)
Frank, E., Hall, M.: A simple approach to ordinal classification. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 145–156. Springer, Heidelberg (2001)
Fry, E.: Readability versus leveling. The Reading Teacher 56(3), 286–291 (2002)
Hancke, J., Vajjala, S., Meurers, D.: Readability classification for german using lexical, syntactic, and morphological features. In: Proceedings of the 24th International Conference on Computational Linguistics, pp. 1063–1080 (2012)
Heilman, M., Collins-Thompson, K., Eskenazi, M.: An analysis of statistical models and features for reading difficulty prediction. In: Proceedings of the Third Workshop on Innovative Use of NLP for Building Educational Applications, pp. 71–79. Association for Computational Linguistics (2008)
Islam, Z., Mehler, A., Rahman, R.: Text readability classification of textbooks of a low-resource language. In: Proceedings of the 26th Pacific Asia Conference on Language, Information and Computation, pp. 545–553 (2012)
Kate, R.J., Luo, X., Patwardhan, S., Franz, M., Florian, R., Mooney, R.J., Roukos, S., Welty, C.: Learning to predict readability using diverse linguistic features. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 546–554. Association for Computational Linguistics (2010)
Kincaid, J.P., Fishburne Jr., R.P., Rogers, R.L., Chissom, B.S.: Derivation of new readability formulas for navy enlisted personnel. Tech. rep., Research Branch Report 8-75 (1975)
Levy, R., Manning, C.: Is it harder to parse Chinese, or the Chinese treebank? In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, vol. 1, pp. 439–446. Association for Computational Linguistics (2003)
Li, B., Vogel, C.: Improving multiclass text classification with error-correcting output coding and sub-class partitions. In: Farzindar, A., Kešelj, V. (eds.) Canadian AI 2010. LNCS, vol. 6085, pp. 4–15. Springer, Heidelberg (2010)
McLaughlin, G.H.: Smog grading: A new readability formula. Journal of Reading 12(8), 639–646 (1969)
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12, 2825–2830 (2011)
Schwarm, S.E., Ostendorf, M.: Reading level assessment using support vector machines and statistical language models. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 523–530. Association for Computational Linguistics (2005)
Sinha, M., Sharma, S., Dasgupta, T., Basu, A.: New readability measures for Bangla and Hindi texts. In: Proceedings of COLING 2012: Posters, pp. 1141–1150 (2012)
Stevens, S.S.: On the theory of scales of measurement. Science 103(2684), 677–680 (1946)
Vogel, M., Washburne, C.: An objective method of determining grade placement of children’s reading material. The Elementary School Journal 28(5), 373–381 (1928)
Wan, X., Li, H., Xiao, J.: Eusum: extracting easy-to-understand English summaries for non-native readers. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 491–498. ACM (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Jiang, Z., Sun, G., Gu, Q., Chen, D. (2014). An Ordinal Multi-class Classification Method for Readability Assessment of Chinese Documents. In: Buchmann, R., Kifor, C.V., Yu, J. (eds) Knowledge Science, Engineering and Management. KSEM 2014. Lecture Notes in Computer Science(), vol 8793. Springer, Cham. https://doi.org/10.1007/978-3-319-12096-6_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-12096-6_6
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12095-9
Online ISBN: 978-3-319-12096-6
eBook Packages: Computer ScienceComputer Science (R0)