An Ordinal Multi-class Classification Method for Readability Assessment of Chinese Documents

Jiang, Zhiwei; Sun, Gang; Gu, Qing; Chen, Daoxu

doi:10.1007/978-3-319-12096-6_6

Zhiwei Jiang²²,
Gang Sun²²,
Qing Gu²² &
…
Daoxu Chen²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8793))

Included in the following conference series:

International Conference on Knowledge Science, Engineering and Management

1613 Accesses

Abstract

Readability assessment is worthwhile in recommending suitable documents for the readers. In this paper, we propose an Ordinal Multi-class Classification with Voting (OMCV) method for estimating the reading levels of Chinese documents. Based on current achievements of natural language processing, we also design five groups of text features to explore the peculiarities of Chinese. We collect the Chinese primary school language textbook dataset, and conduct experiments to demonstrate the effectiveness of both the method and the features. Experimental results show that our method has potential in improving the performance of the state-of-the-art classification and regression models, and the designed features are valuable in readability assessment of Chinese documents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Adaptation of Classic Readability Metrics to Czech

An Extended Graph-Based Label Propagation Method for Readability Assessment

Automatic Readability Assessment Based on Phraseological Complexity

References

Benjamin, R.G.: Reconstructing readability: Recent developments and recommendations in the analysis of text difficulty. Educational Psychology Review 24(1), 63–88 (2012)
Article MathSciNet Google Scholar
Chen, Y.T., Chen, Y.H., Cheng, Y.C.: Assessing chinese readability using term frequency and lexical chain. In: Computational Linguistics and Chinese Language Processing (2013)
Google Scholar
Collins-Thompson, K., Callan, J.P.: A language modeling approach to predicting reading difficulty. In: Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, pp. 193–200 (2004)
Google Scholar
DuBay, W.: Smart language: Readers, readability, and the grading of text. BookSurge Publishing, Charleston (2007)
Google Scholar
Feng, L., Jansche, M., Huenerfauth, M., Elhadad, N.: A comparison of features for automatic readability assessment. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 276–284. Association for Computational Linguistics (2010)
Google Scholar
Flesch, R.: A new readability yardstick. Journal of Applied Psychology 32(3), 221 (1948)
Article Google Scholar
François, T.L.: Combining a statistical language model with logistic regression to predict the lexical and syntactic difficulty of texts for ffl. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop, pp. 19–27. Association for Computational Linguistics (2009)
Google Scholar
Frank, E., Hall, M.: A simple approach to ordinal classification. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 145–156. Springer, Heidelberg (2001)
Google Scholar
Fry, E.: Readability versus leveling. The Reading Teacher 56(3), 286–291 (2002)
MathSciNet Google Scholar
Hancke, J., Vajjala, S., Meurers, D.: Readability classification for german using lexical, syntactic, and morphological features. In: Proceedings of the 24th International Conference on Computational Linguistics, pp. 1063–1080 (2012)
Google Scholar
Heilman, M., Collins-Thompson, K., Eskenazi, M.: An analysis of statistical models and features for reading difficulty prediction. In: Proceedings of the Third Workshop on Innovative Use of NLP for Building Educational Applications, pp. 71–79. Association for Computational Linguistics (2008)
Google Scholar
Islam, Z., Mehler, A., Rahman, R.: Text readability classification of textbooks of a low-resource language. In: Proceedings of the 26th Pacific Asia Conference on Language, Information and Computation, pp. 545–553 (2012)
Google Scholar
Kate, R.J., Luo, X., Patwardhan, S., Franz, M., Florian, R., Mooney, R.J., Roukos, S., Welty, C.: Learning to predict readability using diverse linguistic features. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 546–554. Association for Computational Linguistics (2010)
Google Scholar
Kincaid, J.P., Fishburne Jr., R.P., Rogers, R.L., Chissom, B.S.: Derivation of new readability formulas for navy enlisted personnel. Tech. rep., Research Branch Report 8-75 (1975)
Google Scholar
Levy, R., Manning, C.: Is it harder to parse Chinese, or the Chinese treebank? In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, vol. 1, pp. 439–446. Association for Computational Linguistics (2003)
Google Scholar
Li, B., Vogel, C.: Improving multiclass text classification with error-correcting output coding and sub-class partitions. In: Farzindar, A., Kešelj, V. (eds.) Canadian AI 2010. LNCS, vol. 6085, pp. 4–15. Springer, Heidelberg (2010)
Chapter Google Scholar
McLaughlin, G.H.: Smog grading: A new readability formula. Journal of Reading 12(8), 639–646 (1969)
Google Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12, 2825–2830 (2011)
MATH Google Scholar
Schwarm, S.E., Ostendorf, M.: Reading level assessment using support vector machines and statistical language models. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 523–530. Association for Computational Linguistics (2005)
Google Scholar
Sinha, M., Sharma, S., Dasgupta, T., Basu, A.: New readability measures for Bangla and Hindi texts. In: Proceedings of COLING 2012: Posters, pp. 1141–1150 (2012)
Google Scholar
Stevens, S.S.: On the theory of scales of measurement. Science 103(2684), 677–680 (1946)
Article MATH Google Scholar
Vogel, M., Washburne, C.: An objective method of determining grade placement of children’s reading material. The Elementary School Journal 28(5), 373–381 (1928)
Article Google Scholar
Wan, X., Li, H., Xiao, J.: Eusum: extracting easy-to-understand English summaries for non-native readers. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 491–498. ACM (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, China
Zhiwei Jiang, Gang Sun, Qing Gu & Daoxu Chen

Authors

Zhiwei Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Gang Sun
View author publications
You can also search for this author in PubMed Google Scholar
Qing Gu
View author publications
You can also search for this author in PubMed Google Scholar
Daoxu Chen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Computer Science, Knowledge Engineering Research Group, University of Vienna, Währingerstr. 29, 1090, Vienna, Austria
Robert Buchmann
Faculty of Engineering, Research Center for Sustainable Products and Processes, Lucian Blaga University of Sibiu, 10 Victoriei Blv., 550024, Sibiu, Romania
Claudiu Vasile Kifor
Department of Computer Science, Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University, Haidian District, 100044, Beijing, China
Jian Yu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jiang, Z., Sun, G., Gu, Q., Chen, D. (2014). An Ordinal Multi-class Classification Method for Readability Assessment of Chinese Documents. In: Buchmann, R., Kifor, C.V., Yu, J. (eds) Knowledge Science, Engineering and Management. KSEM 2014. Lecture Notes in Computer Science(), vol 8793. Springer, Cham. https://doi.org/10.1007/978-3-319-12096-6_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-12096-6_6
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12095-9
Online ISBN: 978-3-319-12096-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics