Abstract
Traditional NLP model for readability assessment represents document as vector of words or vector of linguistic features that may be sparse, discrete, and ignoring the latent relations among features. We observe from data and linguistics theory that a document’s linguistic features are not necessarily conditionally independent. To capture the latent relations among linguistic features, we propose to build feature graphs and learn distributed representation with Statistical Relational Learning. We then project the document vectors onto the linguistic feature embedding space to produce linguistic feature knowledge-enriched document representation. We showcase this idea with Chinese L1 readability classification experiments and achieve positive results. Our proposed model performs better than traditional vector space models and other embedding based models for current data set and deserves further exploration.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Collins-Thompson, K., Callan, J.: A language-modelling approach to predicting reading difficulty. In: Proceedings of HLT-NAACL, Boston (2004)
Jiang, Z., et al.: Enriching word embeddings with domain knowledge for readability assessment. In: Proceedings of COLING 2018, pp. 366–378 (2018)
Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: Advances in Neural Information Processing Systems, pp. 2787–2795 (2013)
Getoor, L., Taskar, B.: Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning). The MIT Press, Cambridge (2007)
Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers Inc., San Francisco (1988)
Sung, Y.T., et al.: Leveling L2 texts through readability: combining multilevel linguistic features with the CEFR. Mod. Lang. J. 99(2), 371–391 (2015)
Jiang, Z., Sun, G., Gu, Q., Chen, D.: An ordinal multi-class classification method for readability assessment of Chinese documents. In: Buchmann, R., Kifor, C.V., Yu, J. (eds.) KSEM 2014. LNCS (LNAI), vol. 8793, pp. 61–72. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12096-6_6
Jiang, Z., et al.: A graph-based readability assessment method using word coupling. In: Proceedings of EMNLP 2015, pp. 411–420 (2015)
Flesch, R.: A new readability yardstick. J. Appl. Psychol. 32(3), 221 (1948)
Feng, L.: Automatic readability assessment. Ph.D Thesis. The City University of New York (2010)
Vajjala, S., Meurers, D.: On improving the accuracy of readability classification using insights from second language acquisition. In: Proceedings of the ACL 2012 BEA 7th Workshop, pp. 163–173 (2012)
Todirascu, A., et al.: Are cohesive features relevant for text readability evaluation? In: Proceedings of COLING 2016, pp. 987–997 (2016)
Qiu, X., Deng, K., Qiu, L., Wang, X.: Exploring the impact of linguistic features for Chinese readability assessment. In: Huang, X., Jiang, J., Zhao, D., Feng, Y., Hong, Yu. (eds.) NLPCC 2017. LNCS (LNAI), vol. 10619, pp. 771–783. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73618-1_67
Mikolov, T., et al.: Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Kim, Y.: Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014)
Acknowledgements
This work was supported by National Social Science Fund (Grant No. 17BGL068). We thank Prof. Jianyun Nie and anonymous reviewers for their valuable suggestions and thoughtful feedback. We thank undergraduate students Zhiwei Wu, Yuansheng Wang, Xu Zhang, Yuan Chen, Hanwu Chen, Licong Tan, and Hao Zhang for their helpful assistance and support.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Qiu, X., Lu, D., Shen, Y., Cai, Y. (2019). Linguistic Feature Representation with Statistical Relational Learning for Readability Assessment. In: Tang, J., Kan, MY., Zhao, D., Li, S., Zan, H. (eds) Natural Language Processing and Chinese Computing. NLPCC 2019. Lecture Notes in Computer Science(), vol 11839. Springer, Cham. https://doi.org/10.1007/978-3-030-32236-6_32
Download citation
DOI: https://doi.org/10.1007/978-3-030-32236-6_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32235-9
Online ISBN: 978-3-030-32236-6
eBook Packages: Computer ScienceComputer Science (R0)