Linguistic Feature Representation with Statistical Relational Learning for Readability Assessment

Qiu, Xinying; Lu, Dawei; Shen, Yuming; Cai, Yi

doi:10.1007/978-3-030-32236-6_32

Xinying Qiu¹³,
Dawei Lu¹⁴,
Yuming Shen¹³ &
…
Yi Cai¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11839))

Included in the following conference series:

CCF International Conference on Natural Language Processing and Chinese Computing

4666 Accesses
1 Citations

Abstract

Traditional NLP model for readability assessment represents document as vector of words or vector of linguistic features that may be sparse, discrete, and ignoring the latent relations among features. We observe from data and linguistics theory that a document’s linguistic features are not necessarily conditionally independent. To capture the latent relations among linguistic features, we propose to build feature graphs and learn distributed representation with Statistical Relational Learning. We then project the document vectors onto the linguistic feature embedding space to produce linguistic feature knowledge-enriched document representation. We showcase this idea with Chinese L1 readability classification experiments and achieve positive results. Our proposed model performs better than traditional vector space models and other embedding based models for current data set and deserves further exploration.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Collins-Thompson, K., Callan, J.: A language-modelling approach to predicting reading difficulty. In: Proceedings of HLT-NAACL, Boston (2004)
Google Scholar
Jiang, Z., et al.: Enriching word embeddings with domain knowledge for readability assessment. In: Proceedings of COLING 2018, pp. 366–378 (2018)
Google Scholar
Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: Advances in Neural Information Processing Systems, pp. 2787–2795 (2013)
Google Scholar
Getoor, L., Taskar, B.: Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning). The MIT Press, Cambridge (2007)
Book Google Scholar
Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers Inc., San Francisco (1988)
MATH Google Scholar
Sung, Y.T., et al.: Leveling L2 texts through readability: combining multilevel linguistic features with the CEFR. Mod. Lang. J. 99(2), 371–391 (2015)
Article Google Scholar
Jiang, Z., Sun, G., Gu, Q., Chen, D.: An ordinal multi-class classification method for readability assessment of Chinese documents. In: Buchmann, R., Kifor, C.V., Yu, J. (eds.) KSEM 2014. LNCS (LNAI), vol. 8793, pp. 61–72. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12096-6_6
Chapter Google Scholar
Jiang, Z., et al.: A graph-based readability assessment method using word coupling. In: Proceedings of EMNLP 2015, pp. 411–420 (2015)
Google Scholar
Flesch, R.: A new readability yardstick. J. Appl. Psychol. 32(3), 221 (1948)
Article Google Scholar
Feng, L.: Automatic readability assessment. Ph.D Thesis. The City University of New York (2010)
Google Scholar
Vajjala, S., Meurers, D.: On improving the accuracy of readability classification using insights from second language acquisition. In: Proceedings of the ACL 2012 BEA 7th Workshop, pp. 163–173 (2012)
Google Scholar
Todirascu, A., et al.: Are cohesive features relevant for text readability evaluation? In: Proceedings of COLING 2016, pp. 987–997 (2016)
Google Scholar
Qiu, X., Deng, K., Qiu, L., Wang, X.: Exploring the impact of linguistic features for Chinese readability assessment. In: Huang, X., Jiang, J., Zhao, D., Feng, Y., Hong, Yu. (eds.) NLPCC 2017. LNCS (LNAI), vol. 10619, pp. 771–783. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73618-1_67
Chapter Google Scholar
Mikolov, T., et al.: Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Google Scholar
Kim, Y.: Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014)

Download references

Acknowledgements

This work was supported by National Social Science Fund (Grant No. 17BGL068). We thank Prof. Jianyun Nie and anonymous reviewers for their valuable suggestions and thoughtful feedback. We thank undergraduate students Zhiwei Wu, Yuansheng Wang, Xu Zhang, Yuan Chen, Hanwu Chen, Licong Tan, and Hao Zhang for their helpful assistance and support.

Author information

Authors and Affiliations

School of Information Science and Technology, Guangdong University of Foreign Studies, Guangzhou, China
Xinying Qiu & Yuming Shen
School of Liberal Arts, Renmin University of China, Beijing, China
Dawei Lu
School of Software Engineering, South China University of Technology, Guangzhou, China
Yi Cai

Authors

Xinying Qiu
View author publications
You can also search for this author in PubMed Google Scholar
Dawei Lu
View author publications
You can also search for this author in PubMed Google Scholar
Yuming Shen
View author publications
You can also search for this author in PubMed Google Scholar
Yi Cai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuming Shen .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Jie Tang
National University of Singapore, Singapore, Singapore
Min-Yen Kan
Peking University, Beijing, China
Dongyan Zhao
Peking University, Beijing, China
Sujian Li
Zhengzhou University, Zhengzhou, China
Hongying Zan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Qiu, X., Lu, D., Shen, Y., Cai, Y. (2019). Linguistic Feature Representation with Statistical Relational Learning for Readability Assessment. In: Tang, J., Kan, MY., Zhao, D., Li, S., Zan, H. (eds) Natural Language Processing and Chinese Computing. NLPCC 2019. Lecture Notes in Computer Science(), vol 11839. Springer, Cham. https://doi.org/10.1007/978-3-030-32236-6_32

Download citation

DOI: https://doi.org/10.1007/978-3-030-32236-6_32
Published: 30 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32235-9
Online ISBN: 978-3-030-32236-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)