Skip to main content

Linguistic Feature Representation with Statistical Relational Learning for Readability Assessment

  • Conference paper
  • First Online:
Natural Language Processing and Chinese Computing (NLPCC 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11839))

Abstract

Traditional NLP model for readability assessment represents document as vector of words or vector of linguistic features that may be sparse, discrete, and ignoring the latent relations among features. We observe from data and linguistics theory that a document’s linguistic features are not necessarily conditionally independent. To capture the latent relations among linguistic features, we propose to build feature graphs and learn distributed representation with Statistical Relational Learning. We then project the document vectors onto the linguistic feature embedding space to produce linguistic feature knowledge-enriched document representation. We showcase this idea with Chinese L1 readability classification experiments and achieve positive results. Our proposed model performs better than traditional vector space models and other embedding based models for current data set and deserves further exploration.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://ictclas.nlpir.org/.

  2. 2.

    http://www.ltp-cloud.com/.

  3. 3.

    http://www.niuparser.com/.

References

  1. Collins-Thompson, K., Callan, J.: A language-modelling approach to predicting reading difficulty. In: Proceedings of HLT-NAACL, Boston (2004)

    Google Scholar 

  2. Jiang, Z., et al.: Enriching word embeddings with domain knowledge for readability assessment. In: Proceedings of COLING 2018, pp. 366–378 (2018)

    Google Scholar 

  3. Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: Advances in Neural Information Processing Systems, pp. 2787–2795 (2013)

    Google Scholar 

  4. Getoor, L., Taskar, B.: Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning). The MIT Press, Cambridge (2007)

    Book  Google Scholar 

  5. Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers Inc., San Francisco (1988)

    MATH  Google Scholar 

  6. Sung, Y.T., et al.: Leveling L2 texts through readability: combining multilevel linguistic features with the CEFR. Mod. Lang. J. 99(2), 371–391 (2015)

    Article  Google Scholar 

  7. Jiang, Z., Sun, G., Gu, Q., Chen, D.: An ordinal multi-class classification method for readability assessment of Chinese documents. In: Buchmann, R., Kifor, C.V., Yu, J. (eds.) KSEM 2014. LNCS (LNAI), vol. 8793, pp. 61–72. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12096-6_6

    Chapter  Google Scholar 

  8. Jiang, Z., et al.: A graph-based readability assessment method using word coupling. In: Proceedings of EMNLP 2015, pp. 411–420 (2015)

    Google Scholar 

  9. Flesch, R.: A new readability yardstick. J. Appl. Psychol. 32(3), 221 (1948)

    Article  Google Scholar 

  10. Feng, L.: Automatic readability assessment. Ph.D Thesis. The City University of New York (2010)

    Google Scholar 

  11. Vajjala, S., Meurers, D.: On improving the accuracy of readability classification using insights from second language acquisition. In: Proceedings of the ACL 2012 BEA 7th Workshop, pp. 163–173 (2012)

    Google Scholar 

  12. Todirascu, A., et al.: Are cohesive features relevant for text readability evaluation? In: Proceedings of COLING 2016, pp. 987–997 (2016)

    Google Scholar 

  13. Qiu, X., Deng, K., Qiu, L., Wang, X.: Exploring the impact of linguistic features for Chinese readability assessment. In: Huang, X., Jiang, J., Zhao, D., Feng, Y., Hong, Yu. (eds.) NLPCC 2017. LNCS (LNAI), vol. 10619, pp. 771–783. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73618-1_67

    Chapter  Google Scholar 

  14. Mikolov, T., et al.: Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)

    Google Scholar 

  15. Kim, Y.: Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014)

Download references

Acknowledgements

This work was supported by National Social Science Fund (Grant No. 17BGL068). We thank Prof. Jianyun Nie and anonymous reviewers for their valuable suggestions and thoughtful feedback. We thank undergraduate students Zhiwei Wu, Yuansheng Wang, Xu Zhang, Yuan Chen, Hanwu Chen, Licong Tan, and Hao Zhang for their helpful assistance and support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuming Shen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Qiu, X., Lu, D., Shen, Y., Cai, Y. (2019). Linguistic Feature Representation with Statistical Relational Learning for Readability Assessment. In: Tang, J., Kan, MY., Zhao, D., Li, S., Zan, H. (eds) Natural Language Processing and Chinese Computing. NLPCC 2019. Lecture Notes in Computer Science(), vol 11839. Springer, Cham. https://doi.org/10.1007/978-3-030-32236-6_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-32236-6_32

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-32235-9

  • Online ISBN: 978-3-030-32236-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics