Skip to main content

Sentence-Level Readability Assessment for L2 Chinese Learning

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11831))

Abstract

Automatic assessment of sentence readability level can support educators in selecting sentence examples suitable for different learning levels to complement teaching materials. Although there exists extensive research on document-level and passage-level Chinese readability assessment, the sentence-level evaluation remains little explored. We bridge the gap by providing a research framework and a large corpus of nearly 40,000 sentences with ten-level readability annotation. We design experiments to analyze the influence of 88 linguistic features on sentence complexity and results suggest that the linguistic features can significantly improve the predictive performance with the highest of 70.78% distance-1 adjacent accuracy. Model comparison also confirms that our proposed set of features can reduce the bias in prediction without adding variances. We hope that our corpus, feature sets, and experimental validation can provide educators and linguists with more language resources, enlightenment, and automatic tools for future related research.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://www.aihanyu.org/.

  2. 2.

    The seven universities participating in text-books survey are Renmin University of China, Beijing Language and Culture University, Sun Yat-sen University, Jinan University, South China Normal University, Huaqiao University, and Fujian Normal University.

  3. 3.

    http://www.ltp-cloud.com/.

  4. 4.

    http://www.niuparser.com/.

References

  1. Flesch, R.: A new readability yardstick. J. Appl. Psychol. 32, 221–233 (1948)

    Article  Google Scholar 

  2. Collins-Thompson, K., Callan, J.: A language-modelling approach to predicting reading difficulty. In: Proceedings NAACL-HLT, Boston, pp. 193–200 (2004)

    Google Scholar 

  3. Woodsend, Lapata: Learning to simplify sentences with quasi-synchronous grammar and integer programming. In: Proceedings of EMNLP 2011, pp. 409–420 (2011)

    Google Scholar 

  4. Husák, M.: Automatic retrieval of good dictionary examples. Bachelor Thesis, Brno (2010)

    Google Scholar 

  5. Segler, T.M.: Investigating the selection of example sentences for unknown target words in ICALL reading texts for L2 German. PhD Thesis. University of Edinburgh (2007)

    Google Scholar 

  6. Vajjala, Meurers: On improving the accuracy of readability classification using insights from second language acquisition. In: Proceedings of the ACL 2012 BEA 7th Workshop, pp. 163–173 (2012)

    Google Scholar 

  7. Pilán, et al.: Rule-based and machine learning approaches for second language sentence-level readability. In: Proceeding of the ACL 2014 BEA 9th Workshop, pp. 174–184 (2014)

    Google Scholar 

  8. Schumacher, E., et al.: Predicting the relative difficulty of single sentences with and without surrounding context. In: Proceedings of EMNLP 2016, pp. 1871–1881 (2016)

    Google Scholar 

  9. Song, R.: Stream model of generalized topic structure in Chinese text. Stud. Chin. Lang. 357(6), 483–494 (2013). (in Chinese)

    Google Scholar 

  10. Lin, D.: On the structural complexity of natural language sentences. In: Proceedings of COLING 1996, pp. 729–733 (1996)

    Google Scholar 

  11. Liu, Haitao: Dependency distance as a metric of language comprehension difficulty. J. Cogn. Sci. 9(2), 159–191 (2008)

    Article  Google Scholar 

  12. Feng, L.: Automatic readability assessment. Ph.D. thesis, The City University of New York (2010)

    Google Scholar 

  13. Sung, Y., et al.: Leveling L2 texts through readability: combining multilevel linguistic features with the CEFR. Mod. Lang. J. 99(2), 371–391 (2015)

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by National Social Science Fund (Grant No. 17BGL068). We thank undergraduate students Zhiwei Wu, Yuansheng Wang, Xu Zhang, Yuan Chen, Hanwu Chen, Licong Tan, and Hao Zhang for their helpful assistance and support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xinying Qiu .

Editor information

Editors and Affiliations

Appendix

Appendix

Linguistic Features

Category

Sub-category

Feature definition

Shallow features

Character features

1. Percentage of most-common characters per sentence

2. Percentage of second-most-common characters per sentence

3. Percentage of all common-characters per sentence

4. Percentage of low-stroke-count characters per sentence

5. Percentage of medium-stroke-count characters per sentence

6. Percentage of high-stroke-count characters per sentence

7. Average number of strokes per word per sentence

8. Percentage of HSK1 to HSK3-characters per sentence

9. Percentage of HSK4 to HSK5-characters per sentence

10. Percentage of HSK6-characters per sentence

11. Percentage of not-HSK-characters per sentence

Word features

12. Average number of characters per word per sentence

13. Average number of characters per unique word per sentence

14. Number of two-character words per sentence

15. Percentage of two-character words per sentence

16. Number of three-character words per sentence

17. Percentage of three-character words per sentence

18. Number of four-character words per sentence

19. Percentage of four-character words per sentence

20. Number of five-up-character words per sentence

21. Percentage of five-up-character words per sentence

22. Percentage of HSK1 to HSK3-words per sentence

23. Percentage of HSK4 to HSK5-words per sentence

24. Percentage of HSK6-words per sentence

25. Percentage of Not-HSK-words per sentence

Sentence features

26. Number of multi-character words per sentence

27. Number of words per sentence

28. Number of characters per sentence

29. Number of characters (including punctuations, numerical, and symbols) per sentence

POS Features

Adjectives

30. Percentage of adjectives per sentence

31. Percentage of unique adjectives per sentence

32. Number of unique adjectives per sentence

33. Number of adjectives per sentence

Functional words

34. Percentage of functional words per sentence

35. Percentage of unique functional words per sentence

36. Number of unique functional words per sentence

37. Number of functional words per sentence

Verbs

38. Percentage of verbs per sentence

39. Number of unique verbs per sentence

40. Percentage of unique verbs per sentence

41. Number of verbs per sentence

Nouns

42. Percentage of nouns per sentence

43. Number of unique nouns per sentence

44. Percentage of unique nouns per sentence

45. Number of nouns per sentence

46. Percentage of All-Nouns per sentence

47. Number of unique All-Nouns per sentence

48. Percentage of unique All-Nouns per sentence

49. Number of All-Nouns per sentence

Content words

50. Percentage of content words per sentence

51. Number of unique content words per sentence

52. Percentage of unique content words per sentence

53. Number of content words per sentence

Idioms

54. Percentage of idioms per sentence

55. Number of unique idioms per sentence

56. Percentage of unique idioms per sentence

57. Number of idioms per sentence

Adverbs

58. Percentage of adverbs per sentence

59. Percentage of unique adverbs per sentence

60. Number of unique adverbs per sentence

61. Number of adverbs per sentence

Syntactic features

Phrases

62. Total number of noun phrases per sentence

63. Total number of verbal phrases per sentence

64. Total number of prepositional phrases per sentence

65. Average length of noun phrases per sentence

66. Average length of verbal phrases per sentence

67. Average length of prepositional phrases per sentence

Clauses

68. Number of punctuation-clauses per sentence

69. Average dependency distance per sentence

70. Maximum dependency distance per sentence

Sentences

71. Height of parse tree per sentence

72. Total number of dependency distances per sentence

73. Average number of dependency distances per sentence

Discourse features

Entity density

74. Total number of entities per sentence

75. Total number of unique entities per sentence

76. Percentage of entities per sentence

77. Percentage of unique entities per sentence

78. Percentage of named entities per sentence

79. Percentage of named entities against total number of entities per sentence

80. Percentage of Not-NE nouns per sentence

81. Number of Not-NE nouns per sentence

82. Number of Not-Entity nouns per sentence

Cohesion

83. Percentage of conjunctions per sentence

84. Number of unique conjunctions per sentence

85. Percentage of unique conjunctions per sentence

86. Percentage of pronouns per sentence

87. Number of unique pronouns per sentence

88. Percentage of unique pronouns per sentence

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lu, D., Qiu, X., Cai, Y. (2020). Sentence-Level Readability Assessment for L2 Chinese Learning. In: Hong, JF., Zhang, Y., Liu, P. (eds) Chinese Lexical Semantics. CLSW 2019. Lecture Notes in Computer Science(), vol 11831. Springer, Cham. https://doi.org/10.1007/978-3-030-38189-9_40

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-38189-9_40

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-38188-2

  • Online ISBN: 978-3-030-38189-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics