Skip to main content

Exploiting Large Unlabeled Data in Automatic Evaluation of Coherence in Czech

  • Conference paper
  • First Online:
Book cover Text, Speech, and Dialogue (TSD 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11697))

Included in the following conference series:

  • 989 Accesses

Abstract

The paper contributes to the research on automatic evaluation of surface coherence in student essays. We look into possibilities of using large unlabeled data to improve quality of such evaluation. Particularly, we propose two approaches to benefit from the large data: (i) n-gram language model, and (ii) density estimates of features used by the evaluation system. In our experiments, we integrate these approaches that exploit data from the Czech National Corpus into the evaluator of surface coherence for Czech, the EVALD system, and test its performance on two datasets: essays written by native speakers (L1) as well as foreign learners of Czech (L2). The system implementing these approaches together with other new features significantly outperforms the original EVALD system, especially on L1 with a large margin.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In the task of text ordering ranking, the original text is compared with a text created from the original one by a random permutation of its sentences; it is assumed that the original text is always more coherent than the shuffled one. This allows using raw data with no annotation of grades for training and testing the models.

  2. 2.

    http://lindat.cz.

  3. 3.

    Unlike n-gram models, word-based [18, 19] or character-based [12] recurrent neural language models are able to handle longer dependencies. However, we leave their incorporation in this task for the future work.

  4. 4.

    Gaussian smoothing kernel corresponds to the radial basis function kernel frequently used in Support Vector Machines (SVM).

  5. 5.

    Note that anytime we combine LM and DE features, density estimates are calculated from all but the LM features.

  6. 6.

    Regression models require the labels in the L2 dataset (CEFR proficiency levels) to be converted to numbers. Furthermore, all (possibly non-integer) predictions must be discretized.

  7. 7.

    Implemented in the Weka toolkit [8].

  8. 8.

    Statistical significance was calculated by paired bootstrap resampling [14] at p-level \(p \le 0.05\).

  9. 9.

    Note that the scores of EVALD 3.0 presented here slightly differ from those reported in [23] due to the modification in cross-validation procedure (see Sect. 6).

References

  1. Boyd, A., et al.: The MERLIN corpus: learner language and the CEFR. In: Proceedings of LREC 2014, Reykjavík, Iceland, pp. 1281–1288. ELRA (2014)

    Google Scholar 

  2. Chen, Y.Y., Liu, C.L., Lee, C.H., Chang, T.H., et al.: An unsupervised automated essay-scoring system. IEEE Intell. Syst. 25(5), 61–67 (2010)

    Article  Google Scholar 

  3. Coleman, M., Liau, T.L.: A computer readability formula designed for machine scoring. J. Appl. Psychol. 60(2), 283–284 (1975)

    Article  Google Scholar 

  4. Cui, B., Li, Y., Zhang, Y., Zhang, Z.: Text coherence analysis based on deep neural network. In: Proceedings of CIKM 2017, pp. 2027–2030. ACM (2017)

    Google Scholar 

  5. Cummins, R., Yannakoudakis, H., Briscoe, T.: Unsupervised modeling of topical relevance in L2 learner text. In: Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications, San Diego, CA, pp. 95–104. ACL (2016)

    Google Scholar 

  6. Farag, Y., Yannakoudakis, H., Briscoe, T.: Neural automated essay scoring and coherence modeling for adversarially crafted input. In: Proceedings of the NAACL:HLT 2018, New Orleans, Louisiana, Volume 1 (Long Papers), pp. 263–271. ACL (2018)

    Google Scholar 

  7. Feng, V.W., Lin, Z., Hirst, G.: The impact of deep hierarchical discourse structures in the evaluation of text coherence. In: Proceedings of COLING 2014: Technical Papers, Dublin, Ireland, pp. 940–949. Dublin City University and ACL (2014)

    Google Scholar 

  8. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–18 (2009)

    Article  Google Scholar 

  9. Hancke, J., Meurers, D.: Exploring CEFR classification for German based on rich linguistic modeling. In: Learner Corpus Research 2013. Book of Abstracts, pp. 54–56. Bergen, Norway (2013)

    Google Scholar 

  10. Heafield, K.: KenLM: faster and smaller language model queries. In: Proceedings of WMT 2011, Edinburgh, Scotland, pp. 187–197. ACL (2011)

    Google Scholar 

  11. Hnátková, M., Křen, M., Procházka, P., Skoumalová, H.: The SYN-series corpora of written Czech. In: Proceedings of LREC 2014, Reykjavik, Iceland, pp. 160–164. ELRA (2014)

    Google Scholar 

  12. Kim, Y., Jernite, Y., Sontag, D., Rush, A.M.: Character-aware neural language models. In: Proceedings of AAAI 2016, pp. 2741–2749. AAAI Press, Phoenix (2016)

    Google Scholar 

  13. Kincaid, J.P., Fishburne, Jr., R.P., Rogers, R.L., Chissom, B.S.: Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel. Technical report, pp. 8–75, Institute for Simulation and Training (1975)

    Google Scholar 

  14. Koehn, P.: Statistical significance tests for machine translation evaluation. In: Proceedings of EMNLP 2004, Barcelona, Spain. ACL (2004)

    Google Scholar 

  15. Lin, Z., Ng, H.T., Kan, M.Y.: Automatically evaluating text coherence using discourse relations. In: Proceedings of ACL:HLT 2011, Portland, OR, vol. 1, pp. 997–1006. ACL (2011)

    Google Scholar 

  16. Mann, W.C., Thompson, S.A.: Rhetorical structure theory: toward a functional theory of text organization. Text-Interdisc. J. Study Discourse 8(3), 243–281 (1988)

    Article  Google Scholar 

  17. McLaughlin, H.G.: SMOG grading - a new readability formula. J. Reading 12(8), 639–646 (1969)

    Google Scholar 

  18. Melis, G., Dyer, C., Blunsom, P.: On the state of the art of evaluation in neural language models. In: Proceedings of ICLR 2018. Vancouver, Canada (2018)

    Google Scholar 

  19. Merity, S., Keskar, N.S., Socher, R.: Regularizing and optimizing LSTM language models. In: Proceedings of ICLR 2018. Vancouver, Canada (2018)

    Google Scholar 

  20. Mesgar, M., Strube, M.: A neural local coherence model for text quality assessment. In: Proceedings of EMNLP 2018, Brussels, Belgium, pp. 4328–4339. ACL (2018)

    Google Scholar 

  21. Mírovský, J., Novák, M., Rysová, K., Rysová, M., Hajičová, E.: EVALD 3.0 – Evaluator of Discourse, Charles University, Prague, Czech Republic (2018)

    Google Scholar 

  22. Mírovský, J., Novák, M., Rysová, K., Rysová, M., Hajičová, E.: EVALD 3.0 for Foreigners - Evaluator of Discourse, Charles University. Czech Republic, Prague (2018)

    Google Scholar 

  23. Novák, M., Mírovský, J., Rysová, K., Rysová, M.: Topic–focus articulation: a third pillar of automatic evaluation of text coherence. In: Batyrshin, I., Martínez-Villaseñor, M.L., Ponce Espinosa, H.E. (eds.) MICAI 2018. LNCS (LNAI), vol. 11289, pp. 96–108. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-04497-8_8

    Chapter  Google Scholar 

  24. Novák, M., Rysová, K., Rysová, M., Mírovský, J.: Incorporating coreference to automatic evaluation of coherence in essays. In: Camelin, N., Estève, Y., Martín-Vide, C. (eds.) SLSP 2017. LNCS (LNAI), vol. 10583, pp. 58–69. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68456-7_5

    Chapter  Google Scholar 

  25. Östling, R., Smolentzov, A., Hinnerich, B.T., Höglin, E.: Automated essay scoring for Swedish. In: Proceedings of the 8th Workshop on Innovative Use of NLP for Building Educational Applications, Atlanta, GA, pp. 42–47. ACL (2013)

    Google Scholar 

  26. Persing, I., Ng, V.: Modeling prompt adherence in student essays. In: Proceedings of ACL 2014, Baltimore, MD, (Volume 1: Long Papers), pp. 1534–1543. ACL (2014)

    Google Scholar 

  27. Popel, M., Žabokrtský, Z.: TectoMT: modular NLP framework. In: Loftsson, H., Rögnvaldsson, E., Helgadóttir, S. (eds.) NLP 2010. LNCS (LNAI), vol. 6233, pp. 293–304. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14770-8_33

    Chapter  Google Scholar 

  28. Rysová, K., Rysová, M., Mírovský, J.: Automatic evaluation of surface coherence in L2 texts in czech. In: Proceedings of ROCLING 2016, Taipei, Taiwan, pp. 214–228. ACLCLP (2016)

    Google Scholar 

  29. Rysová, K., Rysová, M., Mírovský, J., Novák, M.: Introducing EVALD - software applications for automatic evaluation of discourse in czech. In: Proceedings of RANLP 2017, Varna, Bulgaria, pp. 634–641. INCOMA Ltd. (2017)

    Google Scholar 

  30. Šebesta, K., Bedřichová, Z., Šormová, K., et al.: AKCES 5 (CzeSL-SGT) data/software, LINDAT/CLARIN digital library at ÚFAL MFF UK, Prague, Czech Republic (2014)

    Google Scholar 

  31. Šebesta, K., Goláňová, H., Letafková, J., et al.: AKCES 1, data/software, LINDAT/CLARIN digital library at ÚFAL MFF UK, Prague, Czech Republic (2016)

    Google Scholar 

  32. Vajjala, S.: Automated assessment of non-native learner essays: investigating the role of linguistic features. Int. J. Artif. Intell. Educ. 28(1), 79–105 (2018)

    Article  Google Scholar 

  33. Vajjala, S., Lõo, K.: Automatic CEFR level prediction for estonian learner text. In: Proceedings of the Third Workshop on NLP for Computer-assisted Language Learning, no. 107, pp. 113–127. Linköping University Electronic Press, Linköping (2014)

    Google Scholar 

  34. Volodina, E., Pilán, I., Alfter, D.: Classification of Swedish learner essays by CEFR levels. In: Proceedings of EuroCALL 2016, Limassol, Cyprus, pp. 456–461. Research-publishing.net (2016)

    Chapter  Google Scholar 

  35. Zesch, T., Wojatzki, M., Scholten-Akoun, D.: Task-independent features for automated essay grading. In: Proceedings of the 10th Workshop on Innovative Use of NLP for Building Educational Applications, Denver, CO, pp. 224–232. ACL (2015)

    Google Scholar 

Download references

Acknowledgment

The authors acknowledge support from the Ministry of Culture of the Czech Republic (project No. DG16P02B016 Automatic Evaluation of Text Coherence in Czech). This work has been using language resources developed, stored and distributed by the LINDAT/CLARIN project of the Ministry of Education, Youth and Sports of the Czech Republic (project LM2015071). Many thanks to our colleagues Milan Straka and Jakub Náplava for providing us with a pre-trained n-gram model.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michal Novák .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Novák, M., Mírovský, J., Rysová, K., Rysová, M. (2019). Exploiting Large Unlabeled Data in Automatic Evaluation of Coherence in Czech. In: Ekštein, K. (eds) Text, Speech, and Dialogue. TSD 2019. Lecture Notes in Computer Science(), vol 11697. Springer, Cham. https://doi.org/10.1007/978-3-030-27947-9_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-27947-9_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-27946-2

  • Online ISBN: 978-3-030-27947-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics