Abstract
The paper contributes to the research on automatic evaluation of surface coherence in student essays. We look into possibilities of using large unlabeled data to improve quality of such evaluation. Particularly, we propose two approaches to benefit from the large data: (i) n-gram language model, and (ii) density estimates of features used by the evaluation system. In our experiments, we integrate these approaches that exploit data from the Czech National Corpus into the evaluator of surface coherence for Czech, the EVALD system, and test its performance on two datasets: essays written by native speakers (L1) as well as foreign learners of Czech (L2). The system implementing these approaches together with other new features significantly outperforms the original EVALD system, especially on L1 with a large margin.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
In the task of text ordering ranking, the original text is compared with a text created from the original one by a random permutation of its sentences; it is assumed that the original text is always more coherent than the shuffled one. This allows using raw data with no annotation of grades for training and testing the models.
- 2.
- 3.
- 4.
Gaussian smoothing kernel corresponds to the radial basis function kernel frequently used in Support Vector Machines (SVM).
- 5.
Note that anytime we combine LM and DE features, density estimates are calculated from all but the LM features.
- 6.
Regression models require the labels in the L2 dataset (CEFR proficiency levels) to be converted to numbers. Furthermore, all (possibly non-integer) predictions must be discretized.
- 7.
Implemented in the Weka toolkit [8].
- 8.
Statistical significance was calculated by paired bootstrap resampling [14] at p-level \(p \le 0.05\).
- 9.
References
Boyd, A., et al.: The MERLIN corpus: learner language and the CEFR. In: Proceedings of LREC 2014, Reykjavík, Iceland, pp. 1281–1288. ELRA (2014)
Chen, Y.Y., Liu, C.L., Lee, C.H., Chang, T.H., et al.: An unsupervised automated essay-scoring system. IEEE Intell. Syst. 25(5), 61–67 (2010)
Coleman, M., Liau, T.L.: A computer readability formula designed for machine scoring. J. Appl. Psychol. 60(2), 283–284 (1975)
Cui, B., Li, Y., Zhang, Y., Zhang, Z.: Text coherence analysis based on deep neural network. In: Proceedings of CIKM 2017, pp. 2027–2030. ACM (2017)
Cummins, R., Yannakoudakis, H., Briscoe, T.: Unsupervised modeling of topical relevance in L2 learner text. In: Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications, San Diego, CA, pp. 95–104. ACL (2016)
Farag, Y., Yannakoudakis, H., Briscoe, T.: Neural automated essay scoring and coherence modeling for adversarially crafted input. In: Proceedings of the NAACL:HLT 2018, New Orleans, Louisiana, Volume 1 (Long Papers), pp. 263–271. ACL (2018)
Feng, V.W., Lin, Z., Hirst, G.: The impact of deep hierarchical discourse structures in the evaluation of text coherence. In: Proceedings of COLING 2014: Technical Papers, Dublin, Ireland, pp. 940–949. Dublin City University and ACL (2014)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–18 (2009)
Hancke, J., Meurers, D.: Exploring CEFR classification for German based on rich linguistic modeling. In: Learner Corpus Research 2013. Book of Abstracts, pp. 54–56. Bergen, Norway (2013)
Heafield, K.: KenLM: faster and smaller language model queries. In: Proceedings of WMT 2011, Edinburgh, Scotland, pp. 187–197. ACL (2011)
Hnátková, M., Křen, M., Procházka, P., Skoumalová, H.: The SYN-series corpora of written Czech. In: Proceedings of LREC 2014, Reykjavik, Iceland, pp. 160–164. ELRA (2014)
Kim, Y., Jernite, Y., Sontag, D., Rush, A.M.: Character-aware neural language models. In: Proceedings of AAAI 2016, pp. 2741–2749. AAAI Press, Phoenix (2016)
Kincaid, J.P., Fishburne, Jr., R.P., Rogers, R.L., Chissom, B.S.: Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel. Technical report, pp. 8–75, Institute for Simulation and Training (1975)
Koehn, P.: Statistical significance tests for machine translation evaluation. In: Proceedings of EMNLP 2004, Barcelona, Spain. ACL (2004)
Lin, Z., Ng, H.T., Kan, M.Y.: Automatically evaluating text coherence using discourse relations. In: Proceedings of ACL:HLT 2011, Portland, OR, vol. 1, pp. 997–1006. ACL (2011)
Mann, W.C., Thompson, S.A.: Rhetorical structure theory: toward a functional theory of text organization. Text-Interdisc. J. Study Discourse 8(3), 243–281 (1988)
McLaughlin, H.G.: SMOG grading - a new readability formula. J. Reading 12(8), 639–646 (1969)
Melis, G., Dyer, C., Blunsom, P.: On the state of the art of evaluation in neural language models. In: Proceedings of ICLR 2018. Vancouver, Canada (2018)
Merity, S., Keskar, N.S., Socher, R.: Regularizing and optimizing LSTM language models. In: Proceedings of ICLR 2018. Vancouver, Canada (2018)
Mesgar, M., Strube, M.: A neural local coherence model for text quality assessment. In: Proceedings of EMNLP 2018, Brussels, Belgium, pp. 4328–4339. ACL (2018)
Mírovský, J., Novák, M., Rysová, K., Rysová, M., Hajičová, E.: EVALD 3.0 – Evaluator of Discourse, Charles University, Prague, Czech Republic (2018)
Mírovský, J., Novák, M., Rysová, K., Rysová, M., Hajičová, E.: EVALD 3.0 for Foreigners - Evaluator of Discourse, Charles University. Czech Republic, Prague (2018)
Novák, M., Mírovský, J., Rysová, K., Rysová, M.: Topic–focus articulation: a third pillar of automatic evaluation of text coherence. In: Batyrshin, I., Martínez-Villaseñor, M.L., Ponce Espinosa, H.E. (eds.) MICAI 2018. LNCS (LNAI), vol. 11289, pp. 96–108. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-04497-8_8
Novák, M., Rysová, K., Rysová, M., Mírovský, J.: Incorporating coreference to automatic evaluation of coherence in essays. In: Camelin, N., Estève, Y., Martín-Vide, C. (eds.) SLSP 2017. LNCS (LNAI), vol. 10583, pp. 58–69. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68456-7_5
Östling, R., Smolentzov, A., Hinnerich, B.T., Höglin, E.: Automated essay scoring for Swedish. In: Proceedings of the 8th Workshop on Innovative Use of NLP for Building Educational Applications, Atlanta, GA, pp. 42–47. ACL (2013)
Persing, I., Ng, V.: Modeling prompt adherence in student essays. In: Proceedings of ACL 2014, Baltimore, MD, (Volume 1: Long Papers), pp. 1534–1543. ACL (2014)
Popel, M., Žabokrtský, Z.: TectoMT: modular NLP framework. In: Loftsson, H., Rögnvaldsson, E., Helgadóttir, S. (eds.) NLP 2010. LNCS (LNAI), vol. 6233, pp. 293–304. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14770-8_33
Rysová, K., Rysová, M., Mírovský, J.: Automatic evaluation of surface coherence in L2 texts in czech. In: Proceedings of ROCLING 2016, Taipei, Taiwan, pp. 214–228. ACLCLP (2016)
Rysová, K., Rysová, M., Mírovský, J., Novák, M.: Introducing EVALD - software applications for automatic evaluation of discourse in czech. In: Proceedings of RANLP 2017, Varna, Bulgaria, pp. 634–641. INCOMA Ltd. (2017)
Šebesta, K., Bedřichová, Z., Šormová, K., et al.: AKCES 5 (CzeSL-SGT) data/software, LINDAT/CLARIN digital library at ÚFAL MFF UK, Prague, Czech Republic (2014)
Šebesta, K., Goláňová, H., Letafková, J., et al.: AKCES 1, data/software, LINDAT/CLARIN digital library at ÚFAL MFF UK, Prague, Czech Republic (2016)
Vajjala, S.: Automated assessment of non-native learner essays: investigating the role of linguistic features. Int. J. Artif. Intell. Educ. 28(1), 79–105 (2018)
Vajjala, S., Lõo, K.: Automatic CEFR level prediction for estonian learner text. In: Proceedings of the Third Workshop on NLP for Computer-assisted Language Learning, no. 107, pp. 113–127. Linköping University Electronic Press, Linköping (2014)
Volodina, E., Pilán, I., Alfter, D.: Classification of Swedish learner essays by CEFR levels. In: Proceedings of EuroCALL 2016, Limassol, Cyprus, pp. 456–461. Research-publishing.net (2016)
Zesch, T., Wojatzki, M., Scholten-Akoun, D.: Task-independent features for automated essay grading. In: Proceedings of the 10th Workshop on Innovative Use of NLP for Building Educational Applications, Denver, CO, pp. 224–232. ACL (2015)
Acknowledgment
The authors acknowledge support from the Ministry of Culture of the Czech Republic (project No. DG16P02B016 Automatic Evaluation of Text Coherence in Czech). This work has been using language resources developed, stored and distributed by the LINDAT/CLARIN project of the Ministry of Education, Youth and Sports of the Czech Republic (project LM2015071). Many thanks to our colleagues Milan Straka and Jakub Náplava for providing us with a pre-trained n-gram model.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Novák, M., Mírovský, J., Rysová, K., Rysová, M. (2019). Exploiting Large Unlabeled Data in Automatic Evaluation of Coherence in Czech. In: Ekštein, K. (eds) Text, Speech, and Dialogue. TSD 2019. Lecture Notes in Computer Science(), vol 11697. Springer, Cham. https://doi.org/10.1007/978-3-030-27947-9_17
Download citation
DOI: https://doi.org/10.1007/978-3-030-27947-9_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-27946-2
Online ISBN: 978-3-030-27947-9
eBook Packages: Computer ScienceComputer Science (R0)