Exploiting Large Unlabeled Data in Automatic Evaluation of Coherence in Czech

Novák, Michal; Mírovský, Jiří; Rysová, Kateřina; Rysová, Magdaléna

doi:10.1007/978-3-030-27947-9_17

Michal Novák⁹,
Jiří Mírovský⁹,
Kateřina Rysová⁹ &
…
Magdaléna Rysová⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11697))

Included in the following conference series:

International Conference on Text, Speech, and Dialogue

989 Accesses

Abstract

The paper contributes to the research on automatic evaluation of surface coherence in student essays. We look into possibilities of using large unlabeled data to improve quality of such evaluation. Particularly, we propose two approaches to benefit from the large data: (i) n-gram language model, and (ii) density estimates of features used by the evaluation system. In our experiments, we integrate these approaches that exploit data from the Czech National Corpus into the evaluator of surface coherence for Czech, the EVALD system, and test its performance on two datasets: essays written by native speakers (L1) as well as foreign learners of Czech (L2). The system implementing these approaches together with other new features significantly outperforms the original EVALD system, especially on L1 with a large margin.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
In the task of text ordering ranking, the original text is compared with a text created from the original one by a random permutation of its sentences; it is assumed that the original text is always more coherent than the shuffled one. This allows using raw data with no annotation of grades for training and testing the models.
2.
http://lindat.cz.
3.
Unlike n-gram models, word-based [18, 19] or character-based [12] recurrent neural language models are able to handle longer dependencies. However, we leave their incorporation in this task for the future work.
4.
Gaussian smoothing kernel corresponds to the radial basis function kernel frequently used in Support Vector Machines (SVM).
5.
Note that anytime we combine LM and DE features, density estimates are calculated from all but the LM features.
6.
Regression models require the labels in the L2 dataset (CEFR proficiency levels) to be converted to numbers. Furthermore, all (possibly non-integer) predictions must be discretized.
7.
Implemented in the Weka toolkit [8].
8.
Statistical significance was calculated by paired bootstrap resampling [14] at p-level \(p \le 0.05\).
9.
Note that the scores of EVALD 3.0 presented here slightly differ from those reported in [23] due to the modification in cross-validation procedure (see Sect. 6).

References

Boyd, A., et al.: The MERLIN corpus: learner language and the CEFR. In: Proceedings of LREC 2014, Reykjavík, Iceland, pp. 1281–1288. ELRA (2014)
Google Scholar
Chen, Y.Y., Liu, C.L., Lee, C.H., Chang, T.H., et al.: An unsupervised automated essay-scoring system. IEEE Intell. Syst. 25(5), 61–67 (2010)
Article Google Scholar
Coleman, M., Liau, T.L.: A computer readability formula designed for machine scoring. J. Appl. Psychol. 60(2), 283–284 (1975)
Article Google Scholar
Cui, B., Li, Y., Zhang, Y., Zhang, Z.: Text coherence analysis based on deep neural network. In: Proceedings of CIKM 2017, pp. 2027–2030. ACM (2017)
Google Scholar
Cummins, R., Yannakoudakis, H., Briscoe, T.: Unsupervised modeling of topical relevance in L2 learner text. In: Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications, San Diego, CA, pp. 95–104. ACL (2016)
Google Scholar
Farag, Y., Yannakoudakis, H., Briscoe, T.: Neural automated essay scoring and coherence modeling for adversarially crafted input. In: Proceedings of the NAACL:HLT 2018, New Orleans, Louisiana, Volume 1 (Long Papers), pp. 263–271. ACL (2018)
Google Scholar
Feng, V.W., Lin, Z., Hirst, G.: The impact of deep hierarchical discourse structures in the evaluation of text coherence. In: Proceedings of COLING 2014: Technical Papers, Dublin, Ireland, pp. 940–949. Dublin City University and ACL (2014)
Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–18 (2009)
Article Google Scholar
Hancke, J., Meurers, D.: Exploring CEFR classification for German based on rich linguistic modeling. In: Learner Corpus Research 2013. Book of Abstracts, pp. 54–56. Bergen, Norway (2013)
Google Scholar
Heafield, K.: KenLM: faster and smaller language model queries. In: Proceedings of WMT 2011, Edinburgh, Scotland, pp. 187–197. ACL (2011)
Google Scholar
Hnátková, M., Křen, M., Procházka, P., Skoumalová, H.: The SYN-series corpora of written Czech. In: Proceedings of LREC 2014, Reykjavik, Iceland, pp. 160–164. ELRA (2014)
Google Scholar
Kim, Y., Jernite, Y., Sontag, D., Rush, A.M.: Character-aware neural language models. In: Proceedings of AAAI 2016, pp. 2741–2749. AAAI Press, Phoenix (2016)
Google Scholar
Kincaid, J.P., Fishburne, Jr., R.P., Rogers, R.L., Chissom, B.S.: Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel. Technical report, pp. 8–75, Institute for Simulation and Training (1975)
Google Scholar
Koehn, P.: Statistical significance tests for machine translation evaluation. In: Proceedings of EMNLP 2004, Barcelona, Spain. ACL (2004)
Google Scholar
Lin, Z., Ng, H.T., Kan, M.Y.: Automatically evaluating text coherence using discourse relations. In: Proceedings of ACL:HLT 2011, Portland, OR, vol. 1, pp. 997–1006. ACL (2011)
Google Scholar
Mann, W.C., Thompson, S.A.: Rhetorical structure theory: toward a functional theory of text organization. Text-Interdisc. J. Study Discourse 8(3), 243–281 (1988)
Article Google Scholar
McLaughlin, H.G.: SMOG grading - a new readability formula. J. Reading 12(8), 639–646 (1969)
Google Scholar
Melis, G., Dyer, C., Blunsom, P.: On the state of the art of evaluation in neural language models. In: Proceedings of ICLR 2018. Vancouver, Canada (2018)
Google Scholar
Merity, S., Keskar, N.S., Socher, R.: Regularizing and optimizing LSTM language models. In: Proceedings of ICLR 2018. Vancouver, Canada (2018)
Google Scholar
Mesgar, M., Strube, M.: A neural local coherence model for text quality assessment. In: Proceedings of EMNLP 2018, Brussels, Belgium, pp. 4328–4339. ACL (2018)
Google Scholar
Mírovský, J., Novák, M., Rysová, K., Rysová, M., Hajičová, E.: EVALD 3.0 – Evaluator of Discourse, Charles University, Prague, Czech Republic (2018)
Google Scholar
Mírovský, J., Novák, M., Rysová, K., Rysová, M., Hajičová, E.: EVALD 3.0 for Foreigners - Evaluator of Discourse, Charles University. Czech Republic, Prague (2018)
Google Scholar
Novák, M., Mírovský, J., Rysová, K., Rysová, M.: Topic–focus articulation: a third pillar of automatic evaluation of text coherence. In: Batyrshin, I., Martínez-Villaseñor, M.L., Ponce Espinosa, H.E. (eds.) MICAI 2018. LNCS (LNAI), vol. 11289, pp. 96–108. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-04497-8_8
Chapter Google Scholar
Novák, M., Rysová, K., Rysová, M., Mírovský, J.: Incorporating coreference to automatic evaluation of coherence in essays. In: Camelin, N., Estève, Y., Martín-Vide, C. (eds.) SLSP 2017. LNCS (LNAI), vol. 10583, pp. 58–69. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68456-7_5
Chapter Google Scholar
Östling, R., Smolentzov, A., Hinnerich, B.T., Höglin, E.: Automated essay scoring for Swedish. In: Proceedings of the 8th Workshop on Innovative Use of NLP for Building Educational Applications, Atlanta, GA, pp. 42–47. ACL (2013)
Google Scholar
Persing, I., Ng, V.: Modeling prompt adherence in student essays. In: Proceedings of ACL 2014, Baltimore, MD, (Volume 1: Long Papers), pp. 1534–1543. ACL (2014)
Google Scholar
Popel, M., Žabokrtský, Z.: TectoMT: modular NLP framework. In: Loftsson, H., Rögnvaldsson, E., Helgadóttir, S. (eds.) NLP 2010. LNCS (LNAI), vol. 6233, pp. 293–304. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14770-8_33
Chapter Google Scholar
Rysová, K., Rysová, M., Mírovský, J.: Automatic evaluation of surface coherence in L2 texts in czech. In: Proceedings of ROCLING 2016, Taipei, Taiwan, pp. 214–228. ACLCLP (2016)
Google Scholar
Rysová, K., Rysová, M., Mírovský, J., Novák, M.: Introducing EVALD - software applications for automatic evaluation of discourse in czech. In: Proceedings of RANLP 2017, Varna, Bulgaria, pp. 634–641. INCOMA Ltd. (2017)
Google Scholar
Šebesta, K., Bedřichová, Z., Šormová, K., et al.: AKCES 5 (CzeSL-SGT) data/software, LINDAT/CLARIN digital library at ÚFAL MFF UK, Prague, Czech Republic (2014)
Google Scholar
Šebesta, K., Goláňová, H., Letafková, J., et al.: AKCES 1, data/software, LINDAT/CLARIN digital library at ÚFAL MFF UK, Prague, Czech Republic (2016)
Google Scholar
Vajjala, S.: Automated assessment of non-native learner essays: investigating the role of linguistic features. Int. J. Artif. Intell. Educ. 28(1), 79–105 (2018)
Article Google Scholar
Vajjala, S., Lõo, K.: Automatic CEFR level prediction for estonian learner text. In: Proceedings of the Third Workshop on NLP for Computer-assisted Language Learning, no. 107, pp. 113–127. Linköping University Electronic Press, Linköping (2014)
Google Scholar
Volodina, E., Pilán, I., Alfter, D.: Classification of Swedish learner essays by CEFR levels. In: Proceedings of EuroCALL 2016, Limassol, Cyprus, pp. 456–461. Research-publishing.net (2016)
Chapter Google Scholar
Zesch, T., Wojatzki, M., Scholten-Akoun, D.: Task-independent features for automated essay grading. In: Proceedings of the 10th Workshop on Innovative Use of NLP for Building Educational Applications, Denver, CO, pp. 224–232. ACL (2015)
Google Scholar

Download references

Acknowledgment

The authors acknowledge support from the Ministry of Culture of the Czech Republic (project No. DG16P02B016 Automatic Evaluation of Text Coherence in Czech). This work has been using language resources developed, stored and distributed by the LINDAT/CLARIN project of the Ministry of Education, Youth and Sports of the Czech Republic (project LM2015071). Many thanks to our colleagues Milan Straka and Jakub Náplava for providing us with a pre-trained n-gram model.

Author information

Authors and Affiliations

Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics, Charles University, Malostranské nám. 25, 118 00, Prague 1, Czech Republic
Michal Novák, Jiří Mírovský, Kateřina Rysová & Magdaléna Rysová

Authors

Michal Novák
View author publications
You can also search for this author in PubMed Google Scholar
Jiří Mírovský
View author publications
You can also search for this author in PubMed Google Scholar
Kateřina Rysová
View author publications
You can also search for this author in PubMed Google Scholar
Magdaléna Rysová
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michal Novák .

Editor information

Editors and Affiliations

University of West Bohemia, Pilsen, Czech Republic
Kamil Ekštein

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Novák, M., Mírovský, J., Rysová, K., Rysová, M. (2019). Exploiting Large Unlabeled Data in Automatic Evaluation of Coherence in Czech. In: Ekštein, K. (eds) Text, Speech, and Dialogue. TSD 2019. Lecture Notes in Computer Science(), vol 11697. Springer, Cham. https://doi.org/10.1007/978-3-030-27947-9_17

Download citation

DOI: https://doi.org/10.1007/978-3-030-27947-9_17
Published: 06 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-27946-2
Online ISBN: 978-3-030-27947-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics