Multi-modal indicators for estimating perceived cognitive load in post-editing of machine translation

Herbig, Nico; Pal, Santanu; Vela, Mihaela; Krüger, Antonio; van Genabith, Josef

doi:10.1007/s10590-019-09227-8

Multi-modal indicators for estimating perceived cognitive load in post-editing of machine translation

Published: 15 March 2019

Volume 33, pages 91–115, (2019)
Cite this article

Machine Translation

Nico Herbig ORCID: orcid.org/0000-0001-5710-954X¹,
Santanu Pal²,
Mihaela Vela²,
Antonio Krüger¹ &
…
Josef van Genabith^1,2

1031 Accesses
13 Citations
7 Altmetric
Explore all metrics

Abstract

In this paper, we develop a model that uses a wide range of physiological and behavioral sensor data to estimate perceived cognitive load (CL) during post-editing (PE) of machine translated (MT) text. By predicting the subjectively reported perceived CL, we aim to quantify the extent of demands placed on the mental resources available during PE. This could for example be used to better capture the usefulness of MT proposals for PE, including the mental effort required, in contrast to the mere closeness to a reference perspective that current MT evaluation focuses on. We compare the effectiveness of our physiological and behavioral features individually and in combination with each other and with the more traditional text and time features relevant to the task. Many of the physiological and behavioral features have not previously been applied to PE. Based on the data gathered from ten participants, we show that our multi-modal measurement approach outperforms all baseline measures in terms of predicting the perceived level of CL as measured by a psychological scale. Combinations of eye-, skin-, and heart-based indicators enhance the results over each individual measure. Additionally, adding PE time improves the regression results further. An investigation of correlations between the best performing features, including sensor features previously unexplored in PE, and the corresponding subjective ratings indicates that the multi-modal approach takes advantage of several weakly to moderately correlated features to combine them into a stronger model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

AI-based chatbots in customer service and their effects on user compliance

Article Open access 17 March 2020

Assessing the Attitude Towards Artificial Intelligence: Introduction of a Short Measure in German, Chinese, and English Language

Article Open access 23 September 2020

The challenges of entering the metaverse: An experiment on the effect of extended reality on workload

Article Open access 12 February 2022

Notes

https://azure.microsoft.com/services/cognitive-services.
The study was approved by the university’s ethical review board and the data protection officer.
As \(\mathrm{TER}\) intervals we used [35–50], [60–70], and [80–95].
http://scikit-learn.org/.

References

Arshad S, Wang Y, Chen F (2013) Analysing mouse activity for cognitive load detection. In: Proceedings of the 25th Australian computer-human interaction conference: augmentation, application, innovation, collaboration, ACM, pp 115–118
Asteriadis S, Tzouveli P, Karpouzis K, Kollias S (2009) Estimation of behavioral user state based on eye gaze and head pose—application in an e-learning environment. Multimed Tools Appl 41(3):469–493
Article Google Scholar
Callison-Burch C, Koehn P, Monz C, Peterson K, Przybocki M, Zaidan OF (2010) Findings of the 2010 joint workshop on statistical machine translation and metrics for machine translation. In: Proceedings of the joint fifth workshop on statistical machine translation and metrics, association for computational linguistics, pp 17–53
Chanel G, Rebetez C, Bétrancourt M, Pun T (2008) Boredom, engagement and anxiety as indicators for adaptation to difficulty in games. In: Proceedings of the 12th international conference on entertainment and media in the ubiquitous era, ACM, pp 13–17
Chen F, Zhou J, Wang Y, Yu K, Arshad SZ, Khawaji A, Conway D (2016) Robust multimodal cognitive load measurement. Springer, New York
Book Google Scholar
Chen S, Epps J (2013) Automatic classification of eye activity for cognitive load measurement with emotion interference. Comput Methods Prog Biomed 110(2):111–124
Article Google Scholar
Corder GW (2009) Nonparametric statistics for non-statisticians: a step-by-step approach. Wiley, New York
Book MATH Google Scholar
Demberg V, Sayeed A (2016) The frequency of rapid pupil dilations as a measure of linguistic processing difficulty. PLoS ONE 11(1):1–29
Article Google Scholar
Dietterich TG (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10(7):1895–1923
Article Google Scholar
Doherty S, O’Brien S, Carl M (2010) Eye tracking as an MT evaluation technique. Mach Transl 24(1):1–13
Article Google Scholar
Gehring J, Auli M, Grangier D, Yarats D, Dauphin YN (2017) Convolutional sequence to sequence learning. arXiv
Goldberg JH, Kotval XP (1999) Computer interface evaluation using eye movements: methods and constructs. Int J Ind Ergon 24(6):631–645
Article Google Scholar
Guerberof A (2009) Productivity and quality in the post-editing of outputs from translation memories and machine translation. Int J Localiz 7(1):11–21
Google Scholar
Hart SG, Staveland LE (1988) Development of NASA-TLX (Task Load Index): results of empirical and theoretical research. In: Advances in psychology, vol 52, Elsevier, Amsterdam, pp 139–183
Hockey GRJ (1997) Compensatory control in the regulation of human performance under stress and high workload: a cognitive-energetical framework. Biol Psychol 45(1):73–93
Article Google Scholar
Hosseini SA, Khalilzadeh MA (2010) Emotional stress recognition system using EEG and psychophysiological signals: Using new labelling process of EEG signals in emotional stress state. In: International conference on biomedical engineering and computer science, IEEE, pp 1–6
Iqbal ST, Zheng XS, Bailey BP (2004) Task-evoked pupillary response to mental workload in human-computer interaction. In: Extended abstracts on human factors in computing systems, ACM, pp 1477–1480
Kahou SE, Bouthillier X, Lamblin P, Gulcehre C, Michalski V, Konda K, Jean S, Froumenty P, Dauphin Y, Boulanger-Lewandowski N (2016) Emonets: multimodal deep learning approaches for emotion recognition in video. J Multimodal User Interfaces 10(2):99–111
Article Google Scholar
Koponen M (2012) Comparing human perceptions of post-editing effort with post-editing operations. In: Proceedings of the seventh workshop on statistical machine translation, association for computational linguistics, pp 181–190
Koponen M (2016) Is machine translation post-editing worth the effort? A survey of research into post-editing and effort. J Specialised Transl 25:131–148
Google Scholar
Koponen M, Aziz W, Ramos L, Specia L (2012) Post-editing time as a measure of cognitive effort. In: AMTA workshop on post-editing technology and practice, pp 11–20
Kramer AF (1991) Physiological metrics of mental workload: A review of recent progress. Multiple-task performance pp 279–328
Krings HP (2001) Repairing texts: empirical investigations of machine translation post-editing processes, vol 5. Kent State University Press, Kent
Google Scholar
Kruger JL, Doherty S (2016) Measuring cognitive load in the presence of educational video: towards a multimodal methodology. Aust J Educ Technol 32(6):19
Google Scholar
Kruger JL, Doherty S, Fox W, De Lissa P (2018) Multimodal measurement of cognitive load during subtitle processing. Innovation and expansion in translation process research, p 267
Lacruz I, Shreve GM (2014) Pauses and cognitive effort in post-editing. Post-editing of machine translation: processes and applications, p 246
Lacruz I, Shreve GM, Angelone E (2012) Average pause ratio as an indicator of cognitive effort in post-editing: a case study. In: AMTA workshop on post-editing technology and practice, pp 21–30
Lafferty JD, McCallum A, Pereira FCN (2001) Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the eighteenth international conference on machine learning, pp 282–289
Lavie A, Agarwal A (2007) Meteor: an automatic metric for MT evaluation with high levels of correlation with human judgments. In: Proceedings of the second workshop on statistical machine translation
Lin CY, Och FJ (2004) Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics. In: Proceedings of the 42nd annual meeting of the association for computational linguistics
Mack DJ, Belfanti S, Schwarz U (2017) The effect of sampling rate and lowpass filters on saccades-a modeling approach. Behav Res Methods 49(6):2146–2162
Article Google Scholar
Mellinger CD (2014) Computer-assisted translation: an empirical investigation of cognitive effort. Kent State University, Kent
Google Scholar
Moorkens J, O’Brien S, da Silva IA, de Lima Fonseca NB, Alves F (2015) Correlations of perceived post-editing effort with measurements of actual effort. Mach Transl 29(3–4):267–284
Article Google Scholar
Mulder L (1992) Measurement and analysis methods of heart rate and respiration for use in applied environments. Biol Psychol 34(2):205–236
Article Google Scholar
O’Brien S (2005) Methodologies for measuring the correlations between post-editing effort and machine translatability. Mach Transl 19(1):37–58
Article Google Scholar
O’Brien S (2006) Eye-tracking and translation memory matches. Perspectives 14(3):185–205
Google Scholar
O’Brien S (2006b) Pauses as indicators of cognitive effort in post-editing machine translation output. Across Lang Cult 7(1):1–21
Article Google Scholar
Paas F, Tuovinen JE, Tabbers H, Van Gerven PW (2003) Cognitive load measurement as a means to advance cognitive load theory. Educ Psychol 38(1):63–71
Article Google Scholar
Paas FG, Van Merriënboer JJ (1994) Instructional control of cognitive load in the training of complex cognitive tasks. Educ Psychol Rev 6(4):351–371
Article Google Scholar
Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the ACL, pp 311–318
Popovic M, Lommel A, Burchardt A, Avramidis E, Uszkoreit H (2014) Relations between different types of post-editing operations, cognitive effort and temporal effort. In: Proceedings of the 17th annual conference of the european association for machine translation, pp 191–198
Rowe DW, Sibert J, Irwin D (1998) Heart rate variability: indicator of user state as an aid to human-computer interaction. In: Proceedings of the conference on human factors in computing systems, pp 480–487
Sennrich R, Haddow B, Birch A (2016) Neural machine translation of rare words with subword units. In: Proceedings of the 54th annual meeting of the association for computational linguistics
Sennrich R, Birch A, Currey A, Germann U, Haddow B, Heafield K, Miceli Barone AV, Williams P (2017) The University of Edinburgh’s neural MT systems for WMT17. In: Proceedings of the second conference on machine translation, vol 2. Shared Task Papers, pp 389–399
Shi Y, Ruiz N, Taib R, Choi E, Chen F (2007) Galvanic skin response (GSR) as an index of cognitive load. In: Extended abstracts on human factors in computing systems, pp 2651–2656
Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: Proceedings of the association for machine translation in the Americas, pp 223–231
Snover M, Madnani N, Dorr B, Schwartz R (2009) Fluency, adequacy, or HTER? Exploring different human judgments with a tunable MT metric. In: Proceedings of the 4th workshop on statistical machine translation, pp 259–268
Solovey E, Schermerhorn P, Scheutz M, Sassaroli A, Fantini S, Jacob R (2012) Brainput: enhancing interactive systems with streaming fNIRS brain input. In: Proceedings of the conference on human factors in computing systems, ACM, pp 2193–2202
Soukupova T, Cech J (2016) Real-time eye blink detection using facial landmarks. In: 21st computer vision winter workshop, pp 1–8
Specia L, Raj D, Turchi M (2010) Machine translation evaluation versus quality estimation. Mach Transl 24(1):39–50
Article Google Scholar
Stuyven E, Van der Goten K, Vandierendonck A, Claeys K, Crevits L (2000) The effect of cognitive load on saccadic eye movements. Acta Psychologica 104(1):69–85
Article Google Scholar
Sweller J (1988) Cognitive load during problem solving: effects on learning. Cognit Sci 12(2):257–285
Article Google Scholar
Sweller J, Van Merrienboer JJ, Paas FG (1998) Cognitive architecture and instructional design. Educ Psychol Rev 10(3):251–296
Article Google Scholar
Tatsumi M (2009) Correlation between automatic evaluation metric scores, post-editing speed, and some other factors. The twelfth machine translation summit, pp 332–339
Temnikova IP (2010) Cognitive evaluation approach for a controlled language post-editing experiment. In: Proceedings of the international conference on language resources and evaluation
Van Orden KF, Limbert W, Makeig S, Jung TP (2001) Eye activity correlates of workload during a visuospatial memory task. Hum Factors 43(1):111–121
Article Google Scholar
Vieira LN (2014) Indices of cognitive effort in machine translation post-editing. Mach Transl 28(3–4):187–216
Article Google Scholar
Vieira LN (2016) How do measures of cognitive effort relate to each other? A multivariate analysis of post-editing process data. Mach Transl 30(1–2):41–62
Article Google Scholar
Villarejo MV, Zapirain BG, Zorrilla AM (2012) A stress sensor based on galvanic skin response (GSR) controlled by ZigBee. Sensors 12(5):6075–6101
Article Google Scholar
Yamakoshi T, Yamakoshi K, Tanaka S, Nogawa M, Park SB, Shibata M, Sawada Y, Rolfe P, Hirose Y (2008) Feasibility study on driver’s stress detection from differential skin temperature measurement. In: Engineering in medicine and biology society, IEEE, pp 1076–1079
Zampieri M, Vela M (2014) Quantifying the influence of MT output in the translators’ performance: A case study in technical translation. In: Proceedings of the EACL workshop on humans and computer-assisted translation, pp 93–98

Download references

Author information

Authors and Affiliations

German Research Center for Artificial Intelligence (DFKI), Saarland Informatics Campus, Saarbrücken, Germany
Nico Herbig, Antonio Krüger & Josef van Genabith
Saarland University, Saarbrücken, Germany
Santanu Pal, Mihaela Vela & Josef van Genabith

Authors

Nico Herbig
View author publications
You can also search for this author in PubMed Google Scholar
Santanu Pal
View author publications
You can also search for this author in PubMed Google Scholar
Mihaela Vela
View author publications
You can also search for this author in PubMed Google Scholar
Antonio Krüger
View author publications
You can also search for this author in PubMed Google Scholar
Josef van Genabith
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nico Herbig.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This research was funded in part by the Deutsche Forschungsgemeinschaft (DFG) under Grant No. GE 2819/2-1 / AOBJ: 636684. The responsibility lies with the authors. We further want to thank the reviewers and editors for their very valuable feedback.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Herbig, N., Pal, S., Vela, M. et al. Multi-modal indicators for estimating perceived cognitive load in post-editing of machine translation. Machine Translation 33, 91–115 (2019). https://doi.org/10.1007/s10590-019-09227-8

Download citation

Received: 15 July 2018
Accepted: 09 February 2019
Published: 15 March 2019
Issue Date: 15 June 2019
DOI: https://doi.org/10.1007/s10590-019-09227-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-modal indicators for estimating perceived cognitive load in post-editing of machine translation

Abstract

Access this article

Similar content being viewed by others

AI-based chatbots in customer service and their effects on user compliance

Assessing the Attitude Towards Artificial Intelligence: Introduction of a Short Measure in German, Chinese, and English Language

The challenges of entering the metaverse: An experiment on the effect of extended reality on workload

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-modal indicators for estimating perceived cognitive load in post-editing of machine translation

Abstract

Access this article

Similar content being viewed by others

AI-based chatbots in customer service and their effects on user compliance

Assessing the Attitude Towards Artificial Intelligence: Introduction of a Short Measure in German, Chinese, and English Language

The challenges of entering the metaverse: An experiment on the effect of extended reality on workload

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation