Skip to main content
Log in

Multi-modal indicators for estimating perceived cognitive load in post-editing of machine translation

  • Published:
Machine Translation

Abstract

In this paper, we develop a model that uses a wide range of physiological and behavioral sensor data to estimate perceived cognitive load (CL) during post-editing (PE) of machine translated (MT) text. By predicting the subjectively reported perceived CL, we aim to quantify the extent of demands placed on the mental resources available during PE. This could for example be used to better capture the usefulness of MT proposals for PE, including the mental effort required, in contrast to the mere closeness to a reference perspective that current MT evaluation focuses on. We compare the effectiveness of our physiological and behavioral features individually and in combination with each other and with the more traditional text and time features relevant to the task. Many of the physiological and behavioral features have not previously been applied to PE. Based on the data gathered from ten participants, we show that our multi-modal measurement approach outperforms all baseline measures in terms of predicting the perceived level of CL as measured by a psychological scale. Combinations of eye-, skin-, and heart-based indicators enhance the results over each individual measure. Additionally, adding PE time improves the regression results further. An investigation of correlations between the best performing features, including sensor features previously unexplored in PE, and the corresponding subjective ratings indicates that the multi-modal approach takes advantage of several weakly to moderately correlated features to combine them into a stronger model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. https://azure.microsoft.com/services/cognitive-services.

  2. The study was approved by the university’s ethical review board and the data protection officer.

  3. As \(\mathrm{TER}\) intervals we used [35–50], [60–70], and [80–95].

  4. http://scikit-learn.org/.

References

  • Arshad S, Wang Y, Chen F (2013) Analysing mouse activity for cognitive load detection. In: Proceedings of the 25th Australian computer-human interaction conference: augmentation, application, innovation, collaboration, ACM, pp 115–118

  • Asteriadis S, Tzouveli P, Karpouzis K, Kollias S (2009) Estimation of behavioral user state based on eye gaze and head pose—application in an e-learning environment. Multimed Tools Appl 41(3):469–493

    Article  Google Scholar 

  • Callison-Burch C, Koehn P, Monz C, Peterson K, Przybocki M, Zaidan OF (2010) Findings of the 2010 joint workshop on statistical machine translation and metrics for machine translation. In: Proceedings of the joint fifth workshop on statistical machine translation and metrics, association for computational linguistics, pp 17–53

  • Chanel G, Rebetez C, Bétrancourt M, Pun T (2008) Boredom, engagement and anxiety as indicators for adaptation to difficulty in games. In: Proceedings of the 12th international conference on entertainment and media in the ubiquitous era, ACM, pp 13–17

  • Chen F, Zhou J, Wang Y, Yu K, Arshad SZ, Khawaji A, Conway D (2016) Robust multimodal cognitive load measurement. Springer, New York

    Book  Google Scholar 

  • Chen S, Epps J (2013) Automatic classification of eye activity for cognitive load measurement with emotion interference. Comput Methods Prog Biomed 110(2):111–124

    Article  Google Scholar 

  • Corder GW (2009) Nonparametric statistics for non-statisticians: a step-by-step approach. Wiley, New York

    Book  MATH  Google Scholar 

  • Demberg V, Sayeed A (2016) The frequency of rapid pupil dilations as a measure of linguistic processing difficulty. PLoS ONE 11(1):1–29

    Article  Google Scholar 

  • Dietterich TG (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10(7):1895–1923

    Article  Google Scholar 

  • Doherty S, O’Brien S, Carl M (2010) Eye tracking as an MT evaluation technique. Mach Transl 24(1):1–13

    Article  Google Scholar 

  • Gehring J, Auli M, Grangier D, Yarats D, Dauphin YN (2017) Convolutional sequence to sequence learning. arXiv

  • Goldberg JH, Kotval XP (1999) Computer interface evaluation using eye movements: methods and constructs. Int J Ind Ergon 24(6):631–645

    Article  Google Scholar 

  • Guerberof A (2009) Productivity and quality in the post-editing of outputs from translation memories and machine translation. Int J Localiz 7(1):11–21

    Google Scholar 

  • Hart SG, Staveland LE (1988) Development of NASA-TLX (Task Load Index): results of empirical and theoretical research. In: Advances in psychology, vol 52, Elsevier, Amsterdam, pp 139–183

  • Hockey GRJ (1997) Compensatory control in the regulation of human performance under stress and high workload: a cognitive-energetical framework. Biol Psychol 45(1):73–93

    Article  Google Scholar 

  • Hosseini SA, Khalilzadeh MA (2010) Emotional stress recognition system using EEG and psychophysiological signals: Using new labelling process of EEG signals in emotional stress state. In: International conference on biomedical engineering and computer science, IEEE, pp 1–6

  • Iqbal ST, Zheng XS, Bailey BP (2004) Task-evoked pupillary response to mental workload in human-computer interaction. In: Extended abstracts on human factors in computing systems, ACM, pp 1477–1480

  • Kahou SE, Bouthillier X, Lamblin P, Gulcehre C, Michalski V, Konda K, Jean S, Froumenty P, Dauphin Y, Boulanger-Lewandowski N (2016) Emonets: multimodal deep learning approaches for emotion recognition in video. J Multimodal User Interfaces 10(2):99–111

    Article  Google Scholar 

  • Koponen M (2012) Comparing human perceptions of post-editing effort with post-editing operations. In: Proceedings of the seventh workshop on statistical machine translation, association for computational linguistics, pp 181–190

  • Koponen M (2016) Is machine translation post-editing worth the effort? A survey of research into post-editing and effort. J Specialised Transl 25:131–148

    Google Scholar 

  • Koponen M, Aziz W, Ramos L, Specia L (2012) Post-editing time as a measure of cognitive effort. In: AMTA workshop on post-editing technology and practice, pp 11–20

  • Kramer AF (1991) Physiological metrics of mental workload: A review of recent progress. Multiple-task performance pp 279–328

  • Krings HP (2001) Repairing texts: empirical investigations of machine translation post-editing processes, vol 5. Kent State University Press, Kent

    Google Scholar 

  • Kruger JL, Doherty S (2016) Measuring cognitive load in the presence of educational video: towards a multimodal methodology. Aust J Educ Technol 32(6):19

    Google Scholar 

  • Kruger JL, Doherty S, Fox W, De Lissa P (2018) Multimodal measurement of cognitive load during subtitle processing. Innovation and expansion in translation process research, p 267

  • Lacruz I, Shreve GM (2014) Pauses and cognitive effort in post-editing. Post-editing of machine translation: processes and applications, p 246

  • Lacruz I, Shreve GM, Angelone E (2012) Average pause ratio as an indicator of cognitive effort in post-editing: a case study. In: AMTA workshop on post-editing technology and practice, pp 21–30

  • Lafferty JD, McCallum A, Pereira FCN (2001) Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the eighteenth international conference on machine learning, pp 282–289

  • Lavie A, Agarwal A (2007) Meteor: an automatic metric for MT evaluation with high levels of correlation with human judgments. In: Proceedings of the second workshop on statistical machine translation

  • Lin CY, Och FJ (2004) Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics. In: Proceedings of the 42nd annual meeting of the association for computational linguistics

  • Mack DJ, Belfanti S, Schwarz U (2017) The effect of sampling rate and lowpass filters on saccades-a modeling approach. Behav Res Methods 49(6):2146–2162

    Article  Google Scholar 

  • Mellinger CD (2014) Computer-assisted translation: an empirical investigation of cognitive effort. Kent State University, Kent

    Google Scholar 

  • Moorkens J, O’Brien S, da Silva IA, de Lima Fonseca NB, Alves F (2015) Correlations of perceived post-editing effort with measurements of actual effort. Mach Transl 29(3–4):267–284

    Article  Google Scholar 

  • Mulder L (1992) Measurement and analysis methods of heart rate and respiration for use in applied environments. Biol Psychol 34(2):205–236

    Article  Google Scholar 

  • O’Brien S (2005) Methodologies for measuring the correlations between post-editing effort and machine translatability. Mach Transl 19(1):37–58

    Article  Google Scholar 

  • O’Brien S (2006) Eye-tracking and translation memory matches. Perspectives 14(3):185–205

    Google Scholar 

  • O’Brien S (2006b) Pauses as indicators of cognitive effort in post-editing machine translation output. Across Lang Cult 7(1):1–21

    Article  Google Scholar 

  • Paas F, Tuovinen JE, Tabbers H, Van Gerven PW (2003) Cognitive load measurement as a means to advance cognitive load theory. Educ Psychol 38(1):63–71

    Article  Google Scholar 

  • Paas FG, Van Merriënboer JJ (1994) Instructional control of cognitive load in the training of complex cognitive tasks. Educ Psychol Rev 6(4):351–371

    Article  Google Scholar 

  • Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the ACL, pp 311–318

  • Popovic M, Lommel A, Burchardt A, Avramidis E, Uszkoreit H (2014) Relations between different types of post-editing operations, cognitive effort and temporal effort. In: Proceedings of the 17th annual conference of the european association for machine translation, pp 191–198

  • Rowe DW, Sibert J, Irwin D (1998) Heart rate variability: indicator of user state as an aid to human-computer interaction. In: Proceedings of the conference on human factors in computing systems, pp 480–487

  • Sennrich R, Haddow B, Birch A (2016) Neural machine translation of rare words with subword units. In: Proceedings of the 54th annual meeting of the association for computational linguistics

  • Sennrich R, Birch A, Currey A, Germann U, Haddow B, Heafield K, Miceli Barone AV, Williams P (2017) The University of Edinburgh’s neural MT systems for WMT17. In: Proceedings of the second conference on machine translation, vol 2. Shared Task Papers, pp 389–399

  • Shi Y, Ruiz N, Taib R, Choi E, Chen F (2007) Galvanic skin response (GSR) as an index of cognitive load. In: Extended abstracts on human factors in computing systems, pp 2651–2656

  • Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: Proceedings of the association for machine translation in the Americas, pp 223–231

  • Snover M, Madnani N, Dorr B, Schwartz R (2009) Fluency, adequacy, or HTER? Exploring different human judgments with a tunable MT metric. In: Proceedings of the 4th workshop on statistical machine translation, pp 259–268

  • Solovey E, Schermerhorn P, Scheutz M, Sassaroli A, Fantini S, Jacob R (2012) Brainput: enhancing interactive systems with streaming fNIRS brain input. In: Proceedings of the conference on human factors in computing systems, ACM, pp 2193–2202

  • Soukupova T, Cech J (2016) Real-time eye blink detection using facial landmarks. In: 21st computer vision winter workshop, pp 1–8

  • Specia L, Raj D, Turchi M (2010) Machine translation evaluation versus quality estimation. Mach Transl 24(1):39–50

    Article  Google Scholar 

  • Stuyven E, Van der Goten K, Vandierendonck A, Claeys K, Crevits L (2000) The effect of cognitive load on saccadic eye movements. Acta Psychologica 104(1):69–85

    Article  Google Scholar 

  • Sweller J (1988) Cognitive load during problem solving: effects on learning. Cognit Sci 12(2):257–285

    Article  Google Scholar 

  • Sweller J, Van Merrienboer JJ, Paas FG (1998) Cognitive architecture and instructional design. Educ Psychol Rev 10(3):251–296

    Article  Google Scholar 

  • Tatsumi M (2009) Correlation between automatic evaluation metric scores, post-editing speed, and some other factors. The twelfth machine translation summit, pp 332–339

  • Temnikova IP (2010) Cognitive evaluation approach for a controlled language post-editing experiment. In: Proceedings of the international conference on language resources and evaluation

  • Van Orden KF, Limbert W, Makeig S, Jung TP (2001) Eye activity correlates of workload during a visuospatial memory task. Hum Factors 43(1):111–121

    Article  Google Scholar 

  • Vieira LN (2014) Indices of cognitive effort in machine translation post-editing. Mach Transl 28(3–4):187–216

    Article  Google Scholar 

  • Vieira LN (2016) How do measures of cognitive effort relate to each other? A multivariate analysis of post-editing process data. Mach Transl 30(1–2):41–62

    Article  Google Scholar 

  • Villarejo MV, Zapirain BG, Zorrilla AM (2012) A stress sensor based on galvanic skin response (GSR) controlled by ZigBee. Sensors 12(5):6075–6101

    Article  Google Scholar 

  • Yamakoshi T, Yamakoshi K, Tanaka S, Nogawa M, Park SB, Shibata M, Sawada Y, Rolfe P, Hirose Y (2008) Feasibility study on driver’s stress detection from differential skin temperature measurement. In: Engineering in medicine and biology society, IEEE, pp 1076–1079

  • Zampieri M, Vela M (2014) Quantifying the influence of MT output in the translators’ performance: A case study in technical translation. In: Proceedings of the EACL workshop on humans and computer-assisted translation, pp 93–98

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nico Herbig.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This research was funded in part by the Deutsche Forschungsgemeinschaft (DFG) under Grant No. GE 2819/2-1 / AOBJ: 636684. The responsibility lies with the authors. We further want to thank the reviewers and editors for their very valuable feedback.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Herbig, N., Pal, S., Vela, M. et al. Multi-modal indicators for estimating perceived cognitive load in post-editing of machine translation. Machine Translation 33, 91–115 (2019). https://doi.org/10.1007/s10590-019-09227-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10590-019-09227-8

Keywords

Navigation