Abstract
In this paper, we develop a model that uses a wide range of physiological and behavioral sensor data to estimate perceived cognitive load (CL) during post-editing (PE) of machine translated (MT) text. By predicting the subjectively reported perceived CL, we aim to quantify the extent of demands placed on the mental resources available during PE. This could for example be used to better capture the usefulness of MT proposals for PE, including the mental effort required, in contrast to the mere closeness to a reference perspective that current MT evaluation focuses on. We compare the effectiveness of our physiological and behavioral features individually and in combination with each other and with the more traditional text and time features relevant to the task. Many of the physiological and behavioral features have not previously been applied to PE. Based on the data gathered from ten participants, we show that our multi-modal measurement approach outperforms all baseline measures in terms of predicting the perceived level of CL as measured by a psychological scale. Combinations of eye-, skin-, and heart-based indicators enhance the results over each individual measure. Additionally, adding PE time improves the regression results further. An investigation of correlations between the best performing features, including sensor features previously unexplored in PE, and the corresponding subjective ratings indicates that the multi-modal approach takes advantage of several weakly to moderately correlated features to combine them into a stronger model.
Similar content being viewed by others
Notes
The study was approved by the university’s ethical review board and the data protection officer.
As \(\mathrm{TER}\) intervals we used [35–50], [60–70], and [80–95].
References
Arshad S, Wang Y, Chen F (2013) Analysing mouse activity for cognitive load detection. In: Proceedings of the 25th Australian computer-human interaction conference: augmentation, application, innovation, collaboration, ACM, pp 115–118
Asteriadis S, Tzouveli P, Karpouzis K, Kollias S (2009) Estimation of behavioral user state based on eye gaze and head pose—application in an e-learning environment. Multimed Tools Appl 41(3):469–493
Callison-Burch C, Koehn P, Monz C, Peterson K, Przybocki M, Zaidan OF (2010) Findings of the 2010 joint workshop on statistical machine translation and metrics for machine translation. In: Proceedings of the joint fifth workshop on statistical machine translation and metrics, association for computational linguistics, pp 17–53
Chanel G, Rebetez C, Bétrancourt M, Pun T (2008) Boredom, engagement and anxiety as indicators for adaptation to difficulty in games. In: Proceedings of the 12th international conference on entertainment and media in the ubiquitous era, ACM, pp 13–17
Chen F, Zhou J, Wang Y, Yu K, Arshad SZ, Khawaji A, Conway D (2016) Robust multimodal cognitive load measurement. Springer, New York
Chen S, Epps J (2013) Automatic classification of eye activity for cognitive load measurement with emotion interference. Comput Methods Prog Biomed 110(2):111–124
Corder GW (2009) Nonparametric statistics for non-statisticians: a step-by-step approach. Wiley, New York
Demberg V, Sayeed A (2016) The frequency of rapid pupil dilations as a measure of linguistic processing difficulty. PLoS ONE 11(1):1–29
Dietterich TG (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10(7):1895–1923
Doherty S, O’Brien S, Carl M (2010) Eye tracking as an MT evaluation technique. Mach Transl 24(1):1–13
Gehring J, Auli M, Grangier D, Yarats D, Dauphin YN (2017) Convolutional sequence to sequence learning. arXiv
Goldberg JH, Kotval XP (1999) Computer interface evaluation using eye movements: methods and constructs. Int J Ind Ergon 24(6):631–645
Guerberof A (2009) Productivity and quality in the post-editing of outputs from translation memories and machine translation. Int J Localiz 7(1):11–21
Hart SG, Staveland LE (1988) Development of NASA-TLX (Task Load Index): results of empirical and theoretical research. In: Advances in psychology, vol 52, Elsevier, Amsterdam, pp 139–183
Hockey GRJ (1997) Compensatory control in the regulation of human performance under stress and high workload: a cognitive-energetical framework. Biol Psychol 45(1):73–93
Hosseini SA, Khalilzadeh MA (2010) Emotional stress recognition system using EEG and psychophysiological signals: Using new labelling process of EEG signals in emotional stress state. In: International conference on biomedical engineering and computer science, IEEE, pp 1–6
Iqbal ST, Zheng XS, Bailey BP (2004) Task-evoked pupillary response to mental workload in human-computer interaction. In: Extended abstracts on human factors in computing systems, ACM, pp 1477–1480
Kahou SE, Bouthillier X, Lamblin P, Gulcehre C, Michalski V, Konda K, Jean S, Froumenty P, Dauphin Y, Boulanger-Lewandowski N (2016) Emonets: multimodal deep learning approaches for emotion recognition in video. J Multimodal User Interfaces 10(2):99–111
Koponen M (2012) Comparing human perceptions of post-editing effort with post-editing operations. In: Proceedings of the seventh workshop on statistical machine translation, association for computational linguistics, pp 181–190
Koponen M (2016) Is machine translation post-editing worth the effort? A survey of research into post-editing and effort. J Specialised Transl 25:131–148
Koponen M, Aziz W, Ramos L, Specia L (2012) Post-editing time as a measure of cognitive effort. In: AMTA workshop on post-editing technology and practice, pp 11–20
Kramer AF (1991) Physiological metrics of mental workload: A review of recent progress. Multiple-task performance pp 279–328
Krings HP (2001) Repairing texts: empirical investigations of machine translation post-editing processes, vol 5. Kent State University Press, Kent
Kruger JL, Doherty S (2016) Measuring cognitive load in the presence of educational video: towards a multimodal methodology. Aust J Educ Technol 32(6):19
Kruger JL, Doherty S, Fox W, De Lissa P (2018) Multimodal measurement of cognitive load during subtitle processing. Innovation and expansion in translation process research, p 267
Lacruz I, Shreve GM (2014) Pauses and cognitive effort in post-editing. Post-editing of machine translation: processes and applications, p 246
Lacruz I, Shreve GM, Angelone E (2012) Average pause ratio as an indicator of cognitive effort in post-editing: a case study. In: AMTA workshop on post-editing technology and practice, pp 21–30
Lafferty JD, McCallum A, Pereira FCN (2001) Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the eighteenth international conference on machine learning, pp 282–289
Lavie A, Agarwal A (2007) Meteor: an automatic metric for MT evaluation with high levels of correlation with human judgments. In: Proceedings of the second workshop on statistical machine translation
Lin CY, Och FJ (2004) Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics. In: Proceedings of the 42nd annual meeting of the association for computational linguistics
Mack DJ, Belfanti S, Schwarz U (2017) The effect of sampling rate and lowpass filters on saccades-a modeling approach. Behav Res Methods 49(6):2146–2162
Mellinger CD (2014) Computer-assisted translation: an empirical investigation of cognitive effort. Kent State University, Kent
Moorkens J, O’Brien S, da Silva IA, de Lima Fonseca NB, Alves F (2015) Correlations of perceived post-editing effort with measurements of actual effort. Mach Transl 29(3–4):267–284
Mulder L (1992) Measurement and analysis methods of heart rate and respiration for use in applied environments. Biol Psychol 34(2):205–236
O’Brien S (2005) Methodologies for measuring the correlations between post-editing effort and machine translatability. Mach Transl 19(1):37–58
O’Brien S (2006) Eye-tracking and translation memory matches. Perspectives 14(3):185–205
O’Brien S (2006b) Pauses as indicators of cognitive effort in post-editing machine translation output. Across Lang Cult 7(1):1–21
Paas F, Tuovinen JE, Tabbers H, Van Gerven PW (2003) Cognitive load measurement as a means to advance cognitive load theory. Educ Psychol 38(1):63–71
Paas FG, Van Merriënboer JJ (1994) Instructional control of cognitive load in the training of complex cognitive tasks. Educ Psychol Rev 6(4):351–371
Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the ACL, pp 311–318
Popovic M, Lommel A, Burchardt A, Avramidis E, Uszkoreit H (2014) Relations between different types of post-editing operations, cognitive effort and temporal effort. In: Proceedings of the 17th annual conference of the european association for machine translation, pp 191–198
Rowe DW, Sibert J, Irwin D (1998) Heart rate variability: indicator of user state as an aid to human-computer interaction. In: Proceedings of the conference on human factors in computing systems, pp 480–487
Sennrich R, Haddow B, Birch A (2016) Neural machine translation of rare words with subword units. In: Proceedings of the 54th annual meeting of the association for computational linguistics
Sennrich R, Birch A, Currey A, Germann U, Haddow B, Heafield K, Miceli Barone AV, Williams P (2017) The University of Edinburgh’s neural MT systems for WMT17. In: Proceedings of the second conference on machine translation, vol 2. Shared Task Papers, pp 389–399
Shi Y, Ruiz N, Taib R, Choi E, Chen F (2007) Galvanic skin response (GSR) as an index of cognitive load. In: Extended abstracts on human factors in computing systems, pp 2651–2656
Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: Proceedings of the association for machine translation in the Americas, pp 223–231
Snover M, Madnani N, Dorr B, Schwartz R (2009) Fluency, adequacy, or HTER? Exploring different human judgments with a tunable MT metric. In: Proceedings of the 4th workshop on statistical machine translation, pp 259–268
Solovey E, Schermerhorn P, Scheutz M, Sassaroli A, Fantini S, Jacob R (2012) Brainput: enhancing interactive systems with streaming fNIRS brain input. In: Proceedings of the conference on human factors in computing systems, ACM, pp 2193–2202
Soukupova T, Cech J (2016) Real-time eye blink detection using facial landmarks. In: 21st computer vision winter workshop, pp 1–8
Specia L, Raj D, Turchi M (2010) Machine translation evaluation versus quality estimation. Mach Transl 24(1):39–50
Stuyven E, Van der Goten K, Vandierendonck A, Claeys K, Crevits L (2000) The effect of cognitive load on saccadic eye movements. Acta Psychologica 104(1):69–85
Sweller J (1988) Cognitive load during problem solving: effects on learning. Cognit Sci 12(2):257–285
Sweller J, Van Merrienboer JJ, Paas FG (1998) Cognitive architecture and instructional design. Educ Psychol Rev 10(3):251–296
Tatsumi M (2009) Correlation between automatic evaluation metric scores, post-editing speed, and some other factors. The twelfth machine translation summit, pp 332–339
Temnikova IP (2010) Cognitive evaluation approach for a controlled language post-editing experiment. In: Proceedings of the international conference on language resources and evaluation
Van Orden KF, Limbert W, Makeig S, Jung TP (2001) Eye activity correlates of workload during a visuospatial memory task. Hum Factors 43(1):111–121
Vieira LN (2014) Indices of cognitive effort in machine translation post-editing. Mach Transl 28(3–4):187–216
Vieira LN (2016) How do measures of cognitive effort relate to each other? A multivariate analysis of post-editing process data. Mach Transl 30(1–2):41–62
Villarejo MV, Zapirain BG, Zorrilla AM (2012) A stress sensor based on galvanic skin response (GSR) controlled by ZigBee. Sensors 12(5):6075–6101
Yamakoshi T, Yamakoshi K, Tanaka S, Nogawa M, Park SB, Shibata M, Sawada Y, Rolfe P, Hirose Y (2008) Feasibility study on driver’s stress detection from differential skin temperature measurement. In: Engineering in medicine and biology society, IEEE, pp 1076–1079
Zampieri M, Vela M (2014) Quantifying the influence of MT output in the translators’ performance: A case study in technical translation. In: Proceedings of the EACL workshop on humans and computer-assisted translation, pp 93–98
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This research was funded in part by the Deutsche Forschungsgemeinschaft (DFG) under Grant No. GE 2819/2-1 / AOBJ: 636684. The responsibility lies with the authors. We further want to thank the reviewers and editors for their very valuable feedback.
Rights and permissions
About this article
Cite this article
Herbig, N., Pal, S., Vela, M. et al. Multi-modal indicators for estimating perceived cognitive load in post-editing of machine translation. Machine Translation 33, 91–115 (2019). https://doi.org/10.1007/s10590-019-09227-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10590-019-09227-8