A Speech-to-Speech, Machine Translation Mediated Map Task: An Exploratory Study

Cerrato, Loredana; Akira, Hayakawa; Campbell, Nick; Luz, Saturnino

doi:10.1007/978-3-319-33500-1_5

Loredana Cerrato¹⁶,
Hayakawa Akira¹⁶,
Nick Campbell¹⁶ &
…
Saturnino Luz¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9577))

Included in the following conference series:

International Workshop on Future and Emerging Trends in Language Technology

598 Accesses

Abstract

The aim of this study is to investigate how the language technologies of Automatic Speech Recognition (ASR), Machine Translation (MT), and Text To Speech (TTS) synthesis affect users during an interlingual interaction. In this paper, we describe the prototype system used for the data collection, we give details of the collected data and report the results of a usability test run to assess how the users of the interlingual system evaluate the interactions in a collaborative map task. We use widely adopted usability evaluation measures: ease of use, effectiveness and users satisfaction, and look at both qualitative and quantitative measures. Results indicate that both users taking part in the dialogues (instructions giver and follower) found the system similarly satisfactory in terms of ease of learning, ease of use, and pleasantness, even if they were less satisfied with its effectiveness in supporting the task. Users employed different strategies in order to adapt to the shortcomings of the technology, such as hyper-articulation, and rewording of utterances in relation to error of the ASR. We also report the results of a comparison of the map task in two different settings – one that includes a constant video stream (“video-on”) and one that does not (“no-video.”) Surprisingly, users rated the no-video setting consistently better.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Alexa vs. Siri vs. Cortana vs. Google Assistant: A Comparison of Speech-Based Natural User Interfaces

Real-Time Machine-Translated Instant Messaging: A Brief Overview with Implications for Translator Training

Field Experiment System “VoiceTra”

Notes

1.
Biosignal data is not used in this study.
2.
Eye tracking data is not used in this study.
3.
It is interesting to note that, as indicated later in this paper under Sect. 3.1, the participants use the word “translation” to describe the ASR results or TTS output.
4.
Reduced to 17 % when outlier dialogue pair (giver : follower = 199 : 60) was removed.
5.
Calculated using the modified kappa feature of ELAN 4.9.0’s “Inter-Annotator Reliability...” function.

References

Allwood, J., Cerrato, L., Jokinen, K., Navarretta, C., Paggio, P.: The MUMIN coding scheme for the annotation of feedback, turn management and sequencing phenomena. Lang. Resour. Eval. 41(3–4), 273–287 (2007)
Article Google Scholar
Anderson, A.H., Bader, M., Bard, E.G., Boyle, E., Doherty, G., Garrod, S., Isard, S., Kowtko, J., McAllister, J., Miller, J., et al.: The HCRC map task corpus. Lang. Speech 34(4), 351–366 (1991)
Google Scholar
Carletta, J., Isard, S., Doherty-Sneddon, G., Isard, A., Kowtko, J.C., Anderson, A.H.: The reliability of a dialogue structure coding scheme. Comput. Linguist. 23(1), 13–31 (1997)
Google Scholar
Finn, K.E., Sellen, A.J., Wilbur, S.B. (eds.): Video-Mediated Communication. Lawrence Erlbaum Associates Inc., Hillsdale (1997)
Google Scholar
Hayakawa, A., Cerrato, L., Campbell, N., Luz, S.: Detection of cognitive states and their correlation to speech recognition performance in speech-to-speech machine translation systems. In: Proceedings of INTERSPEECH 2015, pp. 2539–2543. ISCA, Dresden (2015)
Google Scholar
Hayakawa, A., Cerrato, L., Campbell, N., Luz, S.: A study of prosodic alignment in interlingual map-task dialogues. In: Proceedings of ICPhS XVIII. No. 0760, University of Glasgow, Glasgow, United Kingdom (2015)
Google Scholar
Henrichsen, P.J., Allwood, J.: Predicting the attitude flow in dialogue based on multi-modal speech cues. In: NEALT PROCEEDINGS SERIES (2012)
Google Scholar
Kane, B., Luz, S.: Probing the use and value of video for multi-disciplinary medical teams in teleconference. In: Proceedings of CBMS 2006, pp. 518–523. IEEE Computer Society, Salt Lake City (2006)
Google Scholar
Lavie, A., Metze, F., Cattoni, R., Costantini, E.: A multi-perspective evaluation of the NESPOLE!: speech-to-speech translation system. In: Proceedings of the ACL-02 Workshop on Speech-to-Speech Translation: Algorithms and Systems, vol. 7, pp. 121–128. Association for Computational Linguistics, Philadelphia (2002)
Google Scholar
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Sov. Phys. Dokl. 10, 707–710 (1966)
MathSciNet MATH Google Scholar
Mariani, J.: Spoken language processing and multimodal communication: a view from Europe. In: Plenary Talk, NSF Workshop on Human-centered Systems: Information, Interactivity, and Intelligence (HCS), Arlington, VA, USA (1997)
Google Scholar
Newlands, A., Anderson, A.H., Mullin, J.: Adapting communicative strategies to computer-mediated communication: an analysis of task performance and dialogue structure. Appl. Cogn. Psychol. 17(3), 325–348 (2003)
Article Google Scholar
Popescu-Belis, A.: Dialogue acts: one or more dimensions. ISSCO WorkingPaper 62 (2005)
Google Scholar
Sjölander, K., Beskow, J.: Wavesurfer - an open source speech tool. In: Proceedings of INTERSPEECH 2000, pp. 464–467. ISCA, Beijing (2000)
Google Scholar
The-AMI-Emotion-Annotation-Subgroup: Coding guidelines for affect annotation of the ami corpus. http://groups.inf.ed.ac.uk/ami/corpus/Guidelines/EmotionAnnotationManual-v1.0.pdf, no institute given
Wittenburg, P., Brugman, H., Russel, A., Klassmann, A., Sloetjes, H.: ELAN: a professional framework for multimodality research. In: Proceedings of LREC 2006, Genoa, Italy, pp. 1556–1559 (2006)
Google Scholar

Download references

Acknowledgments

This research is supported by Science Foundation Ireland through the CNGL Programme (Grant 12/CE/I2267) in the ADAPT Centre (www.adaptcentre.ie) at Trinity College, Dublin.

Author information

Authors and Affiliations

ADAPT Centre, School of Computer Science and Statistics, Trinity College Dublin, Dublin, Ireland
Loredana Cerrato, Hayakawa Akira & Nick Campbell
Usher Institute of Population Health Sciences and Informatics, University of Edinburgh, Edinburgh, UK
Saturnino Luz

Authors

Loredana Cerrato
View author publications
You can also search for this author in PubMed Google Scholar
Hayakawa Akira
View author publications
You can also search for this author in PubMed Google Scholar
Nick Campbell
View author publications
You can also search for this author in PubMed Google Scholar
Saturnino Luz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Loredana Cerrato .

Editor information

Editors and Affiliations

University of Seville, Seville, Spain
José F. Quesada
University of Seville, Seville, Spain
Francisco-Jesús Martín Mateos
University of Seville, Seville, Spain
Teresa Lopez-Soto

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cerrato, L., Akira, H., Campbell, N., Luz, S. (2016). A Speech-to-Speech, Machine Translation Mediated Map Task: An Exploratory Study. In: Quesada, J., Martín Mateos, FJ., Lopez-Soto, T. (eds) Future and Emergent Trends in Language Technology. FETLT 2015. Lecture Notes in Computer Science(), vol 9577. Springer, Cham. https://doi.org/10.1007/978-3-319-33500-1_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-33500-1_5
Published: 26 April 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-33499-8
Online ISBN: 978-3-319-33500-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics