Voice interaction on TV: analysis of natural language interaction models and recommendations for voice user interfaces

Santos, Rita; Abreu, Jorge; Beça, Pedro; Rodrigues, Ana; Fernandes, Sílvia

doi:10.1007/s11042-020-08710-2

Voice interaction on TV: analysis of natural language interaction models and recommendations for voice user interfaces

Published: 18 February 2020

Volume 79, pages 35689–35716, (2020)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Rita Santos ORCID: orcid.org/0000-0001-9741-6210¹,
Jorge Abreu²,
Pedro Beça²,
Ana Rodrigues² &
…
Sílvia Fernandes²

617 Accesses
2 Citations
Explore all metrics

Abstract

The goal of this study was to perform an evaluation of a set of voice interaction models (supported by a hands-free solution activated by a wake-up word, a mobile app and a TV remote control with microphone) to identify the most appropriate solution for interactive television. The research addressed issues associated with natural language systems such as usability, interaction and privacy perception, and aimed to analyze the strengths and limitations of the voice interaction models. On a first evaluation approach, a prototype based on a Wizard-of-Oz methodology was used, while a second approach was based on a functional prototype. The preferred interaction model was the hands-free solution activated by a wake-up word because it was easy to use and raised the least difficulties in any task execution. Despite this result, the other two models are not disregarded for a future voice interaction system in television. The TV remote control was the most natural way of interaction for the study’s participants. The need for control provided by the remote and by the app makes the participants feel like these grant more privacy. Participants considered that a voice-operated system for TV would be very useful and almost all were receptive to having such a system at home. Lastly, based on commercial standards and guidelines, solutions to issues identified by participants in the visual interface of the TV system were proposed and considered for the next phase of prototype development, also benefiting other researches in the field.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Review of Voice User Interfaces for Interactive TV

Comparing the User Preferences Towards Emotional Voice Interaction Applied on Different Devices: An Empirical Study

Sound as an Interface, Methods to Evaluate Voice User Interface (VUI) Experiences in Various Contexts

Notes

All the TV images used in this paper come from our partner and TV operator Altice Labs.
The original interfaces were designed in Portuguese for testing purposes. However, to improve the understanding of this section, new interfaces are presented with placeholders in English.

References

Abreu J, Beça P, Santos R, Cardoso B, Fernandes S, & Rodrigues A (2018) Voice interaction on TV: analysis of natural language interaction models. Proceedings of the XIX International Conference on Human Computer Interaction (pp. 8:1--8:8). New York, NY, USA: ACM. https://doi.org/10.1145/3233824.3233853
Alexa Design Guide (2019) Voice design best practices (legacy). Retrieved February 15, 2019, from https://developer.amazon.com/docs/custom-skills/voice-design-best-practices-legacy.html. Accessed 15 Feb 2019
Amazon Alexa Best Practices (Legacy) (2019) Custom Skills. Retrieved February 15, 2019, from https://developer.amazon.com/docs/custom-skills/voice-design-best-practices-legacy.html. Accessed 15 Feb 2019
Archer J. (2013) LG Smart TV - Voice recognition and content discovery. Retrieved from http://www.trustedreviews.com/lg-smart-tv-review-voice-recognition-and-recommendations-page-2. Accessed 15 Feb 2019
Bangor A, Kortum P, Miller J (2009) Determining what individual SUS scores mean: adding an adjective rating scale. Journal of Usability Studies 4(3):114–123. 66.39.39.113
Google Scholar
Bernhaupt R, Boutonnet M, Gatellier B, Gimenez Y, Pouchepanadin C, & Souiba, L. (2012) A set of recommendations for the control of IPTV-systems via smart phones based on the understanding of users practices and needs. https://doi.org/10.1145/2325616.2325645
Bernhaupt R, Drouet D, Manciet F, Pirker M, & Pottier G (2017) Using Speech to search: Comparing built-in and ambient speech search in terms of privacy and user experience. Retrieved from https://www.ibc.org/download?ac=3894. Accessed 15 Feb 2019
Brooke J (1996) SUS - a quick and dirty usability scale. Usability Evaluation in Industry 189(194):4–7. https://doi.org/10.1002/hbm.20701
Article Google Scholar
Cadwalladr C, Graham-Harrison E (2018) Revealed: 50 million Facebook profiles harvested for Cambridge Analytica in major data breach | News | The Guardian. Retrieved February 15, 2019, from https://www.theguardian.com/news/2018/mar/17/cambridge-analytica-facebook-influence-us-election. Accessed 15 Feb 2019
Corpuz J (2018) Best android remote apps 2018 - control your TV, PC or smart devices. Retrieved February 14, 2019, from https://www.tomsguide.com/us/pictures-story/494-android-tv-remote-apps.html#s1. Accessed 15 Feb 2019
Cutsinger P (2018) How Building for Voice Differs from Building for the Screen: Individualize Your Entire Interaction: Alexa Blogs. Retrieved February 15, 2019, from https://developer.amazon.com/blogs/alexa/post/7092d81b-f57e-4a52-997f-21e61983eb55/how-building-for-voice-differs-from-building-for-the-screen-individualize-your-entire-interaction. Accessed 15 Feb 2019
DECO (2014) Comandar a televisão por voz e movimento não dispensa comando remoto. Retrieved March 23, 2018, from https://www.deco.proteste.pt/tecnologia/televisores/noticias/comandar-a-televisao-por-voz-e-movimento-nao-dispensa-comando-remoto. Accessed 15 Feb 2019
Elder H a (1970) On the feasibility of voice input to an on-line computer processing system. Commun ACM 13(6):339–346. https://doi.org/10.1145/362384.362387
Article MATH Google Scholar
Furnas GW, Landauer TK, Gomez LM, Dumais ST (1987) The vocabulary problem in human-system communication. Commun ACM 30(11):964–971. https://doi.org/10.1145/32206.32212
Article Google Scholar
Giangola J (2017) Conversation design: speaking the same language - library - Google design. Retrieved from https://design.google/library/conversation-design-speaking-same-language/%0A. Accessed 15 Feb 2019
Giles (2017) What will the TV of Tomorrow look like? – W12 Studios – Medium. Retrieved January 22, 2019, from https://medium.com/w12studios/what-will-the-tv-of-tomorrow-look-like-cd61029380e8. Accessed 15 Feb 2019
Goto J, Kim Y-B, Strl N, Miyazaki M, Komine K, & Uratani N (2004) A spoken dialogue interface for TV operations based on data collected by using WOZ method. Retrieved from https://pdfs.semanticscholar.org/c8dd/1235fbd0f336a1a1d7f2c6eb4614f15fbb90.pdf. Accessed 15 Feb 2019
Ismail A (2018) The 5 Best Apps for Controlling Your TV | Digital Trends. Retrieved February 14, 2019, from https://www.digitaltrends.com/mobile/best-tv-remote-apps/. Accessed 15 Feb 2019
Kishore A (2016) Use a smartphone as a remote for your TV, Set-top box or console. Retrieved February 14, 2019, from https://www.online-tech-tips.com/gadgets/use-your-smartphone-as-a-remote-control-for-your-tv/. Accessed 15 Feb 2019
Mortensen D (2018). How to design voice user interfaces. Retrieved February 15, 2019, from https://www.interaction-design.org/literature/article/how-to-design-voice-user-interfaces. Accessed 15 Feb 2019
Pasztor D (2017) Combining graphical and voice interfaces for a better user experience — Smashing Magazine. Retrieved February 15, 2019, from https://www.smashingmagazine.com/2017/10/combining-graphical-voice-interfaces/. Accessed 15 Feb 2019
Pearl C (2017) Designing voice user interfaces: principles of conversational experiences. O'Reilly, Beijing. Accessed 15 Feb 2019
Samsung (2014) Voice control. Retrieved February 15, 2019, from http://www.samsung.com/ph/smarttv/voice_control.html. Accessed 15 Feb 2019
Seifert D (2018) Amazon fire TV cube review: a smarter streaming box - The Verge. Retrieved June 30, 2018, from https://www.theverge.com/2018/6/21/17484412/amazon-fire-tv-cube-review-alexa-echo. Accessed 15 Feb 2019
Spiliotopoulos D, Stavropoulou P, Kouroupetroglou G (2009) Spoken dialogue interfaces: integrating usability. In: Holzinger A, Miesenberger K (eds) HCI and usability for e-inclusion: 5th Symposium of the workgroup human-computer interaction and usability engineering of the Austrian computer society, USAB 2009, Linz, Austria, November 9–10, 2009 proceedings. Springer Berlin Heidelberg, Berlin, Heidelberg, pp 484–499. https://doi.org/10.1007/978-3-642-10308-7_36
Chapter Google Scholar
TIVO (2016) Q4 2016 video trends report. Retrieved from https://pt.slideshare.net/shurm/q4-2016-video-trends-report. Accessed 15 Feb 2019
Turunen M, Melto A, Hella J, Heimonen T, Hakulinen J, Mäkinen E, Laivo T, Soronen H (2009) User expectations and user experience with different modalities in a mobile phone-controlled home entertainment system. In with Mobile Devices (pp. 1–4). New York, NY, USA: ACM. https://doi.org/10.1145/1613858.1613898
Ward N, Rivera AG, Ward K, Novick DG (2005) Some usability issues and research priorities in spoken dialog applications, departmental technical reports (CS). Paper 253. http://digitalcommons.utep.edu/cs_techrep/253. Accessed 15 Feb 2019
Whitenton K (2017) Voice First: The Future of Interaction?. Retrieved January 20, 2018, from https://www.nngroup.com/articles/voice-first/. Accessed 15 Feb 2019
Whitenton K (2017) Audio signifiers for voice interaction. Retrieved January 20, 2018, from https://www.nngroup.com/articles/audio-signifiers-voice-interaction/?utm_source=Alertbox&utm_campaign=0741ff983b-audiosignifiers_dontvalidatedesign_2017_09_11&utm_medium=email&utm_term=0_7f29a2b335-0741ff983b-24092741. Accessed 15 Feb 2019
William L, Holden K, Butler J (2003) Universal principles of design. Rockport Publishers, Gloucester
Google Scholar
Yankelovich N, Levow G-A, & Marx M (n.d.) Designing speech acts: issues in speech user interfaces. Retrieved from https://www.media.mit.edu/speech/papers/1995/yankelovich_CHI95_speechacts.pdf. Accessed 15 Feb 2019

Download references

Acknowledgements

This paper is a result of the CHIC – Cooperative Holistic for Internet and Content project (grant agreement number 24498), funded by COMPETE 2020 and Portugal 2020 through the European Regional Development Fund (FEDER).

Author information

Authors and Affiliations

Digimedia, Águeda School of Technology and Management, University of Aveiro, 3754–909, Águeda, Portugal
Rita Santos
Digimedia, Communication and Art Department, University of Aveiro, Campus Universitário de Santiago, 3810-193, Aveiro, Portugal
Jorge Abreu, Pedro Beça, Ana Rodrigues & Sílvia Fernandes

Authors

Rita Santos
View author publications
You can also search for this author in PubMed Google Scholar
Jorge Abreu
View author publications
You can also search for this author in PubMed Google Scholar
Pedro Beça
View author publications
You can also search for this author in PubMed Google Scholar
Ana Rodrigues
View author publications
You can also search for this author in PubMed Google Scholar
Sílvia Fernandes
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rita Santos.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Santos, R., Abreu, J., Beça, P. et al. Voice interaction on TV: analysis of natural language interaction models and recommendations for voice user interfaces. Multimed Tools Appl 79, 35689–35716 (2020). https://doi.org/10.1007/s11042-020-08710-2

Download citation

Received: 15 July 2019
Revised: 20 December 2019
Accepted: 28 January 2020
Published: 18 February 2020
Issue Date: December 2020
DOI: https://doi.org/10.1007/s11042-020-08710-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Voice interaction on TV: analysis of natural language interaction models and recommendations for voice user interfaces

Abstract

Access this article

Similar content being viewed by others

A Review of Voice User Interfaces for Interactive TV

Comparing the User Preferences Towards Emotional Voice Interaction Applied on Different Devices: An Empirical Study

Sound as an Interface, Methods to Evaluate Voice User Interface (VUI) Experiences in Various Contexts

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Voice interaction on TV: analysis of natural language interaction models and recommendations for voice user interfaces

Abstract

Access this article

Similar content being viewed by others

A Review of Voice User Interfaces for Interactive TV

Comparing the User Preferences Towards Emotional Voice Interaction Applied on Different Devices: An Empirical Study

Sound as an Interface, Methods to Evaluate Voice User Interface (VUI) Experiences in Various Contexts

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation