Skip to main content
Log in

Voice interaction on TV: analysis of natural language interaction models and recommendations for voice user interfaces

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The goal of this study was to perform an evaluation of a set of voice interaction models (supported by a hands-free solution activated by a wake-up word, a mobile app and a TV remote control with microphone) to identify the most appropriate solution for interactive television. The research addressed issues associated with natural language systems such as usability, interaction and privacy perception, and aimed to analyze the strengths and limitations of the voice interaction models. On a first evaluation approach, a prototype based on a Wizard-of-Oz methodology was used, while a second approach was based on a functional prototype. The preferred interaction model was the hands-free solution activated by a wake-up word because it was easy to use and raised the least difficulties in any task execution. Despite this result, the other two models are not disregarded for a future voice interaction system in television. The TV remote control was the most natural way of interaction for the study’s participants. The need for control provided by the remote and by the app makes the participants feel like these grant more privacy. Participants considered that a voice-operated system for TV would be very useful and almost all were receptive to having such a system at home. Lastly, based on commercial standards and guidelines, solutions to issues identified by participants in the visual interface of the TV system were proposed and considered for the next phase of prototype development, also benefiting other researches in the field.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22

Similar content being viewed by others

Notes

  1. All the TV images used in this paper come from our partner and TV operator Altice Labs.

  2. The original interfaces were designed in Portuguese for testing purposes. However, to improve the understanding of this section, new interfaces are presented with placeholders in English.

References

  1. Abreu J, Beça P, Santos R, Cardoso B, Fernandes S, & Rodrigues A (2018) Voice interaction on TV: analysis of natural language interaction models. Proceedings of the XIX International Conference on Human Computer Interaction (pp. 8:1--8:8). New York, NY, USA: ACM. https://doi.org/10.1145/3233824.3233853

  2. Alexa Design Guide (2019) Voice design best practices (legacy). Retrieved February 15, 2019, from https://developer.amazon.com/docs/custom-skills/voice-design-best-practices-legacy.html. Accessed 15 Feb 2019

  3. Amazon Alexa Best Practices (Legacy) (2019) Custom Skills. Retrieved February 15, 2019, from https://developer.amazon.com/docs/custom-skills/voice-design-best-practices-legacy.html. Accessed 15 Feb 2019

  4. Archer J. (2013) LG Smart TV - Voice recognition and content discovery. Retrieved from http://www.trustedreviews.com/lg-smart-tv-review-voice-recognition-and-recommendations-page-2. Accessed 15 Feb 2019

  5. Bangor A, Kortum P, Miller J (2009) Determining what individual SUS scores mean: adding an adjective rating scale. Journal of Usability Studies 4(3):114–123. 66.39.39.113

    Google Scholar 

  6. Bernhaupt R, Boutonnet M, Gatellier B, Gimenez Y, Pouchepanadin C, & Souiba, L. (2012) A set of recommendations for the control of IPTV-systems via smart phones based on the understanding of users practices and needs. https://doi.org/10.1145/2325616.2325645

  7. Bernhaupt R, Drouet D, Manciet F, Pirker M, & Pottier G (2017) Using Speech to search: Comparing built-in and ambient speech search in terms of privacy and user experience. Retrieved from https://www.ibc.org/download?ac=3894. Accessed 15 Feb 2019

  8. Brooke J (1996) SUS - a quick and dirty usability scale. Usability Evaluation in Industry 189(194):4–7. https://doi.org/10.1002/hbm.20701

    Article  Google Scholar 

  9. Cadwalladr C, Graham-Harrison E (2018) Revealed: 50 million Facebook profiles harvested for Cambridge Analytica in major data breach | News | The Guardian. Retrieved February 15, 2019, from https://www.theguardian.com/news/2018/mar/17/cambridge-analytica-facebook-influence-us-election. Accessed 15 Feb 2019

  10. Corpuz J (2018) Best android remote apps 2018 - control your TV, PC or smart devices. Retrieved February 14, 2019, from https://www.tomsguide.com/us/pictures-story/494-android-tv-remote-apps.html#s1. Accessed 15 Feb 2019

  11. Cutsinger P (2018) How Building for Voice Differs from Building for the Screen: Individualize Your Entire Interaction: Alexa Blogs. Retrieved February 15, 2019, from https://developer.amazon.com/blogs/alexa/post/7092d81b-f57e-4a52-997f-21e61983eb55/how-building-for-voice-differs-from-building-for-the-screen-individualize-your-entire-interaction. Accessed 15 Feb 2019

  12. DECO (2014) Comandar a televisão por voz e movimento não dispensa comando remoto. Retrieved March 23, 2018, from https://www.deco.proteste.pt/tecnologia/televisores/noticias/comandar-a-televisao-por-voz-e-movimento-nao-dispensa-comando-remoto. Accessed 15 Feb 2019

  13. Elder H a (1970) On the feasibility of voice input to an on-line computer processing system. Commun ACM 13(6):339–346. https://doi.org/10.1145/362384.362387

    Article  MATH  Google Scholar 

  14. Furnas GW, Landauer TK, Gomez LM, Dumais ST (1987) The vocabulary problem in human-system communication. Commun ACM 30(11):964–971. https://doi.org/10.1145/32206.32212

    Article  Google Scholar 

  15. Giangola J (2017) Conversation design: speaking the same language - library - Google design. Retrieved from https://design.google/library/conversation-design-speaking-same-language/%0A. Accessed 15 Feb 2019

  16. Giles (2017) What will the TV of Tomorrow look like? – W12 Studios – Medium. Retrieved January 22, 2019, from https://medium.com/w12studios/what-will-the-tv-of-tomorrow-look-like-cd61029380e8. Accessed 15 Feb 2019

  17. Goto J, Kim Y-B, Strl N, Miyazaki M, Komine K, & Uratani N (2004) A spoken dialogue interface for TV operations based on data collected by using WOZ method. Retrieved from https://pdfs.semanticscholar.org/c8dd/1235fbd0f336a1a1d7f2c6eb4614f15fbb90.pdf. Accessed 15 Feb 2019

  18. Ismail A (2018) The 5 Best Apps for Controlling Your TV | Digital Trends. Retrieved February 14, 2019, from https://www.digitaltrends.com/mobile/best-tv-remote-apps/. Accessed 15 Feb 2019

  19. Kishore A (2016) Use a smartphone as a remote for your TV, Set-top box or console. Retrieved February 14, 2019, from https://www.online-tech-tips.com/gadgets/use-your-smartphone-as-a-remote-control-for-your-tv/. Accessed 15 Feb 2019

  20. Mortensen D (2018). How to design voice user interfaces. Retrieved February 15, 2019, from https://www.interaction-design.org/literature/article/how-to-design-voice-user-interfaces. Accessed 15 Feb 2019

  21. Pasztor D (2017) Combining graphical and voice interfaces for a better user experience — Smashing Magazine. Retrieved February 15, 2019, from https://www.smashingmagazine.com/2017/10/combining-graphical-voice-interfaces/. Accessed 15 Feb 2019

  22. Pearl C (2017) Designing voice user interfaces: principles of conversational experiences. O'Reilly, Beijing. Accessed 15 Feb 2019

  23. Samsung (2014) Voice control. Retrieved February 15, 2019, from http://www.samsung.com/ph/smarttv/voice_control.html. Accessed 15 Feb 2019

  24. Seifert D (2018) Amazon fire TV cube review: a smarter streaming box - The Verge. Retrieved June 30, 2018, from https://www.theverge.com/2018/6/21/17484412/amazon-fire-tv-cube-review-alexa-echo. Accessed 15 Feb 2019

  25. Spiliotopoulos D, Stavropoulou P, Kouroupetroglou G (2009) Spoken dialogue interfaces: integrating usability. In: Holzinger A, Miesenberger K (eds) HCI and usability for e-inclusion: 5th Symposium of the workgroup human-computer interaction and usability engineering of the Austrian computer society, USAB 2009, Linz, Austria, November 9–10, 2009 proceedings. Springer Berlin Heidelberg, Berlin, Heidelberg, pp 484–499. https://doi.org/10.1007/978-3-642-10308-7_36

    Chapter  Google Scholar 

  26. TIVO (2016) Q4 2016 video trends report. Retrieved from https://pt.slideshare.net/shurm/q4-2016-video-trends-report. Accessed 15 Feb 2019

  27. Turunen M, Melto A, Hella J, Heimonen T, Hakulinen J, Mäkinen E, Laivo T, Soronen H (2009) User expectations and user experience with different modalities in a mobile phone-controlled home entertainment system. In with Mobile Devices (pp. 1–4). New York, NY, USA: ACM. https://doi.org/10.1145/1613858.1613898

  28. Ward N, Rivera AG, Ward K, Novick DG (2005) Some usability issues and research priorities in spoken dialog applications, departmental technical reports (CS). Paper 253. http://digitalcommons.utep.edu/cs_techrep/253. Accessed 15 Feb 2019

  29. Whitenton K (2017) Voice First: The Future of Interaction?. Retrieved January 20, 2018, from https://www.nngroup.com/articles/voice-first/. Accessed 15 Feb 2019

  30. Whitenton K (2017) Audio signifiers for voice interaction. Retrieved January 20, 2018, from https://www.nngroup.com/articles/audio-signifiers-voice-interaction/?utm_source=Alertbox&utm_campaign=0741ff983b-audiosignifiers_dontvalidatedesign_2017_09_11&utm_medium=email&utm_term=0_7f29a2b335-0741ff983b-24092741. Accessed 15 Feb 2019

  31. William L, Holden K, Butler J (2003) Universal principles of design. Rockport Publishers, Gloucester

    Google Scholar 

  32. Yankelovich N, Levow G-A, & Marx M (n.d.) Designing speech acts: issues in speech user interfaces. Retrieved from https://www.media.mit.edu/speech/papers/1995/yankelovich_CHI95_speechacts.pdf. Accessed 15 Feb 2019

Download references

Acknowledgements

This paper is a result of the CHIC – Cooperative Holistic for Internet and Content project (grant agreement number 24498), funded by COMPETE 2020 and Portugal 2020 through the European Regional Development Fund (FEDER).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rita Santos.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Santos, R., Abreu, J., Beça, P. et al. Voice interaction on TV: analysis of natural language interaction models and recommendations for voice user interfaces. Multimed Tools Appl 79, 35689–35716 (2020). https://doi.org/10.1007/s11042-020-08710-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-08710-2

Keywords

Navigation