skip to main content
10.1145/3448018.3458011acmconferencesArticle/Chapter ViewAbstractPublication PagesetraConference Proceedingsconference-collections
short-paper

Gaze+Lip: Rapid, Precise and Expressive Interactions Combining Gaze Input and Silent Speech Commands for Hands-free Smart TV Control

Authors Info & Claims
Published:25 May 2021Publication History

ABSTRACT

As eye-tracking technologies develop, gaze becomes more and more popular as an input modality. However, in situations that require fast and precise object selection, gaze is hard to use because of limited accuracy. We present Gaze+Lip, a hands-free interface that combines gaze and lip reading to enable rapid and precise remote controls when interacting with big displays. Gaze+Lip takes advantage of gaze for target selection and leverages silent speech to ensure accurate and reliable command execution in noisy scenarios such as watching TV or playing videos on a computer. For evaluation, we implemented a system on a TV, and conducted an experiment to compare our method with the dwell-based gaze-only input method. Results showed that Gaze+Lip outperformed the gaze-only approach in accuracy and input speed. Furthermore, subjective evaluations indicated that Gaze+Lip is easy to understand, easy to use, and has higher perceived speed than the gaze-only approach.

References

  1. Yannis M. Assael, Brendan Shillingford, Shimon Whiteson, and Nando de Freitas. 2016. LipNet: Sentence-level Lipreading. CoRR abs/1611.01599(2016). arxiv:1611.01599http://arxiv.org/abs/1611.01599Google ScholarGoogle Scholar
  2. Ishan Chatterjee, Robert Xiao, and Chris Harrison. 2015. Gaze+Gesture: Expressive, Precise and Targeted Free-Space Interactions. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction (Seattle, Washington, USA) (ICMI ’15). Association for Computing Machinery, New York, NY, USA, 131–138. https://doi.org/10.1145/2818346.2820752 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. S. Chung, A. Senior, O. Vinyals, and A. Zisserman. 2017. Lip Reading Sentences in the Wild. In IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle Scholar
  4. Joon Son Chung, Andrew W. Senior, Oriol Vinyals, and Andrew Zisserman. 2016. Lip Reading Sentences in the Wild. CoRR abs/1611.05358(2016). arxiv:1611.05358http://arxiv.org/abs/1611.05358Google ScholarGoogle Scholar
  5. J. S. Chung and A. Zisserman. 2016. Lip Reading in the Wild. In Asian Conference on Computer Vision.Google ScholarGoogle Scholar
  6. Ing-Shiou Hwang, Yi-Ying Tsai, Bo-Han Zeng, Chien-Ming Lin, Huei-Sheng Shiue, and Gwo-Ching Chang. 2020. Integration of eye tracking and lip motion for hands-free computer access. Universal Access in the Information Society(2020), 1–12.Google ScholarGoogle Scholar
  7. Robert J. K. Jacob. 1990. What You Look at is What You Get: Eye Movement-Based Interaction Techniques. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Seattle, Washington, USA) (CHI ’90). Association for Computing Machinery, New York, NY, USA, 11–18. https://doi.org/10.1145/97243.97246 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Ji, W. Xu, M. Yang, and K. Yu. 2013. 3D Convolutional Neural Networks for Human Action Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 1(2013), 221–231. https://doi.org/10.1109/TPAMI.2012.59 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Naoki Kimura, Kentaro Hayashi, and Jun Rekimoto. 2020. TieLent: A Casual Neck-Mounted Mouth Capturing Device for Silent Speech Interaction. In Proceedings of the International Conference on Advanced Visual Interfaces (Salerno, Italy) (AVI ’20). Association for Computing Machinery, New York, NY, USA, Article 33, 8 pages. https://doi.org/10.1145/3399715.3399852 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Davis E. King. 2009. Dlib-ml: A Machine Learning Toolkit. Journal of Machine Learning Research 10 (2009), 1755–1758. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. I. Matthews, T. F. Cootes, J. A. Bangham, S. Cox, and R. Harvey. 2002. Extraction of visual features for lipreading. IEEE Transactions on Pattern Analysis and Machine Intelligence 24, 2 (Feb 2002), 198–213. https://doi.org/10.1109/34.982900 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Eric David Petajan. 1984. Automatic Lipreading to Enhance Speech Recognition (Speech Reading). Ph.D. Dissertation. Champaign, IL, USA. AAI8502266. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Ken Pfeuffer, Jason Alexander, Ming Ki Chong, and Hans Gellersen. 2014. Gaze-Touch: Combining Gaze with Multi-Touch for Interaction on the Same Surface. In Proceedings of the 27th Annual ACM Symposium on User Interface Software and Technology(Honolulu, Hawaii, USA) (UIST ’14). Association for Computing Machinery, New York, NY, USA, 509–518. https://doi.org/10.1145/2642918.2647397 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. KR Prajwal, Rudrabha Mukhopadhyay, Vinay P Namboodiri, and CV Jawahar. 2020. Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13796–13805.Google ScholarGoogle ScholarCross RefCross Ref
  15. Korok Sengupta, Min Ke, Raphael Menges, Chandan Kumar, and Steffen Staab. 2018. Hands-Free Web Browsing: Enriching the User Experience with Gaze and Voice Modality. In Proceedings of the 2018 ACM Symposium on Eye Tracking Research & Applications (Warsaw, Poland) (ETRA ’18). Association for Computing Machinery, New York, NY, USA, Article 88, 3 pages. https://doi.org/10.1145/3204493.3208338 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Linda E. Sibert and Robert J. K. Jacob. 2000. Evaluation of Eye Gaze Interaction. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (The Hague, The Netherlands) (CHI ’00). Association for Computing Machinery, New York, NY, USA, 281–288. https://doi.org/10.1145/332040.332445 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Stone. 1974. Cross-Validatory Choice and Assessment of Statistical Predictions. Journal of the Royal Statistical Society. Series B (Methodological) 36, 2(1974), 111–147. http://www.jstor.org/stable/2984809Google ScholarGoogle ScholarCross RefCross Ref
  18. Ke Sun, Chun Yu, Weinan Shi, Lan Liu, and Yuanchun Shi. 2018. Lip-Interact: Improving Mobile Device Interaction with Silent Speech Commands. In Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology(Berlin, Germany) (UIST ’18). Association for Computing Machinery, New York, NY, USA, 581–593. https://doi.org/10.1145/3242587.3242599 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Du Tran, Heng Wang, Lorenzo Torresani, Jamie Ray, Yann LeCun, and Manohar Paluri. 2018. A Closer Look at Spatiotemporal Convolutions for Action Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle ScholarCross RefCross Ref
  20. Michael Wand, Jan Koutník, and Jürgen Schmidhuber. 2016. Lipreading with Long Short-Term Memory. CoRR abs/1601.08188(2016). arxiv:1601.08188http://arxiv.org/abs/1601.08188Google ScholarGoogle Scholar
  21. Colin Ware and Harutune H. Mikaelian. 1986. An Evaluation of an Eye Tracker as a Device for Computer Input2. SIGCHI Bull. 17, SI (May 1986), 183–188. https://doi.org/10.1145/30851.275627 Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Shumin Zhai, Carlos Morimoto, and Steven Ihde. 1999a. Manual and Gaze Input Cascaded (MAGIC) Pointing. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Pittsburgh, Pennsylvania, USA) (CHI ’99). Association for Computing Machinery, New York, NY, USA, 246–253. https://doi.org/10.1145/302979.303053 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Shumin Zhai, Carlos Morimoto, and Steven Ihde. 1999b. Manual and Gaze Input Cascaded (MAGIC) Pointing. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Pittsburgh, Pennsylvania, USA) (CHI ’99). Association for Computing Machinery, New York, NY, USA, 246–253. https://doi.org/10.1145/302979.303053 Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Yanxia Zhang, Sophie Stellmach, Abigail Sellen, and Andrew Blake. 2015. The costs and benefits of combining gaze and hand gestures for remote interaction. In IFIP Conference on Human-Computer Interaction. Springer, 570–577. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Ziheng Zhou, Guoying Zhao, Xiaopeng Hong, and Matti Pietikäinen. 2014. Editor’s Choice Article. Image and Vision Computing 32, 9 (2014), 590–605. https://doi.org/10.1016/j.imavis.2014.06.004Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Gaze+Lip: Rapid, Precise and Expressive Interactions Combining Gaze Input and Silent Speech Commands for Hands-free Smart TV Control
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ETRA '21 Short Papers: ACM Symposium on Eye Tracking Research and Applications
        May 2021
        232 pages
        ISBN:9781450383455
        DOI:10.1145/3448018

        Copyright © 2021 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 25 May 2021

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • short-paper
        • Research
        • Refereed limited

        Acceptance Rates

        Overall Acceptance Rate69of137submissions,50%

        Upcoming Conference

        ETRA '24
        The 2024 Symposium on Eye Tracking Research and Applications
        June 4 - 7, 2024
        Glasgow , United Kingdom

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format