ABSTRACT
Acoustic signal has been recently adopted for contact-free hand gesture recognition due to its fine-grained sensing granularity and wide availability of microphone and speaker in consumer-grade electronic devices such as smartphones. However, a very limited sensing range constrains acoustic sensing to application scenarios where users interact with devices in close proximity. In this paper, we improve the range of acoustic sensing and demonstrate the feasibility of enabling room-scale hand gesture recognition using commodity smart speakers. We develop a series of novel signal processing techniques and implement our system on two commodity smart speaker prototypes with different numbers of microphones. Extensive evaluations are performed in three different environments with 1440 gestures collected from 16 participants. Experiment results show that our system can significantly increase the sensing range from 1 m to 4--5 m. In the challenging scenario where the user is 4 m away from the smart speaker and there is strong interference, the achieved gesture recognition accuracy is still higher than 90%.
- Fadel Adib, Zachary Kabelac, and Dina Katabi. 2015. Multi-person localization via {RF} body reflections. In 12th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 15). 279--292.Google Scholar
- Fadel Adib and Dina Katabi. 2013. See through walls with WiFi!. In Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM. 75--86.Google ScholarDigital Library
- Amazon. 2021. Amazon Echo Dot. https://www.amazon.com/Echo-Dot/dp/B07FZ8S74R?th=1Google Scholar
- Tawfiq Ammari, Jofish Kaye, Janice Y Tsai, and Frank Bentley. 2019. Music, Search, and IoT: How People (Really) Use Voice Assistants. ACM Trans. Comput. Hum. Interact. 26, 3 (2019), 17--1.Google ScholarDigital Library
- Apple. 2021. Apple HomePod. https://www.apple.com/shop/buy-homepod/homepodGoogle Scholar
- Derya Birant and Alp Kut. 2007. ST-DBSCAN: An algorithm for clustering spatial-temporal data. Data & knowledge engineering 60, 1 (2007), 208--221.Google Scholar
- Bo Chen, Qian Zhang, Run Zhao, Dong Li, and Dong Wang. 2018. SGRS: A sequential gesture recognition system using COTS RFID. In 2018 IEEE Wireless Communications and Networking Conference (WCNC). IEEE, 1--6.Google ScholarDigital Library
- Zhizhang Chen, Gopal Gokeda, and Yiqiang Yu. 2010. Introduction to Direction-of-arrival Estimation. Artech House.Google Scholar
- Cao Dian, Dong Wang, Qian Zhang, Run Zhao, and Yinggang Yu. 2020. Towards Domain-independent Complex and Fine-grained Gesture Recognition with RFID. Proceedings of the ACM on Human-Computer Interaction 4, ISS (2020), 1--22.Google ScholarDigital Library
- Zhihui Gao, Ang Li, Dong Li, Jialin Liu, Jie Xiong, Yu Wang, Bing Li, and Yiran Chen. 2022. MOM: Microphone based 3D Orientation Measurement. In 2022 21st ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN). IEEE, 132--144.Google ScholarCross Ref
- Google. 2022. Google Home Mini. https://store.google.com/us/product/google_home_mini_first_gen?hl=en-USGoogle Scholar
- Sidhant Gupta, Daniel Morris, Shwetak Patel, and Desney Tan. 2012. Soundwave: using the doppler effect to sense gestures. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 1911--1914.Google ScholarDigital Library
- Qualisys Inc. 2020. Qualisys motion capture systems. https://www.qualisys.com/hardware/miqus/Google Scholar
- Yincheng Jin, Yang Gao, Yanjun Zhu, Wei Wang, Jiyang Li, Seokmin Choi, Zhangyu Li, Jagmohan Chauhan, Anind K Dey, and Zhanpeng Jin. 2021. Soni-cASL: An Acoustic-based Sign Language Gesture Recognizer Using Earphones. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 5, 2 (2021), 1--30.Google ScholarDigital Library
- Chenning Li, Manni Liu, and Zhichao Cao. 2020. WiHF: Enable User Identified Gesture Recognition with WiFi. In IEEE INFOCOM 2020-IEEE Conference on Computer Communications. IEEE, 586--595.Google Scholar
- Dong Li, Shirui Cao, Sunghoon Ivan Lee, and Jie Xiong. 2022. Experience: practical problems for acoustic sensing. In Proceedings of the 28th Annual International Conference on Mobile Computing And Networking. 381--390.Google ScholarDigital Library
- Dong Li, Feng Ding, Qian Zhang, Run Zhao, Jinshi Zhang, and Dong Wang. 2017. TagController: A universal wireless and battery-free remote controller using passive RFID tags. In Proceedings of the 14th EAI International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services. 166--175.Google ScholarDigital Library
- Dong Li, Jialin Liu, Sunghoon Ivan Lee, and Jie Xiong. 2020. FM-Track: Pushing the Limits of Contactless Multi-target Tracking using Acoustic Signals. In Proceedings of the 18th ACM Conference on Embedded Networked Sensor Systems. 1--14.Google ScholarDigital Library
- Dong Li, Jialin Liu, Sunghoon Ivan Lee, and Jie Xiong. 2022. LASense: Pushing the Limits of Fine-grained Activity Sensing Using Acoustic Signals. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6, 1 (2022), 1--27.Google ScholarDigital Library
- Tianxing Li, Chuankai An, Zhao Tian, Andrew T Campbell, and Xia Zhou. 2015. Human sensing using visible light communication. In Proceedings of the 21st Annual International Conference on Mobile Computing and Networking. 331--344.Google ScholarDigital Library
- Tianxing Li, Qiang Liu, and Xia Zhou. 2016. Practical human sensing in the light. In Proceedings of the 14th Annual International Conference on Mobile Systems, Applications, and Services. 71--84.Google ScholarDigital Library
- Yichen Li, Tianxing Li, Ruchir A Patel, Xing-Dong Yang, and Xia Zhou. 2018. Self-powered gesture recognition with ambient light. In Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology. 595--608.Google ScholarDigital Library
- Jie Lian, Jiadong Lou, Li Chen, and Xu Yuan. 2021. EchoSpot: Spotting Your Locations via Acoustic Sensing. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 5, 3 (2021), 1--21.Google ScholarDigital Library
- BDNC (HOLDING) LIMITED. 2019. STAPEZ brand low distortion speaker. http://www.newbdnc.com/wp-content/uploads/datasheets/BFC-4448-24-4-006.pdfGoogle Scholar
- Kang Ling, Haipeng Dai, Yuntang Liu, and Alex X Liu. 2018. Ultragesture: Finegrained gesture sensing and recognition. In 2018 15th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON). IEEE, 1--9.Google ScholarDigital Library
- Jialin Liu, Dong Li, Lei Wang, and Jie Xiong. 2021. BlinkListener: " Listen" to Your Eye Blink Using Your Smartphone. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 5, 2 (2021), 1--27.Google ScholarDigital Library
- Jialin Liu, Dong Li, Lei Wang, Fusang Zhang, and Jie Xiong. 2022. Enabling Contact-free Acoustic Sensing under Device Motion. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6, 3 (2022), 1--27.Google ScholarDigital Library
- Wenguang Mao, Mei Wang, Wei Sun, Lili Qiu, Swadhin Pradhan, and Yi-Chao Chen. 2019. RNN-Based Room Scale Hand Motion Tracking. In The 25th Annual International Conference on Mobile Computing and Networking. ACM, 38.Google Scholar
- Sky McKinley and Megan Levine. 1998. Cubic spline interpolation. College of the Redwoods 45, 1 (1998), 1049--1060.Google Scholar
- MiniDSP. 2020. UMA-8-SP User mannual. https://www.minidsp.com/images/documents/UMA-8-SP%20User%20manual.pdfGoogle Scholar
- Rajalakshmi Nandakumar, Vikram Iyer, Desney Tan, and Shyamnath Gollakota. 2016. Fingerio: Using active sonar for fine-grained finger tracking. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM, 1515--1525.Google ScholarDigital Library
- Rajalakshmi Nandakumar, Alex Takakuwa, Tadayoshi Kohno, and Shyamnath Gollakota. 2017. Covertband: Activity information leakage using music. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 1, 3 (2017), 1--24.Google ScholarDigital Library
- Qifan Pu, Sidhant Gupta, Shyamnath Gollakota, and Shwetak Patel. 2013. Whole-home gesture recognition using wireless signals. In Proceedings of the 19th annual international conference on Mobile computing & networking. 27--38.Google ScholarDigital Library
- Fitzpatrick D. Purves D, Augustine GJ. 2021. Neuroscience. 2nd edition. The Audible Spectrum. https://www.ncbi.nlm.nih.gov/books/NBK10924/Google Scholar
- Kun Qian, Chenshu Wu, Fu Xiao, Yue Zheng, Yi Zhang, Zheng Yang, and Yunhao Liu. 2018. Acousticcardiogram: Monitoring heartbeats using acoustic signals on smart devices. In IEEE INFOCOM 2018-IEEE Conference on Computer Communications. IEEE, 1574--1582.Google ScholarDigital Library
- Research and Markets. 2022. Global Gesture Recognition and Touchless Sensing Market with COVID-19 Impact Analysis by Technology (Touch-based, Touchless), Type, Product (Touchless Biometric Equipment, Touchless Sanitary Equipment), Industry and Geography - Forecast to 2026. https://www.researchandmarkets.com/reports/5011620/global-gesture-recognition-and-touchless-sensingGoogle Scholar
- Wenjie Ruan, Quan Z Sheng, Lei Yang, Tao Gu, Peipei Xu, and Longfei Shangguan. 2016. AudioGest: enabling fine-grained hand gesture detection by decoding echo signal. In Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing. ACM, 474--485.Google ScholarDigital Library
- Seeed. 2021. ReSpeaker 2-Mic Array. https://wiki.seeedstudio.com/ReSpeaker_2_Mics_Pi_HAT/Google Scholar
- Seeed. 2021. ReSpeaker 6-Mic Circular Array. https://wiki.seeedstudio.com/ReSpeaker_6-Mic_Circular_Array_kit_for_Raspberry_Pi/Google Scholar
- Sonos. 2021. Sonos One. https://www.sonos.com/en-us/shop/one.htmlGoogle Scholar
- Ke Sun, Chen Chen, and Xinyu Zhang. 2020. " Alexa, stop spying on me!" speech privacy protection against voice assistants. In Proceedings of the 18th conference on embedded networked sensor systems. 298--311.Google ScholarDigital Library
- Ke Sun, Wei Wang, Alex X Liu, and Haipeng Dai. 2018. Depth aware finger tapping on virtual displays. In Proceedings of the 16th Annual International Conference on Mobile Systems, Applications, and Services. ACM, 283--295.Google ScholarDigital Library
- Zijian Tang, Gerrit Blacquiere, and Geert Leus. 2011. Aliasing-free wideband beamforming using sparse signal representation. IEEE Transactions on Signal Processing 59, 7 (2011), 3464--3469.Google ScholarDigital Library
- Raghav H Venkatnarayan, Griffin Page, and Muhammad Shahzad. 2018. Multiuser gesture recognition using WiFi. In Proceedings of the 16th Annual International Conference on Mobile Systems, Applications, and Services. 401--413.Google ScholarDigital Library
- Raghav H Venkatnarayan and Muhammad Shahzad. 2018. Gesture recognition using ambient light. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 1 (2018), 1--28.Google ScholarDigital Library
- VLIKE. 2021. VLIKE LCD Digital Sound Level Meter. https://www.amazon.com/VLIKE-Digital-Measurement-Measuring-Function/dp/B01N2RLJ32Google Scholar
- Voicebot.ai. 2020. Smart speaker survey. https://research.voicebot.ai/report-list/smart-speaker-consumer-adoption-report-2020/Google Scholar
- Tianben Wang, Daqing Zhang, Yuanqing Zheng, Tao Gu, Xingshe Zhou, and Bernadette Dorizzi. 2018. C-FMCW based contactless respiration detection using acoustic signal. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 1, 4 (2018), 1--20.Google ScholarDigital Library
- Wei Wang, Alex X Liu, and Ke Sun. 2016. Device-free gesture tracking using acoustic signals. In Proceedings of the 22nd Annual International Conference on Mobile Computing and Networking. ACM, 82--94.Google Scholar
- Yanwen Wang, Jiaxing Shen, and Yuanqing Zheng. 2020. Push the Limit of Acoustic Gesture Recognition. In IEEE INFOCOM 2020-IEEE Conference on Computer Communications. IEEE, 566--575.Google Scholar
- Zhengjie Wang, Yushan Hou, Kangkang Jiang, Wenwen Dou, Chengming Zhang, Zehua Huang, and Yinjing Guo. 2019. Hand gesture recognition based on active ultrasonic sensing of smartphone: a survey. IEEE Access 7 (2019), 111897--111922.Google ScholarCross Ref
- Binbin Xie and Jie Xiong. 2020. Combating interference for long range LoRa sensing. In Proceedings of the 18th Conference on Embedded Networked Sensor Systems. 69--81.Google ScholarDigital Library
- Sangki Yun, Yi-Chao Chen, Huihuang Zheng, Lili Qiu, and Wenguang Mao. 2017. Strata: Fine-grained acoustic-based device-free tracking. In Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services. ACM, 15--28.Google ScholarDigital Library
- Fusang Zhang, Zhi Wang, Beihong Jin, Jie Xiong, and Daqing Zhang. 2020. Your Smart Speaker Can" Hear" Your Heartbeat! Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 4, 4 (2020), 1--24.Google ScholarDigital Library
- Maotian Zhang, Qian Dai, Panlong Yang, Jie Xiong, Chang Tian, and Chaocan Xiang. 2018. idial: Enabling a virtual dial plate on the hand back for around-device interaction. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 1 (2018), 55.Google ScholarDigital Library
- Ningzhi Zhu, Huangxun Chen, and Zhice Yang. 2021. Fine-grained Multi-user Device-Free Gesture Tracking on Today's Smart Speakers. In 2021 IEEE 18th International Conference on Mobile Ad Hoc and Smart Systems (MASS). IEEE, 99--107.Google ScholarCross Ref
- Yongpan Zou, Jiang Xiao, Jinsong Han, Kaishun Wu, Yun Li, and Lionel M Ni. 2016. Grfid: A device-free rfid-based gesture recognition system. IEEE Transactions on Mobile Computing 16, 2 (2016), 381--393.Google ScholarDigital Library
Index Terms
- Room-Scale Hand Gesture Recognition Using Smart Speakers
Recommendations
Depth-based hand gesture recognition
In this article, a dynamic gesture recognition system with the depth information is proposed. The proposed system consists of three main components: preprocessing, static posture recognition and dynamic gesture recognition. In the first component, the ...
Smart Hand Device Gesture Recognition with Dynamic Time-Warping Method
BDIOT '17: Proceedings of the International Conference on Big Data and Internet of ThingIn this paper, we present a smart wearable hand-gesture recognition system based on the movement of the hand and fingers. The proposed smart wearable system is built using the fewest sensors necessary for gesture recognition. Thus, motion sensors are ...
Heterogeneous hand gesture recognition using 3D dynamic skeletal data
AbstractHand gestures are the most natural and intuitive non-verbal communication medium while interacting with a computer, and related research efforts have recently boosted interest. Additionally, the identifiable features of the hand pose ...
Highlights- Dynamic hand gesture recognition using 3D skeletal data.
- Computing efficient ...
Comments