ABSTRACT
Data-driven machine learning approaches have become increasingly used in human-computer interaction (HCI) tasks. However, compared with traditional machine learning tasks, for which large datasets are available and maintained, each HCI project needs to collect new datasets because HCI systems usually propose new sensing or use cases. Such datasets tend to be lacking in amount and lead to low performance or place a burden on participants in user studies. In this paper, taking hand gesture recognition using wrist-worn devices as a typical HCI task, I propose a self-supervised approach that achieves high performance with little burden on the user. The experimental results showed that hand gesture recognition was achieved with a very small number of labeled training samples (five samples with 95% accuracy for 5 gestures and 10 samples with 95% accuracy for 10 gestures). The results support the story that when the user wants to design 5 new gestures, he/she can activate the feature in less than 2 minutes. I discuss the potential of this self-supervised framework for the HCI community.
- Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, and Michael Auli. 2020. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations. CoRR abs/2006.11477(2020). arXiv:2006.11477https://arxiv.org/abs/2006.11477Google Scholar
- Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A Simple Framework for Contrastive Learning of Visual Representations. arXiv preprint arXiv:2002.05709(2020).Google Scholar
- Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248–255.Google ScholarCross Ref
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. ArXiv abs/1810.04805(2019).Google Scholar
- Harish Haresamudram, Irfan Essa, and Thomas Ploetz. 2021. Contrastive Predictive Coding for Human Activity Recognition. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 5(2021), 1 – 26.Google ScholarDigital Library
- Vassil Panayotov, Guoguo Chen, Daniel Povey, and Sanjeev Khudanpur. 2015. Librispeech: An ASR corpus based on public domain audio books. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 5206–5210. https://doi.org/10.1109/ICASSP.2015.7178964Google ScholarCross Ref
- F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.Google ScholarDigital Library
- Steffen Schneider, Alexei Baevski, Ronan Collobert, and Michael Auli. 2019. wav2vec: Unsupervised Pre-training for Speech Recognition. In INTERSPEECH.Google Scholar
- Bernd Thomas. 2022. SensorLog. http://sensorlog.berndthomas.net/. (Accessed on 07/21/2022).Google Scholar
- Xuhai Xu, Jun Gong, Carolina Brum, Lilian Liang, Bongsoo Suh, Shivam Kumar Gupta, Yash Agarwal, Laurence Lindsey, Runchang Kang, Behrooz Shahsavari, Tu Nguyen, Heriberto Nieto, Scott E Hudson, Charlie Maalouf, Jax Seyed Mousavi, and Gierad Laput. 2022. Enabling Hand Gesture Customization on Wrist-Worn Devices. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 496, 19 pages. https://doi.org/10.1145/3491102.3501904Google ScholarDigital Library
Index Terms
- Self-Supervised Approach for Few-shot Hand Gesture Recognition
Recommendations
Improving Few-Shot Image Classification with Self-supervised Learning
Cloud Computing – CLOUD 2022AbstractFew-Shot Image Classification (FSIC) aims to learn an image classifier with only a few training samples. The key challenge of few-shot image classification is to learn this classifier with scarce labeled data. To tackle the issue, we leverage the ...
Adversarial Self-supervised Learning for Semi-supervised 3D Action Recognition
Computer Vision – ECCV 2020AbstractWe consider the problem of semi-supervised 3D action recognition which has been rarely explored before. Its major challenge lies in how to effectively learn motion representations from unlabeled data. Self-supervised learning (SSL) has been proved ...
Recent methods and databases in vision-based hand gesture recognition
The paper surveys RGB and RGB-D sensors based hand gesture recognition methods.Dynamic as well as static gesture (posture/pose) recognition methods are reviewed.Qualitative as well as quantitative comparison of algorithms is provided.Twenty-six publicly ...
Comments