ABSTRACT
Text entry takes an important role of effectively delivering the intention of users to computers, where physical and soft keyboards have been widely used. However, with the recent trends of developing technologies like augmented reality and increasing contactless services due to COVID-19, a more advanced type of text entry is required. To tackle this issue, we propose Air-Text which is an intuitive system to write in the air using fingertips as a pen. Unlike previously suggested air-writing systems, Air-Text provides various functionalities by the seamless integration of air-writing and text-recognition modules. Specifically, the air-writing module takes a sequence of RGB images as input and tracks both the location of fingertips (5.33 pixel error in 640x480 image) and current hand gesture class (98.29% classification accuracy) frame by frame. Users can easily perform writing operations such as writing or deleting a text by changing hand gestures, and tracked fingertip locations can be stored as a binary image. Then the text-recognition module, which is compatible with any pre-trained recognition models, predicts a written text in the binary image. In this paper, examples of single digit recognition with MNIST classifier (96.0% accuracy) and word-level recognition with text recognition model (79.36% character recognition rate) are provided.
Supplemental Material
- Jon Almazán, Albert Gordo, Alicia Fornés, and Ernest Valveny. 2014. Word spotting and recognition with embedded attributes. IEEE transactions on pattern analysis and machine intelligence, Vol. 36, 12 (2014), 2552--2566.Google ScholarCross Ref
- Jeonghun Baek, Geewook Kim, Junyeop Lee, Sungrae Park, Dongyoon Han, Sangdoo Yun, Seong Joon Oh, and Hwalsuk Lee. 2019. What is wrong with scene text recognition model comparisons? dataset and model analysis. In Proceedings of the IEEE International Conference on Computer Vision. 4715--4723.Google ScholarCross Ref
- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).Google Scholar
- G. Bradski. 2000. The OpenCV Library. Dr. Dobb's Journal of Software Tools (2000).Google Scholar
- Leigh Clark, Philip Doyle, Diego Garaialde, Emer Gilmartin, Stephan Schlögl, Jens Edlund, Matthew Aylett, Jo ao Cabral, Cosmin Munteanu, Justin Edwards, et almbox. 2019. The State of Speech in HCI: Trends, Themes and Challenges. Interacting with Computers, Vol. 31, 4 (2019), 349--371.Google ScholarCross Ref
- Gregory Cohen, Saeed Afshar, Jonathan Tapson, and Andre Van Schaik. 2017. EMNIST: Extending MNIST to handwritten letters. In 2017 International Joint Conference on Neural Networks (IJCNN). IEEE, 2921--2926.Google ScholarCross Ref
- Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, and Yichen Wei. 2017. Deformable convolutional networks. In Proceedings of the IEEE international conference on computer vision. 764--773.Google ScholarCross Ref
- Volkmar Frinken and Seiichi Uchida. 2015. Deep BLSTM neural networks for unconstrained continuous handwritten text recognition. In 2015 13th International Conference on Document Analysis and Recognition (ICDAR). IEEE, 911--915. Google ScholarDigital Library
- Ji Gan and Weiqiang Wang. 2019. In-air handwritten English word recognition using attention recurrent translator. Neural Computing and Applications (2019), 1--18.Google Scholar
- Ji Gan, Weiqiang Wang, and Ke Lu. 2018. A unified CNN-RNN approach for in-air handwritten English word recognition. In 2018 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 1--6.Google ScholarCross Ref
- Gaurav Garg, Srinidhi Hegde, Ramakrishna Perla, Varun Jain, Lovekesh Vig, and Ramya Hebbalaguppe. 2018. DrawInAir: a lightweight gestural interface based on fingertip regression. In Proceedings of the European Conference on Computer Vision (ECCV). 0--0.Google Scholar
- Alex Graves, Santiago Fernández, Faustino Gomez, and Jürgen Schmidhuber. 2006. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd international conference on Machine learning. 369--376. Google ScholarDigital Library
- Yichao Huang, Xiaorui Liu, Xin Zhang, and Lianwen Jin. 2016. A pointing gesture based egocentric interaction system: Dataset, approach and application. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 16--23.Google ScholarCross Ref
- Varun Jain, Gaurav Garg, Ramakrishna Perla, and Ramya Hebbalaguppe. 2019. Gestarlite: An on-device pointing finger based gestural interface for smartphones and video see-through head-mounts. arXiv preprint arXiv:1904.09843 (2019).Google Scholar
- Ue-Hwan Kim, Sahng-Min Yoo, and Jong-Hwan Kim. 2019. I-Keyboard: Fully Imaginary Keyboard on Touch Devices Empowered by Deep Neural Decoder. arXiv preprint arXiv:1907.13285 (2019).Google Scholar
- Per Ola Kristensson. 2015. Next-generation text entry. Computer 7 (2015), 84--87.Google ScholarDigital Library
- Rishikesh Kumar and Poonam Chaudhary. 2016. User defined custom virtual keyboard. In 2016 International Conference on Information Science (ICIS). IEEE, 18--22.Google ScholarCross Ref
- Yann LeCun, Corinna Cortes, and CJ Burges. 2010. MNIST handwritten digit database. (2010).Google Scholar
- Chen-Yu Lee and Simon Osindero. 2016. Recursive recurrent nets with attention modeling for ocr in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2231--2239.Google ScholarCross Ref
- Duo Lu, Kai Xu, and Dijiang Huang. 2017. A data driven in-air-handwriting biometric authentication system. In 2017 IEEE International Joint Conference on Biometrics (IJCB). IEEE, 531--537.Google ScholarCross Ref
- Anand Mishra, Karteek Alahari, and CV Jawahar. 2011. An MRF model for binarization of natural scene text. In 2011 International Conference on Document Analysis and Recognition. IEEE, 11--16. Google ScholarDigital Library
- Chong Mou and Xin Zhang. 2020. Attention Based Dual Branches Fingertip Detection Network and Virtual Key System. In Proceedings of the 28th ACM International Conference on Multimedia. 2159--2165. Google ScholarDigital Library
- Sohom Mukherjee, Sk Arif Ahmed, Debi Prosad Dogra, Samarjit Kar, and Partha Pratim Roy. 2019. Fingertip detection and tracking for recognition of air-writing in videos. Expert Systems with Applications, Vol. 136 (2019), 217--229.Google ScholarDigital Library
- Shigueo Nomura, Keiji Yamanaka, Osamu Katai, Hiroshi Kawakami, and Takayuki Shiose. 2005. A novel adaptive morphological approach for degraded character image segmentation. Pattern Recognition, Vol. 38, 11 (2005), 1961--1975. Google ScholarDigital Library
- Jan Noyes. 1983. The QWERTY keyboard: A review. International Journal of Man-Machine Studies, Vol. 18, 3 (1983), 265--281.Google ScholarCross Ref
- Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 8024--8035. http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf Google ScholarDigital Library
- Pranav Puranik, Tanmay Sankhe, Avinash Singh, Vikas Vishwakarma, and Pradnya Rane. 2019. AirNote-Pen it Down!. In 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT). IEEE, 1--7.Google ScholarCross Ref
- Baoguang Shi, Xiang Bai, and Cong Yao. 2016. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE transactions on pattern analysis and machine intelligence, Vol. 39, 11 (2016), 2298--2304.Google ScholarDigital Library
- Desney Tan and Anton Nijholt. 2010. Brain-computer interfaces and human-computer interaction. In Brain-Computer Interfaces. Springer, 3--19.Google Scholar
- Wenbin Wu, Chenyang Li, Zhuo Cheng, Xin Zhang, and Lianwen Jin. 2017. Yolse: Egocentric fingertip detection from single rgb images. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 623--630.Google ScholarCross Ref
- Cong Yao, Xiang Bai, Baoguang Shi, and Wenyu Liu. 2014. Strokelets: A learned multi-scale representation for scene text recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4042--4049. Google ScholarDigital Library
- Qixiang Ye, Wen Gao, Weiqiang Wang, and Wei Zeng. 2003. A robust text detection algorithm in images and video frames. In Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint, Vol. 2. IEEE, 802--806.Google Scholar
- Hui-Shyong Yeo, Xiao-Shen Phang, Taejin Ha, Woontack Woo, and Aaron Quigley. 2017. TiTAN: Exploring Midair Text Entry Using Freehand Input. In Proceedings of the 2017 CHI Conference Extended Abstracts on Human Factors in Computing Systems. 3041--3049. Google ScholarDigital Library
- Xin Yi, Chun Yu, Mingrui Zhang, Sida Gao, Ke Sun, and Yuanchun Shi. 2015. Atk: Enabling ten-finger freehand typing in air based on 3d hand tracking data. In Proceedings of the 28th Annual ACM Symposium on User Interface Software & Technology. 539--548. Google ScholarDigital Library
- Hao Zhang, Yafeng Yin, Lei Xie, and Sanglu Lu. 2020. AirTyping: a mid-air typing scheme based on leap motion. In Adjunct Proceedings of the 2020 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2020 ACM International Symposium on Wearable Computers. 168--171. Google ScholarDigital Library
Index Terms
- Air-Text: Air-Writing and Recognition System
Recommendations
Character Input System using Fingertip Detection with Kinect Sensor
RACS '16: Proceedings of the International Conference on Research in Adaptive and Convergent SystemsThe sign language and the finger alphabet are used for instrumentalizing communication with the deaf and hard of hearing people. Therefore, a character input system using hand gestures has already been extensively investigated. Previous research has ...
A CNN Based Air-Writing Recognition Framework for Multilinguistic Characters and Digits
AbstractAir writing is a practice of writing the linguistic characters in free space utilizing the six degrees of freedom of hand motion. Researchers have proposed various methods to approach air-writing based on one or more dedicated external hardware, ...
Finger Spelling in Air System for Deaf and Dumb
Finger spelling in air helps user to operate a computer in order to make human interaction easier and faster than keyboard and touch screen. This article presents a real-time video based system which recognizes the English alphabets and words written in ...
Comments