skip to main content
10.1145/3474085.3475694acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Air-Text: Air-Writing and Recognition System

Published:17 October 2021Publication History

ABSTRACT

Text entry takes an important role of effectively delivering the intention of users to computers, where physical and soft keyboards have been widely used. However, with the recent trends of developing technologies like augmented reality and increasing contactless services due to COVID-19, a more advanced type of text entry is required. To tackle this issue, we propose Air-Text which is an intuitive system to write in the air using fingertips as a pen. Unlike previously suggested air-writing systems, Air-Text provides various functionalities by the seamless integration of air-writing and text-recognition modules. Specifically, the air-writing module takes a sequence of RGB images as input and tracks both the location of fingertips (5.33 pixel error in 640x480 image) and current hand gesture class (98.29% classification accuracy) frame by frame. Users can easily perform writing operations such as writing or deleting a text by changing hand gestures, and tracked fingertip locations can be stored as a binary image. Then the text-recognition module, which is compatible with any pre-trained recognition models, predicts a written text in the binary image. In this paper, examples of single digit recognition with MNIST classifier (96.0% accuracy) and word-level recognition with text recognition model (79.36% character recognition rate) are provided.

Skip Supplemental Material Section

Supplemental Material

MM21-fp2918.mp4

mp4

225.4 MB

References

  1. Jon Almazán, Albert Gordo, Alicia Fornés, and Ernest Valveny. 2014. Word spotting and recognition with embedded attributes. IEEE transactions on pattern analysis and machine intelligence, Vol. 36, 12 (2014), 2552--2566.Google ScholarGoogle ScholarCross RefCross Ref
  2. Jeonghun Baek, Geewook Kim, Junyeop Lee, Sungrae Park, Dongyoon Han, Sangdoo Yun, Seong Joon Oh, and Hwalsuk Lee. 2019. What is wrong with scene text recognition model comparisons? dataset and model analysis. In Proceedings of the IEEE International Conference on Computer Vision. 4715--4723.Google ScholarGoogle ScholarCross RefCross Ref
  3. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).Google ScholarGoogle Scholar
  4. G. Bradski. 2000. The OpenCV Library. Dr. Dobb's Journal of Software Tools (2000).Google ScholarGoogle Scholar
  5. Leigh Clark, Philip Doyle, Diego Garaialde, Emer Gilmartin, Stephan Schlögl, Jens Edlund, Matthew Aylett, Jo ao Cabral, Cosmin Munteanu, Justin Edwards, et almbox. 2019. The State of Speech in HCI: Trends, Themes and Challenges. Interacting with Computers, Vol. 31, 4 (2019), 349--371.Google ScholarGoogle ScholarCross RefCross Ref
  6. Gregory Cohen, Saeed Afshar, Jonathan Tapson, and Andre Van Schaik. 2017. EMNIST: Extending MNIST to handwritten letters. In 2017 International Joint Conference on Neural Networks (IJCNN). IEEE, 2921--2926.Google ScholarGoogle ScholarCross RefCross Ref
  7. Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, and Yichen Wei. 2017. Deformable convolutional networks. In Proceedings of the IEEE international conference on computer vision. 764--773.Google ScholarGoogle ScholarCross RefCross Ref
  8. Volkmar Frinken and Seiichi Uchida. 2015. Deep BLSTM neural networks for unconstrained continuous handwritten text recognition. In 2015 13th International Conference on Document Analysis and Recognition (ICDAR). IEEE, 911--915. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Ji Gan and Weiqiang Wang. 2019. In-air handwritten English word recognition using attention recurrent translator. Neural Computing and Applications (2019), 1--18.Google ScholarGoogle Scholar
  10. Ji Gan, Weiqiang Wang, and Ke Lu. 2018. A unified CNN-RNN approach for in-air handwritten English word recognition. In 2018 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  11. Gaurav Garg, Srinidhi Hegde, Ramakrishna Perla, Varun Jain, Lovekesh Vig, and Ramya Hebbalaguppe. 2018. DrawInAir: a lightweight gestural interface based on fingertip regression. In Proceedings of the European Conference on Computer Vision (ECCV). 0--0.Google ScholarGoogle Scholar
  12. Alex Graves, Santiago Fernández, Faustino Gomez, and Jürgen Schmidhuber. 2006. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd international conference on Machine learning. 369--376. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Yichao Huang, Xiaorui Liu, Xin Zhang, and Lianwen Jin. 2016. A pointing gesture based egocentric interaction system: Dataset, approach and application. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 16--23.Google ScholarGoogle ScholarCross RefCross Ref
  14. Varun Jain, Gaurav Garg, Ramakrishna Perla, and Ramya Hebbalaguppe. 2019. Gestarlite: An on-device pointing finger based gestural interface for smartphones and video see-through head-mounts. arXiv preprint arXiv:1904.09843 (2019).Google ScholarGoogle Scholar
  15. Ue-Hwan Kim, Sahng-Min Yoo, and Jong-Hwan Kim. 2019. I-Keyboard: Fully Imaginary Keyboard on Touch Devices Empowered by Deep Neural Decoder. arXiv preprint arXiv:1907.13285 (2019).Google ScholarGoogle Scholar
  16. Per Ola Kristensson. 2015. Next-generation text entry. Computer 7 (2015), 84--87.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Rishikesh Kumar and Poonam Chaudhary. 2016. User defined custom virtual keyboard. In 2016 International Conference on Information Science (ICIS). IEEE, 18--22.Google ScholarGoogle ScholarCross RefCross Ref
  18. Yann LeCun, Corinna Cortes, and CJ Burges. 2010. MNIST handwritten digit database. (2010).Google ScholarGoogle Scholar
  19. Chen-Yu Lee and Simon Osindero. 2016. Recursive recurrent nets with attention modeling for ocr in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2231--2239.Google ScholarGoogle ScholarCross RefCross Ref
  20. Duo Lu, Kai Xu, and Dijiang Huang. 2017. A data driven in-air-handwriting biometric authentication system. In 2017 IEEE International Joint Conference on Biometrics (IJCB). IEEE, 531--537.Google ScholarGoogle ScholarCross RefCross Ref
  21. Anand Mishra, Karteek Alahari, and CV Jawahar. 2011. An MRF model for binarization of natural scene text. In 2011 International Conference on Document Analysis and Recognition. IEEE, 11--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Chong Mou and Xin Zhang. 2020. Attention Based Dual Branches Fingertip Detection Network and Virtual Key System. In Proceedings of the 28th ACM International Conference on Multimedia. 2159--2165. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Sohom Mukherjee, Sk Arif Ahmed, Debi Prosad Dogra, Samarjit Kar, and Partha Pratim Roy. 2019. Fingertip detection and tracking for recognition of air-writing in videos. Expert Systems with Applications, Vol. 136 (2019), 217--229.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Shigueo Nomura, Keiji Yamanaka, Osamu Katai, Hiroshi Kawakami, and Takayuki Shiose. 2005. A novel adaptive morphological approach for degraded character image segmentation. Pattern Recognition, Vol. 38, 11 (2005), 1961--1975. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Jan Noyes. 1983. The QWERTY keyboard: A review. International Journal of Man-Machine Studies, Vol. 18, 3 (1983), 265--281.Google ScholarGoogle ScholarCross RefCross Ref
  26. Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 8024--8035. http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Pranav Puranik, Tanmay Sankhe, Avinash Singh, Vikas Vishwakarma, and Pradnya Rane. 2019. AirNote-Pen it Down!. In 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT). IEEE, 1--7.Google ScholarGoogle ScholarCross RefCross Ref
  28. Baoguang Shi, Xiang Bai, and Cong Yao. 2016. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE transactions on pattern analysis and machine intelligence, Vol. 39, 11 (2016), 2298--2304.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Desney Tan and Anton Nijholt. 2010. Brain-computer interfaces and human-computer interaction. In Brain-Computer Interfaces. Springer, 3--19.Google ScholarGoogle Scholar
  30. Wenbin Wu, Chenyang Li, Zhuo Cheng, Xin Zhang, and Lianwen Jin. 2017. Yolse: Egocentric fingertip detection from single rgb images. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 623--630.Google ScholarGoogle ScholarCross RefCross Ref
  31. Cong Yao, Xiang Bai, Baoguang Shi, and Wenyu Liu. 2014. Strokelets: A learned multi-scale representation for scene text recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4042--4049. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Qixiang Ye, Wen Gao, Weiqiang Wang, and Wei Zeng. 2003. A robust text detection algorithm in images and video frames. In Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint, Vol. 2. IEEE, 802--806.Google ScholarGoogle Scholar
  33. Hui-Shyong Yeo, Xiao-Shen Phang, Taejin Ha, Woontack Woo, and Aaron Quigley. 2017. TiTAN: Exploring Midair Text Entry Using Freehand Input. In Proceedings of the 2017 CHI Conference Extended Abstracts on Human Factors in Computing Systems. 3041--3049. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Xin Yi, Chun Yu, Mingrui Zhang, Sida Gao, Ke Sun, and Yuanchun Shi. 2015. Atk: Enabling ten-finger freehand typing in air based on 3d hand tracking data. In Proceedings of the 28th Annual ACM Symposium on User Interface Software & Technology. 539--548. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Hao Zhang, Yafeng Yin, Lei Xie, and Sanglu Lu. 2020. AirTyping: a mid-air typing scheme based on leap motion. In Adjunct Proceedings of the 2020 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2020 ACM International Symposium on Wearable Computers. 168--171. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Air-Text: Air-Writing and Recognition System

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      MM '21: Proceedings of the 29th ACM International Conference on Multimedia
      October 2021
      5796 pages
      ISBN:9781450386517
      DOI:10.1145/3474085

      Copyright © 2021 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 17 October 2021

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate995of4,171submissions,24%

      Upcoming Conference

      MM '24
      MM '24: The 32nd ACM International Conference on Multimedia
      October 28 - November 1, 2024
      Melbourne , VIC , Australia

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader