research-article

Air-Text: Air-Writing and Recognition System

Authors:
Sun-Kyung Lee

KAIST, Daejeon, Republic of Korea

KAIST, Daejeon, Republic of Korea
View Profile

,
Jong-Hwan Kim

KAIST, Daejeon, Republic of Korea

KAIST, Daejeon, Republic of Korea
View Profile

MM '21: Proceedings of the 29th ACM International Conference on MultimediaOctober 2021Pages 1267–1274https://doi.org/10.1145/3474085.3475694

Published:17 October 2021Publication History

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

Pages 1267–1274

ABSTRACT

Text entry takes an important role of effectively delivering the intention of users to computers, where physical and soft keyboards have been widely used. However, with the recent trends of developing technologies like augmented reality and increasing contactless services due to COVID-19, a more advanced type of text entry is required. To tackle this issue, we propose Air-Text which is an intuitive system to write in the air using fingertips as a pen. Unlike previously suggested air-writing systems, Air-Text provides various functionalities by the seamless integration of air-writing and text-recognition modules. Specifically, the air-writing module takes a sequence of RGB images as input and tracks both the location of fingertips (5.33 pixel error in 640x480 image) and current hand gesture class (98.29% classification accuracy) frame by frame. Users can easily perform writing operations such as writing or deleting a text by changing hand gestures, and tracked fingertip locations can be stored as a binary image. Then the text-recognition module, which is compatible with any pre-trained recognition models, predicts a written text in the binary image. In this paper, examples of single digit recognition with MNIST classifier (96.0% accuracy) and word-level recognition with text recognition model (79.36% character recognition rate) are provided.

Supplemental Material

MM21-fp2918.mp4

mp4

225.4 MB

Download

References

Jon Almazán, Albert Gordo, Alicia Fornés, and Ernest Valveny. 2014. Word spotting and recognition with embedded attributes. IEEE transactions on pattern analysis and machine intelligence, Vol. 36, 12 (2014), 2552--2566.Google ScholarCross Ref
Jeonghun Baek, Geewook Kim, Junyeop Lee, Sungrae Park, Dongyoon Han, Sangdoo Yun, Seong Joon Oh, and Hwalsuk Lee. 2019. What is wrong with scene text recognition model comparisons? dataset and model analysis. In Proceedings of the IEEE International Conference on Computer Vision. 4715--4723.Google ScholarCross Ref
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).Google Scholar
G. Bradski. 2000. The OpenCV Library. Dr. Dobb's Journal of Software Tools (2000).Google Scholar
Leigh Clark, Philip Doyle, Diego Garaialde, Emer Gilmartin, Stephan Schlögl, Jens Edlund, Matthew Aylett, Jo ao Cabral, Cosmin Munteanu, Justin Edwards, et almbox. 2019. The State of Speech in HCI: Trends, Themes and Challenges. Interacting with Computers, Vol. 31, 4 (2019), 349--371.Google ScholarCross Ref
Gregory Cohen, Saeed Afshar, Jonathan Tapson, and Andre Van Schaik. 2017. EMNIST: Extending MNIST to handwritten letters. In 2017 International Joint Conference on Neural Networks (IJCNN). IEEE, 2921--2926.Google ScholarCross Ref
Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, and Yichen Wei. 2017. Deformable convolutional networks. In Proceedings of the IEEE international conference on computer vision. 764--773.Google ScholarCross Ref
Volkmar Frinken and Seiichi Uchida. 2015. Deep BLSTM neural networks for unconstrained continuous handwritten text recognition. In 2015 13th International Conference on Document Analysis and Recognition (ICDAR). IEEE, 911--915. Google ScholarDigital Library
Ji Gan and Weiqiang Wang. 2019. In-air handwritten English word recognition using attention recurrent translator. Neural Computing and Applications (2019), 1--18.Google Scholar
Ji Gan, Weiqiang Wang, and Ke Lu. 2018. A unified CNN-RNN approach for in-air handwritten English word recognition. In 2018 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 1--6.Google ScholarCross Ref
Gaurav Garg, Srinidhi Hegde, Ramakrishna Perla, Varun Jain, Lovekesh Vig, and Ramya Hebbalaguppe. 2018. DrawInAir: a lightweight gestural interface based on fingertip regression. In Proceedings of the European Conference on Computer Vision (ECCV). 0--0.Google Scholar
Alex Graves, Santiago Fernández, Faustino Gomez, and Jürgen Schmidhuber. 2006. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd international conference on Machine learning. 369--376. Google ScholarDigital Library
Yichao Huang, Xiaorui Liu, Xin Zhang, and Lianwen Jin. 2016. A pointing gesture based egocentric interaction system: Dataset, approach and application. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 16--23.Google ScholarCross Ref
Varun Jain, Gaurav Garg, Ramakrishna Perla, and Ramya Hebbalaguppe. 2019. Gestarlite: An on-device pointing finger based gestural interface for smartphones and video see-through head-mounts. arXiv preprint arXiv:1904.09843 (2019).Google Scholar
Ue-Hwan Kim, Sahng-Min Yoo, and Jong-Hwan Kim. 2019. I-Keyboard: Fully Imaginary Keyboard on Touch Devices Empowered by Deep Neural Decoder. arXiv preprint arXiv:1907.13285 (2019).Google Scholar
Per Ola Kristensson. 2015. Next-generation text entry. Computer 7 (2015), 84--87.Google ScholarDigital Library
Rishikesh Kumar and Poonam Chaudhary. 2016. User defined custom virtual keyboard. In 2016 International Conference on Information Science (ICIS). IEEE, 18--22.Google ScholarCross Ref
Yann LeCun, Corinna Cortes, and CJ Burges. 2010. MNIST handwritten digit database. (2010).Google Scholar
Chen-Yu Lee and Simon Osindero. 2016. Recursive recurrent nets with attention modeling for ocr in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2231--2239.Google ScholarCross Ref
Duo Lu, Kai Xu, and Dijiang Huang. 2017. A data driven in-air-handwriting biometric authentication system. In 2017 IEEE International Joint Conference on Biometrics (IJCB). IEEE, 531--537.Google ScholarCross Ref
Anand Mishra, Karteek Alahari, and CV Jawahar. 2011. An MRF model for binarization of natural scene text. In 2011 International Conference on Document Analysis and Recognition. IEEE, 11--16. Google ScholarDigital Library
Chong Mou and Xin Zhang. 2020. Attention Based Dual Branches Fingertip Detection Network and Virtual Key System. In Proceedings of the 28th ACM International Conference on Multimedia. 2159--2165. Google ScholarDigital Library
Sohom Mukherjee, Sk Arif Ahmed, Debi Prosad Dogra, Samarjit Kar, and Partha Pratim Roy. 2019. Fingertip detection and tracking for recognition of air-writing in videos. Expert Systems with Applications, Vol. 136 (2019), 217--229.Google ScholarDigital Library
Shigueo Nomura, Keiji Yamanaka, Osamu Katai, Hiroshi Kawakami, and Takayuki Shiose. 2005. A novel adaptive morphological approach for degraded character image segmentation. Pattern Recognition, Vol. 38, 11 (2005), 1961--1975. Google ScholarDigital Library
Jan Noyes. 1983. The QWERTY keyboard: A review. International Journal of Man-Machine Studies, Vol. 18, 3 (1983), 265--281.Google ScholarCross Ref
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 8024--8035. http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf Google ScholarDigital Library
Pranav Puranik, Tanmay Sankhe, Avinash Singh, Vikas Vishwakarma, and Pradnya Rane. 2019. AirNote-Pen it Down!. In 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT). IEEE, 1--7.Google ScholarCross Ref
Baoguang Shi, Xiang Bai, and Cong Yao. 2016. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE transactions on pattern analysis and machine intelligence, Vol. 39, 11 (2016), 2298--2304.Google ScholarDigital Library
Desney Tan and Anton Nijholt. 2010. Brain-computer interfaces and human-computer interaction. In Brain-Computer Interfaces. Springer, 3--19.Google Scholar
Wenbin Wu, Chenyang Li, Zhuo Cheng, Xin Zhang, and Lianwen Jin. 2017. Yolse: Egocentric fingertip detection from single rgb images. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 623--630.Google ScholarCross Ref
Cong Yao, Xiang Bai, Baoguang Shi, and Wenyu Liu. 2014. Strokelets: A learned multi-scale representation for scene text recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4042--4049. Google ScholarDigital Library
Qixiang Ye, Wen Gao, Weiqiang Wang, and Wei Zeng. 2003. A robust text detection algorithm in images and video frames. In Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint, Vol. 2. IEEE, 802--806.Google Scholar
Hui-Shyong Yeo, Xiao-Shen Phang, Taejin Ha, Woontack Woo, and Aaron Quigley. 2017. TiTAN: Exploring Midair Text Entry Using Freehand Input. In Proceedings of the 2017 CHI Conference Extended Abstracts on Human Factors in Computing Systems. 3041--3049. Google ScholarDigital Library
Xin Yi, Chun Yu, Mingrui Zhang, Sida Gao, Ke Sun, and Yuanchun Shi. 2015. Atk: Enabling ten-finger freehand typing in air based on 3d hand tracking data. In Proceedings of the 28th Annual ACM Symposium on User Interface Software & Technology. 539--548. Google ScholarDigital Library
Hao Zhang, Yafeng Yin, Lei Xie, and Sanglu Lu. 2020. AirTyping: a mid-air typing scheme based on leap motion. In Adjunct Proceedings of the 2020 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2020 ACM International Symposium on Wearable Computers. 168--171. Google ScholarDigital Library

Index Terms

Air-Text: Air-Writing and Recognition System
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interaction techniques
      1. Text input

Recommendations

Character Input System using Fingertip Detection with Kinect Sensor
RACS '16: Proceedings of the International Conference on Research in Adaptive and Convergent Systems

The sign language and the finger alphabet are used for instrumentalizing communication with the deaf and hard of hearing people. Therefore, a character input system using hand gestures has already been extensively investigated. Previous research has ...
Read More
A CNN Based Air-Writing Recognition Framework for Multilinguistic Characters and Digits
Abstract
Air writing is a practice of writing the linguistic characters in free space utilizing the six degrees of freedom of hand motion. Researchers have proposed various methods to approach air-writing based on one or more dedicated external hardware, ...
Read More
Finger Spelling in Air System for Deaf and Dumb

Finger spelling in air helps user to operate a computer in order to make human interaction easier and faster than keyboard and touch screen. This article presents a real-time video based system which recognizes the English alphabets and words written in ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '21: Proceedings of the 29th ACM International Conference on Multimedia
October 2021
5796 pages
ISBN:9781450386517
DOI:10.1145/3474085
General Chairs:
Heng Tao Shen
University of Electronic Science&Technology of China, China
,
Yueting Zhuang
Zhejiang University, China
,
John R. Smith
IBM, USA
,
Program Chairs:
Yang Yang
University of Electronic Science and Technology of China, China
,
Pablo Cesar
CWI&TU Delft, The Netherlands
,
Florian Metze
FACEBOOK, Inc., USA
,
Balakrishnan Prabhakaran
University of Texas at Dallas, USA
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 October 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
air-writing
fingertip detection
gesture recognition
human-computer interaction
text recognition
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate995of4,171submissions,24%
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 380
  Total Downloads
- Downloads (Last 12 months)117
- Downloads (Last 6 weeks)14
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Air-Text: Air-Writing and Recognition System

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Character Input System using Fingertip Detection with Kinect Sensor

A CNN Based Air-Writing Recognition Framework for Multilinguistic Characters and Digits

Finger Spelling in Air System for Deaf and Dumb