ABSTRACT
Can a reasonably robust speech recognition engine improve text entry speeds in Indian languages in spite of the time spent by users in correcting errors? We investigate this question in this paper. We conducted a within-subject longitudinal study to evaluate performance of keyboard-only input and keyboard+speech input for Hindi with 20 novice users. We found that keyboard+speech input is 2.5 times faster than keyboard input. Results also showed that the difference in performance was lower for phrases picked from poems, songs and phrases that used less frequently used words. To the best of our knowledge, ours is the first study that compares performance of these two input modalities in an Indian language.
- Anirudha Joshi, Girish Dalvi, Manjiri Joshi, Prasad Rashinkar, and Aniket Sarangdhar. 2011. Design and evaluation of Devanagari virtual keyboards for touch screen mobile phones. In Proceedings of the 13th International Conference on Human Computer Interaction with Mobile Devices and Services (MobileHCI '11). ACM, New York, NY, USA, 323--332. Google ScholarDigital Library
- Girish Dalvi, Shashank Ahire, Nagraj Emmadi, Manjiri Joshi, Nirav Malsettar, Debasis Samanta, Devendra Jalihal, and Anirudha Joshi. 2015. A Protocol to Evaluate Virtual Keyboards for Indian Languages. In Proceedings of the 7th International Conference on HCI, IndiaHCI 2015 (IndiaHCI'15). ACM, New York, NY, USA, 27--38. Google ScholarDigital Library
- Girish Dalvi, Shashank Ahire, Nagraj Emmadi, Manjiri Joshi, Anirudha Joshi, Sanjay Ghosh, Prasad Ghone, and Narendra Parmar. 2016. Does prediction really help in Marathi text entry?: empirical analysis of a longitudinal study. In Proceedings of the 18th International Conference on Human-Computer Interaction with Mobile Devices and Services (MobileHCI '16). ACM, New York, NY, USA, 35--46. Google ScholarDigital Library
- Younghee Jung, Dhaval Joshi, Vijay Narayanan-Saroja, and Deepak Prabhu Desai. 2011. Solving the great Indian text entry puzzle: touch screen-based mobile text entry design. In Proceedings of the 13th International Conference on Human Computer Interaction with Mobile Devices and Services (MobileHCI '11). ACM, New York, NY, USA, 313--322. Google ScholarDigital Library
- Ben Shneiderman. 2000. The limits of speech recognition. Commun. ACM 43, 9 (September 2000), 63--65. Google ScholarDigital Library
- Prasenjit Dey, Ramchandrula Sitaram, Rahul Ajmera, and Kalika Bali. 2009. Voice key board: multimodal indic text entry. In Proceedings of the 2009 international conference on Multimodal interfaces (ICMI-MLMI '09). ACM, New York, NY, USA, 313--318. Google ScholarDigital Library
- Keith Vertanen and Per Ola Kristensson. 2009. Parakeet: a continuous speech recognition system for mobile touch-screen devices. In Proceedings of the 14th international conference on Intelligent user interfaces (IUI '09). ACM, New York, NY, USA, 237--246. Google ScholarDigital Library
- Sherry Ruan, Jacob O. Wobbrock, Kenny Liou, Andrew Ng, and James A. Landay. 2018. Comparing Speech and Keyboard Text Entry for Short Messages in Two Languages on Touchscreen Phones. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 1, 4, Article 159 (January 2018), 23 pages. Google ScholarDigital Library
- Ankit Kuamr, Mohit Dua, and Arun Choudhary. 2014. Implementation and performance evaluation of continuous Hindi speech recognition. In Electronics and Communication Systems (ICECS), 2014 International Conference on. IEEE, 1--5).Google ScholarCross Ref
- Clare-Marie Karat, Christine Halverson, Daniel Horn, and John Karat. 1999. Patterns of entry and correction in large vocabulary continuous speech recognition systems. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems (CHI '99). ACM, New York, NY, USA, 568--575. Google ScholarDigital Library
- Alexander I. Rudnicky, Michelle Sakamoto, and Joseph H. Polifroni. 1989. Evaluating spoken language interaction. In Proceedings of the workshop on Speech and Natural Language (HLT '89). Association for Computational Linguistics, Stroudsburg, PA, USA, 150--159. Google ScholarDigital Library
- Aditya Vashistha, Pooja Sethi, and Richard Anderson. 2017. Respeak: A Voice-based, Crowd-powered Speech Transcription System. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI '17). ACM, New York, NY, USA, 1855--1866. Google ScholarDigital Library
- Kathleen J. Price, Min Lin, Jinjuan Feng, Rich Goldman, Andrew Sears, and Julie A. Jacko. 2004. Data Entry on the Move: An Examination of Nomadic Speech-Based Text Entry. Springer Berlin Heidelberg, Berlin, Heidelberg, 460--471.Google Scholar
- Wikipedia. Devanagari. Retrieved July 06, 2018 from https://en.wikipedia.org/wiki/DevanagariGoogle Scholar
- Liv.ai. 2018. Retrieved July 06, 2018 from https://liv.aiGoogle Scholar
- Wikipedia. List of languages by number of native speakers in India. Retrieved on July 06, 2018 from https://en.wikipedia.org/wiki/List_of_languages_by_number_of_native_speakers_in_IndiaGoogle Scholar
- Wikipedia. List of languages by number of native speakers. Retrieved on July 06, 2018 from https://en.wikipedia.org/wiki/List_of_languages_by_number_of_native_speakersGoogle Scholar
- R. William Soukoreff and I. Scott MacKenzie. 2004. Recent developments in text-entry error rate measurement. In CHI '04 Extended Abstracts on Human Factors in Computing Systems (CHI EA '04). ACM, New York, NY, USA, 1425--1428. Google ScholarDigital Library
- CDAC InScript: Unified Virtual Keyboard for Indian Languages, 2018. http://www.cdac.in/index.aspx?id=dl_android_uvkilGoogle Scholar
- SwiftKey Keyboard, 2018. https://play.google.com/store/apps/details?id=com.touchtype.swiftkey&hl=enGoogle Scholar
- Swarachakra Marathi Keyboard, 2018. https://play.google.com/store/apps/details?id=iit.android.swarachakraMarathiGoogle Scholar
- Sparsh Marathi keyboard, 2018. https://play.google.com/store/apps/details?id=com.sparsh.inputmethod.marathi&hl=enGoogle Scholar
- Wikipedia. Abugida. Retrieved on September 28, 2018 from https://en.wikipedia.org/wiki/AbugidaGoogle Scholar
- David A Rosenbaum. 2009. Human motor control. Academic press.Google Scholar
- Sherry Ruan, Jacob O.Wobbrock, Kenny Liou, Andrew Ng, James Landay.2016.Speech Is 3x Faster than Typing for English and Mandarin Text Entry on Mobile DevicesGoogle Scholar
Index Terms
- How Much Faster Can You Type by Speaking in Hindi?: Comparing Keyboard-Only and Keyboard+Speech Text Entry
Recommendations
Automatic speech segmentation in syllable centric speech recognition system
Speech recognition is the process of understanding the human or natural language speech by a computer. A syllable centric speech recognition system in this aspect identifies the syllable boundaries in the input speech and converts it into the respective ...
A waveform concatenation technique for text-to-speech synthesis
Designing text-to-speech systems capable of producing natural sounding speech segments in different Indian languages is a challenging and ongoing problem. Due to the large number of possible pronunciations in different Indian languages, a number of ...
Approaches for Multilingual Phone Recognition in Code-switched and Non-code-switched Scenarios Using Indian Languages
In this study, we evaluate and compare two different approaches for multilingual phone recognition in code-switched and non-code-switched scenarios. First approach is a front-end Language Identification (LID)-switched to a monolingual phone recognizer (...
Comments