skip to main content
10.1145/3098279.3098556acmconferencesArticle/Chapter ViewAbstractPublication PagesmobilehciConference Proceedingsconference-collections
research-article

Finger tracking: facilitating non-commercial content production for mobile e-reading applications

Published: 04 September 2017 Publication History

Abstract

Limited literacy and visual impairment reduce the ability of many to read on their own. Current e-reader solutions rely on either unnatural synthetic voices or professionally produced audio e-books. Neither provide the same enjoyment as having a family member read to a user, especially when the user requires assistive reading (following printed text while listening to it being read). Unfortunately, the support for non-commercial production of such e-books is limited and requires significant effort. We evaluate a novel, assistive mobile interaction technique that facilitates the recording of audio e-books and their synchronization with the read text. We show that a technique based on a finger tracking metaphor provides optimal support with respect to reading speed. These human-in-the-loop, adaptive techniques can now be used to reduce the content-creation burden that is associated with supporting those who cannot read on their own.

References

[1]
Aitor Álvarez, Haritz Arzelus, and Pablo Ruiz. 2014. Long audio alignment for automatic subtitling using different phone-relatedness measures. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6280--6284.
[2]
Xavier Anguera, Jordi Luque, and Ciro Gracia. 2014. Audio-to-text alignment for speech recognition with very limited resources. In INTERSPEECH 2014, 15th Annual Conference of the International Speech Communication Association, 1405--1409. Retrieved from http://www.isca-speech.org/archive/interspeech_2014/i14_1405.html
[3]
J. Archibald and W. O'Grady. 2008. Contemporary Linguistic Analysis. Pearson.
[4]
Abbas Attarwala, Ronald M. Baecker, and Cosmin Munteanu. 2012. Accessible, Large-print, Listening & Talking e-Book (ALIT). In Proceedings of the Fifth ACM Workshop on Research Advances in Large Digital Book Repositories and Complementary Media (BooksOnline '12), 19--20.
[5]
Abbas Attarwala, Cosmin Munteanu, and Ronald Baecker. 2013. An Accessible, Large-print, Listening and Talking e-Book to Support Families Reading Together. In Proceedings of the 15th International Conference on Human-computer Interaction with Mobile Devices and Services (MobileHCI '13), 440--443.
[6]
Jill Attewell and Carol Savill-Smith. 2004. Mobile learning and social inclusion: focusing on learners and learning. In Learning with mobile devices: research and development. 3--12.
[7]
Matthew P. Aylett, Graham Pullin, David A. Braude, Blaise Potard, Shannon Hennig, and Marilia Antunes Ferreira. 2016. Don't Say Yes, Say Yes: Interacting with Synthetic Speech Using Tonetable. In Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems (CHI EA '16), 3643--3646.
[8]
Scott Bateman, Rosta Farzan, Peter Brusilovsky, and Gord McCalla. 2006. OATS: The Open Annotation and Tagging System. In 3rd Annual International Scientific Conference of the Learning Object Repository Research Network, 10. Retrieved April 1, 2012 from http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.61.9222
[9]
J.E. Beck and J. Sison. 2006. Using Knowledge Tracing in a Noisy Environment to Measure Student Reading Proficiencies. International Journal of Artificial Intelligence in Education (IJAIED) 16, 2: 129--143.
[10]
Joseph Beck, Peng Jia, and Jack Mostow. 2003. Assessing Student Proficiency in a Reading Tutor that Listens. In International Conference on User Modeling (UM) (Lecture Notes in Computer Science), 323--327.
[11]
Shirley Ann Becker. 2004. A Study of Web Usability for Older Adults Seeking Online Health Resources. ACM Trans. Comput.-Hum. Interact. 11, 4: 387--406.
[12]
Nicola J. Bidwell, Thomas Reitmaier, Gary Marsden, and Susan Hansen. 2010. Designing with mobile digital storytelling in rural Africa. 1593.
[13]
Christian Boitet. 1990. Towards Personal MT: General Design, Dialogue Structure, Potential Role of Speech. In Proceedings of the 13th Conference on Computational Linguistics - Volume 2 (COLING '90), 30--35.
[14]
N. Braunschweiler, M. J. F. Gales, and S. Buchholz. 2010. Lightly supervised recognition for automatic alignment of large coherent speech recordings. In Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2222--2225. Retrieved August 29, 2016 from http://publications.eng.cam.ac.uk/323723/
[15]
Jennifer Bravo, Eileen Bartholomew, Christopher Frangione, Jil Gravender, and Matt Keller. 2014. XPRIZE Adult Literacy Landscape Analysis. XPRIZE Foundation, Culver City California. Retrieved from http://www.xprize.org/sites/default/files/adult_literacy_landscape_analysis_2014.pdf
[16]
Nadia Caidi and Danielle Allard. 2005. Social inclusion of newcomers to Canada: An information problem? Library & Information Science Research 27, 3: 302--324.
[17]
Joanne F. Carlisle. 1988. Knowledge of Derivational Morphology and Spelling Ability in Fourth, Sixth, and Eighth Graders. Applied Psycholinguistics 9, 03: 247--266.
[18]
Joanne F Carlisle, C Addison Stone, and Lauren A Katz. 2001. The effects of phonological transparency on reading derived words. Annals of Dyslexia 51, 07369387: 249--274.
[19]
Daniel Churchill. 2006. Towards a useful classification of learning objects. Educational Technology Research and Development 55, 5: 479--497.
[20]
Luca Colombo, Monica Landoni, and Elisa Rubegni. 2012. Understanding Reading Experience to Inform the Design of Ebooks for Children. In Proceedings of the 11th International Conference on Interaction Design and Children (IDC '12), 272--275.
[21]
Rasmus Dali, Sandrine Brognaux, Korin Richmond, Cassia Valentini-Botinhao, Gustav Eje Henter, Julia Hirschberg, Junichi Yamagishi, and Simon King. 2016. Testing the consistency assumption: Pronunciation variant forced alignment in read and spontaneous speech synthesis. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 5155--5159.
[22]
S. Hélène Deacon, Nicole Conrad, and Sébastien Pacton. 2008. A statistical learning perspective on children's learning about graphotactic and morphological regularities in spelling. Canadian Psychology/Psychologie canadienne 49, 2: 118--124.
[23]
Carrie Demmans Epp. 2016. Noticing: ELL use of MALL for filling the gap. In CALICO Conference.
[24]
Carrie Demmans Epp. 2016. Supporting English Language Learners with an Adaptive Mobile Application. University of Toronto, Toronto, ON, Canada. Retrieved from http://hdl.handle.net/1807/71720
[25]
Carrie Demmans Epp. 2017. Migrants and Mobile Technology Use: Gaps in the Support Provided by Current Tools. Journal of Interactive Media in Education, Special Collection on migrants, education and technologies 2017, 1: 1--13.
[26]
Kathryn D. R. Drager and Joe E. Reichle. 2001. Effects of Discourse Context on the Intelligibility of Synthesized Speech for Young Adult and Older Adult Listeners: Applications for AAC. Journal of Speech, Language, and Hearing Research 44, 5: 1052--1057. )
[27]
W. Erickson, C. Lee, and S. von Schrader. 2015. Disability Statistics from the 2013 American Community Survey (ACS). Cornell University Employment and Disability Institute (EDI), Ithaca, NY. Retrieved from http://www.disabilitystatistics.org/
[28]
Ronald A. Fisher. 1966. The design of experiments. Hafner Publishing Company, Inc., New York, NY, USA.
[29]
Sharon Goldwater, Dan Jurafsky, and Christopher D. Manning. 2010. Which words are hard to recognize? Prosodic, lexical, and disfluency factors that increase speech recognition error rates. Speech Communication 52, 3: 181--200.
[30]
João Guerreiro and Daniel Gonçalves. 2014. Text-to-speeches: evaluating the perception of concurrent speech by blind people. 169--176.
[31]
Chandra M. Harrison. 2004. Low-vision reading aids: reading as a pleasurable experience. Personal and Ubiquitous Computing 8, 3--4: 213--220.
[32]
Ken Hinckley, Xiaojun Bi, Michel Pahud, and Bill Buxton. 2012. Informal Information Gathering Techniques for Active Reading. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '12), 1893--1896.
[33]
Matt Jones, Emma Thom, David Bainbridge, and David Frohlich. 2009. Mobility, digital libraries and a rural Indian village. In Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries, 309--312.
[34]
M. J. Kieffer, N. K. Lesaux, M. Rivera, and D. J. Francis. 2009. Accommodations for English Language Learners Taking Large-Scale Assessments: A Meta-Analysis on Effectiveness and Validity. Review of Educational Research 79, 3: 1168--1201.
[35]
Simon King. 2014. Measuring a decade of progress in Text-to-Speech. Loquens 1, 1: e006.
[36]
Nat Lertwongkhanakool, Natthawut Kertkeidkachorn, Proadpran Punyabukkana, and Atiwong Suchato. 2015. An Automatic Real-time Synchronization of Live speech with Its Transcription Approach. Engineering Journal 19, 5: 81--99.
[37]
Yan-Hua Long and Hong Ye. 2015. Filled pause refinement based on the pronunciation probability for lecture speech. PloS One 10, 4: e0123466.
[38]
Catherine C. Marshall and Sara Bly. 2005. Turning the Page on Navigation. In Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05), 225--234.
[39]
Gord McCalla. 2004. The Ecological Approach to the Design of E-Learning Environments: Purpose-based Capture and Use of Information About Learners. Journal of Interactive Media in Education 2004, 7: 1--23.
[40]
Jack Mostow. 2012. Why and How Our Automated Reading Tutor Listens. In International Symposium on Automatic Detection of Errors in Pronunciation Training (ISADEPT), 43--52.
[41]
Cosmin Munteanu, Joanna Lumsden, Hélène Fournier, Rock Leung, Danny D'Amours, Daniel McDonald, and Julie Maitland. 2010. ALEX: Mobile Language Assistant for Low-Literacy Adults. In Proc. International Conference on Human-Computer Interaction with Mobile Devices and Services (MobileHCI) (MobileHCI '10), 427--430.
[42]
Cosmin Munteanu, Heather Molyneaux, Julie Maitland, Daniel McDonald, Rock Leung, Hélène Fournier, and Joanna Lumsden. 2013. Hidden in plain sight: low-literacy adults in a developed country overcoming social and educational challenges through mobile learning support tools. Personal and Ubiquitous Computing: 1--15.
[43]
Emma Murphy, Ravi Kuber, Graham McAllister, Philip Strain, and Wai Yu. 2008. An empirical investigation into the difficulties experienced by visually impaired Internet users. Universal Access in the Information Society 7, 1--2: 79--91.
[44]
Susan Neuman and David K Dickinson. 2011. Handbook of Early Literacy Research, Volume 3. Guilford Publications, New York.
[45]
Nicholas R. Nicholson. 2012. A Review of Social Isolation: An Important but Underassessed Condition in Older Adults. The Journal of Primary Prevention 33, 2--3: 137--152.
[46]
John O'Donovan and Barry Smyth. 2005. Trust in recommender systems. In Proceedings of the 10th international conference on Intelligent user interfaces (IUI '05), 167--174.
[47]
Nobuko Osada. 2004. Listening Comprehension Research: A Brief Review of the Past Thirty Years. Dialogue 3: 53--66.
[48]
S. Oviatt. 2003. Advances in robust multimodal interface design. IEEE Computer Graphics and Applications 23, 5: 62--68.
[49]
Joyojeet Pal, Manas Pradhan, Mihir Shah, and Rakesh Babu. 2011. Assistive Technology for Vision-impairments: An Agenda for the ICTD Community. In Proceedings of the 20th international conference companion on World wide web, 513--522.
[50]
Adrian Pasquarella, Alexandra Gottardo, and Amy Grant. 2012. Comparing Factors Related to Reading Comprehension in Adolescents Who Speak English as a First (L1) or Second (L2) Language. Scientific Studies of Reading 16, 6: 475--503.
[51]
Andrea Passerini and Michele Sebag. 2015. Learning and Optimization with the Human in the Loop. In Constraints, Optimization and Data, 21--24. Retrieved from http://drops.dagstuhl.de/opus/volltexte/2015/4890/pdf/dagrep_v004_i010_p001_s14411.pdf
[52]
Jennifer Pearson, George Buchanan, and Harold Thimbleby. 2011. The Reading Desk: Applying Physical Interactions to Digital Documents. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '11), 3199--3202.
[53]
Jennifer Pearson, Tom Owen, Harold Thimbleby, and George Buchanan. 2012. Co-reading: investigating collaborative group reading. In Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries, 325--334.
[54]
Hayes Raffle, Glenda Revelle, Koichi Mori, Rafael Ballagas, Kyle Buza, Hiroshi Horii, Joseph Kaye, Kristin Cook, Natalie Freed, Janet Go, and Mirjana Spasojevic. 2011. Hello, is Grandma There? Let's Read! StoryVisit: Family Video Chat and Connected eBooks. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '11), 1195--1204.
[55]
Preeti Rao, Prakhar Swarup, Ankita Pasad, Hitesh Tulsiani, and Gargi Ghosh Das. 2016. Automatic Assessment of Reading with Speech Recognition Technology. In 24th International Conference on Computers in Education (ICCE), 1--3.
[56]
Frank Rudzicz, Rosalie Wang, Momotaz Begum, and Alex Mihailidis. 2014. Speech recognition in Alzheimer's disease with personal assistive robots. In Proceedings of the 5th Workshop on Speech and Language Processing for Assistive Technologies (SLPAT), 20--28. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/citations;jsessionid=41D9A49475DE3BD5D8636885D56FB4B7?doi=10. 1.1.477.4901
[57]
Nithya Sambasivan, Ed Cutrell, Kentaro Toyama, and Bonnie Nardi. 2010. Intermediated Technology Use in Developing Communities. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '10), 2583--2592.
[58]
Kei Sawada, Shinji Takaki, Kei Hashimoto, Keiichiro Oura, and Keiichi Tokuda. Overview of NITECH HMM-based text-to-speech system for Blizzard Challenge 2014. In Blizzard Challenge Workshop.
[59]
Roy Shilkrot, Jochen Huber, Wong Meng Ee, Pattie Maes, and Suranga Chandima Nanayakkara. 2015. FingerReader: A Wearable Device to Explore Printed Text on the Go. 2363--2372.
[60]
Eva Siegenthaler, Pascal Wurtz, and Rudolf Groner. 2010. Improving the Usability of E-Book Readers. J. Usability Studies 6, 1: 3:25--3:38.
[61]
Statistics Canada. 2004. International Adult Literacy Survey (IALS). Human Resources Development Canada.
[62]
John Sweller, Paul L Ayres, and Slava Kalyuga. 2011. Cognitive Load Theory. Springer, New York.
[63]
Elsebeth Tank and Carsten Frederiksen. 2007. The DAISY Standard: Entering the Global Virtual Library. Retrieved February 6, 2017 from https://www.ideals.illinois.edu/handle/2142/3763
[64]
Gökhan Tür, Dilek Hakkani-Tür, Andreas Stolcke, and Elizabeth Shriberg. 2001. Integrating Prosodic and Lexical Cues for Automatic Topic SegmentatioN. Computational Linguistics 27, 1: 31--57.
[65]
Ashwini Venkatesh, M. V. Lalitha, Jyothi Narayana, and Kavi Mahesh. 2015. Wikiaudia: Crowd-sourcing the Production of Audio and Digital Books. In Proceedings of the International MultiConference of Engineers and Computer Scientists.
[66]
Rafael Veras, Erik Paluka, Meng-Wei Chang, Vivian Tsang, Fraser Shein, and Christopher Collins. 2014. Interaction for Reading Comprehension on Mobile Devices. In Proc. of the 16th International Conference on Human-computer Interaction with Mobile Devices and Services (MobileHCI '14) (MobileHCI '14), 157--161.
[67]
Richard K Wagner, Andrea E Muse, and Kendra R Tannenbaum (eds.). 2007. Vocabulary Acquisition: Implications for Reading Comprehension. Guilford Press, New York.
[68]
Mirjam Wester, Matthew Aylett, Marcus Tomalin, and Rasmus Dall. 2015. Artificial personality and disfluency. In Sixteenth Annual Conference of the International Speech Communication Association (INTERSPEECH).
[69]
Silke M. Witt. 2012. Automatic Error Detection in Pronunciation Training: Where we are and where we need to go. In International Symposium on automatic detection on errors in pronunciation training (IS ADEPT 6).
[70]
Xiang Xiao and Jingtao Wang. 2015. Towards Attentive, Bi-directional MOOC Learning on Mobile Devices. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction (ICMI '15), 163--170.
[71]
Junbo Zhang, Fuping Pan, and Yonghong Yan. 2012. An LVCSR Based Automatic Scoring Method in English Reading Tests. 34--37.
[72]
2005. LibriVox. Librivox. Retrieved from http://librivox.org/

Cited By

View all
  • (2023)Characterizing the Technology Needs of Vulnerable Populations for Participation in Research and Design by Adopting Maslow’s Hierarchy of NeedsProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3581221(1-20)Online publication date: 19-Apr-2023
  • (2021)KinVoices: Using Voices of Friends and Family in Voice InterfacesProceedings of the ACM on Human-Computer Interaction10.1145/34795905:CSCW2(1-25)Online publication date: 18-Oct-2021
  • (2020)Creating a Children-Friendly Reading Environment via Joint Learning of Content and Human AttentionProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3397271.3401062(279-288)Online publication date: 25-Jul-2020
  • Show More Cited By

Index Terms

  1. Finger tracking: facilitating non-commercial content production for mobile e-reading applications

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MobileHCI '17: Proceedings of the 19th International Conference on Human-Computer Interaction with Mobile Devices and Services
      September 2017
      874 pages
      ISBN:9781450350754
      DOI:10.1145/3098279
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 04 September 2017

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. assistive technology
      2. education
      3. literacy
      4. mobile e-readers
      5. reading
      6. visual impairment

      Qualifiers

      • Research-article

      Funding Sources

      • AGE-WELL

      Conference

      MobileHCI '17
      Sponsor:

      Acceptance Rates

      MobileHCI '17 Paper Acceptance Rate 45 of 224 submissions, 20%;
      Overall Acceptance Rate 202 of 906 submissions, 22%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)9
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 17 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Characterizing the Technology Needs of Vulnerable Populations for Participation in Research and Design by Adopting Maslow’s Hierarchy of NeedsProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3581221(1-20)Online publication date: 19-Apr-2023
      • (2021)KinVoices: Using Voices of Friends and Family in Voice InterfacesProceedings of the ACM on Human-Computer Interaction10.1145/34795905:CSCW2(1-25)Online publication date: 18-Oct-2021
      • (2020)Creating a Children-Friendly Reading Environment via Joint Learning of Content and Human AttentionProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3397271.3401062(279-288)Online publication date: 25-Jul-2020
      • (2020)Gestatten: Estimation of User's Attention in Mobile MOOCs From Eye Gaze and Gaze Gesture TrackingProceedings of the ACM on Human-Computer Interaction10.1145/33949744:EICS(1-32)Online publication date: 18-Jun-2020
      • (2018)Touch-Supported Voice Recording to Facilitate Forced Alignment of Text and Speech in an E-Reading InterfaceProceedings of the 23rd International Conference on Intelligent User Interfaces10.1145/3172944.3172984(129-140)Online publication date: 5-Mar-2018

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media