skip to main content
10.1145/3677846.3677861acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesw4aConference Proceedingsconference-collections
research-article
Open access

ImageExplorer Deployment: Understanding Text-Based and Touch-Based Image Exploration in the Wild

Published: 22 October 2024 Publication History

Abstract

Blind and visually-impaired (BVI) users often rely on alt-texts to understand images. AI-generated alt-texts can be scalable and efficient but may lack details and are prone to errors. Multi-layered touch interfaces, on the other hand, can provide rich details and spatial information, but may take longer to explore and cause higher mental load. To understand how BVI users leverage these two methods, we deployed ImageExplorer, an iOS app on the Apple App Store that provides multi-layered image information via both text-based and touch-based interfaces with customizable levels of granularity. Across 12 months, 371 users uploaded 651 images and explored 694 times. Their activities were logged to help us understand how BVI users consume image captions in the wild. This work informs a holistic understanding of BVI users’ image exploration behavior and influential factors. We provide design implications for future models of image captioning and visual access tools.

References

[1]
2023. Microsoft Soundscape. https://www.microsoft.com/en-us/research/product/soundscape/
[2]
Dragan Ahmetovic, Daisuke Sato, Uran Oh, Tatsuya Ishihara, Kris Kitani, and Chieko Asakawa. 2020. ReCog: Supporting Blind People in Recognizing Personal Objects. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–12. https://doi.org/10.1145/3313831.3376143
[3]
Facebook Meta AI. 2021. Facebook Alt-text System. https://ai.facebook.com/blog/how-facebook-is-using-ai-to-improve-photo-descriptions-for-people-who-are-blind-or-visually-impaired/
[4]
Apple. 2023. Use VoiceOver for images and videos on iPhone. https://support.apple.com/guide/iphone/use-voiceover-for-images-and-videos-iph37e6b3844/ios
[5]
Jan Balata, Zdenek Mikovec, and Lukas Neoproud. 2015. BlindCamera: Central and Golden-ratio Composition for Blind Photographers. In Proceedings of the Mulitimedia, Interaction, Design and Innnovation (Warsaw, Poland) (MIDI ’15). Association for Computing Machinery, New York, NY, USA, Article 8, 8 pages. https://doi.org/10.1145/2814464.2814472
[6]
Cynthia L. Bennett, Jane E, Martez E. Mott, Edward Cutrell, and Meredith Ringel Morris. 2018. How Teens with Visual Impairments Take, Edit, and Share Photos on Social Media. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada) (CHI ’18). Association for Computing Machinery, New York, NY, USA, 1–12. https://doi.org/10.1145/3173574.3173650
[7]
Jeffrey P. Bigham, Chandrika Jayant, Hanjie Ji, Greg Little, Andrew Miller, Robert C. Miller, Robin Miller, Aubrey Tatarowicz, Brandyn White, Samual White, and Tom Yeh. 2010. VizWiz: Nearly Real-Time Answers to Visual Questions. In Proceedings of the 23nd Annual ACM Symposium on User Interface Software and Technology (New York, New York, USA) (UIST ’10). Association for Computing Machinery, New York, NY, USA, 333–342. https://doi.org/10.1145/1866029.1866080
[8]
Jeffrey P. Bigham, Ryan S. Kaminsky, Richard E. Ladner, Oscar M. Danielsson, and Gordon L. Hempton. 2006. WebInSight: Making Web Images Accessible. In Proceedings of the 8th International ACM SIGACCESS Conference on Computers and Accessibility (Portland, Oregon, USA) (Assets ’06). Association for Computing Machinery, New York, NY, USA, 181–188. https://doi.org/10.1145/1168987.1169018
[9]
Erin Brady, Meredith Ringel Morris, Yu Zhong, Samuel White, and Jeffrey P. Bigham. 2013. Visual Challenges in the Everyday Lives of Blind People. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Paris, France) (CHI ’13). Association for Computing Machinery, New York, NY, USA, 2117–2126. https://doi.org/10.1145/2470654.2481291
[10]
W3 Consortium. 2018. Web Content Accessibility Guidelines (WCAG) 2.1. https://www.w3.org/TR/WCAG21/
[11]
Be My Eyes. 2023. Introducing Be My AI (formerly Virtual Volunteer) for People who are Blind or Have Low Vision, Powered by OpenAI’s GPT-4. https://www.bemyeyes.com/blog/introducing-be-my-eyes-virtual-volunteer
[12]
Cole Gleason, Patrick Carrington, Cameron Cassidy, Meredith Ringel Morris, Kris M. Kitani, and Jeffrey P. Bigham. 2019. “It’s Almost like They’re Trying to Hide It”: How User-Provided Image Descriptions Have Failed to Make Twitter Accessible. In The World Wide Web Conference (San Francisco, CA, USA) (WWW ’19). Association for Computing Machinery, New York, NY, USA, 549–559. https://doi.org/10.1145/3308558.3313605
[13]
Cole Gleason, Amy Pavel, Emma McCamey, Christina Low, Patrick Carrington, Kris M. Kitani, and Jeffrey P. Bigham. 2020. Twitter A11y: A Browser Extension to Make Twitter Images Accessible. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–12. https://doi.org/10.1145/3313831.3376728
[14]
Ricardo Gonzalez, Jazmin Collins, Shiri Azenkot, and Cynthia Bennett. 2024. Investigating Use Cases of AI-Powered Scene Description Applications for Blind and Low Vision People. arxiv:2403.15604 [cs.HC]
[15]
Google. 2023. Get image descriptions on Chrome. https://support.google.com/chrome/answer/9311597?hl=en&co=GENIE.Platform%3DDesktop
[16]
Darren Guinness, Edward Cutrell, and Meredith Ringel Morris. 2018. Caption Crawler: Enabling Reusable Alternative Text Descriptions Using Reverse Image Search. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada) (CHI ’18). Association for Computing Machinery, New York, NY, USA, 1–11. https://doi.org/10.1145/3173574.3174092
[17]
Danna Gurari, Qing Li, Chi Lin, Yinan Zhao, Anhong Guo, Abigale Stangl, and Jeffrey P. Bigham. 2019. VizWiz-Priv: A Dataset for Recognizing the Presence and Purpose of Private Visual Information in Images Taken by Blind People. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[18]
Danna Gurari, Qing Li, Abigale J. Stangl, Anhong Guo, Chi Lin, Kristen Grauman, Jiebo Luo, and Jeffrey P. Bigham. 2018. VizWiz Grand Challenge: Answering Visual Questions From Blind People. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[19]
Jaylin Herskovitz, Andi Xu, Rahaf Alharbi, and Anhong Guo. 2023. Hacking, Switching, Combining: Understanding and Supporting DIY Assistive Technology Design by Blind People. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 57, 17 pages. https://doi.org/10.1145/3544548.3581249
[20]
Naoki Hirabayashi, Masakazu Iwamura, Zheng Cheng, Kazunori Minatani, and Koichi Kise. 2023. VisPhoto: Photography for People with Visual Impairments via Post-Production of Omnidirectional Camera Imaging. In Proceedings of the 25th International ACM SIGACCESS Conference on Computers and Accessibility (New York, NY, USA) (ASSETS ’23). Association for Computing Machinery, New York, NY, USA, Article 6, 17 pages. https://doi.org/10.1145/3597638.3608422
[21]
Chandrika Jayant, Hanjie Ji, Samuel White, and Jeffrey P. Bigham. 2011. Supporting blind photography. In The Proceedings of the 13th International ACM SIGACCESS Conference on Computers and Accessibility (Dundee, Scotland, UK) (ASSETS ’11). Association for Computing Machinery, New York, NY, USA, 203–210. https://doi.org/10.1145/2049536.2049573
[22]
Justin Johnson, Andrej Karpathy, and Li Fei-Fei. 2016. Densecap: Fully convolutional localization networks for dense captioning. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4565–4574.
[23]
Andrej Karpathy and Li Fei-Fei. 2015. Deep visual-semantic alignments for generating image descriptions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3128–3137.
[24]
Klaus Krippendorff. 1970. Estimating the Reliability, Systematic Error and Random Error of Interval Data. Educational and Psychological Measurement 30, 1 (1970), 61–70. https://doi.org/10.1177/001316447003000105 arXiv:https://doi.org/10.1177/001316447003000105
[25]
Jaewook Lee, Jaylin Herskovitz, Yi-Hao Peng, and Anhong Guo. 2022. ImageExplorer: Multi-Layered Touch Exploration to Encourage Skepticism Towards Imperfect AI-Generated Image Captions. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 462, 15 pages. https://doi.org/10.1145/3491102.3501966
[26]
Tiffany Liu, Javier Hernandez, Mar Gonzalez-Franco, Antonella Maselli, Melanie Kneisel, Adam Glass, Jarnail Chudge, and Amos Miller. 2022. Characterizing and Predicting Engagement of Blind and Low-Vision People with an Audio-Based Navigation App. In Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI EA ’22). Association for Computing Machinery, New York, NY, USA, Article 411, 7 pages. https://doi.org/10.1145/3491101.3519862
[27]
Haley MacLeod, Cynthia L. Bennett, Meredith Ringel Morris, and Edward Cutrell. 2017. Understanding Blind People’s Experiences with Computer-Generated Captions of Social Media Images. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (Denver, Colorado, USA) (CHI ’17). Association for Computing Machinery, New York, NY, USA, 5988–5999. https://doi.org/10.1145/3025453.3025814
[28]
Microsoft. 2023. Seeing AI. https://www.microsoft.com/en-us/ai/seeing-ai
[29]
Vishnu Nair, Hanxiu ’Hazel’ Zhu, and Brian A. Smith. 2023. ImageAssist: Tools for Enhancing Touchscreen-Based Image Exploration Systems for Blind and Low Vision Users. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 76, 17 pages. https://doi.org/10.1145/3544548.3581302
[30]
OpenAI. 2023. GPT-4. https://openai.com/research/gpt-4
[31]
Helen Petrie, Chandra Harrison, and Sundeep Dev. 2005. Describing images on the web: a survey of current practice and prospects for the future. Proceedings of Human Computer Interaction International (HCII) 71, 2 (2005).
[32]
Abigale Stangl, Meredith Ringel Morris, and Danna Gurari. 2020. "Person, Shoes, Tree. Is the Person Naked?" What People with Vision Impairments Want in Image Descriptions. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3313831.3376404
[33]
Abigale Stangl, Nitin Verma, Kenneth R. Fleischmann, Meredith Ringel Morris, and Danna Gurari. 2021. Going Beyond One-Size-Fits-All Image Descriptions to Satisfy the Information Wants of People Who Are Blind or Have Low Vision. In Proceedings of the 23rd International ACM SIGACCESS Conference on Computers and Accessibility (Virtual Event, USA) (ASSETS ’21). Association for Computing Machinery, New York, NY, USA, Article 16, 15 pages. https://doi.org/10.1145/3441852.3471233
[34]
Marynel Vázquez and Aaron Steinfeld. 2012. Helping visually impaired users properly aim a camera. In Proceedings of the 14th International ACM SIGACCESS Conference on Computers and Accessibility (Boulder, Colorado, USA) (ASSETS ’12). Association for Computing Machinery, New York, NY, USA, 95–102. https://doi.org/10.1145/2384916.2384934
[35]
Marynel Vázquez and Aaron Steinfeld. 2014. An Assisted Photography Framework to Help Visually Impaired Users Properly Aim a Camera. ACM Trans. Comput.-Hum. Interact. 21, 5, Article 25 (nov 2014), 29 pages. https://doi.org/10.1145/2651380
[36]
Samuel White, Hanjie Ji, and Jeffrey P. Bigham. 2010. EasySnap: real-time audio feedback for blind photography. In Adjunct Proceedings of the 23nd Annual ACM Symposium on User Interface Software and Technology (New York, New York, USA) (UIST ’10). Association for Computing Machinery, New York, NY, USA, 409–410. https://doi.org/10.1145/1866218.1866244
[37]
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In International conference on machine learning. PMLR, 2048–2057.

Cited By

View all
  • (2024)EditScribe: Non-Visual Image Editing with Natural Language Verification LoopsProceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility10.1145/3663548.3675599(1-19)Online publication date: 27-Oct-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
W4A '24: Proceedings of the 21st International Web for All Conference
May 2024
220 pages
ISBN:9798400710308
DOI:10.1145/3677846
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 October 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Accessibility
  2. Alt Text
  3. Blind
  4. Deployment
  5. Image Caption
  6. ImageExplorer
  7. Screen Reader
  8. Touch Exploration
  9. Visual Impairment

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

W4A '24
W4A '24: The 21st International Web for All Conference
May 13 - 14, 2024
Singapore, Singapore

Acceptance Rates

Overall Acceptance Rate 171 of 371 submissions, 46%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)106
  • Downloads (Last 6 weeks)36
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)EditScribe: Non-Visual Image Editing with Natural Language Verification LoopsProceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility10.1145/3663548.3675599(1-19)Online publication date: 27-Oct-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media