skip to main content
10.1145/2702123.2702437acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
research-article

RegionSpeak: Quick Comprehensive Spatial Descriptions of Complex Images for Blind Users

Published: 18 April 2015 Publication History

Abstract

Blind people often seek answers to their visual questions from remote sources, however, the commonly adopted single-image, single-response model does not always guarantee enough bandwidth between users and sources. This is especially true when questions concern large sets of information, or spatial layout, e.g., where is there to sit in this area, what tools are on this work bench, or what do the buttons on this machine do? Our RegionSpeak system addresses this problem by providing an accessible way for blind users to (i) combine visual information across multiple photographs via image stitching, em (ii) quickly collect labels from the crowd for all relevant objects contained within the resulting large visual area in parallel, and (iii) then interactively explore the spatial layout of the objects that were labeled. The regions and descriptions are displayed on an accessible touchscreen interface, which allow blind users to interactively explore their spatial layout. We demonstrate that workers from Amazon Mechanical Turk are able to quickly and accurately identify relevant regions, and that asking them to describe only one region at a time results in more comprehensive descriptions of complex images. RegionSpeak can be used to explore the spatial layout of the regions identified. It also demonstrates broad potential for helping blind users to answer difficult spatial layout questions.

Supplementary Material

suppl.mov (pn1638-file3.mp4)
Supplemental video

References

[1]
Azenkot, S., Prasain, S., Borning, A., Fortuna, E., Ladner, R. E., and Wobbrock, J. O. Enhancing independence and safety for blind and deaf-blind public transit riders. CHI 2011.
[2]
Bernstein, M. S., Little, G., Miller, R. C., Hartmann, B., Ackerman, M. S., Karger, D. R., Crowell D., and Panovich, K. Soylent: A word processor with a crowd inside. UIST 2010.
[3]
Bigham, J. P., Jayant, C., Ji, H., Little, G., Miller, A., Miller, R. C., Miller, R., Tatarowicz, A., White, B., White, S., and Yeh, T. Vizwiz: Nearly real-time answers to visual questions. UIST 2010.
[4]
Bigham, J. P., Jayant, C., Miller, A., White, B., and Yeh, T. Vizwiz:: Locateit-enabling blind people to locate objects in their environment. CVPR 2010.
[5]
Bigham, J. P., Ladner, R. E., and Borodin, Y. The Design of Human-Powered Access Technology. ASSETS 2011.
[6]
Brady, E., Morris, M. R., Zhong, Y., White, S., and Bigham, J. P. Visual challenges in the everyday lives of blind people. CHI 2013.
[7]
Brown, M. and Lowe, D. G. Automatic panoramic image stitching using invariant features. In International Journal of Computer Vision, pages 59--73, 2007.
[8]
Jayant, C., Ji, H., White, S., and Bigham, J. P. Supporting blind photography. ASSETS 2011.
[9]
Kalal, Z., Mikolajczyk, K., and Matas, J. Tracking-learning-detection. In IEEE Trans. on Pattern Anal. & Machine Intelligence 34.7, p. 1409--1422, '12.
[10]
Kane, S. K., Frey, B., and Wobbrock, J. O. Access lens: a gesture-based screen reader for real-world documents. CHI 2013.
[11]
Kane, S. K., Bigham, J. P., and Wobbrock, J. O. Slide rule: Making mobile touch screens accessible to blind people using multi-touch interaction techniques. ASSETS 2008.
[12]
Kane, S. K., Morris, M. R., Perkins, A. Z., Wigdor, D., Ladner, R. E., and Wobbrock, J. O. Access overlays: Improving non-visual access to large touch screens for blind users. UIST 2011.
[13]
Kotaro, H., Le, V., and Froehlich, J. Combining crowdsourcing and google street view to identify street-level accessibility problems. CHI 2013.
[14]
Lasecki, W.S., Gordon, M., Koutra, D., Jung, M.F., Dow, S.P., and Bigham, J. P. Glance: Rapidly coding behavioral video with the crowd. UIST 2014.
[15]
Lasecki, W. S., Miller, C. D., Sadilek, A., Abumoussa, A., Kushalnagar, R. and Bigham, J. P. Real-time Captioning by Groups of Non-Experts. UIST 2012.
[16]
Lasecki, W. S., Thiha, P., Zhong, Y., Brady, E., and Bigham, J. P. Answering visual questions with conversational crowd assistants. ASSETS 2013.
[17]
Martin, D., Hanrahan, B. V., O'Neill, J., and Gupta, N. Being a turker. CSCW 2014.
[18]
Russell, B. C., Torralba, A., Murphy, K. P., and Freeman, W. T. Labelme: A database and web-based tool for image annotation. Int. J. Comput. Vision, 77(1-3):157--173, May 2008.
[19]
Shum, H., Szeliski, R. Systems and experiment paper: Construction of panoramic image mosaics with global and local alignment. Int. J. of Comput. Vision 36.2, 2000.
[20]
Song, D. and Goldberg, K. Sharecam part i: Interface, system architecture, and implementation of a collaboratively controlled robotic webcam, 2003.
[21]
Uyttendaele, M., Eden, A., Szeliski, R. Eliminating ghosting and exposure artifacts in image mosaics. CVPR '01.
[22]
Vázquez, M. and Steinfeld, A. Helping visually impaired users properly aim a camera. ASSETS 2012.
[23]
Von Ahn, L. and Dabbish, L. Labeling images with a computer game. CHI 2004.
[24]
Zhong, Y., Garrigues, P. J., and Bigham, J. P. Real time object scanning using a mobile phone and cloud-based visual search engine. ASSETS 2013.

Cited By

View all
  • (2024)Human–AI Collaboration for Remote Sighted Assistance: Perspectives from the LLM EraFuture Internet10.3390/fi1607025416:7(254)Online publication date: 18-Jul-2024
  • (2024)AI-Vision: A Three-Layer Accessible Image Exploration System for People with Visual Impairments in ChinaProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36785378:3(1-27)Online publication date: 9-Sep-2024
  • (2024)AltCanvas: A Tile-Based Editor for Visual Content Creation with Generative AI for Blind or Visually Impaired PeopleProceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility10.1145/3663548.3675600(1-22)Online publication date: 27-Oct-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CHI '15: Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems
April 2015
4290 pages
ISBN:9781450331456
DOI:10.1145/2702123
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 April 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. accessibility
  2. crowdsourcing
  3. stitching
  4. visual questions

Qualifiers

  • Research-article

Funding Sources

Conference

CHI '15
Sponsor:
CHI '15: CHI Conference on Human Factors in Computing Systems
April 18 - 23, 2015
Seoul, Republic of Korea

Acceptance Rates

CHI '15 Paper Acceptance Rate 486 of 2,120 submissions, 23%;
Overall Acceptance Rate 6,199 of 26,314 submissions, 24%

Upcoming Conference

CHI 2025
ACM CHI Conference on Human Factors in Computing Systems
April 26 - May 1, 2025
Yokohama , Japan

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)51
  • Downloads (Last 6 weeks)6
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Human–AI Collaboration for Remote Sighted Assistance: Perspectives from the LLM EraFuture Internet10.3390/fi1607025416:7(254)Online publication date: 18-Jul-2024
  • (2024)AI-Vision: A Three-Layer Accessible Image Exploration System for People with Visual Impairments in ChinaProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36785378:3(1-27)Online publication date: 9-Sep-2024
  • (2024)AltCanvas: A Tile-Based Editor for Visual Content Creation with Generative AI for Blind or Visually Impaired PeopleProceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility10.1145/3663548.3675600(1-22)Online publication date: 27-Oct-2024
  • (2024)Memory Reviver: Supporting Photo-Collection Reminiscence for People with Visual Impairment via a Proactive ChatbotProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676336(1-17)Online publication date: 13-Oct-2024
  • (2024)SPICA: Interactive Video Content Exploration through Augmented Audio Descriptions for Blind or Low-Vision ViewersProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642632(1-18)Online publication date: 11-May-2024
  • (2023)Building the Habit of Authoring Alt Text: Design for Making a ChangeProceedings of the 25th International ACM SIGACCESS Conference on Computers and Accessibility10.1145/3597638.3614495(1-5)Online publication date: 22-Oct-2023
  • (2023)Crowdsourcing Thumbnail Captions: Data Collection and ValidationACM Transactions on Interactive Intelligent Systems10.1145/358934613:3(1-28)Online publication date: 11-Sep-2023
  • (2023)TacNote: Tactile and Audio Note-Taking for Non-Visual AccessProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology10.1145/3586183.3606784(1-14)Online publication date: 29-Oct-2023
  • (2023)GenAssist: Making Image Generation AccessibleProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology10.1145/3586183.3606735(1-17)Online publication date: 29-Oct-2023
  • (2023)Advocacy as Access Work: How People with Visual Impairments Gain Access to Digital Banking in IndiaProceedings of the ACM on Human-Computer Interaction10.1145/35795967:CSCW1(1-23)Online publication date: 16-Apr-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media