research-article

RegionSpeak: Quick Comprehensive Spatial Descriptions of Complex Images for Blind Users

Authors:

Walter S. Lasecki,

Jeffrey P. BighamAuthors Info & Claims

CHI '15: Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems

Pages 2353 - 2362

https://doi.org/10.1145/2702123.2702437

Published: 18 April 2015 Publication History

Abstract

Blind people often seek answers to their visual questions from remote sources, however, the commonly adopted single-image, single-response model does not always guarantee enough bandwidth between users and sources. This is especially true when questions concern large sets of information, or spatial layout, e.g., where is there to sit in this area, what tools are on this work bench, or what do the buttons on this machine do? Our RegionSpeak system addresses this problem by providing an accessible way for blind users to (i) combine visual information across multiple photographs via image stitching, em (ii) quickly collect labels from the crowd for all relevant objects contained within the resulting large visual area in parallel, and (iii) then interactively explore the spatial layout of the objects that were labeled. The regions and descriptions are displayed on an accessible touchscreen interface, which allow blind users to interactively explore their spatial layout. We demonstrate that workers from Amazon Mechanical Turk are able to quickly and accurately identify relevant regions, and that asking them to describe only one region at a time results in more comprehensive descriptions of complex images. RegionSpeak can be used to explore the spatial layout of the regions identified. It also demonstrates broad potential for helping blind users to answer difficult spatial layout questions.

Supplementary Material

suppl.mov (pn1638-file3.mp4)

Supplemental video

Download
5.42 MB

References

[1]

Azenkot, S., Prasain, S., Borning, A., Fortuna, E., Ladner, R. E., and Wobbrock, J. O. Enhancing independence and safety for blind and deaf-blind public transit riders. CHI 2011.

Digital Library

[2]

Bernstein, M. S., Little, G., Miller, R. C., Hartmann, B., Ackerman, M. S., Karger, D. R., Crowell D., and Panovich, K. Soylent: A word processor with a crowd inside. UIST 2010.

Digital Library

[3]

Bigham, J. P., Jayant, C., Ji, H., Little, G., Miller, A., Miller, R. C., Miller, R., Tatarowicz, A., White, B., White, S., and Yeh, T. Vizwiz: Nearly real-time answers to visual questions. UIST 2010.

Digital Library

[4]

Bigham, J. P., Jayant, C., Miller, A., White, B., and Yeh, T. Vizwiz:: Locateit-enabling blind people to locate objects in their environment. CVPR 2010.

[5]

Bigham, J. P., Ladner, R. E., and Borodin, Y. The Design of Human-Powered Access Technology. ASSETS 2011.

Digital Library

[6]

Brady, E., Morris, M. R., Zhong, Y., White, S., and Bigham, J. P. Visual challenges in the everyday lives of blind people. CHI 2013.

Digital Library

[7]

Brown, M. and Lowe, D. G. Automatic panoramic image stitching using invariant features. In International Journal of Computer Vision, pages 59--73, 2007.

Digital Library

[8]

Jayant, C., Ji, H., White, S., and Bigham, J. P. Supporting blind photography. ASSETS 2011.

Digital Library

[9]

Kalal, Z., Mikolajczyk, K., and Matas, J. Tracking-learning-detection. In IEEE Trans. on Pattern Anal. & Machine Intelligence 34.7, p. 1409--1422, '12.

Digital Library

[10]

Kane, S. K., Frey, B., and Wobbrock, J. O. Access lens: a gesture-based screen reader for real-world documents. CHI 2013.

Digital Library

[11]

Kane, S. K., Bigham, J. P., and Wobbrock, J. O. Slide rule: Making mobile touch screens accessible to blind people using multi-touch interaction techniques. ASSETS 2008.

Digital Library

[12]

Kane, S. K., Morris, M. R., Perkins, A. Z., Wigdor, D., Ladner, R. E., and Wobbrock, J. O. Access overlays: Improving non-visual access to large touch screens for blind users. UIST 2011.

Digital Library

[13]

Kotaro, H., Le, V., and Froehlich, J. Combining crowdsourcing and google street view to identify street-level accessibility problems. CHI 2013.

[14]

Lasecki, W.S., Gordon, M., Koutra, D., Jung, M.F., Dow, S.P., and Bigham, J. P. Glance: Rapidly coding behavioral video with the crowd. UIST 2014.

Digital Library

[15]

Lasecki, W. S., Miller, C. D., Sadilek, A., Abumoussa, A., Kushalnagar, R. and Bigham, J. P. Real-time Captioning by Groups of Non-Experts. UIST 2012.

Digital Library

[16]

Lasecki, W. S., Thiha, P., Zhong, Y., Brady, E., and Bigham, J. P. Answering visual questions with conversational crowd assistants. ASSETS 2013.

Digital Library

[17]

Martin, D., Hanrahan, B. V., O'Neill, J., and Gupta, N. Being a turker. CSCW 2014.

Digital Library

[18]

Russell, B. C., Torralba, A., Murphy, K. P., and Freeman, W. T. Labelme: A database and web-based tool for image annotation. Int. J. Comput. Vision, 77(1-3):157--173, May 2008.

Digital Library

[19]

Shum, H., Szeliski, R. Systems and experiment paper: Construction of panoramic image mosaics with global and local alignment. Int. J. of Comput. Vision 36.2, 2000.

Digital Library

[20]

Song, D. and Goldberg, K. Sharecam part i: Interface, system architecture, and implementation of a collaboratively controlled robotic webcam, 2003.

[21]

Uyttendaele, M., Eden, A., Szeliski, R. Eliminating ghosting and exposure artifacts in image mosaics. CVPR '01.

[22]

Vázquez, M. and Steinfeld, A. Helping visually impaired users properly aim a camera. ASSETS 2012.

Digital Library

[23]

Von Ahn, L. and Dabbish, L. Labeling images with a computer game. CHI 2004.

Digital Library

[24]

Zhong, Y., Garrigues, P. J., and Bigham, J. P. Real time object scanning using a mobile phone and cloud-based visual search engine. ASSETS 2013.

Digital Library

Cited By

Yu RLee SXie JBillah SCarroll J(2024)Human–AI Collaboration for Remote Sighted Assistance: Perspectives from the LLM EraFuture Internet10.3390/fi1607025416:7(254)Online publication date: 18-Jul-2024
https://doi.org/10.3390/fi16070254
Zhao KLai RGuo BLiu LHe LZhao Y(2024)AI-Vision: A Three-Layer Accessible Image Exploration System for People with Visual Impairments in ChinaProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36785378:3(1-27)Online publication date: 9-Sep-2024
https://dl.acm.org/doi/10.1145/3678537
Lee SKohga MLandau SO'Modhrain SSubramonyam H(2024)AltCanvas: A Tile-Based Editor for Visual Content Creation with Generative AI for Blind or Visually Impaired PeopleProceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility10.1145/3663548.3675600(1-22)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3663548.3675600
Show More Cited By

Index Terms

RegionSpeak: Quick Comprehensive Spatial Descriptions of Complex Images for Blind Users
1. Human-centered computing
  1. Human computer interaction (HCI)
2. Social and professional topics
  1. Professional topics
    1. Computing profession
      1. Assistive technologies
  2. User characteristics
    1. People with disabilities

Recommendations

Visual challenges in the everyday lives of blind people
CHI '13: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

The challenges faced by blind people in their everyday lives are not well understood. In this paper, we report on the findings of a large-scale study of the visual questions that blind people would like to have answered. As part of this year-long study, ...
Crowd-AI Systems for Non-Visual Information Access in the Real World
UIST '18 Adjunct: Adjunct Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology

The world is full of information, interfaces and environments that are inaccessible to blind people. When navigating indoors, blind people are often unaware of key visual information, such as posters, signs, and exit doors. When accessing specific ...
Investigating the Accessibility of Crowdwork Tasks on Mechanical Turk
CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems

Crowdwork can enable invaluable opportunities for people with disabilities, not least the work flexibility and the ability to work from home, especially during the current Covid-19 pandemic. This paper investigates how engagement in crowdwork tasks is ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CHI '15: Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems

April 2015

4290 pages

ISBN:9781450331456

DOI:10.1145/2702123

General Chairs:
Bo Begole
Huawei, USA
,
Jinwoo Kim
Yonsei University, Korea
,
Program Chairs:
Kori Inkpen
Microsoft Research, USA
,
Woontack Woo
KAIST, Korea

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 April 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

the Department of Education, NIDRR
National Science Foundation
Alfred P. Sloan Foundation Fellowship

Conference

CHI '15

Sponsor:

SIGCHI

CHI '15: CHI Conference on Human Factors in Computing Systems

April 18 - 23, 2015

Seoul, Republic of Korea

Acceptance Rates

CHI '15 Paper Acceptance Rate 486 of 2,120 submissions, 23%;

Overall Acceptance Rate 6,199 of 26,314 submissions, 24%

Upcoming Conference

CHI 2025

Sponsor:
sigchi

ACM CHI Conference on Human Factors in Computing Systems

April 26 - May 1, 2025

Yokohama , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

57
Total Citations
View Citations
536
Total Downloads

Downloads (Last 12 months)51
Downloads (Last 6 weeks)6

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yu RLee SXie JBillah SCarroll J(2024)Human–AI Collaboration for Remote Sighted Assistance: Perspectives from the LLM EraFuture Internet10.3390/fi1607025416:7(254)Online publication date: 18-Jul-2024
https://doi.org/10.3390/fi16070254
Zhao KLai RGuo BLiu LHe LZhao Y(2024)AI-Vision: A Three-Layer Accessible Image Exploration System for People with Visual Impairments in ChinaProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36785378:3(1-27)Online publication date: 9-Sep-2024
https://dl.acm.org/doi/10.1145/3678537
Lee SKohga MLandau SO'Modhrain SSubramonyam H(2024)AltCanvas: A Tile-Based Editor for Visual Content Creation with Generative AI for Blind or Visually Impaired PeopleProceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility10.1145/3663548.3675600(1-22)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3663548.3675600
Xu SChen CLiu ZJin XYuan LYan YQu H(2024)Memory Reviver: Supporting Photo-Collection Reminiscence for People with Visual Impairment via a Proactive ChatbotProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676336(1-17)Online publication date: 13-Oct-2024
https://dl.acm.org/doi/10.1145/3654777.3676336
Ning ZWimer BJiang KChen KBan JTian YZhao YLi T(2024)SPICA: Interactive Video Content Exploration through Augmented Audio Descriptions for Blind or Low-Vision ViewersProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642632(1-18)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642632
Bellscheidt SMetcalf HPham DElglaly Y(2023)Building the Habit of Authoring Alt Text: Design for Making a ChangeProceedings of the 25th International ACM SIGACCESS Conference on Computers and Accessibility10.1145/3597638.3614495(1-5)Online publication date: 22-Oct-2023
https://dl.acm.org/doi/10.1145/3597638.3614495
Aguirre CCao SMahmood AHuang C(2023)Crowdsourcing Thumbnail Captions: Data Collection and ValidationACM Transactions on Interactive Intelligent Systems10.1145/358934613:3(1-28)Online publication date: 11-Sep-2023
https://dl.acm.org/doi/10.1145/3589346
Lee WHung CTing CChi PChen B(2023)TacNote: Tactile and Audio Note-Taking for Non-Visual AccessProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology10.1145/3586183.3606784(1-14)Online publication date: 29-Oct-2023
https://dl.acm.org/doi/10.1145/3586183.3606784
Huh MPeng YPavel A(2023)GenAssist: Making Image Generation AccessibleProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology10.1145/3586183.3606735(1-17)Online publication date: 29-Oct-2023
https://dl.acm.org/doi/10.1145/3586183.3606735
Kameswaran VY VMarathe M(2023)Advocacy as Access Work: How People with Visual Impairments Gain Access to Digital Banking in IndiaProceedings of the ACM on Human-Computer Interaction10.1145/35795967:CSCW1(1-23)Online publication date: 16-Apr-2023
https://dl.acm.org/doi/10.1145/3579596
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten