skip to main content
research-article

“Hands On” Visual Recognition for Visually Impaired Users

Published: 11 August 2017 Publication History

Abstract

Blind or visually impaired (BVI) individuals are capable of identifying an object in their hands by combining the available visual cues (if available) with manipulation. It is harder for them to associate the object with a specific brand, a model, or a type. Starting from this observation, we propose a collaborative system designed to deliver visual feedback automatically and to help the user filling this semantic gap. Our visual recognition module is implemented by means of an image retrieval procedure that provides real-time feedback, performs the computation locally on the device, and is scalable to new categories and instances. We carry out a thorough experimental analysis of the visual recognition module, which includes a comparative analysis with the state of the art. We also present two different system implementations that we test with the help of BVI users to evaluate the technical soundness, the usability, and the effectiveness of the proposed concept.

References

[1]
Artem Babenko, Anton Slesarev, Alexandr Chigorin, and Victor Lempitsky. 2014. Neural codes for image retrieval. In Computer Vision–ECCV 2014. Springer, 584–599
[2]
Ferid Bajramovic, Frank Mattern, Nicholas Butko, and Joachim Denzler. 2006. A comparison of nearest neighbor search algorithms for generic object recognition. In Advanced Concepts for Intelligent Vision Systems. Springer, 1186–1197.
[3]
Herbert Bay, Andreas Ess, Tinne Tuytelaars, and Luc Van Gool. 2008. Speeded-up robust features (SURF). Computer Vision and Image Understanding 110, 3, 346–359.
[4]
Jeffrey P. Bigham, Chandrika Jayan, Andrew Miller, Brandyn White, and Tom Yeh. 2010. VizWiz::LocateIt—enabling blind people to locate objects in their environment. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’10). IEEE, Los Alamitos, CA, 65–72.
[5]
Y. Boureau, N. Le Roux, F. Bach, J. Ponce, and Y. LeCun. 2011. Ask the locals: Multi-way local pooling for image recognition. In Proceedings of the 2011 IEEE International Conference on Computer Vision (ICCV’11). IEEE, Los Alamitos, CA, 2651–2658.
[6]
Luca Brayda, Federico Traverso, Luca Giuliani, Francesco Diotalevi, Stefania Repetto, Sara Sansalone, Andrea Trucco, and Giulio Sandini. 2015. Spatially selective binaural hearing aids. In Adjunct Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2015 ACM International Symposium on Wearable Computers. ACM, New York, NY, 957–962.
[7]
Michael Buhrmester, Tracy Kwang, and Samuel D. Gosling. 2011. Amazon’s Mechanical Turk: A new source of inexpensive, yet high-quality, data?Perspectives on Psychological Science 6, 1, 3–5.
[8]
Ken Chatfield, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2014. Return of the devil in the details: Delving deep into convolutional nets. In Proceedings of the British Machine Vision Conference (BMVC’14). http://www.bmva.org/bmvc/2014/papers/paper054/index.html.
[9]
Manuela Chessa, Nicoletta Noceti, Francesca Odone, Fabio Solari, Joan Sosa-García, and Luca Zini. 2016. An integrated artificial vision framework for assisting visually impaired users. Computer Vision and Image Understanding 149, 209–228.
[10]
Ricardo Chincha and YingLi Tian. 2011. Finding objects for blind people based on SURF features. In Proceedings of the 2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW’11). IEEE, Los Alamitos, CA, 526–527.
[11]
Ondrej Chum, James Philbin, Josef Sivic, Michael Isard, and Andrew Zisserman. 2007. Total recall: Automatic query expansion with a generative feature model for object retrieval. In Proceedings of the 2007 IEEE 11th International Conference on Computer Vision (ICCV’07). 1–8.
[12]
Alvaro Collet, Manuel Martinez, and Siddhartha S. Srinivasa. 2011. The MOPED framework: Object recognition and pose estimation for manipulation. International Journal of Robotics Research 30, 10, 1284–1306.
[13]
James Coughlan and Roberto Manduchi. 2009. Functional assessment of a camera phone-based wayfinding system operated by blind and visually impaired users. International Journal on Artificial Intelligence Tools 18, 03, 379–397.
[14]
James Coughlan, Roberto Manduchi, and Huiying Shen. 2006. Cell phone-based wayfinding for the visually impaired. In Proceedings of the 1st International Workshop on Mobile Vision.
[15]
Gabriella Csurka, Christopher Dance, Lixin Fan, Jutta Willamowski, and Cédric Bray. 2004. Visual categorization with bags of keypoints. In Proceedings of the Workshop on Statistical Learning in Computer Vision (ECCV’04).
[16]
Dimitrios Dakopoulos and Nikolaos G. Bourbakis. 2010. Wearable obstacle avoidance electronic travel aids for blind: A survey. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews 40, 1, 25–35.
[17]
Samuel F. Dodge and Lina J. Karam. 2016. Understanding how image quality affects deep neural networks. arXiv:1604.04004. http://arxiv.org/abs/1604.04004
[18]
Tudor Dumitraş, Matthew Lee, Pablo Quinones, Asim Smailagic, Dan Siewiorek, and Priya Narasimhan. 2006. Eye of the beholder: Phone-based text-recognition for the visually-impaired. In Proceedings of the 2006 10th IEEE International Symposium on Wearable Computers. IEEE, Los Alamitos, CA, 145–146.
[19]
Giovanni Fusco, Ender Tekin, Nicholas A. Giudice, and James M. Coughlan. 2015. Appliance displays: Accessibility challenges and proposed solutions. In Proceedings of the 17th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS’15). ACM, New York, NY, 405–406.
[20]
Nicholas A. Giudice and Gordon E. Legge. 2008. Blind navigation and the role of technology. In Engineering Handbook of Smart Technology for Aging, Disability, and Independence. Wiley, 479–500.
[21]
Faiz M. Hasanuzzaman, Xiaodong Yang, and YingLi Tian. 2012. Robust and effective component-based banknote recognition for the blind. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews 42, 6, 1021–1030.
[22]
Rabia Jafri, Syed Abid Ali, and Hamid R. Arabnia. 2013. Computer vision-based object recognition for the visually impaired using visual tags. In Proceedings of the 2013 International Conference on Image Processing, Computer Vision, and Pattern Recognition. 400–406.
[23]
Rabia Jafri, Syed Abid Ali, Hamid R. Arabnia, and Shameem Fatima. 2014. Computer vision-based object recognition for the visually impaired in an indoors environment: A survey. Visual Computer 30, 11, 1197–1222.
[24]
Hervé Jégou and Ondřej Chum. 2012. Negative evidences and co-occurences in image retrieval: The benefit of PCA and whitening. In Computer Vision–ECCV 2012. Springer, 774–787.
[25]
Hervé Jégou, Matthijs Douze, and Cordelia Schmid. 2010. Improving bag-of-features for large scale image search. International Journal of Computer Vision 87, 3, 316–336.
[26]
Hervé Jégou, Matthijs Douze, and Cordelia Schmid. 2011. Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 1, 117–128.
[27]
Hervé Jégou, Florent Perronnin, Matthijs Douze, Jorge Sanchez, Patrick Perez, and Cordelia Schmid. 2012. Aggregating local image descriptors into compact codes. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 9, 1704–1716.
[28]
Hervé Jégou and Andrew Zisserman. 2014. Triangulation embedding and democratic aggregation for image search. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). IEEE, Los Alamitos, CA, 3310–3317.
[29]
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the ACM International Conference on Multimedia. ACM, New York, NY, 675–678.
[30]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097–1105.
[31]
Rahul Kumar and Sukadev Meher. 2015. A novel method for visually impaired using object recognition. In Proceedings of the 2015 International Conference on Communications and Signal Processing (ICCSP’15). IEEE, Los Alamitos, CA, 0772–0776.
[32]
Svetlana Lazebnik, Cordelia Schmid, and Jean Ponce. 2006. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), Vol. 2. IEEE, Los Alamitos, CA, 2169–2178.
[33]
Barbara Leporini, Patrizia Andronico, and Marina Buzzi. 2004. Designing search engine user interfaces for the visually impaired. In Proceedings of the 2004 International Cross-Disciplinary Workshop on Web Accessibility (W4A’04). ACM, New York, NY, 57–66.
[34]
Weisheng Li, Peng Dong, Bin Xiao, and Lifang Zhou. 2016. Object recognition based on the region of interest and optimal bag of words model. Neurocomputing 172, 271–280.
[35]
Simon Liu, Wei Ma, Dale Schalow, and Kevin Spruill. 2004. Improving Web access for visually impaired users. IT Professional 6, 4, 28–33.
[36]
Xu Liu. 2008. A camera phone based currency reader for the visually impaired. In Proceedings of the 10th International ACM SIGACCESS Conference on Computers and Accessibility. ACM, New York, NY, 305–306.
[37]
Roberto Manduchi and James Coughlan. 2012. (Computer) vision without sight. Communications of the ACM 55, 1, 96–104.
[38]
Mohamed Lamine Mekhalfi, Farid Melgani, Yakoub Bazi, and Naif Alajlan. 2015a. A compressive sensing approach to describe indoor scenes for blind people. IEEE Transactions on Circuits and Systems for Video Technology 25, 7, 1246–1257.
[39]
Mohamed Lamine Mekhalfi, Farid Melgani, Yakoub Bazi, and Naif Alajlan. 2015b. Toward an assisted indoor scene perception for blind people with image multilabeling strategies. Expert Systems With Applications 42, 6, 2907–2918.
[40]
Mohamed Lamine Mekhalfi, Farid Melgani, Abdallah Zeggada, Francesco G. B. De Natale, Mohammed A.-M. Salem, and Alaa Khamis. 2016. Recovering the sight to blind people in indoor environments with smart technologies. Expert Systems With Applications 46, 129–138.
[41]
Suranga Nanayakkara, Roy Shilkrot, Kian Peen Yeo, and Pattie Maes. 2013. EyeRing: A finger-worn input device for seamless interactions with our surroundings. In Proceedings of the 4th Augmented Human International Conference. ACM, New York, NY, 13–20.
[42]
Jussi Nikander, Juha Järvi, Muhammad Usman, and Kirsi Virrantaus. 2013. Indoor and outdoor mobile navigation by using a combination of floor plans and street maps. In Progress in Location-Based Services. Springer, 233–249.
[43]
David Nister and Henrik Stewenius. 2006. Scalable recognition with a vocabulary tree. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), Vol. 2. IEEE, Los Alamitos, CA, 2161–2168.
[44]
Noboru Ohnishi, Tetsuya Matsumoto, Hiroaki Kudo, and Yoshinori Takeuchi. 2013. A system helping blind people to get character information in their surrounding environment. In Proceedings of the 15th International ACM SIGACCESS Conference on Computers and Accessibility. ACM, New York, NY, 34.
[45]
D. Pascolini and S. P. Mariotti. 2011. Global estimates of visual impairment: 2010. British Journal of Ophthalmology 96, 5, 614–618.
[46]
Florent Perronnin, Yan Liu, Jorge Sánchez, and Hervé Poirier. 2010a. Large-scale image retrieval with compressed Fisher vectors. In Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’10). IEEE, Los Alamitos, CA, 3384–3391.
[47]
Florent Perronnin, Jorge Sánchez, and Thomas Mensink. 2010b. Improving the Fisher kernel for large-scale image classification. In Proceedings of the 11th European Conference on Computer Vision: Part IV (ECCV’10). 143–156.
[48]
James Philbin, Ondrej Chum, Michael Isard, Josef Sivic, and Andrew Zisserman. 2007. Object retrieval with large vocabularies and fast spatial matching. In Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’07). IEEE, Los Alamitos, CA, 1–8.
[49]
James Philbin, Ondrej Chum, Michael Isard, Josef Sivic, and Andrew Zisserman. 2008. Lost in quantization: Improving particular object retrieval in large scale image databases. In Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’07). IEEE, Los Alamitos, CA, 1–8.
[50]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet large scale visual recognition challenge. International Journal of Computer Vision 115, 3, 211–252.
[51]
Boris Schauerte, Manel Martinez, Angela Constantinescu, and Rainer Stiefelhagen. 2012. An Assistive Vision System for the Blind That Helps Find Lost Things. Springer.
[52]
Roy Shilkrot, Jochen Huber, Connie Liu, Pattie Maes, and Suranga Chandima Nanayakkara. 2014. FingerReader: A wearable device to support text reading on the go. In CHI’14 Extended Abstracts on Human Factors in Computing Systems. ACM, New York, NY, 2359–2364.
[53]
Josef Sivic and Andrew Zisserman. 2003. Video Google: A text retrieval approach to object matching in videos. In Proceedings of the 9th IEEE International Conference on Computer Vision (ICCV’03). IEEE, Los Alamitos, CA, 1470–1477.
[54]
Zóra Solymár, Attila Stubendek, Mihály Radványi, and Kristóf Karacs. 2011. Banknote recognition for visually impaired. In Proceedings of the 2011 20th European Conference on Circuit Theory and Design (ECCTD’11). IEEE, Los Alamitos, CA, 841–844.
[55]
Joan Sosa-García and Francesca Odone. 2015. Mean BoF per quadrant—simple and effective way to embed spatial information in bag of features. In Proceedings of the 10th International Conference on Computer Vision Theory and Applications (VISIGRAPP’15). 297–304.
[56]
Jeremi Sudol, Orang Dialameh, Chuck Blanchard, and Tim Dorcey. 2010. Looktel—a comprehensive platform for computer-aided visual assistance. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’10). IEEE, Los Alamitos, CA, 73–80.
[57]
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. 2014. Going deeper with convolutions. arxiv:cs.CV/1409.4842.
[58]
Fumiaki Takeda, Lalita Sakoobunthu, and Hironobu Satou. 2003. Thai banknote recognition using neural network and continues learning by DSP unit. In Knowledge-Based Intelligent Information and Engineering Systems. Springer, 1169–1177.
[59]
Makoto Tanaka and Hideaki Goto. 2008. Text-tracking wearable camera system for visually-impaired people. In Proceedings of the 2008 19th International Conference on Pattern Recognition (ICPR’08). IEEE, Los Alamitos, CA, 1–4.
[60]
YingLi Tian, Xiaodong Yang, Chucai Yi, and Aries Arditi. 2013. Toward a computer vision-based wayfinding aid for blind persons to access unfamiliar indoor environments. Machine Vision and Applications 24, 3, 521–535.
[61]
Paul van Schaik, Mohammad Mayouf, and Gabor Aranyi. 2015. 3-D route-planning support for navigation in a complex indoor environment. Behaviour and Information Technology. E-pub ahead of print.
[62]
A. Vedaldi and B. Fulkerson. 2008. VLFeat: An Open and Portable Library of Computer Vision Algorithms. Retrieved July 12, 2017, from http://www.vlfeat.org/.
[63]
Tess Winlock, Eric Christiansen, and Serge Belongie. 2010. Toward real-time grocery detection for the visually impaired. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’10). IEEE, Los Alamitos, CA, 49–56.
[64]
Xiaodong Yang, Yingli Tian, Chucai Yi, and Aries Arditi. 2010. Context-based indoor object detection as an aid to blind persons accessing unfamiliar environments. In Proceedings of the International Conference on Multimedia. ACM, New York, NY, 1087–1090.
[65]
Hanlu Ye, Meethu Malu, Uran Oh, and Leah Findlater. 2014. Current and future mobile and wearable device use by people with visual impairments. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, New York, NY, 3123–3132.
[66]
Chucai Yi, Yingli Tian, and Aries Arditi. 2014. Portable camera-based assistive text and product label reading from hand-held objects for blind persons. IEEE/ASME Transactions on Mechatronics 19, 3, 808–817.
[67]
Aurang Zeb, Sehat Ullah, and Ihsan Rabbi. 2014. Indoor vision-based auditory assistance for blind people in semi controlled environments. In Proceedings of the 2014 4th International Conference on Image Processing Theory, Tools, and Applications (IPTA’14). IEEE, Los Alamitos, CA, 1–6.
[68]
J. Zhang, S. K. Ong, and A. Y. C. Nee. 2008. Navigation systems for individuals with visual impairment: A survey. In Proceedings of the 2nd International Convention on Rehabilitation Engineering and Assistive Technology. 159–162.

Cited By

View all
  • (2024)AccessShare: Co-designing Data Access and Sharing with Blind PeopleProceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility10.1145/3663548.3675612(1-16)Online publication date: 27-Oct-2024
  • (2024)Exploring AI Problem Formulation with Children via Teachable MachinesProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642692(1-18)Online publication date: 11-May-2024
  • (2024)Open Data Sources for Post-Consumer Plastic Sorting: What We Have and What We Still NeedProcedia CIRP10.1016/j.procir.2024.01.141122(1042-1047)Online publication date: 2024
  • Show More Cited By

Index Terms

  1. “Hands On” Visual Recognition for Visually Impaired Users

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Accessible Computing
    ACM Transactions on Accessible Computing  Volume 10, Issue 3
    August 2017
    76 pages
    ISSN:1936-7228
    EISSN:1936-7236
    DOI:10.1145/3132048
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 August 2017
    Accepted: 01 February 2017
    Revised: 01 February 2017
    Received: 01 August 2016
    Published in TACCESS Volume 10, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Single-instance object recognition
    2. systems for visually impaired users

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • GLASSENSE

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)31
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 03 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)AccessShare: Co-designing Data Access and Sharing with Blind PeopleProceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility10.1145/3663548.3675612(1-16)Online publication date: 27-Oct-2024
    • (2024)Exploring AI Problem Formulation with Children via Teachable MachinesProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642692(1-18)Online publication date: 11-May-2024
    • (2024)Open Data Sources for Post-Consumer Plastic Sorting: What We Have and What We Still NeedProcedia CIRP10.1016/j.procir.2024.01.141122(1042-1047)Online publication date: 2024
    • (2024)Hierarchical waste detection with weakly supervised segmentation in images from recycling plantsEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.107542128(107542)Online publication date: Feb-2024
    • (2023)Contributing to Accessibility Datasets: Reflections on Sharing Study Data by Blind PeopleProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3581337(1-18)Online publication date: 19-Apr-2023
    • (2022)Blind Users Accessing Their Training Images in Teachable Object RecognizersProceedings of the 24th International ACM SIGACCESS Conference on Computers and Accessibility10.1145/3517428.3544824(1-18)Online publication date: 23-Oct-2022
    • (2022)Computer Vision Extended Perception System for Blind People2022 IEEE 40th Central America and Panama Convention (CONCAPAN)10.1109/CONCAPAN48024.2022.9997641(1-4)Online publication date: 9-Nov-2022
    • (2022)Recent trends in computer vision-driven scene understanding for VI/blind users: a systematic mappingUniversal Access in the Information Society10.1007/s10209-022-00868-w22:3(983-1005)Online publication date: 6-Feb-2022
    • (2021)Recycling Waste Classification Using Vision Transformer on Portable DeviceSustainability10.3390/su13211157213:21(11572)Online publication date: 20-Oct-2021
    • (2021)An Overview of Machine Learning and 5G for People with DisabilitiesSensors10.3390/s2122757221:22(7572)Online publication date: 14-Nov-2021
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media