The development of an augmented reality audio application for visually impaired persons

Akın, Alper Tunga; Cömert, Çetin

doi:10.1007/s11042-022-14134-x

The development of an augmented reality audio application for visually impaired persons

Track 4: Digital Games, Virtual Reality, and Augmented Reality
Published: 09 November 2022

Volume 82, pages 17493–17512, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

444 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

In this study, an augmented reality audio application that works with smartphones has been developed to assist the lives of visually impaired persons. The application provides object detection, obstacle notification, and navigation through online base maps with audio feedback. Several important issues were to be tackled in such an undertaking. Deep learning techniques have been employed for the issues of monocular depth extraction and object detection. A web services solution has been adopted concerning real-time feedback, which is critical for the impaired. A deep learning monocular depth extraction model, which has been preferred with respect to a literature review, has been validated with relevant metrics. For object detection, a well-proven and widely used deep learning model has been chosen. All the involved software components and the developed application are open source.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SightAid: empowering the visually impaired in the Kingdom of Saudi Arabia (KSA) with deep learning-based intelligent wearable vision system

Article 29 March 2024

Empowering Individuals with Visual Impairments: A Deep Learning-Based Smartphone Navigation Assistant

Vision-Based Assistive Systems for Visually Impaired People: A Review

Data availability statement

Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.

References

Akın AT, Cömert Ç (2021) Testing of a deep learning model providing monocular depth estimation on mobile devices via web service. In 2021 5th International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT) (pp. 43–46). IEEE
Aktaş A, Doğan B, Demir Ö (2020) Derin öğrenme yöntemleri ile dokunsal parke yüzeyi tespiti. Gazi Üniversitesi Mühendislik Mimarlık Fakültesi Dergisi 35(3):1685–1700
Google Scholar
Baecker RM (ed) (2000) Readings in Human-Computer Interaction: toward the year. Elsevier, p 2014
Google Scholar
Bauer Z, Dominguez A, Cruz E, Gomez-Donoso F, Orts-Escolano S, Cazorla M (2020) Enhancing perception for the visually impaired with deep learning techniques and low-cost wearable sensors. Pattern Recogn Lett 137:27–36
Article Google Scholar
Bimber O, Raskar R (2005) Spatial augmented reality: merging real and virtual worlds. CRC press
Book Google Scholar
Bradski G (2000) The openCV library. Dr. Dobb's Journal: Software Tools for the Professional Programmer 25(11):120–123
BTS-PyTorch (2020) https://github.com/ErenBalatkan/Bts-PyTorch, (accessed 12 July 2022)
Cloud Text-to-speech (2022) https://cloud.google.com/text-to-speech, (accessed 12 July 2022)
CloudSight AI, Image Recognition API(2022), https://cloudsight.ai/, (accessed 12 July 2022)
Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single image using a multi-scale deep network. Adv Neural Inf Proces Syst 27
Elmannai W, Elleithy K (2017) Sensor-based assistive devices for visually-impaired people: current status, challenges, and future directions. Sensors 17(3):565
Article Google Scholar
Furht B (ed) (2008) Encyclopedia of multimedia. Springer Science & Business Media
Google Scholar
Gallo P, Tinnirello I, Giarré L, Garlisi D, Croce D, Fagiolini A (2013) ARIANNA: pAth recognition for indoor assisted navigation with augmented perception. arXiv preprint arXiv:1312.3724
Geiger A, Lenz P, Stiller C, Urtasun R (2013) Vision meets robotics: the Kitti dataset. Int JRobot Res 32(11):1231–1237
Article Google Scholar
Grinberg M (2018) Flask web development: developing web applications with python. O'Reilly Media, Inc
Google Scholar
Haklay M, Weber P (2008) Openstreetmap: User-generated street maps. IEEE Pervas Compu 7(4):12–18
Article Google Scholar
Hartley R, Zisserman A (2003) Multiple view geometry in computer vision. Cambridge university press
MATH Google Scholar
How to Protect Yourself and Others (2022) https://www.cdc.gov/coronavirus/2019-ncov/prevent-getting-sick/prevention.html, (accessed 12 July 2022)
https://github.com/alpertungakin/DepthandRecogApp (2021) (accessed 12 July 2022)
https://www.gsmarena.com/compare.php3?idPhone2=5953&idPhone3=8961 (2022), (accessed 12 July 2022)
Impact of Vision Impairment (2022) https://www.who.int/news-room/fact-sheets/detail/blindness-and-visual-impairment, (accessed 12 July 2022).
Jiao L, Zhang F, Liu F, Yang S, Li L, Feng Z, Qu R (2019) A survey of deep learning-based object detection. IEEE Access 7:128837–128868
Article Google Scholar
Kandalan RN, Namuduri K (2020) Techniques for constructing indoor navigation systems for the visually impaired: a review. IEEE Transac Human-Machine Syst 50(6):492–506
Article Google Scholar
Khan F, Salahuddin S, Javidnia H (2020) Deep learning-based monocular depth estimation methods—a state-of-the-art review. Sensors 20(8):2272
Article Google Scholar
Laina I, Rupprecht C, Belagiannis V, Tombari F, Navab N (2016) Deeper depth prediction with fully convolutional residual networks. In 2016 fourth international conference on 3D vision (3DV) (pp. 239–248). IEEE
Lee JH, Han MK, Ko DW, Suh IH (2019) From big to small: multi-scale local planar guidance for monocular depth estimation. arXiv preprint arXiv:1907.10326
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Zitnick CL (2014) Microsoft coco: common objects in context. In European conference on computer vision (pp. 740–755). Springer, Cham
Lin BS, Lee CC, Chiang PY (2017) Simple smartphone-based guiding system for visually impaired people. Sensors 17(6):1371
Article Google Scholar
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: single shot multibox detector. In European conference on computer vision (pp. 21–37). Springer, Cham
Liu L, Ouyang W, Wang X, Fieguth P, Chen J, Liu X, Pietikäinen M (2020) Deep learning for generic object detection: a survey. Int J Comput Vis 128(2):261–318
Article MATH Google Scholar
Lo Valvo A, Croce D, Garlisi D, Giuliano F, Giarré L, Tinnirello I (2021) A navigation and augmented reality system for visually impaired people. Sensors 21(9):3061
Article Google Scholar
Mapbox Web Services APIs (2022) https://docs.mapbox.com/api/overview/, (accessed 12 July 2022)
Ming Y, Meng X, Fan C, Yu H (2021) Deep learning for monocular depth estimation: a review. Neurocomputing 438:14–33
Article Google Scholar
Nowacki P, Woda M (2019) Capabilities of arcore and arkit platforms for ar/vr applications. In international conference on dependability and complex systems (pp. 358–370). Springer, Cham
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, … Chintala S (2019) Pytorch: an imperative style, high-performance deep learning library. Adv Neural Inf Proces Syst 32
Redmon J, Divvala S, Girshick R, Farhadi A (2016). You only look once: unified, real-time object detection. In proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779–788)
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Adv Neural Inf Proces Syst 28
Senanayake P, Jayawardena CL, Jayakodi JDSU (2018) Accuracy of smartphone location services for geo-tagged data collection: A field study. Annu Sessions of IESL, 447–451
Skopeliti A, Stamou L (2019) Online map services: contemporary cartography or a new cartographic culture? ISPRS Int J Geo Inf 8(5):215
Article Google Scholar
Stepping Science: Estimating Someone's Height from Their Walk (2013) https://www.scientificamerican.com/article/bring-science-home-estimating-height-walk/, (accessed 12 July 2022)
Sun M, Ding P, Song J, Song M, Wang L (2019) “Watch your step”: precise obstacle detection and navigation for Mobile users through their Mobile service. IEEE Access 7:66731–66738
Article Google Scholar
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, ... Rabinovich A (2015) Going deeper with convolutions. In proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1–9)
TapTapSee - Assistive Technology for the Blind and Visually Impaired (2022) https://taptapseeapp.com/, 12 July 2022g
Tapu R, Mocanu B, Zaharia T (2020) Wearable assistive devices for visually impaired: a state of the art survey. Pattern Recogn Lett 137:37–52
Article Google Scholar
Unity Manual (2022) https://docs.unity3d.com/Manual/PlatformSpecific.html, (accessed 12 July 2022)
WeWALK (2022) https://wewalk.io/tr, (accessed 12 July 2022)
World Health Organization (WHO), 2019. World Report On Vision
YOLOv5 – Ultralytics (2022) https://github.com/ultralytics/yolov5, (accessed 12 July 2022)
Zaba JN (2011) Children's vision care in the 21 St Century & its Impact on Education, literacy, social issues, & the workplace: a call to action. J Behav Optom 22(2)
Zhao ZQ, Zheng P, Xu ST, Wu X (2019) Object detection with deep learning: a review. IEEE Transac Neural Netw Learn Syst 30(11):3212–3232
Article Google Scholar

Download references

Funding

This work was supported by KTU Scientific Research Projects (KTÜ BAP) [FBA-2021–9488].

Author information

Authors and Affiliations

Karadeniz Technical University, Department of Geomatics Engineering, 61080, Trabzon, Turkey
Alper Tunga Akın & Çetin Cömert

Authors

Alper Tunga Akın
View author publications
You can also search for this author in PubMed Google Scholar
Çetin Cömert
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alper Tunga Akın.

Ethics declarations

Conflict of interes

The authors declare that there is no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Akın, A.T., Cömert, Ç. The development of an augmented reality audio application for visually impaired persons. Multimed Tools Appl 82, 17493–17512 (2023). https://doi.org/10.1007/s11042-022-14134-x

Download citation

Received: 17 May 2022
Revised: 15 August 2022
Accepted: 25 October 2022
Published: 09 November 2022
Issue Date: May 2023
DOI: https://doi.org/10.1007/s11042-022-14134-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The development of an augmented reality audio application for visually impaired persons

Abstract

Access this article

Similar content being viewed by others

SightAid: empowering the visually impaired in the Kingdom of Saudi Arabia (KSA) with deep learning-based intelligent wearable vision system

Empowering Individuals with Visual Impairments: A Deep Learning-Based Smartphone Navigation Assistant

Vision-Based Assistive Systems for Visually Impaired People: A Review

Data availability statement

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interes

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation