Abstract
In this study, an augmented reality audio application that works with smartphones has been developed to assist the lives of visually impaired persons. The application provides object detection, obstacle notification, and navigation through online base maps with audio feedback. Several important issues were to be tackled in such an undertaking. Deep learning techniques have been employed for the issues of monocular depth extraction and object detection. A web services solution has been adopted concerning real-time feedback, which is critical for the impaired. A deep learning monocular depth extraction model, which has been preferred with respect to a literature review, has been validated with relevant metrics. For object detection, a well-proven and widely used deep learning model has been chosen. All the involved software components and the developed application are open source.
Similar content being viewed by others
Data availability statement
Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.
References
Akın AT, Cömert Ç (2021) Testing of a deep learning model providing monocular depth estimation on mobile devices via web service. In 2021 5th International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT) (pp. 43–46). IEEE
Aktaş A, Doğan B, Demir Ö (2020) Derin öğrenme yöntemleri ile dokunsal parke yüzeyi tespiti. Gazi Üniversitesi Mühendislik Mimarlık Fakültesi Dergisi 35(3):1685–1700
Baecker RM (ed) (2000) Readings in Human-Computer Interaction: toward the year. Elsevier, p 2014
Bauer Z, Dominguez A, Cruz E, Gomez-Donoso F, Orts-Escolano S, Cazorla M (2020) Enhancing perception for the visually impaired with deep learning techniques and low-cost wearable sensors. Pattern Recogn Lett 137:27–36
Bimber O, Raskar R (2005) Spatial augmented reality: merging real and virtual worlds. CRC press
Bradski G (2000) The openCV library. Dr. Dobb's Journal: Software Tools for the Professional Programmer 25(11):120–123
BTS-PyTorch (2020) https://github.com/ErenBalatkan/Bts-PyTorch, (accessed 12 July 2022)
Cloud Text-to-speech (2022) https://cloud.google.com/text-to-speech, (accessed 12 July 2022)
CloudSight AI, Image Recognition API(2022), https://cloudsight.ai/, (accessed 12 July 2022)
Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single image using a multi-scale deep network. Adv Neural Inf Proces Syst 27
Elmannai W, Elleithy K (2017) Sensor-based assistive devices for visually-impaired people: current status, challenges, and future directions. Sensors 17(3):565
Furht B (ed) (2008) Encyclopedia of multimedia. Springer Science & Business Media
Gallo P, Tinnirello I, Giarré L, Garlisi D, Croce D, Fagiolini A (2013) ARIANNA: pAth recognition for indoor assisted navigation with augmented perception. arXiv preprint arXiv:1312.3724
Geiger A, Lenz P, Stiller C, Urtasun R (2013) Vision meets robotics: the Kitti dataset. Int JRobot Res 32(11):1231–1237
Grinberg M (2018) Flask web development: developing web applications with python. O'Reilly Media, Inc
Haklay M, Weber P (2008) Openstreetmap: User-generated street maps. IEEE Pervas Compu 7(4):12–18
Hartley R, Zisserman A (2003) Multiple view geometry in computer vision. Cambridge university press
How to Protect Yourself and Others (2022) https://www.cdc.gov/coronavirus/2019-ncov/prevent-getting-sick/prevention.html, (accessed 12 July 2022)
https://github.com/alpertungakin/DepthandRecogApp (2021) (accessed 12 July 2022)
https://www.gsmarena.com/compare.php3?idPhone2=5953&idPhone3=8961 (2022), (accessed 12 July 2022)
Impact of Vision Impairment (2022) https://www.who.int/news-room/fact-sheets/detail/blindness-and-visual-impairment, (accessed 12 July 2022).
Jiao L, Zhang F, Liu F, Yang S, Li L, Feng Z, Qu R (2019) A survey of deep learning-based object detection. IEEE Access 7:128837–128868
Kandalan RN, Namuduri K (2020) Techniques for constructing indoor navigation systems for the visually impaired: a review. IEEE Transac Human-Machine Syst 50(6):492–506
Khan F, Salahuddin S, Javidnia H (2020) Deep learning-based monocular depth estimation methods—a state-of-the-art review. Sensors 20(8):2272
Laina I, Rupprecht C, Belagiannis V, Tombari F, Navab N (2016) Deeper depth prediction with fully convolutional residual networks. In 2016 fourth international conference on 3D vision (3DV) (pp. 239–248). IEEE
Lee JH, Han MK, Ko DW, Suh IH (2019) From big to small: multi-scale local planar guidance for monocular depth estimation. arXiv preprint arXiv:1907.10326
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Zitnick CL (2014) Microsoft coco: common objects in context. In European conference on computer vision (pp. 740–755). Springer, Cham
Lin BS, Lee CC, Chiang PY (2017) Simple smartphone-based guiding system for visually impaired people. Sensors 17(6):1371
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: single shot multibox detector. In European conference on computer vision (pp. 21–37). Springer, Cham
Liu L, Ouyang W, Wang X, Fieguth P, Chen J, Liu X, Pietikäinen M (2020) Deep learning for generic object detection: a survey. Int J Comput Vis 128(2):261–318
Lo Valvo A, Croce D, Garlisi D, Giuliano F, Giarré L, Tinnirello I (2021) A navigation and augmented reality system for visually impaired people. Sensors 21(9):3061
Mapbox Web Services APIs (2022) https://docs.mapbox.com/api/overview/, (accessed 12 July 2022)
Ming Y, Meng X, Fan C, Yu H (2021) Deep learning for monocular depth estimation: a review. Neurocomputing 438:14–33
Nowacki P, Woda M (2019) Capabilities of arcore and arkit platforms for ar/vr applications. In international conference on dependability and complex systems (pp. 358–370). Springer, Cham
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, … Chintala S (2019) Pytorch: an imperative style, high-performance deep learning library. Adv Neural Inf Proces Syst 32
Redmon J, Divvala S, Girshick R, Farhadi A (2016). You only look once: unified, real-time object detection. In proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779–788)
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Adv Neural Inf Proces Syst 28
Senanayake P, Jayawardena CL, Jayakodi JDSU (2018) Accuracy of smartphone location services for geo-tagged data collection: A field study. Annu Sessions of IESL, 447–451
Skopeliti A, Stamou L (2019) Online map services: contemporary cartography or a new cartographic culture? ISPRS Int J Geo Inf 8(5):215
Stepping Science: Estimating Someone's Height from Their Walk (2013) https://www.scientificamerican.com/article/bring-science-home-estimating-height-walk/, (accessed 12 July 2022)
Sun M, Ding P, Song J, Song M, Wang L (2019) “Watch your step”: precise obstacle detection and navigation for Mobile users through their Mobile service. IEEE Access 7:66731–66738
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, ... Rabinovich A (2015) Going deeper with convolutions. In proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1–9)
TapTapSee - Assistive Technology for the Blind and Visually Impaired (2022) https://taptapseeapp.com/, 12 July 2022g
Tapu R, Mocanu B, Zaharia T (2020) Wearable assistive devices for visually impaired: a state of the art survey. Pattern Recogn Lett 137:37–52
Unity Manual (2022) https://docs.unity3d.com/Manual/PlatformSpecific.html, (accessed 12 July 2022)
WeWALK (2022) https://wewalk.io/tr, (accessed 12 July 2022)
World Health Organization (WHO), 2019. World Report On Vision
YOLOv5 – Ultralytics (2022) https://github.com/ultralytics/yolov5, (accessed 12 July 2022)
Zaba JN (2011) Children's vision care in the 21 St Century & its Impact on Education, literacy, social issues, & the workplace: a call to action. J Behav Optom 22(2)
Zhao ZQ, Zheng P, Xu ST, Wu X (2019) Object detection with deep learning: a review. IEEE Transac Neural Netw Learn Syst 30(11):3212–3232
Funding
This work was supported by KTU Scientific Research Projects (KTÜ BAP) [FBA-2021–9488].
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interes
The authors declare that there is no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Akın, A.T., Cömert, Ç. The development of an augmented reality audio application for visually impaired persons. Multimed Tools Appl 82, 17493–17512 (2023). https://doi.org/10.1007/s11042-022-14134-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-14134-x