Skip to main content
Log in

The development of an augmented reality audio application for visually impaired persons

  • Track 4: Digital Games, Virtual Reality, and Augmented Reality
  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In this study, an augmented reality audio application that works with smartphones has been developed to assist the lives of visually impaired persons. The application provides object detection, obstacle notification, and navigation through online base maps with audio feedback. Several important issues were to be tackled in such an undertaking. Deep learning techniques have been employed for the issues of monocular depth extraction and object detection. A web services solution has been adopted concerning real-time feedback, which is critical for the impaired. A deep learning monocular depth extraction model, which has been preferred with respect to a literature review, has been validated with relevant metrics. For object detection, a well-proven and widely used deep learning model has been chosen. All the involved software components and the developed application are open source.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availability statement

Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.

References

  1. Akın AT, Cömert Ç (2021) Testing of a deep learning model providing monocular depth estimation on mobile devices via web service. In 2021 5th International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT) (pp. 43–46). IEEE

  2. Aktaş A, Doğan B, Demir Ö (2020) Derin öğrenme yöntemleri ile dokunsal parke yüzeyi tespiti. Gazi Üniversitesi Mühendislik Mimarlık Fakültesi Dergisi 35(3):1685–1700

    Google Scholar 

  3. Baecker RM (ed) (2000) Readings in Human-Computer Interaction: toward the year. Elsevier, p 2014

    Google Scholar 

  4. Bauer Z, Dominguez A, Cruz E, Gomez-Donoso F, Orts-Escolano S, Cazorla M (2020) Enhancing perception for the visually impaired with deep learning techniques and low-cost wearable sensors. Pattern Recogn Lett 137:27–36

    Article  Google Scholar 

  5. Bimber O, Raskar R (2005) Spatial augmented reality: merging real and virtual worlds. CRC press

    Book  Google Scholar 

  6. Bradski G (2000) The openCV library. Dr. Dobb's Journal: Software Tools for the Professional Programmer 25(11):120–123

  7. BTS-PyTorch (2020) https://github.com/ErenBalatkan/Bts-PyTorch, (accessed 12 July 2022)

  8. Cloud Text-to-speech (2022) https://cloud.google.com/text-to-speech, (accessed 12 July 2022)

  9. CloudSight AI, Image Recognition API(2022), https://cloudsight.ai/, (accessed 12 July 2022)

  10. Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single image using a multi-scale deep network. Adv Neural Inf Proces Syst 27

  11. Elmannai W, Elleithy K (2017) Sensor-based assistive devices for visually-impaired people: current status, challenges, and future directions. Sensors 17(3):565

    Article  Google Scholar 

  12. Furht B (ed) (2008) Encyclopedia of multimedia. Springer Science & Business Media

    Google Scholar 

  13. Gallo P, Tinnirello I, Giarré L, Garlisi D, Croce D, Fagiolini A (2013) ARIANNA: pAth recognition for indoor assisted navigation with augmented perception. arXiv preprint arXiv:1312.3724

  14. Geiger A, Lenz P, Stiller C, Urtasun R (2013) Vision meets robotics: the Kitti dataset. Int JRobot Res 32(11):1231–1237

    Article  Google Scholar 

  15. Grinberg M (2018) Flask web development: developing web applications with python. O'Reilly Media, Inc

    Google Scholar 

  16. Haklay M, Weber P (2008) Openstreetmap: User-generated street maps. IEEE Pervas Compu 7(4):12–18

    Article  Google Scholar 

  17. Hartley R, Zisserman A (2003) Multiple view geometry in computer vision. Cambridge university press

    MATH  Google Scholar 

  18. How to Protect Yourself and Others (2022) https://www.cdc.gov/coronavirus/2019-ncov/prevent-getting-sick/prevention.html, (accessed 12 July 2022)

  19. https://github.com/alpertungakin/DepthandRecogApp (2021) (accessed 12 July 2022)

  20. https://www.gsmarena.com/compare.php3?idPhone2=5953&idPhone3=8961 (2022), (accessed 12 July 2022)

  21. Impact of Vision Impairment (2022) https://www.who.int/news-room/fact-sheets/detail/blindness-and-visual-impairment, (accessed 12 July 2022).

  22. Jiao L, Zhang F, Liu F, Yang S, Li L, Feng Z, Qu R (2019) A survey of deep learning-based object detection. IEEE Access 7:128837–128868

    Article  Google Scholar 

  23. Kandalan RN, Namuduri K (2020) Techniques for constructing indoor navigation systems for the visually impaired: a review. IEEE Transac Human-Machine Syst 50(6):492–506

    Article  Google Scholar 

  24. Khan F, Salahuddin S, Javidnia H (2020) Deep learning-based monocular depth estimation methods—a state-of-the-art review. Sensors 20(8):2272

    Article  Google Scholar 

  25. Laina I, Rupprecht C, Belagiannis V, Tombari F, Navab N (2016) Deeper depth prediction with fully convolutional residual networks. In 2016 fourth international conference on 3D vision (3DV) (pp. 239–248). IEEE

  26. Lee JH, Han MK, Ko DW, Suh IH (2019) From big to small: multi-scale local planar guidance for monocular depth estimation. arXiv preprint arXiv:1907.10326

  27. Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Zitnick CL (2014) Microsoft coco: common objects in context. In European conference on computer vision (pp. 740–755). Springer, Cham

  28. Lin BS, Lee CC, Chiang PY (2017) Simple smartphone-based guiding system for visually impaired people. Sensors 17(6):1371

    Article  Google Scholar 

  29. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: single shot multibox detector. In European conference on computer vision (pp. 21–37). Springer, Cham

  30. Liu L, Ouyang W, Wang X, Fieguth P, Chen J, Liu X, Pietikäinen M (2020) Deep learning for generic object detection: a survey. Int J Comput Vis 128(2):261–318

    Article  MATH  Google Scholar 

  31. Lo Valvo A, Croce D, Garlisi D, Giuliano F, Giarré L, Tinnirello I (2021) A navigation and augmented reality system for visually impaired people. Sensors 21(9):3061

    Article  Google Scholar 

  32. Mapbox Web Services APIs (2022) https://docs.mapbox.com/api/overview/, (accessed 12 July 2022)

  33. Ming Y, Meng X, Fan C, Yu H (2021) Deep learning for monocular depth estimation: a review. Neurocomputing 438:14–33

    Article  Google Scholar 

  34. Nowacki P, Woda M (2019) Capabilities of arcore and arkit platforms for ar/vr applications. In international conference on dependability and complex systems (pp. 358–370). Springer, Cham

  35. Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, … Chintala S (2019) Pytorch: an imperative style, high-performance deep learning library. Adv Neural Inf Proces Syst 32

  36. Redmon J, Divvala S, Girshick R, Farhadi A (2016). You only look once: unified, real-time object detection. In proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779–788)

  37. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Adv Neural Inf Proces Syst 28

  38. Senanayake P, Jayawardena CL, Jayakodi JDSU (2018) Accuracy of smartphone location services for geo-tagged data collection: A field study. Annu Sessions of IESL, 447–451

  39. Skopeliti A, Stamou L (2019) Online map services: contemporary cartography or a new cartographic culture? ISPRS Int J Geo Inf 8(5):215

    Article  Google Scholar 

  40. Stepping Science: Estimating Someone's Height from Their Walk (2013) https://www.scientificamerican.com/article/bring-science-home-estimating-height-walk/, (accessed 12 July 2022)

  41. Sun M, Ding P, Song J, Song M, Wang L (2019) “Watch your step”: precise obstacle detection and navigation for Mobile users through their Mobile service. IEEE Access 7:66731–66738

    Article  Google Scholar 

  42. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, ... Rabinovich A (2015) Going deeper with convolutions. In proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1–9)

  43. TapTapSee - Assistive Technology for the Blind and Visually Impaired (2022) https://taptapseeapp.com/, 12 July 2022g

  44. Tapu R, Mocanu B, Zaharia T (2020) Wearable assistive devices for visually impaired: a state of the art survey. Pattern Recogn Lett 137:37–52

    Article  Google Scholar 

  45. Unity Manual (2022) https://docs.unity3d.com/Manual/PlatformSpecific.html, (accessed 12 July 2022)

  46. WeWALK (2022) https://wewalk.io/tr, (accessed 12 July 2022)

  47. World Health Organization (WHO), 2019. World Report On Vision

  48. YOLOv5 – Ultralytics (2022) https://github.com/ultralytics/yolov5, (accessed 12 July 2022)

  49. Zaba JN (2011) Children's vision care in the 21 St Century & its Impact on Education, literacy, social issues, & the workplace: a call to action. J Behav Optom 22(2)

  50. Zhao ZQ, Zheng P, Xu ST, Wu X (2019) Object detection with deep learning: a review. IEEE Transac Neural Netw Learn Syst 30(11):3212–3232

    Article  Google Scholar 

Download references

Funding

This work was supported by KTU Scientific Research Projects (KTÜ BAP) [FBA-2021–9488].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alper Tunga Akın.

Ethics declarations

Conflict of interes

The authors declare that there is no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Akın, A.T., Cömert, Ç. The development of an augmented reality audio application for visually impaired persons. Multimed Tools Appl 82, 17493–17512 (2023). https://doi.org/10.1007/s11042-022-14134-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-14134-x

Keywords

Navigation