Skip to main content
Log in

Real-Time 3D Object Detection, Recognition and Presentation Using a Mobile Device for Assistive Navigation

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

This paper presents an integrated solution for 3D object detection, recognition, and presentation to increase accessibility for various user groups in indoor areas through a mobile application. The system has three major components: a 3D object detection module, an object tracking and update module, and a voice and AR-enhanced interface. The 3D object detection module consists of pre-trained 2D object detectors and 3D bounding box estimation methods to detect the 3D poses and sizes of the objects in each camera frame. This module can easily adapt to various 2D object detectors (e.g., YOLO, SSD, mask RCNN) based on the requested task and requirements of the run time and details for the 3D detection result. It can run on a cloud server or mobile application. The object tracking and update module minimizes the computational power for long-term environment scanning by converting 2D tracking results into 3D results. The voice and AR-enhanced interface integrates ARKit and SiriKit to provide voice interaction and AR visualization to improve information delivery for different user groups. The system can be integrated with existing applications, especially assistive navigation, to increase travel safety for people who are blind or have low vision and improve social interaction for individuals with autism spectrum disorder. In addition, it can potentially be used for 3D reconstruction of the environment for other applications. Our preliminary test results for the object detection evaluation and real-time system performance are provided to validate the proposed system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data availability

The data that support the findings of this study are available on request from the corresponding author, JC. The data are not publicly available due to the potential compromise of personal privacy.

Notes

  1. https://developer.apple.com/documentation/arkit/arframe/2887449-rawfeaturepoints.

  2. https://www.cubi.casa/support/hardware/list-of-supported-lidar-and-tof-devices.

  3. https://developer.apple.com/documentation/vision/vntrackobjectrequest.

  4. https://developer.apple.com/documentation/sirikit.

References

  1. Bourne R, Steinmetz JD, Flaxman S, Briant PS, Taylor HR, Resnikoff S, Casson RJ, Abdoli A, Abu-Gharbieh E, Afshin A, Ahmadieh H. Trends in prevalence of blindness and distance and near vision impairment over 30 years: an analysis for the global burden of disease study. Lancet Glob Health. 2021. https://doi.org/10.1016/s2214-109x(20)30425-3.

    Article  Google Scholar 

  2. Vision Atlas. The International Agency for the Prevention of Blindness. https://www.iapb.org/learn/vision-atlas. Accessed 15 Aug 2022.

  3. Manduchi R, Kurniawan S. Watch your head, mind your step: mobility-related accidents experienced by people with visual impairment. Department of Computer Engineering, University of California, Santa Cruz, Technical Report 2010;1.

  4. Koldewyn K, Weigelt S, Kanwisher N, Jiang Y. Multiple object tracking in autism spectrum disorders. J Autism Dev Disord. 2013;43:1394–405.

    Article  Google Scholar 

  5. van der Geest JN, Kemner C, Camfferman G, Verbaten MN, van Engeland H. Eye movements, visual attention, and autism: a saccadic reaction time study using the gap and overlap paradigm. Biol Psychiatry. 2001;50(8):614–9.

    Article  Google Scholar 

  6. Quintana E, Ibarra C, Escobedo L, Tentori M, Object Favela J, gesture recognition to assist children with autism during the discrimination training. In: Progress in pattern recognition, image analysis, computer vision, and applications: 17th Iberoamerican congress, CIARP, Buenos Aires. Argentina. 2012. p. 877–84.

  7. Laser Eye Surgery Hub. Visual impairment and blindness global data and statistics. https://www.lasereyesurgeryhub.co.uk/data/visual-impairment-blindness-data-statistics. Accessed 15 Aug 2022.

  8. Chen J, Zhu Z. Real-time 3D object detection and recognition using a Smartphone. In: Proceedings of the 2nd international conference on image processing and vision engineering. 2022. p. 158–65. https://doi.org/10.5220/0011060600003209.

  9. Zhu Z, Chen J, Zhang L, Chang Y, Franklin T, Tang H, Ruci A. iassist: an iphone-based multimedia information system for indoor assistive navigation. Int J Multimed Data Eng Manag (IJMDEM). 2020;11(4):38–59. https://doi.org/10.4018/IJMDEM.2020100103.

    Article  Google Scholar 

  10. Ren S, He K, Girshick R, Sun J. Faster r-cnn: towards real-time object detection with region proposal networks. Advances in neural information processing systems. 2015. p. 28. https://doi.org/10.1109/TPAMI.2016.2577031.

  11. He K, Gkioxari G, Dollár P, Girshick R. Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision. 2017. p. 2961–9. https://doi.org/10.1109/ICCV.2017.322.

  12. Redmon J, Farhadi A. Yolov3: an incremental improvement. 2018. arXiv preprint arXiv:1804.02767.

  13. Glenn J, Ayush C, Alex S, Jirka B, et al. ultralytics/yolov5: v7.0—YOLOv5 SOTA realtime instance segmentation. 2022. https://doi.org/10.5281/zenodo.7347926.

  14. Wang CY, Bochkovskiy A, Liao HY. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. 2022. arXiv preprint arXiv:2207.02696.

  15. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC, Ssd: Single shot multibox detector. In: Computer vision-ECCV, 14th European conference, Amsterdam, The Netherlands. 2016. p. 21–37. https://doi.org/10.1007/978-3-319-46448-0_2.

  16. Lin TY, Goyal P, Girshick R, He K, Dollár P. Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision. 2017. p. 2980–8. https://doi.org/10.1109/ICCV.2017.324.

  17. Huang HC, Hsieh CT, Yeh CH. An indoor obstacle detection system using depth information and region growth. Sensors. 2015;15(10):27116–41. https://doi.org/10.3390/s151027116.

    Article  Google Scholar 

  18. Cheng R, Wang K, Yang K, Zhao X. A ground and obstacle detection algorithm for the visually impaired. In: IET international conference on biomedical image and signal processing. 2015. p. 1–6.

  19. Soquet N, Aubert D, Hautiere N. Road segmentation supervised by an extended v-disparity algorithm for autonomous navigation. In: 2007 IEEE intelligent vehicles symposium. 2007. p. 160–5. https://doi.org/10.1109/IVS.2007.4290108

  20. Sun L, Yang K, Hu X, Hu W, Wang K. Real-time fusion network for RGB-D semantic segmentation incorporating unexpected obstacle detection for road-driving images. IEEE Robot Autom Lett. 2020;5(4):5558–65. https://doi.org/10.1109/LRA.2020.3007457.

    Article  Google Scholar 

  21. Chen Y, Liu S, Shen X, Jia J. Dsgn: Deep stereo geometry network for 3d object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020. p. 12536–45. https://doi.org/10.1109/CVPR42600.2020.01255

  22. Pham HH, Thi-Lan L, Vuillerme N. Real-time obstacle detection system in indoor environment for the visually impaired using microsoft kinect sensor. J Sens. 2016. https://doi.org/10.1155/2016/3754918.

    Article  Google Scholar 

  23. Fischler MA, Bolles RC. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM. 1981;24(6):381–95. https://doi.org/10.1145/358669.358692.

    Article  MathSciNet  Google Scholar 

  24. Domenech JF, Escalona F, Gomez-Donoso F, Cazorla M. A voxelized fractal descriptor for 3D object recognition. IEEE Access. 2020;8:161958–68. https://doi.org/10.1109/ACCESS.2020.3021455.

    Article  Google Scholar 

  25. Wu Z, Song S, Khosla A, Yu F, Zhang L, Tang X, Xiao J. 3d shapenets: a deep representation for volumetric shapes. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2015. p. 1912–20.

  26. He C, Gong J, Yang Y, Bi D, Lan J, Qie L. Real-time track obstacle detection from 3D LIDAR point cloud. J Phys Conf Ser. 2021;1910(1):012002. https://doi.org/10.1088/1742-6596/1910/1/012002.

    Article  Google Scholar 

  27. Garnett N, Silberstein S, Oron S, Fetaya E, Verner U, Ayash A, Goldner V, Cohen R, Horn K, Levi D. Real-time category-based and general obstacle detection for autonomous driving. In: Proceedings of the IEEE international conference on computer vision workshops. 2017. p. 198–205. https://doi.org/10.1109/ICCVW.2017.32

  28. Levi D, Garnett N, Fetaya E, Herzlyia I. Stixelnet: a deep convolutional network for obstacle detection and road segmentation. Br Mach Vis Conf. 2015;1(2):4. https://doi.org/10.5244/C.29.109.

    Article  Google Scholar 

  29. Apple Inc. Arkit—augmented reality. 2022. https://developer.apple.com/augmented-reality. Accessed 15 Aug 2022.

  30. Song S, Lichtenberg SP, Xiao J. Sun rgb-d: a rgb-d scene understanding benchmark suite. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2015. p. 567–76. https://doi.org/10.1109/CVPR.2015.7298655.

  31. Rothe R, Guillaumin M, Van Gool L. Non-maximum suppression for object detection by passing messages between windows. In: Computer vision-ACCV 2014: 12th Asian conference on computer vision, Singapore. 2015. pp 290–306. https://doi.org/10.1007/978-3-319-16865-4_19.

  32. Sabbir R. The iphone 12—LIDAR AT YOUR FINGERTIPS. In Forbes. 2020. https://www.forbes.com/sites/sabbirrangwala/2020/11/12/the-iphone-12lidar-at-your-fingertips/?sh=3c3b72493e28. Accessed 25 Apr 2023.

  33. Amazon Inc. Amazon EC2 instance types—Amazon Web Services. https://aws.amazon.com/ec2/instance-types/. Accessed 25 Apr 2023.

  34. Silberman N, Hoiem D, Kohli P, Fergus R. Indoor segmentation and support inference from rgbd images. Eur Conf Comput Vis. 2012;7576:746–60. https://doi.org/10.1007/978-3-642-33715-4_54.

    Article  Google Scholar 

  35. Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL, Microsoft coco: Common objects in context. In: Computer Vision-ECCV, 13th European Conference, Zurich, Switzerland. 2014. p. 740–55. https://doi.org/10.1007/978-3-319-10602-1_48.

  36. Tang YS, Lee GH. Transferable semi-supervised 3d object detection from rgb-d data. In: Proceedings of the IEEE/CVF international conference on computer vision. 2019. p. 1931–40. https://doi.org/10.1109/ICCV.2019.00202.

Download references

Funding

This study was funded by US National Science Foundation (#2131186, #2118006, #1827505 and #1737533), ODNI Intelligence Community Center for Academic Excellence (IC CAE) at Rutgers (#HHM402-19-1-0003 and #HHM402-18-1-0007) and the US Air Force Office for Scientific Research (#FA9550-21-1-0082).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jin Chen.

Ethics declarations

Conflict of interest

Jin Chen is the CTO from Nearabl Inc. and owns stock in Nearabl Inc. Zhigang Zhu declares no conflict of interest.

Ethical approval

This article does not contain any studies with human participants performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Advances on Image Processing and Vision Engineering” guest edited by Sebastiano Battiato, Francisco Imai and Cosimo Distante.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, J., Zhu, Z. Real-Time 3D Object Detection, Recognition and Presentation Using a Mobile Device for Assistive Navigation. SN COMPUT. SCI. 4, 543 (2023). https://doi.org/10.1007/s42979-023-01881-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-023-01881-3

Keywords

Navigation