Real-Time 3D Object Detection, Recognition and Presentation Using a Mobile Device for Assistive Navigation

Chen, Jin; Zhu, Zhigang

doi:10.1007/s42979-023-01881-3

Real-Time 3D Object Detection, Recognition and Presentation Using a Mobile Device for Assistive Navigation

Original Research
Published: 29 July 2023

Volume 4, article number 543, (2023)
Cite this article

SN Computer Science Aims and scope Submit manuscript

127 Accesses
Explore all metrics

Abstract

This paper presents an integrated solution for 3D object detection, recognition, and presentation to increase accessibility for various user groups in indoor areas through a mobile application. The system has three major components: a 3D object detection module, an object tracking and update module, and a voice and AR-enhanced interface. The 3D object detection module consists of pre-trained 2D object detectors and 3D bounding box estimation methods to detect the 3D poses and sizes of the objects in each camera frame. This module can easily adapt to various 2D object detectors (e.g., YOLO, SSD, mask RCNN) based on the requested task and requirements of the run time and details for the 3D detection result. It can run on a cloud server or mobile application. The object tracking and update module minimizes the computational power for long-term environment scanning by converting 2D tracking results into 3D results. The voice and AR-enhanced interface integrates ARKit and SiriKit to provide voice interaction and AR visualization to improve information delivery for different user groups. The system can be integrated with existing applications, especially assistive navigation, to increase travel safety for people who are blind or have low vision and improve social interaction for individuals with autism spectrum disorder. In addition, it can potentially be used for 3D reconstruction of the environment for other applications. Our preliminary test results for the object detection evaluation and real-time system performance are provided to validate the proposed system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

LAR: a low-power, high-precision mobile phone-based AR system

Article 14 July 2020

Can AI Replace Conventional Markerless Tracking? A Comparative Performance Study for Mobile Augmented Reality Based on Artificial Intelligence

ASSIST: Personalized Indoor Navigation via Multimodal Sensors and High-Level Semantic Information

Data availability

The data that support the findings of this study are available on request from the corresponding author, JC. The data are not publicly available due to the potential compromise of personal privacy.

Notes

References

Bourne R, Steinmetz JD, Flaxman S, Briant PS, Taylor HR, Resnikoff S, Casson RJ, Abdoli A, Abu-Gharbieh E, Afshin A, Ahmadieh H. Trends in prevalence of blindness and distance and near vision impairment over 30 years: an analysis for the global burden of disease study. Lancet Glob Health. 2021. https://doi.org/10.1016/s2214-109x(20)30425-3.
Article Google Scholar
Vision Atlas. The International Agency for the Prevention of Blindness. https://www.iapb.org/learn/vision-atlas. Accessed 15 Aug 2022.
Manduchi R, Kurniawan S. Watch your head, mind your step: mobility-related accidents experienced by people with visual impairment. Department of Computer Engineering, University of California, Santa Cruz, Technical Report 2010;1.
Koldewyn K, Weigelt S, Kanwisher N, Jiang Y. Multiple object tracking in autism spectrum disorders. J Autism Dev Disord. 2013;43:1394–405.
Article Google Scholar
van der Geest JN, Kemner C, Camfferman G, Verbaten MN, van Engeland H. Eye movements, visual attention, and autism: a saccadic reaction time study using the gap and overlap paradigm. Biol Psychiatry. 2001;50(8):614–9.
Article Google Scholar
Quintana E, Ibarra C, Escobedo L, Tentori M, Object Favela J, gesture recognition to assist children with autism during the discrimination training. In: Progress in pattern recognition, image analysis, computer vision, and applications: 17th Iberoamerican congress, CIARP, Buenos Aires. Argentina. 2012. p. 877–84.
Laser Eye Surgery Hub. Visual impairment and blindness global data and statistics. https://www.lasereyesurgeryhub.co.uk/data/visual-impairment-blindness-data-statistics. Accessed 15 Aug 2022.
Chen J, Zhu Z. Real-time 3D object detection and recognition using a Smartphone. In: Proceedings of the 2nd international conference on image processing and vision engineering. 2022. p. 158–65. https://doi.org/10.5220/0011060600003209.
Zhu Z, Chen J, Zhang L, Chang Y, Franklin T, Tang H, Ruci A. iassist: an iphone-based multimedia information system for indoor assistive navigation. Int J Multimed Data Eng Manag (IJMDEM). 2020;11(4):38–59. https://doi.org/10.4018/IJMDEM.2020100103.
Article Google Scholar
Ren S, He K, Girshick R, Sun J. Faster r-cnn: towards real-time object detection with region proposal networks. Advances in neural information processing systems. 2015. p. 28. https://doi.org/10.1109/TPAMI.2016.2577031.
He K, Gkioxari G, Dollár P, Girshick R. Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision. 2017. p. 2961–9. https://doi.org/10.1109/ICCV.2017.322.
Redmon J, Farhadi A. Yolov3: an incremental improvement. 2018. arXiv preprint arXiv:1804.02767.
Glenn J, Ayush C, Alex S, Jirka B, et al. ultralytics/yolov5: v7.0—YOLOv5 SOTA realtime instance segmentation. 2022. https://doi.org/10.5281/zenodo.7347926.
Wang CY, Bochkovskiy A, Liao HY. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. 2022. arXiv preprint arXiv:2207.02696.
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC, Ssd: Single shot multibox detector. In: Computer vision-ECCV, 14th European conference, Amsterdam, The Netherlands. 2016. p. 21–37. https://doi.org/10.1007/978-3-319-46448-0_2.
Lin TY, Goyal P, Girshick R, He K, Dollár P. Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision. 2017. p. 2980–8. https://doi.org/10.1109/ICCV.2017.324.
Huang HC, Hsieh CT, Yeh CH. An indoor obstacle detection system using depth information and region growth. Sensors. 2015;15(10):27116–41. https://doi.org/10.3390/s151027116.
Article Google Scholar
Cheng R, Wang K, Yang K, Zhao X. A ground and obstacle detection algorithm for the visually impaired. In: IET international conference on biomedical image and signal processing. 2015. p. 1–6.
Soquet N, Aubert D, Hautiere N. Road segmentation supervised by an extended v-disparity algorithm for autonomous navigation. In: 2007 IEEE intelligent vehicles symposium. 2007. p. 160–5. https://doi.org/10.1109/IVS.2007.4290108
Sun L, Yang K, Hu X, Hu W, Wang K. Real-time fusion network for RGB-D semantic segmentation incorporating unexpected obstacle detection for road-driving images. IEEE Robot Autom Lett. 2020;5(4):5558–65. https://doi.org/10.1109/LRA.2020.3007457.
Article Google Scholar
Chen Y, Liu S, Shen X, Jia J. Dsgn: Deep stereo geometry network for 3d object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020. p. 12536–45. https://doi.org/10.1109/CVPR42600.2020.01255
Pham HH, Thi-Lan L, Vuillerme N. Real-time obstacle detection system in indoor environment for the visually impaired using microsoft kinect sensor. J Sens. 2016. https://doi.org/10.1155/2016/3754918.
Article Google Scholar
Fischler MA, Bolles RC. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM. 1981;24(6):381–95. https://doi.org/10.1145/358669.358692.
Article MathSciNet Google Scholar
Domenech JF, Escalona F, Gomez-Donoso F, Cazorla M. A voxelized fractal descriptor for 3D object recognition. IEEE Access. 2020;8:161958–68. https://doi.org/10.1109/ACCESS.2020.3021455.
Article Google Scholar
Wu Z, Song S, Khosla A, Yu F, Zhang L, Tang X, Xiao J. 3d shapenets: a deep representation for volumetric shapes. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2015. p. 1912–20.
He C, Gong J, Yang Y, Bi D, Lan J, Qie L. Real-time track obstacle detection from 3D LIDAR point cloud. J Phys Conf Ser. 2021;1910(1):012002. https://doi.org/10.1088/1742-6596/1910/1/012002.
Article Google Scholar
Garnett N, Silberstein S, Oron S, Fetaya E, Verner U, Ayash A, Goldner V, Cohen R, Horn K, Levi D. Real-time category-based and general obstacle detection for autonomous driving. In: Proceedings of the IEEE international conference on computer vision workshops. 2017. p. 198–205. https://doi.org/10.1109/ICCVW.2017.32
Levi D, Garnett N, Fetaya E, Herzlyia I. Stixelnet: a deep convolutional network for obstacle detection and road segmentation. Br Mach Vis Conf. 2015;1(2):4. https://doi.org/10.5244/C.29.109.
Article Google Scholar
Apple Inc. Arkit—augmented reality. 2022. https://developer.apple.com/augmented-reality. Accessed 15 Aug 2022.
Song S, Lichtenberg SP, Xiao J. Sun rgb-d: a rgb-d scene understanding benchmark suite. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2015. p. 567–76. https://doi.org/10.1109/CVPR.2015.7298655.
Rothe R, Guillaumin M, Van Gool L. Non-maximum suppression for object detection by passing messages between windows. In: Computer vision-ACCV 2014: 12th Asian conference on computer vision, Singapore. 2015. pp 290–306. https://doi.org/10.1007/978-3-319-16865-4_19.
Sabbir R. The iphone 12—LIDAR AT YOUR FINGERTIPS. In Forbes. 2020. https://www.forbes.com/sites/sabbirrangwala/2020/11/12/the-iphone-12lidar-at-your-fingertips/?sh=3c3b72493e28. Accessed 25 Apr 2023.
Amazon Inc. Amazon EC2 instance types—Amazon Web Services. https://aws.amazon.com/ec2/instance-types/. Accessed 25 Apr 2023.
Silberman N, Hoiem D, Kohli P, Fergus R. Indoor segmentation and support inference from rgbd images. Eur Conf Comput Vis. 2012;7576:746–60. https://doi.org/10.1007/978-3-642-33715-4_54.
Article Google Scholar
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL, Microsoft coco: Common objects in context. In: Computer Vision-ECCV, 13th European Conference, Zurich, Switzerland. 2014. p. 740–55. https://doi.org/10.1007/978-3-319-10602-1_48.
Tang YS, Lee GH. Transferable semi-supervised 3d object detection from rgb-d data. In: Proceedings of the IEEE/CVF international conference on computer vision. 2019. p. 1931–40. https://doi.org/10.1109/ICCV.2019.00202.

Download references

Funding

This study was funded by US National Science Foundation (#2131186, #2118006, #1827505 and #1737533), ODNI Intelligence Community Center for Academic Excellence (IC CAE) at Rutgers (#HHM402-19-1-0003 and #HHM402-18-1-0007) and the US Air Force Office for Scientific Research (#FA9550-21-1-0082).

Author information

Authors and Affiliations

Visual Computing Laboratory, Computer Science Department, The City College-CUNY, New York, NY, 10031, USA
Jin Chen & Zhigang Zhu
Nearabl Inc., New York, 10023, NY, USA
Jin Chen
PhD Program in Computer Science, The Graduate Center-CUNY, New York, NY, 10016, USA
Zhigang Zhu

Authors

Jin Chen
View author publications
You can also search for this author in PubMed Google Scholar
Zhigang Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jin Chen.

Ethics declarations

Conflict of interest

Jin Chen is the CTO from Nearabl Inc. and owns stock in Nearabl Inc. Zhigang Zhu declares no conflict of interest.

Ethical approval

This article does not contain any studies with human participants performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Advances on Image Processing and Vision Engineering” guest edited by Sebastiano Battiato, Francisco Imai and Cosimo Distante.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chen, J., Zhu, Z. Real-Time 3D Object Detection, Recognition and Presentation Using a Mobile Device for Assistive Navigation. SN COMPUT. SCI. 4, 543 (2023). https://doi.org/10.1007/s42979-023-01881-3

Download citation

Received: 28 September 2022
Accepted: 01 May 2023
Published: 29 July 2023
DOI: https://doi.org/10.1007/s42979-023-01881-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Real-Time 3D Object Detection, Recognition and Presentation Using a Mobile Device for Assistive Navigation

Abstract

Access this article

Similar content being viewed by others

LAR: a low-power, high-precision mobile phone-based AR system

Can AI Replace Conventional Markerless Tracking? A Comparative Performance Study for Mobile Augmented Reality Based on Artificial Intelligence

ASSIST: Personalized Indoor Navigation via Multimodal Sensors and High-Level Semantic Information

Data availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation