Deep learning based 3D target detection for indoor scenes

Liu, Ying; Jiang, Du; Xu, Chao; Sun, Ying; Jiang, Guozhang; Tao, Bo; Tong, Xiliang; Xu, Manman; Li, Gongfa; Yun, Juntong

doi:10.1007/s10489-022-03888-4

Deep learning based 3D target detection for indoor scenes

Published: 16 August 2022

Volume 53, pages 10218–10231, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Ying Liu¹,
Du Jiang¹,
Chao Xu¹,
Ying Sun^1,2,3,
Guozhang Jiang³,
Bo Tao^1,2,3,
Xiliang Tong^1,2,
Manman Xu^1,2,
Gongfa Li^1,2,3 &
…
Juntong Yun¹

976 Accesses
19 Citations
1 Altmetric
Explore all metrics

Abstract

3D target detection is a research hotspot in recent years. In the field of autonomous driving, 3D target detection is mainly targeted at outdoor scenes that the camera height is constant. In a few indoor scenes, 3D target detection is mostly at the category level. However, it is difficult to generate instance-level 3D target detection datasets. In complex indoor scenes, instance-level 3D target detection is used as the research object in this paper. The indoor 3D target detection dataset is constructed by Aruco marker. A pixel-by-pixel key point voting network for joint semantic segmentation of RGB images is established, and a new key point assumption strategy is proposed. Combined with depth images, the key point detection is extended to three dimensions, and the bit pose is optimized by the ICP algorithm. The evaluation metrics and visualization of the model are analyzed and compared. It is tested and visualized under validation set, truncated validation set and unlabeled. The generalization of the method in this paper is proved, and 3D target detection in indoor scene based on RGB image and RGB-D image is achieved.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-task learning and joint refinement between camera localization and object detection

Article Open access 08 February 2024

Path aggregation one-stage anchor free 3D object detection

Article 17 August 2023

Monocular 3D object detection via estimation of paired keypoints for autonomous driving

Article 03 January 2022

Data availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

Krizhevsky A, Sutskever I, Hinton G (2012) ImageNet classification with deep convolutional neural networks. Adv Neural Inf Proces Syst 25(2):1097–1105
Google Scholar
Jiang D, Li G, Sun Y, J.Hu, Yun J, Liu Y (2021) Manipulator grabbing position detection with information fusion of color image and depth image using deep learning. J Ambient Intell Humaniz Comput 12(12):10809–10822
Wang L, Li R, Sun J, Liu X, Zhao L, Seah H, Quah C, Tandianus B (2019) Multi-view fusion-based 3D object detection for robot indoor scene perception. Sensors 19(19):4092
Google Scholar
Kuang H, Wang B, An J, Zhang M, Zhang Z (2020) Voxel-FPN: multi-scale voxel feature aggregation for 3D object detection from LIDAR point clouds. Sensors 20(3):704
Google Scholar
Sun Y, Weng Y, Luo B, Li G, Tao B, Jiang D, Chen D (2020) Gesture recognition algorithm based on multi-scale feature fusion in RGB-D images. IET Image Process 14(15):3662–3668
Google Scholar
Rahman M, Tan Y, Xue J, Shao L, Lu K (2018) 3D object detection: learning 3D bounding boxes from scaled down 2D bounding boxes in RGB-D images. Inf Sci 476:147–158
Google Scholar
Tan C, Sun Y, Li G, Jiang D, Chen D, Liu H (2020) Research on gesture recognition of smart data fusion features in the IoT. Neural Comput & Applic 32(22):16917–16929
Google Scholar
Jiang D, Li G, Tan C, Huang L, Sun Y, Kong J (2021) Semantic segmentation for multiscale target based on object recognition using the improved faster-RCNN model. Futur Gener Comput Syst 123:94–104
Google Scholar
Arnold E, Dianati M, Temple R, Fallah S (2020) Cooperative perception for 3D object detection in driving scenarios using infrastructure sensors. IEEE Trans Intell Transp Syst. https://doi.org/10.1109/TITS.2020.3028424
Wu Y, Qin H, Liu T, Liu H, Wei Z (2019) A 3D object detection based on multi-modality sensors of USV. Appl Sci 9(3):535–543
Google Scholar
Lepetit V, Moreno N, Fua P (2009) EPnP: An accurate on solution to the PnP problem. Int J Comput Vis 81(2):155–166
Google Scholar
Guan Y, Li W, Zhang B, Han B, Ji M (2020) Multi-label classification by formulating label-specific features from simultaneous instance level and feature level. Appl Intell 9:1–16
Google Scholar
Bay H (2006) SURF: speeded up robust features. Comput Vision Image Understanding 110(3):404–417
Google Scholar
Li G, Li J, Ju Z, Sun Y, Kong J (2019) A novel feature extraction method for machine learning based on surface electromyography from healthy brain. Neural Comput Applic 31(12):9013–9022
Google Scholar
Ma C, Hu X, Xiao J, Du H, Zhang G (2020) Improved ORB algorithm using three-patch method and local gray difference. Sensors 20(4)
Li W, Wang J, Qi (2010) Spin-image surface matching based target recognition in laser radar range imagery. Chinese Physics B 19(10):281–288
Guo Y, Sohel F, Bennamoun M, Lu M, Wan J (2013) Rotational projection statistics for 3D local surface description and object recognition. Int J Comput Vis 105(1):63–86
MathSciNet MATH Google Scholar
Shih J, Chen H (2009) A 3D model retrieval approach using the interior and exterior 3D shape information. Multimed Tools Appl 43(1):45–62
Google Scholar
Li G, Jiang D, Zhou Y, Jiang G, Kong J, Manogaran G (2019) Human lesion detection method based on image information and brain signal. IEEE Access 7:11533–11542
Google Scholar
Prakhya S, Liu B, Lin W, Jakhetiya V, Guntuku S (2017) B-SHOT: a binary 3D feature descriptor for fast Keypoint matching on 3D point clouds. Auton Robot 41(7):1501–1520
Google Scholar
Guo Y, Bennamoun M, Sohel F, Lu M, Wan J (2016) A comprehensive performance evaluation of 3D local feature descriptors. Int J Comput Vis 116(1):66–89
MathSciNet Google Scholar
Kamranian Z, Sadeghian H, Nilchi A, Mehrandezh M (2020) Fast, yet robust end-to-end camera pose estimation for robotic applications. Appl Intell 3:1–19
Google Scholar
Li G, Tang H, Sun Y, Kong J, Jiang G, Jiang D, Tao B, Xu S, Liu H (2019) Hand gesture recognition based on convolution neural network. Clust Comput 22(Suppl.2):2719–2729. https://doi.org/10.1007/s10586-017-1435-x
Article Google Scholar
Tejani A, Kouskouridas R, Doumanoglou A, Tang D, Kim T (2017) Latent-class hough forests for 6 DOF object pose estimation. IEEE Trans Pattern Anal Mach Intell 40(1):119–132
Google Scholar
Jfa B, Lm B, Zhi B (2020) A registration method of point cloud to CAD model based on edge matching. Optik 219:165223
Google Scholar
Zhang T, Yang Y, Zeng Y, Zhao Y (2020) Cognitive template-clustering improved linemod for efficient multi-object pose estimation. Cogn Comput 12(4):834–843
Google Scholar
Zhai S, Shang D, Wang S, Dong S (2020) DF-SSD: an improved ssd object detection algorithm based on densenet and feature fusion. IEEE Access 8:24344–24357
Google Scholar
Jiang D, Li G, Sun Y, Kong J, Tao B (2019) Gesture recognition based on skeletonization algorithm and CNN with ASL database. Multimed Tools Appl 78(21):29953–29970
Google Scholar
Liu F, Fang P, Yao Z, Fan R, Pan Z, Sheng W, Yang H (2019) Recovering 6D object pose from RGB indoor image based on two-stage detection network with multi-task loss. Neurocomputing 337:15–23
Google Scholar
Guo J, Xing X, Quan W, Yan D, Gu Q (2021) Efficient center voting for object detection and 6D pose estimation in 3D point cloud. IEEE Trans Image Process: 1–1
Omachi S, Omachi M (2007) Fast template matching with polynomials. IEEE Trans Image Process 16(8):2139–2149
MathSciNet MATH Google Scholar
Liu Y, Zhou B, Han C, Guo T, Qin J (2020) A novel method based on deep learning for aligned fingerprints matching. Appl Intell 50(2):397–416
Google Scholar
Spratling M (2019) Explaining away results in accurate and tolerant template matching. Pattern Recogn 104:107337
Google Scholar
Huang L, He M, Tan C, Jiang D, Li G, Yu H (2020) Jointly network image processing: multi-task image semantic segmentation of indoor scene based on CNN. IET Image Process 14(15):3689–3697
Google Scholar
Liao S, Li G, Li J, Jiang D, Jiang G, Sun Y, Tao B, Zhao H, Chen D (2020) Multi-object intergroup gesture recognition combined with fusion feature and KNN algorithm. J Intell Fuzzy Syst 38(3):2725–2735
Google Scholar
Lu X, Tatarczak A, Lyubopytov V, Monroy I (2017) Optimized eight-dimensional lattice modulation format for IM-DD 56 Gb/s optical interconnections using 850 nm VCSELs. J Lightwave Technol 35(8):1407–1414
Google Scholar
Gai R, Chen N, Yuan H (2021) A detection algorithm for cherry fruits based on the improved YOLO-v4 model. Neural Comput Appl: 1–12
Cao M, Jia W, Zhao Y, Li S, Liu X (2018) Fast and robust absolute camera pose estimation with known focal length. Neural Comput Applic 29(5):1383–1398
Google Scholar
Peng S, Zhou X, Liu Y, Lin H, Huang Q, Bao H (2020) PVNet: pixel-wise voting network for 6dof object pose estimation. IEEE Trans. Pattern Anal Mach Intell: 1–1
Duan H, Sun Y, Cheng W, Jiang D, Yun J, Liu Y, Liu Y, Zhou D (2021) Gesture recognition based on multi-modal feature weight. Concurrency Comput: Pract Exper 33(5):e5991. https://doi.org/10.1002/cpe.5991
Eldar Y, Lindenbaum M, Porat M, Zeevi Y (1997) The farthest point strategy for progressive image sampling. IEEE Trans Image Process A Publ IEEE Signal Process Soc 6(9):1305–1315
Google Scholar
Saha S, Mou L, Qiu C, Zhu X, Bovolo F (2020) Unsupervised deep joint segmentation of multitemporal high-resolution images. IEEE Trans Geosci Remote Sens 58(12):8780–8792
Google Scholar
Liu H, Wu W, Wang X, Qian Y (2018) RGB-D joint modelling with scene geometric information for indoor semantic segmentation. Multimed Tools Appl 77(17):22475–22488
Google Scholar
Huang J, Liu B, Fu L (2020) Joint multi-scale discrimination and region segmentation for person re-ID. Pattern Recogn Lett 138:540–547
Google Scholar
Liu Y, Jiang D, Duan H, Sun Y, Li G, Tao B, Yun J, Liu Y, Chen B (2021) Dynamic gesture recognition algorithm based on 3D convolutional neural network. Comput Intell Neurosci 2021:4828102–4828112
Google Scholar
Oscadal P, Heczko D, Vysocky A, Mlotek J, Novak P, Virgala I, Sukop M, Bobovsky Z (2020) Improved pose estimation of aruco tags using a novel 3D placement strategy. Sensors 20(17):4825
Google Scholar
He W, Cai J, Xiong G, Zhou K (2018) Improved reversible data hiding using pixel-based pixel value grouping. J Light Electronoptic 157:68–78
Google Scholar
Liu T, Li G, Nie X, Wang H, Zhang D (2021) Enhancement of contour smoothness by substitution of interpolated sub-pixel points for edge pixels. IEEE Access: 1–1
Lu J, Wang Z, Hua B, Chen K (2020) Automatic point cloud registration algorithm based on the feature histogram of local surface. PLoS One 15(9):e0238802
Google Scholar
JuHyok U, Lu P, Kim C, Ryu U, Pak K (2020) A new LSTM based reversal point prediction method using upward/downward reversal point feature sets. Chaos, Solitons Fractals 132:109559
MathSciNet Google Scholar
Vasquez J, Perez S, Travieso C, Alonso J (2013) Meteorological prediction implemented on field-programmable gate array. Cogn Comput 5(4):551–557
Google Scholar
Tian J, Cheng W, Sun Y, Li G, Jiang D, Jiang G, Tao B, Zhao H, Chen D (2020) Gesture recognition based on multilevel multimodal feature fusion. J Intel Fuzzy Syst 38(3):2539–2550
Google Scholar
Lu L, Li H, Ding Z, Guo Q (2020) An improved target detection method based on multiscale features fusion. Microw Opt Technol Lett 62(9):3051–3059
Google Scholar
Cheng Y, Li G, Yu M, Jiang D, Yun J, Liu Y, Liu Y, Chen D (2021) Gesture recognition based on surface electromyography-feature image. Concurr Comput: Pract Exper 33(6):e6051.https://doi.org/10.1002/cpe.6051
Liu Y, Jiang D, Tao B, Qi J, Jiang G, Yun J, Huang L, Tong X, Chen B, Li G (2022) Grasping posture of humanoid manipulator based on target shape analysis and force closure. Alexandria Eng J 61(5):3959–3969. https://doi.org/10.1016/j.aej.2021.09.017
Article Google Scholar
Chen Y, Guo B, Shen Y, Wang W, Suo X, Zhang Z (2020) Using efficient group pseudo-3d network to learn spatio-temporal features. SIViP 15(2):361–369
Google Scholar
Weng Y, Sun Y, Jiang D, Tao B, Liu Y, Yun J, Zhou D (2021) Enhancement of real-time grasp detection by cascaded deep convolutional neural networks. Concurr Comput: Pract Exp 5(33):e5976. https://doi.org/10.1002/cpe.5976
Ma S, Guo P, You H, He P, Li H (2021) An image matching optimization algorithm based on pixel shift clustering RANSAC. Inf Sci 562:452–474
MathSciNet Google Scholar
Zhang S, Li S, Zhang B, Peng M (2020) Integration of optimal spatial distributed tie-points in RANSAC-based image registration. Eur J Remote Sensing 53(1):67–80
MathSciNet Google Scholar
Jiang D, Zheng Z, Li G, Sun Y, Kong J, Jiang G, Xiong H, Tao B, Xu S, Liu H, Ju Z (2019) Gesture recognition based on binocular vision. Cluster Comput 22(Supple.6):13261–13271. https://doi.org/10.1007/s10586-018-1844-5
Article Google Scholar
He Y, Li G, Liao Y, Sun Y, Kong J, Jiang G, Jiang D, Liu H Gesture recognition based on an improved local sparse representation classification algorithm. Cluster Comput 22(Supple.5):10935–10946. https://doi.org/10.1007/s10586-017-1237-1
Xie P, Zhang L, Du C, Wang X, Zhong W (2021) Space target attitude estimation from ISAR image sequences with key point extraction network. IEEE Signal Processing Lett: 1–1
Liao S, Li G, Wu H, Jiang D, Liu Y, Yun J, Liu Y, Zhou D (2021) Occlusion gesture recognition based on improved SSD. Concurr Comput: Pract Exper 33(6):e6063. https://doi.org/10.1002/cpe.6063
Luo B, Sun Y, Li G, Chen D, Ju Z (2020) Decomposition algorithm for depth image of human health posture based on brain health. Neural Comput & Applic 32(10):6327–6342
Google Scholar
Du S, Xu G, Zhang S, Zhang X, Yue G, Chen B (2020) Robust rigid registration algorithm based on pointwise correspondence and correntropy. Pattern Recogn Lett 132:91–98
Google Scholar
Xiao F, Li G, Jiang D, Xie Y, Yun J, Liu Y, Huang L, Fang Z (2021) An effective and unified method to derive the inverse kinematics formulas of general six-DOF manipulator with simple geometry. Mech Mach Theory 159:104265
Google Scholar
Liu X, Jiang D, Tao B, Jiang G, Sun Y, Kong J, Tong X, Zhao G, Chen B (2022) Genetic algorithm-based trajectory optimization for digital twin robots. Front Bioeng Biotechnol 9:793782. https://doi.org/10.3389/fbioe.2021.793782
Article Google Scholar
Liu Y, Jiang D, Yun J, Sun Y, Li C, Jiang G, Kong J, Tao B, Fang Z (2022) Self-tuning control of manipulator positioning based on fuzzy PID and PSO algorithm. Fronti Bioeng Biotechnol 9:817723. https://doi.org/10.3389/fbioe.2021.817723
Article Google Scholar

Download references

Acknowledgments

This work was supported by grants of the National Natural Science Foundation of China (Grant Nos.52075530, 51575407, 51505349, 61733011, 41906177); the Grants of Hubei Provincial Department of Education (D20191105); the Grants of National Defense PreResearch Foundation of Wuhan University of Science and Technology (GF201705) and Open Fund of the Key Laboratory for Metallurgical Equipment and Control of Ministry of Education in Wuhan University of Science and Technology (2018B07,2019B13).

Author information

Authors and Affiliations

Key Laboratory of Metallurgical Equipment and Control Technology of Ministry of Education, Wuhan University of Science and Technology, Wuhan, 430081, China
Ying Liu, Du Jiang, Chao Xu, Ying Sun, Bo Tao, Xiliang Tong, Manman Xu, Gongfa Li & Juntong Yun
Research Center for Biomimetic Robot and Intelligent Measurement and Control, Wuhan University of Science and Technology, Wuhan, 430081, China
Ying Sun, Bo Tao, Xiliang Tong, Manman Xu & Gongfa Li
Hubei Key Laboratory of Mechanical Transmission and Manufacturing Engineering, Wuhan University of Science and Technology, Wuhan, 430081, China
Ying Sun, Guozhang Jiang, Bo Tao & Gongfa Li

Authors

Ying Liu
View author publications
You can also search for this author in PubMed Google Scholar
Du Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Chao Xu
View author publications
You can also search for this author in PubMed Google Scholar
Ying Sun
View author publications
You can also search for this author in PubMed Google Scholar
Guozhang Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Bo Tao
View author publications
You can also search for this author in PubMed Google Scholar
Xiliang Tong
View author publications
You can also search for this author in PubMed Google Scholar
Manman Xu
View author publications
You can also search for this author in PubMed Google Scholar
Gongfa Li
View author publications
You can also search for this author in PubMed Google Scholar
Juntong Yun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Du Jiang, Bo Tao or Gongfa Li.

Ethics declarations

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, Y., Jiang, D., Xu, C. et al. Deep learning based 3D target detection for indoor scenes. Appl Intell 53, 10218–10231 (2023). https://doi.org/10.1007/s10489-022-03888-4

Download citation

Accepted: 11 June 2022
Published: 16 August 2022
Issue Date: May 2023
DOI: https://doi.org/10.1007/s10489-022-03888-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep learning based 3D target detection for indoor scenes

Abstract

Access this article

Similar content being viewed by others

Multi-task learning and joint refinement between camera localization and object detection

Path aggregation one-stage anchor free 3D object detection

Monocular 3D object detection via estimation of paired keypoints for autonomous driving

Data availability

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Deep learning based 3D target detection for indoor scenes

Abstract

Access this article

Similar content being viewed by others

Multi-task learning and joint refinement between camera localization and object detection

Path aggregation one-stage anchor free 3D object detection

Monocular 3D object detection via estimation of paired keypoints for autonomous driving

Data availability

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation