Abstract
3D target detection is a research hotspot in recent years. In the field of autonomous driving, 3D target detection is mainly targeted at outdoor scenes that the camera height is constant. In a few indoor scenes, 3D target detection is mostly at the category level. However, it is difficult to generate instance-level 3D target detection datasets. In complex indoor scenes, instance-level 3D target detection is used as the research object in this paper. The indoor 3D target detection dataset is constructed by Aruco marker. A pixel-by-pixel key point voting network for joint semantic segmentation of RGB images is established, and a new key point assumption strategy is proposed. Combined with depth images, the key point detection is extended to three dimensions, and the bit pose is optimized by the ICP algorithm. The evaluation metrics and visualization of the model are analyzed and compared. It is tested and visualized under validation set, truncated validation set and unlabeled. The generalization of the method in this paper is proved, and 3D target detection in indoor scene based on RGB image and RGB-D image is achieved.
Similar content being viewed by others
Data availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.
References
Krizhevsky A, Sutskever I, Hinton G (2012) ImageNet classification with deep convolutional neural networks. Adv Neural Inf Proces Syst 25(2):1097–1105
Jiang D, Li G, Sun Y, J.Hu, Yun J, Liu Y (2021) Manipulator grabbing position detection with information fusion of color image and depth image using deep learning. J Ambient Intell Humaniz Comput 12(12):10809–10822
Wang L, Li R, Sun J, Liu X, Zhao L, Seah H, Quah C, Tandianus B (2019) Multi-view fusion-based 3D object detection for robot indoor scene perception. Sensors 19(19):4092
Kuang H, Wang B, An J, Zhang M, Zhang Z (2020) Voxel-FPN: multi-scale voxel feature aggregation for 3D object detection from LIDAR point clouds. Sensors 20(3):704
Sun Y, Weng Y, Luo B, Li G, Tao B, Jiang D, Chen D (2020) Gesture recognition algorithm based on multi-scale feature fusion in RGB-D images. IET Image Process 14(15):3662–3668
Rahman M, Tan Y, Xue J, Shao L, Lu K (2018) 3D object detection: learning 3D bounding boxes from scaled down 2D bounding boxes in RGB-D images. Inf Sci 476:147–158
Tan C, Sun Y, Li G, Jiang D, Chen D, Liu H (2020) Research on gesture recognition of smart data fusion features in the IoT. Neural Comput & Applic 32(22):16917–16929
Jiang D, Li G, Tan C, Huang L, Sun Y, Kong J (2021) Semantic segmentation for multiscale target based on object recognition using the improved faster-RCNN model. Futur Gener Comput Syst 123:94–104
Arnold E, Dianati M, Temple R, Fallah S (2020) Cooperative perception for 3D object detection in driving scenarios using infrastructure sensors. IEEE Trans Intell Transp Syst. https://doi.org/10.1109/TITS.2020.3028424
Wu Y, Qin H, Liu T, Liu H, Wei Z (2019) A 3D object detection based on multi-modality sensors of USV. Appl Sci 9(3):535–543
Lepetit V, Moreno N, Fua P (2009) EPnP: An accurate on solution to the PnP problem. Int J Comput Vis 81(2):155–166
Guan Y, Li W, Zhang B, Han B, Ji M (2020) Multi-label classification by formulating label-specific features from simultaneous instance level and feature level. Appl Intell 9:1–16
Bay H (2006) SURF: speeded up robust features. Comput Vision Image Understanding 110(3):404–417
Li G, Li J, Ju Z, Sun Y, Kong J (2019) A novel feature extraction method for machine learning based on surface electromyography from healthy brain. Neural Comput Applic 31(12):9013–9022
Ma C, Hu X, Xiao J, Du H, Zhang G (2020) Improved ORB algorithm using three-patch method and local gray difference. Sensors 20(4)
Li W, Wang J, Qi (2010) Spin-image surface matching based target recognition in laser radar range imagery. Chinese Physics B 19(10):281–288
Guo Y, Sohel F, Bennamoun M, Lu M, Wan J (2013) Rotational projection statistics for 3D local surface description and object recognition. Int J Comput Vis 105(1):63–86
Shih J, Chen H (2009) A 3D model retrieval approach using the interior and exterior 3D shape information. Multimed Tools Appl 43(1):45–62
Li G, Jiang D, Zhou Y, Jiang G, Kong J, Manogaran G (2019) Human lesion detection method based on image information and brain signal. IEEE Access 7:11533–11542
Prakhya S, Liu B, Lin W, Jakhetiya V, Guntuku S (2017) B-SHOT: a binary 3D feature descriptor for fast Keypoint matching on 3D point clouds. Auton Robot 41(7):1501–1520
Guo Y, Bennamoun M, Sohel F, Lu M, Wan J (2016) A comprehensive performance evaluation of 3D local feature descriptors. Int J Comput Vis 116(1):66–89
Kamranian Z, Sadeghian H, Nilchi A, Mehrandezh M (2020) Fast, yet robust end-to-end camera pose estimation for robotic applications. Appl Intell 3:1–19
Li G, Tang H, Sun Y, Kong J, Jiang G, Jiang D, Tao B, Xu S, Liu H (2019) Hand gesture recognition based on convolution neural network. Clust Comput 22(Suppl.2):2719–2729. https://doi.org/10.1007/s10586-017-1435-x
Tejani A, Kouskouridas R, Doumanoglou A, Tang D, Kim T (2017) Latent-class hough forests for 6 DOF object pose estimation. IEEE Trans Pattern Anal Mach Intell 40(1):119–132
Jfa B, Lm B, Zhi B (2020) A registration method of point cloud to CAD model based on edge matching. Optik 219:165223
Zhang T, Yang Y, Zeng Y, Zhao Y (2020) Cognitive template-clustering improved linemod for efficient multi-object pose estimation. Cogn Comput 12(4):834–843
Zhai S, Shang D, Wang S, Dong S (2020) DF-SSD: an improved ssd object detection algorithm based on densenet and feature fusion. IEEE Access 8:24344–24357
Jiang D, Li G, Sun Y, Kong J, Tao B (2019) Gesture recognition based on skeletonization algorithm and CNN with ASL database. Multimed Tools Appl 78(21):29953–29970
Liu F, Fang P, Yao Z, Fan R, Pan Z, Sheng W, Yang H (2019) Recovering 6D object pose from RGB indoor image based on two-stage detection network with multi-task loss. Neurocomputing 337:15–23
Guo J, Xing X, Quan W, Yan D, Gu Q (2021) Efficient center voting for object detection and 6D pose estimation in 3D point cloud. IEEE Trans Image Process: 1–1
Omachi S, Omachi M (2007) Fast template matching with polynomials. IEEE Trans Image Process 16(8):2139–2149
Liu Y, Zhou B, Han C, Guo T, Qin J (2020) A novel method based on deep learning for aligned fingerprints matching. Appl Intell 50(2):397–416
Spratling M (2019) Explaining away results in accurate and tolerant template matching. Pattern Recogn 104:107337
Huang L, He M, Tan C, Jiang D, Li G, Yu H (2020) Jointly network image processing: multi-task image semantic segmentation of indoor scene based on CNN. IET Image Process 14(15):3689–3697
Liao S, Li G, Li J, Jiang D, Jiang G, Sun Y, Tao B, Zhao H, Chen D (2020) Multi-object intergroup gesture recognition combined with fusion feature and KNN algorithm. J Intell Fuzzy Syst 38(3):2725–2735
Lu X, Tatarczak A, Lyubopytov V, Monroy I (2017) Optimized eight-dimensional lattice modulation format for IM-DD 56 Gb/s optical interconnections using 850 nm VCSELs. J Lightwave Technol 35(8):1407–1414
Gai R, Chen N, Yuan H (2021) A detection algorithm for cherry fruits based on the improved YOLO-v4 model. Neural Comput Appl: 1–12
Cao M, Jia W, Zhao Y, Li S, Liu X (2018) Fast and robust absolute camera pose estimation with known focal length. Neural Comput Applic 29(5):1383–1398
Peng S, Zhou X, Liu Y, Lin H, Huang Q, Bao H (2020) PVNet: pixel-wise voting network for 6dof object pose estimation. IEEE Trans. Pattern Anal Mach Intell: 1–1
Duan H, Sun Y, Cheng W, Jiang D, Yun J, Liu Y, Liu Y, Zhou D (2021) Gesture recognition based on multi-modal feature weight. Concurrency Comput: Pract Exper 33(5):e5991. https://doi.org/10.1002/cpe.5991
Eldar Y, Lindenbaum M, Porat M, Zeevi Y (1997) The farthest point strategy for progressive image sampling. IEEE Trans Image Process A Publ IEEE Signal Process Soc 6(9):1305–1315
Saha S, Mou L, Qiu C, Zhu X, Bovolo F (2020) Unsupervised deep joint segmentation of multitemporal high-resolution images. IEEE Trans Geosci Remote Sens 58(12):8780–8792
Liu H, Wu W, Wang X, Qian Y (2018) RGB-D joint modelling with scene geometric information for indoor semantic segmentation. Multimed Tools Appl 77(17):22475–22488
Huang J, Liu B, Fu L (2020) Joint multi-scale discrimination and region segmentation for person re-ID. Pattern Recogn Lett 138:540–547
Liu Y, Jiang D, Duan H, Sun Y, Li G, Tao B, Yun J, Liu Y, Chen B (2021) Dynamic gesture recognition algorithm based on 3D convolutional neural network. Comput Intell Neurosci 2021:4828102–4828112
Oscadal P, Heczko D, Vysocky A, Mlotek J, Novak P, Virgala I, Sukop M, Bobovsky Z (2020) Improved pose estimation of aruco tags using a novel 3D placement strategy. Sensors 20(17):4825
He W, Cai J, Xiong G, Zhou K (2018) Improved reversible data hiding using pixel-based pixel value grouping. J Light Electronoptic 157:68–78
Liu T, Li G, Nie X, Wang H, Zhang D (2021) Enhancement of contour smoothness by substitution of interpolated sub-pixel points for edge pixels. IEEE Access: 1–1
Lu J, Wang Z, Hua B, Chen K (2020) Automatic point cloud registration algorithm based on the feature histogram of local surface. PLoS One 15(9):e0238802
JuHyok U, Lu P, Kim C, Ryu U, Pak K (2020) A new LSTM based reversal point prediction method using upward/downward reversal point feature sets. Chaos, Solitons Fractals 132:109559
Vasquez J, Perez S, Travieso C, Alonso J (2013) Meteorological prediction implemented on field-programmable gate array. Cogn Comput 5(4):551–557
Tian J, Cheng W, Sun Y, Li G, Jiang D, Jiang G, Tao B, Zhao H, Chen D (2020) Gesture recognition based on multilevel multimodal feature fusion. J Intel Fuzzy Syst 38(3):2539–2550
Lu L, Li H, Ding Z, Guo Q (2020) An improved target detection method based on multiscale features fusion. Microw Opt Technol Lett 62(9):3051–3059
Cheng Y, Li G, Yu M, Jiang D, Yun J, Liu Y, Liu Y, Chen D (2021) Gesture recognition based on surface electromyography-feature image. Concurr Comput: Pract Exper 33(6):e6051.https://doi.org/10.1002/cpe.6051
Liu Y, Jiang D, Tao B, Qi J, Jiang G, Yun J, Huang L, Tong X, Chen B, Li G (2022) Grasping posture of humanoid manipulator based on target shape analysis and force closure. Alexandria Eng J 61(5):3959–3969. https://doi.org/10.1016/j.aej.2021.09.017
Chen Y, Guo B, Shen Y, Wang W, Suo X, Zhang Z (2020) Using efficient group pseudo-3d network to learn spatio-temporal features. SIViP 15(2):361–369
Weng Y, Sun Y, Jiang D, Tao B, Liu Y, Yun J, Zhou D (2021) Enhancement of real-time grasp detection by cascaded deep convolutional neural networks. Concurr Comput: Pract Exp 5(33):e5976. https://doi.org/10.1002/cpe.5976
Ma S, Guo P, You H, He P, Li H (2021) An image matching optimization algorithm based on pixel shift clustering RANSAC. Inf Sci 562:452–474
Zhang S, Li S, Zhang B, Peng M (2020) Integration of optimal spatial distributed tie-points in RANSAC-based image registration. Eur J Remote Sensing 53(1):67–80
Jiang D, Zheng Z, Li G, Sun Y, Kong J, Jiang G, Xiong H, Tao B, Xu S, Liu H, Ju Z (2019) Gesture recognition based on binocular vision. Cluster Comput 22(Supple.6):13261–13271. https://doi.org/10.1007/s10586-018-1844-5
He Y, Li G, Liao Y, Sun Y, Kong J, Jiang G, Jiang D, Liu H Gesture recognition based on an improved local sparse representation classification algorithm. Cluster Comput 22(Supple.5):10935–10946. https://doi.org/10.1007/s10586-017-1237-1
Xie P, Zhang L, Du C, Wang X, Zhong W (2021) Space target attitude estimation from ISAR image sequences with key point extraction network. IEEE Signal Processing Lett: 1–1
Liao S, Li G, Wu H, Jiang D, Liu Y, Yun J, Liu Y, Zhou D (2021) Occlusion gesture recognition based on improved SSD. Concurr Comput: Pract Exper 33(6):e6063. https://doi.org/10.1002/cpe.6063
Luo B, Sun Y, Li G, Chen D, Ju Z (2020) Decomposition algorithm for depth image of human health posture based on brain health. Neural Comput & Applic 32(10):6327–6342
Du S, Xu G, Zhang S, Zhang X, Yue G, Chen B (2020) Robust rigid registration algorithm based on pointwise correspondence and correntropy. Pattern Recogn Lett 132:91–98
Xiao F, Li G, Jiang D, Xie Y, Yun J, Liu Y, Huang L, Fang Z (2021) An effective and unified method to derive the inverse kinematics formulas of general six-DOF manipulator with simple geometry. Mech Mach Theory 159:104265
Liu X, Jiang D, Tao B, Jiang G, Sun Y, Kong J, Tong X, Zhao G, Chen B (2022) Genetic algorithm-based trajectory optimization for digital twin robots. Front Bioeng Biotechnol 9:793782. https://doi.org/10.3389/fbioe.2021.793782
Liu Y, Jiang D, Yun J, Sun Y, Li C, Jiang G, Kong J, Tao B, Fang Z (2022) Self-tuning control of manipulator positioning based on fuzzy PID and PSO algorithm. Fronti Bioeng Biotechnol 9:817723. https://doi.org/10.3389/fbioe.2021.817723
Acknowledgments
This work was supported by grants of the National Natural Science Foundation of China (Grant Nos.52075530, 51575407, 51505349, 61733011, 41906177); the Grants of Hubei Provincial Department of Education (D20191105); the Grants of National Defense PreResearch Foundation of Wuhan University of Science and Technology (GF201705) and Open Fund of the Key Laboratory for Metallurgical Equipment and Control of Ministry of Education in Wuhan University of Science and Technology (2018B07,2019B13).
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Liu, Y., Jiang, D., Xu, C. et al. Deep learning based 3D target detection for indoor scenes. Appl Intell 53, 10218–10231 (2023). https://doi.org/10.1007/s10489-022-03888-4
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03888-4