Skip to main content
Log in

Deep learning based 3D target detection for indoor scenes

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

3D target detection is a research hotspot in recent years. In the field of autonomous driving, 3D target detection is mainly targeted at outdoor scenes that the camera height is constant. In a few indoor scenes, 3D target detection is mostly at the category level. However, it is difficult to generate instance-level 3D target detection datasets. In complex indoor scenes, instance-level 3D target detection is used as the research object in this paper. The indoor 3D target detection dataset is constructed by Aruco marker. A pixel-by-pixel key point voting network for joint semantic segmentation of RGB images is established, and a new key point assumption strategy is proposed. Combined with depth images, the key point detection is extended to three dimensions, and the bit pose is optimized by the ICP algorithm. The evaluation metrics and visualization of the model are analyzed and compared. It is tested and visualized under validation set, truncated validation set and unlabeled. The generalization of the method in this paper is proved, and 3D target detection in indoor scene based on RGB image and RGB-D image is achieved.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Data availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

  1. Krizhevsky A, Sutskever I, Hinton G (2012) ImageNet classification with deep convolutional neural networks. Adv Neural Inf Proces Syst 25(2):1097–1105

    Google Scholar 

  2. Jiang D, Li G, Sun Y, J.Hu, Yun J, Liu Y (2021) Manipulator grabbing position detection with information fusion of color image and depth image using deep learning. J Ambient Intell Humaniz Comput 12(12):10809–10822

  3. Wang L, Li R, Sun J, Liu X, Zhao L, Seah H, Quah C, Tandianus B (2019) Multi-view fusion-based 3D object detection for robot indoor scene perception. Sensors 19(19):4092

    Google Scholar 

  4. Kuang H, Wang B, An J, Zhang M, Zhang Z (2020) Voxel-FPN: multi-scale voxel feature aggregation for 3D object detection from LIDAR point clouds. Sensors 20(3):704

    Google Scholar 

  5. Sun Y, Weng Y, Luo B, Li G, Tao B, Jiang D, Chen D (2020) Gesture recognition algorithm based on multi-scale feature fusion in RGB-D images. IET Image Process 14(15):3662–3668

    Google Scholar 

  6. Rahman M, Tan Y, Xue J, Shao L, Lu K (2018) 3D object detection: learning 3D bounding boxes from scaled down 2D bounding boxes in RGB-D images. Inf Sci 476:147–158

    Google Scholar 

  7. Tan C, Sun Y, Li G, Jiang D, Chen D, Liu H (2020) Research on gesture recognition of smart data fusion features in the IoT. Neural Comput & Applic 32(22):16917–16929

    Google Scholar 

  8. Jiang D, Li G, Tan C, Huang L, Sun Y, Kong J (2021) Semantic segmentation for multiscale target based on object recognition using the improved faster-RCNN model. Futur Gener Comput Syst 123:94–104

    Google Scholar 

  9. Arnold E, Dianati M, Temple R, Fallah S (2020) Cooperative perception for 3D object detection in driving scenarios using infrastructure sensors. IEEE Trans Intell Transp Syst. https://doi.org/10.1109/TITS.2020.3028424

  10. Wu Y, Qin H, Liu T, Liu H, Wei Z (2019) A 3D object detection based on multi-modality sensors of USV. Appl Sci 9(3):535–543

    Google Scholar 

  11. Lepetit V, Moreno N, Fua P (2009) EPnP: An accurate on solution to the PnP problem. Int J Comput Vis 81(2):155–166

    Google Scholar 

  12. Guan Y, Li W, Zhang B, Han B, Ji M (2020) Multi-label classification by formulating label-specific features from simultaneous instance level and feature level. Appl Intell 9:1–16

    Google Scholar 

  13. Bay H (2006) SURF: speeded up robust features. Comput Vision Image Understanding 110(3):404–417

    Google Scholar 

  14. Li G, Li J, Ju Z, Sun Y, Kong J (2019) A novel feature extraction method for machine learning based on surface electromyography from healthy brain. Neural Comput Applic 31(12):9013–9022

    Google Scholar 

  15. Ma C, Hu X, Xiao J, Du H, Zhang G (2020) Improved ORB algorithm using three-patch method and local gray difference. Sensors 20(4)

  16. Li W, Wang J, Qi (2010) Spin-image surface matching based target recognition in laser radar range imagery. Chinese Physics B 19(10):281–288

  17. Guo Y, Sohel F, Bennamoun M, Lu M, Wan J (2013) Rotational projection statistics for 3D local surface description and object recognition. Int J Comput Vis 105(1):63–86

    MathSciNet  MATH  Google Scholar 

  18. Shih J, Chen H (2009) A 3D model retrieval approach using the interior and exterior 3D shape information. Multimed Tools Appl 43(1):45–62

    Google Scholar 

  19. Li G, Jiang D, Zhou Y, Jiang G, Kong J, Manogaran G (2019) Human lesion detection method based on image information and brain signal. IEEE Access 7:11533–11542

    Google Scholar 

  20. Prakhya S, Liu B, Lin W, Jakhetiya V, Guntuku S (2017) B-SHOT: a binary 3D feature descriptor for fast Keypoint matching on 3D point clouds. Auton Robot 41(7):1501–1520

    Google Scholar 

  21. Guo Y, Bennamoun M, Sohel F, Lu M, Wan J (2016) A comprehensive performance evaluation of 3D local feature descriptors. Int J Comput Vis 116(1):66–89

    MathSciNet  Google Scholar 

  22. Kamranian Z, Sadeghian H, Nilchi A, Mehrandezh M (2020) Fast, yet robust end-to-end camera pose estimation for robotic applications. Appl Intell 3:1–19

    Google Scholar 

  23. Li G, Tang H, Sun Y, Kong J, Jiang G, Jiang D, Tao B, Xu S, Liu H (2019) Hand gesture recognition based on convolution neural network. Clust Comput 22(Suppl.2):2719–2729. https://doi.org/10.1007/s10586-017-1435-x

    Article  Google Scholar 

  24. Tejani A, Kouskouridas R, Doumanoglou A, Tang D, Kim T (2017) Latent-class hough forests for 6 DOF object pose estimation. IEEE Trans Pattern Anal Mach Intell 40(1):119–132

    Google Scholar 

  25. Jfa B, Lm B, Zhi B (2020) A registration method of point cloud to CAD model based on edge matching. Optik 219:165223

    Google Scholar 

  26. Zhang T, Yang Y, Zeng Y, Zhao Y (2020) Cognitive template-clustering improved linemod for efficient multi-object pose estimation. Cogn Comput 12(4):834–843

    Google Scholar 

  27. Zhai S, Shang D, Wang S, Dong S (2020) DF-SSD: an improved ssd object detection algorithm based on densenet and feature fusion. IEEE Access 8:24344–24357

    Google Scholar 

  28. Jiang D, Li G, Sun Y, Kong J, Tao B (2019) Gesture recognition based on skeletonization algorithm and CNN with ASL database. Multimed Tools Appl 78(21):29953–29970

    Google Scholar 

  29. Liu F, Fang P, Yao Z, Fan R, Pan Z, Sheng W, Yang H (2019) Recovering 6D object pose from RGB indoor image based on two-stage detection network with multi-task loss. Neurocomputing 337:15–23

    Google Scholar 

  30. Guo J, Xing X, Quan W, Yan D, Gu Q (2021) Efficient center voting for object detection and 6D pose estimation in 3D point cloud. IEEE Trans Image Process: 1–1

  31. Omachi S, Omachi M (2007) Fast template matching with polynomials. IEEE Trans Image Process 16(8):2139–2149

    MathSciNet  MATH  Google Scholar 

  32. Liu Y, Zhou B, Han C, Guo T, Qin J (2020) A novel method based on deep learning for aligned fingerprints matching. Appl Intell 50(2):397–416

    Google Scholar 

  33. Spratling M (2019) Explaining away results in accurate and tolerant template matching. Pattern Recogn 104:107337

    Google Scholar 

  34. Huang L, He M, Tan C, Jiang D, Li G, Yu H (2020) Jointly network image processing: multi-task image semantic segmentation of indoor scene based on CNN. IET Image Process 14(15):3689–3697

    Google Scholar 

  35. Liao S, Li G, Li J, Jiang D, Jiang G, Sun Y, Tao B, Zhao H, Chen D (2020) Multi-object intergroup gesture recognition combined with fusion feature and KNN algorithm. J Intell Fuzzy Syst 38(3):2725–2735

    Google Scholar 

  36. Lu X, Tatarczak A, Lyubopytov V, Monroy I (2017) Optimized eight-dimensional lattice modulation format for IM-DD 56 Gb/s optical interconnections using 850 nm VCSELs. J Lightwave Technol 35(8):1407–1414

    Google Scholar 

  37. Gai R, Chen N, Yuan H (2021) A detection algorithm for cherry fruits based on the improved YOLO-v4 model. Neural Comput Appl: 1–12

  38. Cao M, Jia W, Zhao Y, Li S, Liu X (2018) Fast and robust absolute camera pose estimation with known focal length. Neural Comput Applic 29(5):1383–1398

    Google Scholar 

  39. Peng S, Zhou X, Liu Y, Lin H, Huang Q, Bao H (2020) PVNet: pixel-wise voting network for 6dof object pose estimation. IEEE Trans. Pattern Anal Mach Intell: 1–1

  40. Duan H, Sun Y, Cheng W, Jiang D, Yun J, Liu Y, Liu Y, Zhou D (2021) Gesture recognition based on multi-modal feature weight. Concurrency Comput: Pract Exper 33(5):e5991. https://doi.org/10.1002/cpe.5991

  41. Eldar Y, Lindenbaum M, Porat M, Zeevi Y (1997) The farthest point strategy for progressive image sampling. IEEE Trans Image Process A Publ IEEE Signal Process Soc 6(9):1305–1315

    Google Scholar 

  42. Saha S, Mou L, Qiu C, Zhu X, Bovolo F (2020) Unsupervised deep joint segmentation of multitemporal high-resolution images. IEEE Trans Geosci Remote Sens 58(12):8780–8792

    Google Scholar 

  43. Liu H, Wu W, Wang X, Qian Y (2018) RGB-D joint modelling with scene geometric information for indoor semantic segmentation. Multimed Tools Appl 77(17):22475–22488

    Google Scholar 

  44. Huang J, Liu B, Fu L (2020) Joint multi-scale discrimination and region segmentation for person re-ID. Pattern Recogn Lett 138:540–547

    Google Scholar 

  45. Liu Y, Jiang D, Duan H, Sun Y, Li G, Tao B, Yun J, Liu Y, Chen B (2021) Dynamic gesture recognition algorithm based on 3D convolutional neural network. Comput Intell Neurosci 2021:4828102–4828112

    Google Scholar 

  46. Oscadal P, Heczko D, Vysocky A, Mlotek J, Novak P, Virgala I, Sukop M, Bobovsky Z (2020) Improved pose estimation of aruco tags using a novel 3D placement strategy. Sensors 20(17):4825

    Google Scholar 

  47. He W, Cai J, Xiong G, Zhou K (2018) Improved reversible data hiding using pixel-based pixel value grouping. J Light Electronoptic 157:68–78

    Google Scholar 

  48. Liu T, Li G, Nie X, Wang H, Zhang D (2021) Enhancement of contour smoothness by substitution of interpolated sub-pixel points for edge pixels. IEEE Access: 1–1

  49. Lu J, Wang Z, Hua B, Chen K (2020) Automatic point cloud registration algorithm based on the feature histogram of local surface. PLoS One 15(9):e0238802

    Google Scholar 

  50. JuHyok U, Lu P, Kim C, Ryu U, Pak K (2020) A new LSTM based reversal point prediction method using upward/downward reversal point feature sets. Chaos, Solitons Fractals 132:109559

    MathSciNet  Google Scholar 

  51. Vasquez J, Perez S, Travieso C, Alonso J (2013) Meteorological prediction implemented on field-programmable gate array. Cogn Comput 5(4):551–557

    Google Scholar 

  52. Tian J, Cheng W, Sun Y, Li G, Jiang D, Jiang G, Tao B, Zhao H, Chen D (2020) Gesture recognition based on multilevel multimodal feature fusion. J Intel Fuzzy Syst 38(3):2539–2550

    Google Scholar 

  53. Lu L, Li H, Ding Z, Guo Q (2020) An improved target detection method based on multiscale features fusion. Microw Opt Technol Lett 62(9):3051–3059

    Google Scholar 

  54. Cheng Y, Li G, Yu M, Jiang D, Yun J, Liu Y, Liu Y, Chen D (2021) Gesture recognition based on surface electromyography-feature image. Concurr Comput: Pract Exper 33(6):e6051.https://doi.org/10.1002/cpe.6051

  55. Liu Y, Jiang D, Tao B, Qi J, Jiang G, Yun J, Huang L, Tong X, Chen B, Li G (2022) Grasping posture of humanoid manipulator based on target shape analysis and force closure. Alexandria Eng J 61(5):3959–3969. https://doi.org/10.1016/j.aej.2021.09.017

    Article  Google Scholar 

  56. Chen Y, Guo B, Shen Y, Wang W, Suo X, Zhang Z (2020) Using efficient group pseudo-3d network to learn spatio-temporal features. SIViP 15(2):361–369

    Google Scholar 

  57. Weng Y, Sun Y, Jiang D, Tao B, Liu Y, Yun J, Zhou D (2021) Enhancement of real-time grasp detection by cascaded deep convolutional neural networks. Concurr Comput: Pract Exp 5(33):e5976. https://doi.org/10.1002/cpe.5976

  58. Ma S, Guo P, You H, He P, Li H (2021) An image matching optimization algorithm based on pixel shift clustering RANSAC. Inf Sci 562:452–474

    MathSciNet  Google Scholar 

  59. Zhang S, Li S, Zhang B, Peng M (2020) Integration of optimal spatial distributed tie-points in RANSAC-based image registration. Eur J Remote Sensing 53(1):67–80

    MathSciNet  Google Scholar 

  60. Jiang D, Zheng Z, Li G, Sun Y, Kong J, Jiang G, Xiong H, Tao B, Xu S, Liu H, Ju Z (2019) Gesture recognition based on binocular vision. Cluster Comput 22(Supple.6):13261–13271. https://doi.org/10.1007/s10586-018-1844-5

    Article  Google Scholar 

  61. He Y, Li G, Liao Y, Sun Y, Kong J, Jiang G, Jiang D, Liu H Gesture recognition based on an improved local sparse representation classification algorithm. Cluster Comput 22(Supple.5):10935–10946. https://doi.org/10.1007/s10586-017-1237-1

  62. Xie P, Zhang L, Du C, Wang X, Zhong W (2021) Space target attitude estimation from ISAR image sequences with key point extraction network. IEEE Signal Processing Lett: 1–1

  63. Liao S, Li G, Wu H, Jiang D, Liu Y, Yun J, Liu Y, Zhou D (2021) Occlusion gesture recognition based on improved SSD. Concurr Comput: Pract Exper 33(6):e6063. https://doi.org/10.1002/cpe.6063

  64. Luo B, Sun Y, Li G, Chen D, Ju Z (2020) Decomposition algorithm for depth image of human health posture based on brain health. Neural Comput & Applic 32(10):6327–6342

    Google Scholar 

  65. Du S, Xu G, Zhang S, Zhang X, Yue G, Chen B (2020) Robust rigid registration algorithm based on pointwise correspondence and correntropy. Pattern Recogn Lett 132:91–98

    Google Scholar 

  66. Xiao F, Li G, Jiang D, Xie Y, Yun J, Liu Y, Huang L, Fang Z (2021) An effective and unified method to derive the inverse kinematics formulas of general six-DOF manipulator with simple geometry. Mech Mach Theory 159:104265

    Google Scholar 

  67. Liu X, Jiang D, Tao B, Jiang G, Sun Y, Kong J, Tong X, Zhao G, Chen B (2022) Genetic algorithm-based trajectory optimization for digital twin robots. Front Bioeng Biotechnol 9:793782. https://doi.org/10.3389/fbioe.2021.793782

    Article  Google Scholar 

  68. Liu Y, Jiang D, Yun J, Sun Y, Li C, Jiang G, Kong J, Tao B, Fang Z (2022) Self-tuning control of manipulator positioning based on fuzzy PID and PSO algorithm. Fronti Bioeng Biotechnol 9:817723. https://doi.org/10.3389/fbioe.2021.817723

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by grants of the National Natural Science Foundation of China (Grant Nos.52075530, 51575407, 51505349, 61733011, 41906177); the Grants of Hubei Provincial Department of Education (D20191105); the Grants of National Defense PreResearch Foundation of Wuhan University of Science and Technology (GF201705) and Open Fund of the Key Laboratory for Metallurgical Equipment and Control of Ministry of Education in Wuhan University of Science and Technology (2018B07,2019B13).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Du Jiang, Bo Tao or Gongfa Li.

Ethics declarations

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, Y., Jiang, D., Xu, C. et al. Deep learning based 3D target detection for indoor scenes. Appl Intell 53, 10218–10231 (2023). https://doi.org/10.1007/s10489-022-03888-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03888-4

Keywords

Navigation