Skip to main content
Log in

Overview of indoor scene recognition and representation methods based on multimodal knowledge graphs

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

This paper provides a comprehensive overview of multi-modal knowledge graph technology and a three-layer framework for scene recognition. Integrating diverse 3D expertise into a deep neural network enhances scene cognition and knowledge representation. Real-time 3D scene graph construction via feature matching is explored, demonstrating the feasibility of effective scene knowledge representation. Leveraging advanced multimodal knowledge graph and scene recognition, the paper presents a promising avenue for AI-driven scene cognition and construction. It contributes to understanding multi-modal knowledge graph technology’s potential in addressing scene recognition challenges and implications for future advancements. This interdisciplinary work establishes a foundation for intelligent scene analysis and interpretation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Availability of data and materials

Data sharing is not applicable to this article as no new data were created or analyzed in this study.

References

  1. Kumar V, Aggarwal D, Bathwal V, Singh S (2021) A novel approach to scene graph vectorization. In: 2021 International conference on computing, communication, and intelligent systems (ICCCIS), pp. 696–701. IEEE

  2. Duan Y, Shao L, Hu G (2018) Specifying knowledge graph with data graph, information graph, knowledge graph, and wisdom graph. Int J Softw Innov 6(2):10–25

    Google Scholar 

  3. Johnson J, Gupta A, Fei-Fei L (2018) Image generation from scene graphs. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1219–1228

  4. Recht B, Roelofs R, Schmidt L, Shankar V (2019) Do imagenet classifiers generalize to imagenet? In: International conference on machine learning, pp. 5389–5400. PMLR

  5. Lin Y, Han X, Xie R, Liu Z, Sun M (2018) Knowledge representation learning: A quantitative review. arXiv:1812.10901

  6. Krishna R, Zhu Y, Groth O, Johnson J, Hata K, Kravitz J, Chen S, Kalantidis Y, Li L-J, Shamma DA et al (2017) Visual genome: Connecting language and vision using crowdsourced dense image annotations. Int J Comput Vis 123:32–73

    MathSciNet  Google Scholar 

  7. Armeni I, He Z-Y, Gwak J, Zamir AR, Fischer M, Malik J, Savarese S (2019) 3d scene graph: A structure for unified semantics, 3d space, and camera. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 5664–5673

  8. Rosinol A, Gupta A, Abate M, Shi J, Carlone L (2020) 3d dynamic scene graphs: Actionable spatial perception with places, objects, and humans. arXiv:2002.06289

  9. Zhu J, Wu T, Zhu S-C, Yang X, Zhang W (2015) A reconfigurable tangram model for scene representation and categorization. IEEE Trans Image Process 25(1):150–166

    MathSciNet  Google Scholar 

  10. Wang W, Wang S, Li Y, Jin Y (2021) Adaptive multi-scale dual attention network for semantic segmentation. Neurocomputing 460:39–49

    Google Scholar 

  11. Xu P, Chang X, Guo L, Huang P-Y, Chen X, Hauptmann AG (2020) A survey of scene graph: Generation and application. IEEE Trans Neural Netw Learn Syst 1:1

    Google Scholar 

  12. Zareian A, Karaman S, Chang S-F (2020) Bridging knowledge graphs to generate scene graphs. In: Computer vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIII 16, pp. 606–623. Springer

  13. Wang M, Wang H, Qi G, Zheng Q (2020) Richpedia: a large-scale, comprehensive multi-modal knowledge graph. Big Data Res 22:100159

    Google Scholar 

  14. Ullah A, Ahmad J, Muhammad K, Sajjad M, Baik SW (2017) Action recognition in video sequences using deep bi-directional lstm with cnn features. IEEE Access 6:1155–1166

    Google Scholar 

  15. Pu N, Chen W, Liu Y, Bakker EM, Lew MS (2021) Lifelong person re-identification via adaptive knowledge accumulation. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp. 7901–7910

  16. Yang X-H, Xiong Z, Ma F, Chen X, Ruan Z, Jiang P, Xu X (2021) Identifying influential spreaders in complex networks based on network embedding and node local centrality. Physica A Stat Mech Appl 573:125971

    Google Scholar 

  17. Zhang Z, Cai J, Zhang Y, Wang J (2020) Learning hierarchy-aware knowledge graph embeddings for link prediction. In: Proceedings of the AAAI conference on artificial intelligence, vol. 34, pp. 3065–3072

  18. Arora S (2020) A survey on graph neural networks for knowledge graph completion. arXiv:2007.12374

  19. Wang R, Tang D, Duan N, Wei Z, Huang X, Cao G, Jiang D, Zhou M, et al (2020) K-adapter: Infusing knowledge into pre-trained models with adapters. arXiv:2002.01808

  20. Yang A, Wang Q, Liu J, Liu K, Lyu Y, Wu H, She Q, Li S (2019) Enhancing pre-trained language representations with rich knowledge for machine reading comprehension. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp. 2346–2357

  21. Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805

  22. Yadati N, Dayanidhi R, Vaishnavi S, Indira K, Srinidhi G (2021) Knowledge base question answering through recursive hypergraphs. In: Proceedings of the 16th conference of the European chapter of the association for computational linguistics: main volume, pp. 448–454

  23. Peters ME, Neumann M, Logan IV RL, Schwartz R, Joshi V, Singh S, Smith NA (2019) Knowledge enhanced contextual word representations. arXiv:1909.04164

  24. Suchanek FM, Kasneci G, Weikum G (2007) Yago: a core of semantic knowledge. In: Proceedings of the 16th international conference on world wide web, pp. 697–706

  25. Bollacker K, Evans C, Paritosh P, Sturge T, Taylor J (2008) Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data, pp. 1247–1250

  26. Lehmann J, Isele R, Jakob M, Jentzsch A, Kontokostas D, Mendes PN, Hellmann S, Morsey M, Van Kleef P, Auer S et al (2015) Dbpedia-a large-scale, multilingual knowledge base extracted from wikipedia. Semant Web 6(2):167–195

    Google Scholar 

  27. Carlson A, Betteridge J, Kisiel B et al (2010) Toward an architecture for never-ending language learning. In: Twenty-Fourth AAAI conference on artificial intelligence, vol. 24, pp. 1306–1313

  28. Bordes A, Usunier N, Garcia-Duran A et al (2013) Translating embeddings for modeling multi-relational data. In: Advances in neural information processing systems, vol. 26, pp. 2787–2795

  29. Wang Z, Zhang J, Feng J et al (2014) Knowledge graph embedding by translating on hyperplanes. In: Proceedings of the AAAI, vol. 28, pp. 1112–1119

  30. Lin Y, Liu Z, Sun M et al (2014) Learning entity and relation embeddings for knowledge graph completion. In: Proceedings of the AAAI, vol. 28

  31. Minervini P, Fanizzi N, D’Amato C et al (2016) Scalable learning of entity and predicate embeddings for knowledge graph completion. In: IEEE international conference on machine learning & applications, vol. 15, pp. 781–786

  32. Bordes A, Glorot X, Weston J, Bengio Y (2014) A semantic matching energy function for learning with multi-relational data - application to word-sense disambiguation. Mach Learn 94(2):233–259

    MathSciNet  Google Scholar 

  33. Bordes A, Usunier N, Garcia-Duran A, Weston J, Yakhnenko O (2013) Translating embeddings for modeling multi-relational data. In: Advances in neural information processing systems 26, pp. 2787–2795

  34. Bordes A, Weston J, Collobert R, Bengio Y (2011) Learning structured embeddings of knowledge bases. In: Proceedings of the Twenty-Fifth AAAI conference on artificial intelligence, AAAI 2011

  35. Bordes A, Glorot X, Weston J, Bengio Y (2014) A semantic matching energy function for learning with multi-relational data - application to word-sense disambiguation. In: Machine learning, vol. 94, pp. 233–259

  36. Ji K, Hui B, Luo G (2020) Graph attention networks with local structure awareness for knowledge graph completion. IEEE Access 8(99):1

    Google Scholar 

  37. Bordes A, Usunier N, Garcia-Duran A et al (2013) Translating embeddings for modeling multi-relational data. In: Neural information processing systems

  38. Bordes A, Usunier N, Garcia-Duran A, Weston J, Yakhnenko O (2013) Translating embeddings for modeling multi-relational data. In: Advances in neural information processing systems 26, pp. 2787–2795

  39. Zhen W, Zhang J, Feng J et al (2014) Knowledge graph embedding by translating on hyperplanes. In: National conference on artificial intelligence

  40. Moon C, Jones P, Samatova NF (2017) Learning entity type embeddings for knowledge graph completion. In: the 2017 ACM

  41. Kun W, Tianqi W (2022) God’s feelings in human knowledge exploration. J Syst Sci 04:1–7

    Google Scholar 

  42. Wang Q, Mao Z, Wang B, Guo L (2017) Knowledge graph embedding: A survey of approaches and applications. IEEE Trans Knowl Data Eng 29(12):2724–2743

    Google Scholar 

  43. Kolda TG, Bader BW (2009) Tensor decompositions and applications. SIAM Rev 51(3):455–500

    MathSciNet  Google Scholar 

  44. Dettmers T, Minervini P, Stenetorp P et al (2018) Convolutional 2d knowledge graph embeddings. In: 32nd AAAI conference on artificial intelligence (AAAI-18), pp. 2–7

  45. Nguyen DQ, Vu T, Nguyen TD et al (2019) A capsule network-based embedding model for knowledge graph completion and search personalization. In: Proceedings of the 2019 conference of the north

  46. Rother C (2004) Grabcut: Interactive foreground extraction using iterated graph cuts. Proceedings of siggraph 23

  47. Bordes A, Usunier N, Garcia-Duran A, Weston J, Yakhnenko O (2013) Translating embeddings for modeling multi-relational data. In: Advances in neural information processing systems 26: 27th annual conference on neural information processing systems, pp. 2787–2795

  48. Sun Z, Deng Z, Nie J, Tang J (2019) Rotate: Knowledge graph embedding by relational rotation in complex space. In: 7th International conference on learning representations, ICLR

  49. Balazevic I, Allen C, Hospedales TM (2019) Tucker: Tensor factorization for knowledge graph completion. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing, EMNLP-IJCNLP, pp. 5184–5193

  50. Socher R, Chen D, Manning CD, Ng AY (2013) Reasoning with neural tensor networks for knowledge base completion. In: Advances in neural information processing systems 26: 27th annual conference on neural information processing systems, pp. 926–934

  51. Dong X, Gabrilovich E, Heitz G, Horn W, Lao N, Murphy K, Strohmann T, Sun S, Zhang W (2014) Knowledge vault: a web-scale approach to probabilistic knowledge fusion. In: The 20th ACM SIGKDD international conference on knowledge discovery and data mining, KDD, pp. 601–610

  52. Balazevic I, Allen C, Hospedales TM (2019) Hypernetwork knowledge graph embeddings. In: Artificial neural networks and machine learning - ICANN 2019 - 28th international conference on artificial neural networks, proceedings - workshop and special sessions, pp. 553–565

  53. Shang C, Tang Y, Huang J, Bi J, He X, Zhou B (2019) End-to-end structure-aware convolutional networks for knowledge base completion. In: The Thirty-Third AAAI conference on artificial intelligence, AAAI, pp. 3060–3067

  54. Socher R, Chen D, Manning CD, Ng AY (2013) Reasoning with neural tensor networks for knowledge base completion. In: Curran Associates Inc

  55. Tolstikhin IO, Houlsby N, Kolesnikov A et al (2021) Mlp-mixer: An all-mlp architecture for vision. In: Advances in neural information processing systems, vol. 34, pp. 24261–24272

  56. Dettmers T, Minervini P, Stenetorp P (2018) Convolutional 2d knowledge graph embeddings. In: Proc. 32nd AAAI Conf. Artif. Intell

  57. Balazevic I, Allen C, Hospedales TM (2019) Hypernetwork knowledge graph embeddings. In: Artificial neural networks and machine learning - ICANN 2019 - 28th international conference on artificial neural networks, proceedings - workshop and special sessions, pp. 553–565

  58. Schlichtkrull MS, Kipf TN, Bloem P et al (2018) Modeling relational data with graph convolutional networks. In: The semantic web - 15th international conference, ESWC 2018, proceedings, pp. 593–607

  59. Shang C, Tang Y, Huang J, Bi J, He X, Zhou B (2019) End-to-end structure-aware convolutional networks for knowledge base completion. In: The Thirty-Third AAAI conference on artificial intelligence, AAAI, pp. 3060–3067

  60. Schlichtkrull M, Kipf TN, Bloem P et al (2018) Modeling relational data with graph convolutional networks. In: European semantic web conference, pp. 593–607

  61. Wang X, He X, Cao Y et al (2019) Kgat: Knowledge graph attention network for recommendation. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp. 950–958

  62. Vashishth S, Sanyal S, Nitin V et al (2019) Composition-based multi-relational graph convolutional networks. arXiv:1911.03082

  63. Guo Y, Sohel F, Bennamoun M et al (2013) Rotational projection statistics for 3d local surface description and object recognition. In: arXiv E-prints

  64. Guo Y, Bennamoun M, Sohel F et al (2014) 3d object recognition in cluttered scenes with local surface features: A survey. In: IEEE transactions on pattern analysis & machine intelligence, vol. 36, pp. 2270–87

  65. Chen X, Ma H, Wan J et al (2017) Multi-view 3d object detection network for autonomous driving. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR)

  66. Xu Q, Sun X, Wu CY al (2020) Grid-gcn for fast and scalable point cloud learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5661–5670

  67. Guo Y, Wang H, Hu Q et al (2020) Deep learning for 3d point clouds: A survey. In: IEEE transactions on pattern analysis and machine intelligence, pp. 1

  68. Zuo X, Merrill N, Li W et al (2020) Codevio: Visual-inertial odometry with learned optimizable dense depth. In: 2020

  69. Su H, Maji S, Kalogerakis E et al (2015) Multi-view convolutional neural networks for 3d shape recognition. In: IEEE international conference on computer vision

  70. Pan Z, Zhuang B, Liu J et al (2021) Scalable visual transformers with hierarchical pooling. In: 2021

  71. Mostafaei H, Miri SM, Schmid S (2021) Reactnet: self-adjusting architecture for networked systems. In: 2021

  72. Frankle J, Carbin M (2018) The lottery ticket hypothesis: Finding sparse, trainable neural networks. In: 2018

  73. Azulay A, Weiss Y (2018) Why do deep convolutional networks generalize so poorly to small image transformations? In: 2018

  74. Wu Z, Pan S, Chen F et al (2019) A comprehensive survey on graph neural networks. arXiv:1901.00596

  75. Perozzi B, Al-Rfou R, Skiena S (2014) Deepwalk: Online learning of social representations. In: ACM

  76. Grover A, Leskovec J (2016) node2vec: Scalable feature learning for networks. In: ACM

  77. Wei X, Yu R, Sun J (2020) View-gcn: View-based graph convolutional network for 3d shape analysis. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 950–958

  78. Li R, Sheng W, Zhu F et al (2018) Adaptive graph convolutional neural networks. In: 2018

  79. Lin ZH, Huang SY, Wang Y (2020) Convolution in the cloud: Learning deformable kernels in 3d graph convolution networks for point cloud analysis. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR)

  80. Zou Z, Shi Z, Guo Y et al (2019) Object detection in 20 years: A survey. In: 2019

  81. Ji S, Xu W, Yang M et al (2012) 3d convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35:221–231

    Google Scholar 

  82. Liu Y, Chen X, Wang Z et al (2018) Deep learning for pixel-level image fusion: Recent advances and future prospects. In: Information fusion, vol. 42, pp. 158– 173

  83. Liu TH, Li GQ, Nie XN et al (2021) Enhancement of contour smoothness by substitution of interpolated sub-pixel points for edge pixels. IEEE Access 1

  84. Nie C, Tao Y, Chen X (2020) Hcnet: Hierarchical context network for semantic segmentation. IEEE Access 8:179213–179223

    Google Scholar 

  85. Xu L, Jing W, Song H (2019) High-resolution remote sensing image change detection combined with pixel-level and object-level. IEEE Access 7:78909–78918

    Google Scholar 

  86. Zhao ZQ, Zheng P, Xu ST et al (2018) Object detection with deep learning: A review. arXiv e-prints

  87. Wang Y, Sun Y, Liu Z et al (2018) Dynamic graph cnn for learning on point clouds. ACM Trans Graph 38(5)

  88. Jonnalagedda P, Schmolze D, Bhanu B (2018) Mvpnets: Multi-viewing path deep learning neural networks for magnification invariant diagnosis in breast cancer. In: 2018 IEEE 18th international conference on bioinformatics and bioengineering (BIBE)

  89. Liang M, Yang B, Chen Y et al (2020) Multi-task multi-sensor fusion for 3d object detection

  90. Wu H, Liu Q, Liu X (2019) A review on deep learning approaches to image classification and object segmentation. Comput Mater Contin 58(2):575–597

    Google Scholar 

  91. Yang R, Wang F, Qin H (2018) Research on pedestrian detection and positioning system based on binocular images. Comput Appl Res 35(05):1591–15951600

    Google Scholar 

  92. Peng Q, Song Y (2019) Object recognition and localization based on mask r-cnn. Qinghua Daxue Xuebao/J Tsinghua Univ 59(2):135–141

    Google Scholar 

  93. Zeller N, Quint F, Stilla U (2018) Scale-awareness of light field camera based visual odometry. In: Proceedings of the european conference on computer vision (ECCV), pp. 715–730

  94. Li Y, Zhang Q, Wang X et al (2019) Light field slam based on ray-space projection model. In: Optoelectronic imaging and multimedia technology VI, vol. 11187, pp. 33–41

  95. Goshtasby AA, Nikolov S (2007) Image fusion: Advances in the state of the art. Inf Fusion 8(2):114–118

    Google Scholar 

  96. Fan J, Lei B (2009) Two-dimensional cross-entropy linear threshold segmentation of grayscale images. Chin J Electron 37(03):476–480

    Google Scholar 

  97. Xie Z, Chen G, Chen R et al (2008) A hybrid image segmentation algorithm based on edge detection, thresholding and region growing. In: Proceedings of SPIE - the international society for optical engineering, vol. 32, pp. 387–394

  98. Perona P, Malik J (2002) Scale-space and edge detection using anisotropic diffusion. IEEE Trans Pattern Anal Mach Intell 12(7):629–639

    Google Scholar 

  99. Min E, Guo X, Qiang L et al (2018) A survey of clustering with deep learning: From the perspective of network architecture. IEEE Access 6:39501–39514

    Google Scholar 

  100. Aljalbout E, Golkov V, Siddiqui Y et al (2018) Clustering with deep learning: Taxonomy and new methods

  101. Zhang Z, Blum RS (1999) A categorization of multiscale-decomposition-based image fusion schemes with a performance study for a digital camera application. Proc IEEE 87(8):1315–1326

    Google Scholar 

  102. Zhou Z, Dong M, Xie X et al (2016) Fusion of infrared and visible images for night-vision context enhancement. Appl Opt 55(23):6480–6490

    Google Scholar 

  103. Piella G (2003) A general framework for multiresolution image fusion: from pixels to regions. Inf Fusion 4(4):259–280

    Google Scholar 

  104. Effect R (2022) Objective assessment of multiresolution image fusion algorithms for context enhancement in night vision. China Newspaper Industry (05):9

  105. Jiao D, Li W, Ke L et al (2016) An overview of multi-modal medical image fusion. Neurocomputing 215:3–20

    Google Scholar 

  106. Ghassemian H (2016) A review of remote sensing image fusion methods. Inf Fusion 75–89

  107. Li J et al (2020) Deep pixel-level matching via attention for video co-segmentation. Appl Sci 10(6): 1948

  108. Liu TH, Li GQ, Nie XN et al (2021) Enhancement of contour smoothness by substitution of interpolated sub-pixel points for edge pixels. IEEE Access 9:44236–44246

    Google Scholar 

  109. Chong Y, Nie C, Tao Y et al (2020) Hcnet: Hierarchical context network for semantic segmentation. IEEE Access 8:179213–179223

    Google Scholar 

  110. Xu L, Jing W, Song H et al (2019) High-resolution remote sensing image change detection combined with pixel-level and object-level. IEEE Access 7:78909–78918

    Google Scholar 

  111. Bekkerman I, Tabrikian J (2006) Target detection and localization using mimo radars and sonars. IEEE Trans Signal Process 54(10):3873–3883

    Google Scholar 

  112. Dolz J, Gopinath K, Yuan J et al (2018) Hyperdense-net: a hyper-densely connected cnn for multi-modal image segmentation. IEEE Trans Med Imaging 38(5):1116–1126

    Google Scholar 

  113. Wang Y, Sun Y, Liu Z et al (2019) Dynamic graph cnn for learning on point clouds. ACM Trans Graph (ToG) 38(5):1–12

    Google Scholar 

  114. Sun X, Li H (2005) A review of segmentation and application technology of 3d mesh model. J Comput Aided Des Graph 08:1647–1655

    Google Scholar 

  115. Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv:1312.6114

  116. Yi X, Walia E, Babyn P (2019) Generative adversarial network in medical imaging: A review. Med Image Anal 58:101552

    Google Scholar 

  117. Litany O, Remez T, Rodolá E et al (2017) Deep functional maps: Structured prediction for dense shape correspondence. In: 2017 IEEE international conference on computer vision (ICCV)

  118. Ranjan R, Patel VM, Chellappa R (2017) Hyperface: A deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Trans Pattern Anal Mach Intell 41(1):121–135

    Google Scholar 

  119. Cheek A, Heurtier L, Perez-Gonzalez YF et al (2022) Primordial black hole evaporation and dark matter production. i. solely hawking radiation. Phys Rev D 105(1):015022

    MathSciNet  Google Scholar 

  120. Zhu JY, Zheng WS, Lu F et al (2017) Illumination invariant single face image recognition under heterogeneous lighting condition. Pattern Recognit 66:313–327

    Google Scholar 

  121. Han S, Liu B, Cabezas R et al (2020) Megatrack: monochrome egocentric articulated hand-tracking for virtual reality. ACM Trans Graph (ToG) 39:87–18713

    Google Scholar 

  122. Hu Y, Pu X, Sun Y et al (2013) Multi-source heterogeneous sensing data fusion method and its application in target positioning and tracking. Chin Sci: Inf Sci 43(10):1288–1306

    Google Scholar 

  123. Buades A, Coll B, Morel JM (2005) A review of image denoising algorithms, with a new one. Multiscale Model Simul 4(2):490–530

    MathSciNet  Google Scholar 

  124. Jaritz M, Gu J, Su H (2019) Multi-view pointnet for 3d scene understanding. In: Proceedings of the IEEE/CVF international conference on computer vision workshops

  125. Liang M, Yang B, Chen Y et al (2019) Mvp-net: Multi-view fpn with position-aware attention for deep universal lesion detection. In: international conference on medical image computing and computer-assisted intervention, pp. 13–21

  126. Qi CR, Su H, Mo K et al (2017) Pointnet: Deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 652–660

  127. Qi CR, Yi L, Su H et al (2017) Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In: Advances in neural information processing systems, vol. 30

  128. Ku J, Mozifian M, Lee J et al (2018) Joint 3d proposal generation and object detection from view aggregation. In: 2018 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp. 1–8

  129. Nabati R, Qi H (2020) Radar-camera sensor fusion for joint object detection and distance estimation in autonomous vehicles. arXiv:2009.08428

  130. Liang M, Yang B, Chen Y et al (2019) Multi-task multi-sensor fusion for 3d object detection. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 7345–7353

  131. Liang M, Yang B, Chen Y et al (2019) Multi-task multi-sensor fusion for 3d object detection. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 7345–7353

  132. Zhang H, Lan X, Bai S et al (2019) Roi-based robotic grasp detection for object overlapping scenes. 2019 IEEE/RSJ international conference on intelligent robots and systems (IROS), 4768–4775

  133. Effect R (2022) Reasonable choice of scenes to present the perfect effect. China Newspaper Industry (05): 9

  134. Qi L, Kuen J, Wang Y et al (2021) Open-world entity segmentation. arXiv:2107.14228

  135. Wang W, Feiszli M, Wang H et al (2021) Unidentified video objects: A benchmark for dense, open-world segmentation. Proceedings of the IEEE/CVF international conference on computer vision, 10776–10785

  136. Bear D, Fan C, Mrowca D et al (2020) Learning physical graph representations from visual scenes. Adv Neural Inf Process Syst 33:6027–6039

    Google Scholar 

  137. Tan K, Wang DL (2018) A convolutional recurrent neural network for real-time speech enhancement. Interspeech 3229–3233

  138. Kong X, Yang X, Zhai G et al (2020) Semantic graph based place recognition for 3d point clouds. 2020 IEEE/RSJ international conference on intelligent robots and systems (IROS), 8216–8223

  139. Bosse M, Zlot R (2013) Place recognition using keypoint voting in large 3d lidar datasets. 2013 IEEE international conference on robotics and automation, 2677–2684

  140. Wohlkinger W, Vincze M (2011) Ensemble of shape functions for 3d object classification. In: 2011 IEEE international conference on robotics and biomimetics, pp. 2987–2992

  141. Liu Z, Suo C, Zhou S et al (2019) Seqlpd: Sequence matching enhanced loop-closure detection based on large-scale point cloud description for self-driving vehicles. In: 2019 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp. 1218–1223

  142. Liu Z, Zhou S, Suo C et al (2019) Lpd-net: 3d point cloud learning for large-scale place recognition and environment analysis. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 2831–2840

  143. Xiao H, Chen Y, Shi X (2019) Knowledge graph embedding based on multi-view clustering framework. IEEE Trans Knowl Data Eng 33(2):585–596

    Google Scholar 

  144. Wang J, Shi Y, Li D et al (2022) Mchale: a multistage clustering-based hierarchical attention model for knowledge graph-aware recommendation. World Wide Web 25(3), 1103–1127

  145. Cheng G, Xie X, Han J et al (2020) Remote sensing image scene classification meets deep learning: Challenges, methods, benchmarks, and opportunities. IEEE J Sel Top Appl Earth Obs Remote Sens PP(99), 1

  146. Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507

Download references

Funding

Funding projects: National Natural Science Foundation of China (61375084); Natural Science Foundation of Shandong Province (ZR2019MF064).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guannan Si.

Ethics declarations

Conflicts of interest

The authors do not have any possible conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, J., Si, G., Tian, P. et al. Overview of indoor scene recognition and representation methods based on multimodal knowledge graphs. Appl Intell 54, 899–923 (2024). https://doi.org/10.1007/s10489-023-05235-7

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-05235-7

Keywords

Navigation