Overview of indoor scene recognition and representation methods based on multimodal knowledge graphs

Li, Jianxin; Si, Guannan; Tian, Pengxin; An, Zhaoliang; Zhou, Fengyu

doi:10.1007/s10489-023-05235-7

Overview of indoor scene recognition and representation methods based on multimodal knowledge graphs

Published: 23 December 2023

Volume 54, pages 899–923, (2024)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Jianxin Li¹,
Guannan Si¹,
Pengxin Tian¹,
Zhaoliang An¹ &
…
Fengyu Zhou²

860 Accesses
Explore all metrics

Abstract

This paper provides a comprehensive overview of multi-modal knowledge graph technology and a three-layer framework for scene recognition. Integrating diverse 3D expertise into a deep neural network enhances scene cognition and knowledge representation. Real-time 3D scene graph construction via feature matching is explored, demonstrating the feasibility of effective scene knowledge representation. Leveraging advanced multimodal knowledge graph and scene recognition, the paper presents a promising avenue for AI-driven scene cognition and construction. It contributes to understanding multi-modal knowledge graph technology’s potential in addressing scene recognition challenges and implications for future advancements. This interdisciplinary work establishes a foundation for intelligent scene analysis and interpretation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Computational knowledge vision: paradigmatic knowledge based prescriptive learning and reasoning for perception and vision

Article 21 March 2022

Knowledge Graphs Meet Geometry for Semi-supervised Monocular Depth Estimation

Double Graph Attention Networks for Visual Semantic Navigation

Article 09 March 2023

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Availability of data and materials

Data sharing is not applicable to this article as no new data were created or analyzed in this study.

References

Kumar V, Aggarwal D, Bathwal V, Singh S (2021) A novel approach to scene graph vectorization. In: 2021 International conference on computing, communication, and intelligent systems (ICCCIS), pp. 696–701. IEEE
Duan Y, Shao L, Hu G (2018) Specifying knowledge graph with data graph, information graph, knowledge graph, and wisdom graph. Int J Softw Innov 6(2):10–25
Google Scholar
Johnson J, Gupta A, Fei-Fei L (2018) Image generation from scene graphs. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1219–1228
Recht B, Roelofs R, Schmidt L, Shankar V (2019) Do imagenet classifiers generalize to imagenet? In: International conference on machine learning, pp. 5389–5400. PMLR
Lin Y, Han X, Xie R, Liu Z, Sun M (2018) Knowledge representation learning: A quantitative review. arXiv:1812.10901
Krishna R, Zhu Y, Groth O, Johnson J, Hata K, Kravitz J, Chen S, Kalantidis Y, Li L-J, Shamma DA et al (2017) Visual genome: Connecting language and vision using crowdsourced dense image annotations. Int J Comput Vis 123:32–73
MathSciNet Google Scholar
Armeni I, He Z-Y, Gwak J, Zamir AR, Fischer M, Malik J, Savarese S (2019) 3d scene graph: A structure for unified semantics, 3d space, and camera. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 5664–5673
Rosinol A, Gupta A, Abate M, Shi J, Carlone L (2020) 3d dynamic scene graphs: Actionable spatial perception with places, objects, and humans. arXiv:2002.06289
Zhu J, Wu T, Zhu S-C, Yang X, Zhang W (2015) A reconfigurable tangram model for scene representation and categorization. IEEE Trans Image Process 25(1):150–166
MathSciNet Google Scholar
Wang W, Wang S, Li Y, Jin Y (2021) Adaptive multi-scale dual attention network for semantic segmentation. Neurocomputing 460:39–49
Google Scholar
Xu P, Chang X, Guo L, Huang P-Y, Chen X, Hauptmann AG (2020) A survey of scene graph: Generation and application. IEEE Trans Neural Netw Learn Syst 1:1
Google Scholar
Zareian A, Karaman S, Chang S-F (2020) Bridging knowledge graphs to generate scene graphs. In: Computer vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIII 16, pp. 606–623. Springer
Wang M, Wang H, Qi G, Zheng Q (2020) Richpedia: a large-scale, comprehensive multi-modal knowledge graph. Big Data Res 22:100159
Google Scholar
Ullah A, Ahmad J, Muhammad K, Sajjad M, Baik SW (2017) Action recognition in video sequences using deep bi-directional lstm with cnn features. IEEE Access 6:1155–1166
Google Scholar
Pu N, Chen W, Liu Y, Bakker EM, Lew MS (2021) Lifelong person re-identification via adaptive knowledge accumulation. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp. 7901–7910
Yang X-H, Xiong Z, Ma F, Chen X, Ruan Z, Jiang P, Xu X (2021) Identifying influential spreaders in complex networks based on network embedding and node local centrality. Physica A Stat Mech Appl 573:125971
Google Scholar
Zhang Z, Cai J, Zhang Y, Wang J (2020) Learning hierarchy-aware knowledge graph embeddings for link prediction. In: Proceedings of the AAAI conference on artificial intelligence, vol. 34, pp. 3065–3072
Arora S (2020) A survey on graph neural networks for knowledge graph completion. arXiv:2007.12374
Wang R, Tang D, Duan N, Wei Z, Huang X, Cao G, Jiang D, Zhou M, et al (2020) K-adapter: Infusing knowledge into pre-trained models with adapters. arXiv:2002.01808
Yang A, Wang Q, Liu J, Liu K, Lyu Y, Wu H, She Q, Li S (2019) Enhancing pre-trained language representations with rich knowledge for machine reading comprehension. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp. 2346–2357
Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805
Yadati N, Dayanidhi R, Vaishnavi S, Indira K, Srinidhi G (2021) Knowledge base question answering through recursive hypergraphs. In: Proceedings of the 16th conference of the European chapter of the association for computational linguistics: main volume, pp. 448–454
Peters ME, Neumann M, Logan IV RL, Schwartz R, Joshi V, Singh S, Smith NA (2019) Knowledge enhanced contextual word representations. arXiv:1909.04164
Suchanek FM, Kasneci G, Weikum G (2007) Yago: a core of semantic knowledge. In: Proceedings of the 16th international conference on world wide web, pp. 697–706
Bollacker K, Evans C, Paritosh P, Sturge T, Taylor J (2008) Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data, pp. 1247–1250
Lehmann J, Isele R, Jakob M, Jentzsch A, Kontokostas D, Mendes PN, Hellmann S, Morsey M, Van Kleef P, Auer S et al (2015) Dbpedia-a large-scale, multilingual knowledge base extracted from wikipedia. Semant Web 6(2):167–195
Google Scholar
Carlson A, Betteridge J, Kisiel B et al (2010) Toward an architecture for never-ending language learning. In: Twenty-Fourth AAAI conference on artificial intelligence, vol. 24, pp. 1306–1313
Bordes A, Usunier N, Garcia-Duran A et al (2013) Translating embeddings for modeling multi-relational data. In: Advances in neural information processing systems, vol. 26, pp. 2787–2795
Wang Z, Zhang J, Feng J et al (2014) Knowledge graph embedding by translating on hyperplanes. In: Proceedings of the AAAI, vol. 28, pp. 1112–1119
Lin Y, Liu Z, Sun M et al (2014) Learning entity and relation embeddings for knowledge graph completion. In: Proceedings of the AAAI, vol. 28
Minervini P, Fanizzi N, D’Amato C et al (2016) Scalable learning of entity and predicate embeddings for knowledge graph completion. In: IEEE international conference on machine learning & applications, vol. 15, pp. 781–786
Bordes A, Glorot X, Weston J, Bengio Y (2014) A semantic matching energy function for learning with multi-relational data - application to word-sense disambiguation. Mach Learn 94(2):233–259
MathSciNet Google Scholar
Bordes A, Usunier N, Garcia-Duran A, Weston J, Yakhnenko O (2013) Translating embeddings for modeling multi-relational data. In: Advances in neural information processing systems 26, pp. 2787–2795
Bordes A, Weston J, Collobert R, Bengio Y (2011) Learning structured embeddings of knowledge bases. In: Proceedings of the Twenty-Fifth AAAI conference on artificial intelligence, AAAI 2011
Bordes A, Glorot X, Weston J, Bengio Y (2014) A semantic matching energy function for learning with multi-relational data - application to word-sense disambiguation. In: Machine learning, vol. 94, pp. 233–259
Ji K, Hui B, Luo G (2020) Graph attention networks with local structure awareness for knowledge graph completion. IEEE Access 8(99):1
Google Scholar
Bordes A, Usunier N, Garcia-Duran A et al (2013) Translating embeddings for modeling multi-relational data. In: Neural information processing systems
Bordes A, Usunier N, Garcia-Duran A, Weston J, Yakhnenko O (2013) Translating embeddings for modeling multi-relational data. In: Advances in neural information processing systems 26, pp. 2787–2795
Zhen W, Zhang J, Feng J et al (2014) Knowledge graph embedding by translating on hyperplanes. In: National conference on artificial intelligence
Moon C, Jones P, Samatova NF (2017) Learning entity type embeddings for knowledge graph completion. In: the 2017 ACM
Kun W, Tianqi W (2022) God’s feelings in human knowledge exploration. J Syst Sci 04:1–7
Google Scholar
Wang Q, Mao Z, Wang B, Guo L (2017) Knowledge graph embedding: A survey of approaches and applications. IEEE Trans Knowl Data Eng 29(12):2724–2743
Google Scholar
Kolda TG, Bader BW (2009) Tensor decompositions and applications. SIAM Rev 51(3):455–500
MathSciNet Google Scholar
Dettmers T, Minervini P, Stenetorp P et al (2018) Convolutional 2d knowledge graph embeddings. In: 32nd AAAI conference on artificial intelligence (AAAI-18), pp. 2–7
Nguyen DQ, Vu T, Nguyen TD et al (2019) A capsule network-based embedding model for knowledge graph completion and search personalization. In: Proceedings of the 2019 conference of the north
Rother C (2004) Grabcut: Interactive foreground extraction using iterated graph cuts. Proceedings of siggraph 23
Bordes A, Usunier N, Garcia-Duran A, Weston J, Yakhnenko O (2013) Translating embeddings for modeling multi-relational data. In: Advances in neural information processing systems 26: 27th annual conference on neural information processing systems, pp. 2787–2795
Sun Z, Deng Z, Nie J, Tang J (2019) Rotate: Knowledge graph embedding by relational rotation in complex space. In: 7th International conference on learning representations, ICLR
Balazevic I, Allen C, Hospedales TM (2019) Tucker: Tensor factorization for knowledge graph completion. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing, EMNLP-IJCNLP, pp. 5184–5193
Socher R, Chen D, Manning CD, Ng AY (2013) Reasoning with neural tensor networks for knowledge base completion. In: Advances in neural information processing systems 26: 27th annual conference on neural information processing systems, pp. 926–934
Dong X, Gabrilovich E, Heitz G, Horn W, Lao N, Murphy K, Strohmann T, Sun S, Zhang W (2014) Knowledge vault: a web-scale approach to probabilistic knowledge fusion. In: The 20th ACM SIGKDD international conference on knowledge discovery and data mining, KDD, pp. 601–610
Balazevic I, Allen C, Hospedales TM (2019) Hypernetwork knowledge graph embeddings. In: Artificial neural networks and machine learning - ICANN 2019 - 28th international conference on artificial neural networks, proceedings - workshop and special sessions, pp. 553–565
Shang C, Tang Y, Huang J, Bi J, He X, Zhou B (2019) End-to-end structure-aware convolutional networks for knowledge base completion. In: The Thirty-Third AAAI conference on artificial intelligence, AAAI, pp. 3060–3067
Socher R, Chen D, Manning CD, Ng AY (2013) Reasoning with neural tensor networks for knowledge base completion. In: Curran Associates Inc
Tolstikhin IO, Houlsby N, Kolesnikov A et al (2021) Mlp-mixer: An all-mlp architecture for vision. In: Advances in neural information processing systems, vol. 34, pp. 24261–24272
Dettmers T, Minervini P, Stenetorp P (2018) Convolutional 2d knowledge graph embeddings. In: Proc. 32nd AAAI Conf. Artif. Intell
Balazevic I, Allen C, Hospedales TM (2019) Hypernetwork knowledge graph embeddings. In: Artificial neural networks and machine learning - ICANN 2019 - 28th international conference on artificial neural networks, proceedings - workshop and special sessions, pp. 553–565
Schlichtkrull MS, Kipf TN, Bloem P et al (2018) Modeling relational data with graph convolutional networks. In: The semantic web - 15th international conference, ESWC 2018, proceedings, pp. 593–607
Shang C, Tang Y, Huang J, Bi J, He X, Zhou B (2019) End-to-end structure-aware convolutional networks for knowledge base completion. In: The Thirty-Third AAAI conference on artificial intelligence, AAAI, pp. 3060–3067
Schlichtkrull M, Kipf TN, Bloem P et al (2018) Modeling relational data with graph convolutional networks. In: European semantic web conference, pp. 593–607
Wang X, He X, Cao Y et al (2019) Kgat: Knowledge graph attention network for recommendation. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp. 950–958
Vashishth S, Sanyal S, Nitin V et al (2019) Composition-based multi-relational graph convolutional networks. arXiv:1911.03082
Guo Y, Sohel F, Bennamoun M et al (2013) Rotational projection statistics for 3d local surface description and object recognition. In: arXiv E-prints
Guo Y, Bennamoun M, Sohel F et al (2014) 3d object recognition in cluttered scenes with local surface features: A survey. In: IEEE transactions on pattern analysis & machine intelligence, vol. 36, pp. 2270–87
Chen X, Ma H, Wan J et al (2017) Multi-view 3d object detection network for autonomous driving. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR)
Xu Q, Sun X, Wu CY al (2020) Grid-gcn for fast and scalable point cloud learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5661–5670
Guo Y, Wang H, Hu Q et al (2020) Deep learning for 3d point clouds: A survey. In: IEEE transactions on pattern analysis and machine intelligence, pp. 1
Zuo X, Merrill N, Li W et al (2020) Codevio: Visual-inertial odometry with learned optimizable dense depth. In: 2020
Su H, Maji S, Kalogerakis E et al (2015) Multi-view convolutional neural networks for 3d shape recognition. In: IEEE international conference on computer vision
Pan Z, Zhuang B, Liu J et al (2021) Scalable visual transformers with hierarchical pooling. In: 2021
Mostafaei H, Miri SM, Schmid S (2021) Reactnet: self-adjusting architecture for networked systems. In: 2021
Frankle J, Carbin M (2018) The lottery ticket hypothesis: Finding sparse, trainable neural networks. In: 2018
Azulay A, Weiss Y (2018) Why do deep convolutional networks generalize so poorly to small image transformations? In: 2018
Wu Z, Pan S, Chen F et al (2019) A comprehensive survey on graph neural networks. arXiv:1901.00596
Perozzi B, Al-Rfou R, Skiena S (2014) Deepwalk: Online learning of social representations. In: ACM
Grover A, Leskovec J (2016) node2vec: Scalable feature learning for networks. In: ACM
Wei X, Yu R, Sun J (2020) View-gcn: View-based graph convolutional network for 3d shape analysis. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 950–958
Li R, Sheng W, Zhu F et al (2018) Adaptive graph convolutional neural networks. In: 2018
Lin ZH, Huang SY, Wang Y (2020) Convolution in the cloud: Learning deformable kernels in 3d graph convolution networks for point cloud analysis. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Zou Z, Shi Z, Guo Y et al (2019) Object detection in 20 years: A survey. In: 2019
Ji S, Xu W, Yang M et al (2012) 3d convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35:221–231
Google Scholar
Liu Y, Chen X, Wang Z et al (2018) Deep learning for pixel-level image fusion: Recent advances and future prospects. In: Information fusion, vol. 42, pp. 158– 173
Liu TH, Li GQ, Nie XN et al (2021) Enhancement of contour smoothness by substitution of interpolated sub-pixel points for edge pixels. IEEE Access 1
Nie C, Tao Y, Chen X (2020) Hcnet: Hierarchical context network for semantic segmentation. IEEE Access 8:179213–179223
Google Scholar
Xu L, Jing W, Song H (2019) High-resolution remote sensing image change detection combined with pixel-level and object-level. IEEE Access 7:78909–78918
Google Scholar
Zhao ZQ, Zheng P, Xu ST et al (2018) Object detection with deep learning: A review. arXiv e-prints
Wang Y, Sun Y, Liu Z et al (2018) Dynamic graph cnn for learning on point clouds. ACM Trans Graph 38(5)
Jonnalagedda P, Schmolze D, Bhanu B (2018) Mvpnets: Multi-viewing path deep learning neural networks for magnification invariant diagnosis in breast cancer. In: 2018 IEEE 18th international conference on bioinformatics and bioengineering (BIBE)
Liang M, Yang B, Chen Y et al (2020) Multi-task multi-sensor fusion for 3d object detection
Wu H, Liu Q, Liu X (2019) A review on deep learning approaches to image classification and object segmentation. Comput Mater Contin 58(2):575–597
Google Scholar
Yang R, Wang F, Qin H (2018) Research on pedestrian detection and positioning system based on binocular images. Comput Appl Res 35(05):1591–15951600
Google Scholar
Peng Q, Song Y (2019) Object recognition and localization based on mask r-cnn. Qinghua Daxue Xuebao/J Tsinghua Univ 59(2):135–141
Google Scholar
Zeller N, Quint F, Stilla U (2018) Scale-awareness of light field camera based visual odometry. In: Proceedings of the european conference on computer vision (ECCV), pp. 715–730
Li Y, Zhang Q, Wang X et al (2019) Light field slam based on ray-space projection model. In: Optoelectronic imaging and multimedia technology VI, vol. 11187, pp. 33–41
Goshtasby AA, Nikolov S (2007) Image fusion: Advances in the state of the art. Inf Fusion 8(2):114–118
Google Scholar
Fan J, Lei B (2009) Two-dimensional cross-entropy linear threshold segmentation of grayscale images. Chin J Electron 37(03):476–480
Google Scholar
Xie Z, Chen G, Chen R et al (2008) A hybrid image segmentation algorithm based on edge detection, thresholding and region growing. In: Proceedings of SPIE - the international society for optical engineering, vol. 32, pp. 387–394
Perona P, Malik J (2002) Scale-space and edge detection using anisotropic diffusion. IEEE Trans Pattern Anal Mach Intell 12(7):629–639
Google Scholar
Min E, Guo X, Qiang L et al (2018) A survey of clustering with deep learning: From the perspective of network architecture. IEEE Access 6:39501–39514
Google Scholar
Aljalbout E, Golkov V, Siddiqui Y et al (2018) Clustering with deep learning: Taxonomy and new methods
Zhang Z, Blum RS (1999) A categorization of multiscale-decomposition-based image fusion schemes with a performance study for a digital camera application. Proc IEEE 87(8):1315–1326
Google Scholar
Zhou Z, Dong M, Xie X et al (2016) Fusion of infrared and visible images for night-vision context enhancement. Appl Opt 55(23):6480–6490
Google Scholar
Piella G (2003) A general framework for multiresolution image fusion: from pixels to regions. Inf Fusion 4(4):259–280
Google Scholar
Effect R (2022) Objective assessment of multiresolution image fusion algorithms for context enhancement in night vision. China Newspaper Industry (05):9
Jiao D, Li W, Ke L et al (2016) An overview of multi-modal medical image fusion. Neurocomputing 215:3–20
Google Scholar
Ghassemian H (2016) A review of remote sensing image fusion methods. Inf Fusion 75–89
Li J et al (2020) Deep pixel-level matching via attention for video co-segmentation. Appl Sci 10(6): 1948
Liu TH, Li GQ, Nie XN et al (2021) Enhancement of contour smoothness by substitution of interpolated sub-pixel points for edge pixels. IEEE Access 9:44236–44246
Google Scholar
Chong Y, Nie C, Tao Y et al (2020) Hcnet: Hierarchical context network for semantic segmentation. IEEE Access 8:179213–179223
Google Scholar
Xu L, Jing W, Song H et al (2019) High-resolution remote sensing image change detection combined with pixel-level and object-level. IEEE Access 7:78909–78918
Google Scholar
Bekkerman I, Tabrikian J (2006) Target detection and localization using mimo radars and sonars. IEEE Trans Signal Process 54(10):3873–3883
Google Scholar
Dolz J, Gopinath K, Yuan J et al (2018) Hyperdense-net: a hyper-densely connected cnn for multi-modal image segmentation. IEEE Trans Med Imaging 38(5):1116–1126
Google Scholar
Wang Y, Sun Y, Liu Z et al (2019) Dynamic graph cnn for learning on point clouds. ACM Trans Graph (ToG) 38(5):1–12
Google Scholar
Sun X, Li H (2005) A review of segmentation and application technology of 3d mesh model. J Comput Aided Des Graph 08:1647–1655
Google Scholar
Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv:1312.6114
Yi X, Walia E, Babyn P (2019) Generative adversarial network in medical imaging: A review. Med Image Anal 58:101552
Google Scholar
Litany O, Remez T, Rodolá E et al (2017) Deep functional maps: Structured prediction for dense shape correspondence. In: 2017 IEEE international conference on computer vision (ICCV)
Ranjan R, Patel VM, Chellappa R (2017) Hyperface: A deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Trans Pattern Anal Mach Intell 41(1):121–135
Google Scholar
Cheek A, Heurtier L, Perez-Gonzalez YF et al (2022) Primordial black hole evaporation and dark matter production. i. solely hawking radiation. Phys Rev D 105(1):015022
MathSciNet Google Scholar
Zhu JY, Zheng WS, Lu F et al (2017) Illumination invariant single face image recognition under heterogeneous lighting condition. Pattern Recognit 66:313–327
Google Scholar
Han S, Liu B, Cabezas R et al (2020) Megatrack: monochrome egocentric articulated hand-tracking for virtual reality. ACM Trans Graph (ToG) 39:87–18713
Google Scholar
Hu Y, Pu X, Sun Y et al (2013) Multi-source heterogeneous sensing data fusion method and its application in target positioning and tracking. Chin Sci: Inf Sci 43(10):1288–1306
Google Scholar
Buades A, Coll B, Morel JM (2005) A review of image denoising algorithms, with a new one. Multiscale Model Simul 4(2):490–530
MathSciNet Google Scholar
Jaritz M, Gu J, Su H (2019) Multi-view pointnet for 3d scene understanding. In: Proceedings of the IEEE/CVF international conference on computer vision workshops
Liang M, Yang B, Chen Y et al (2019) Mvp-net: Multi-view fpn with position-aware attention for deep universal lesion detection. In: international conference on medical image computing and computer-assisted intervention, pp. 13–21
Qi CR, Su H, Mo K et al (2017) Pointnet: Deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 652–660
Qi CR, Yi L, Su H et al (2017) Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In: Advances in neural information processing systems, vol. 30
Ku J, Mozifian M, Lee J et al (2018) Joint 3d proposal generation and object detection from view aggregation. In: 2018 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp. 1–8
Nabati R, Qi H (2020) Radar-camera sensor fusion for joint object detection and distance estimation in autonomous vehicles. arXiv:2009.08428
Liang M, Yang B, Chen Y et al (2019) Multi-task multi-sensor fusion for 3d object detection. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 7345–7353
Liang M, Yang B, Chen Y et al (2019) Multi-task multi-sensor fusion for 3d object detection. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 7345–7353
Zhang H, Lan X, Bai S et al (2019) Roi-based robotic grasp detection for object overlapping scenes. 2019 IEEE/RSJ international conference on intelligent robots and systems (IROS), 4768–4775
Effect R (2022) Reasonable choice of scenes to present the perfect effect. China Newspaper Industry (05): 9
Qi L, Kuen J, Wang Y et al (2021) Open-world entity segmentation. arXiv:2107.14228
Wang W, Feiszli M, Wang H et al (2021) Unidentified video objects: A benchmark for dense, open-world segmentation. Proceedings of the IEEE/CVF international conference on computer vision, 10776–10785
Bear D, Fan C, Mrowca D et al (2020) Learning physical graph representations from visual scenes. Adv Neural Inf Process Syst 33:6027–6039
Google Scholar
Tan K, Wang DL (2018) A convolutional recurrent neural network for real-time speech enhancement. Interspeech 3229–3233
Kong X, Yang X, Zhai G et al (2020) Semantic graph based place recognition for 3d point clouds. 2020 IEEE/RSJ international conference on intelligent robots and systems (IROS), 8216–8223
Bosse M, Zlot R (2013) Place recognition using keypoint voting in large 3d lidar datasets. 2013 IEEE international conference on robotics and automation, 2677–2684
Wohlkinger W, Vincze M (2011) Ensemble of shape functions for 3d object classification. In: 2011 IEEE international conference on robotics and biomimetics, pp. 2987–2992
Liu Z, Suo C, Zhou S et al (2019) Seqlpd: Sequence matching enhanced loop-closure detection based on large-scale point cloud description for self-driving vehicles. In: 2019 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp. 1218–1223
Liu Z, Zhou S, Suo C et al (2019) Lpd-net: 3d point cloud learning for large-scale place recognition and environment analysis. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 2831–2840
Xiao H, Chen Y, Shi X (2019) Knowledge graph embedding based on multi-view clustering framework. IEEE Trans Knowl Data Eng 33(2):585–596
Google Scholar
Wang J, Shi Y, Li D et al (2022) Mchale: a multistage clustering-based hierarchical attention model for knowledge graph-aware recommendation. World Wide Web 25(3), 1103–1127
Cheng G, Xie X, Han J et al (2020) Remote sensing image scene classification meets deep learning: Challenges, methods, benchmarks, and opportunities. IEEE J Sel Top Appl Earth Obs Remote Sens PP(99), 1
Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507

Download references

Funding

Funding projects: National Natural Science Foundation of China (61375084); Natural Science Foundation of Shandong Province (ZR2019MF064).

Author information

Authors and Affiliations

School of Information Science and Electrical Engineering, Shandong Jiaotong University, Changqing University Science Park, Jinan, 250357, Shandong, China
Jianxin Li, Guannan Si, Pengxin Tian & Zhaoliang An
School of Control Science and Engineering, Shandong University, 17923 Jingshi Road, Jinan, 250061, Shandong, China
Fengyu Zhou

Authors

Jianxin Li
View author publications
You can also search for this author inPubMed Google Scholar
Guannan Si
View author publications
You can also search for this author inPubMed Google Scholar
Pengxin Tian
View author publications
You can also search for this author inPubMed Google Scholar
Zhaoliang An
View author publications
You can also search for this author inPubMed Google Scholar
Fengyu Zhou
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Guannan Si.

Ethics declarations

Conflicts of interest

The authors do not have any possible conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, J., Si, G., Tian, P. et al. Overview of indoor scene recognition and representation methods based on multimodal knowledge graphs. Appl Intell 54, 899–923 (2024). https://doi.org/10.1007/s10489-023-05235-7

Download citation

Accepted: 10 December 2023
Published: 23 December 2023
Issue Date: January 2024
DOI: https://doi.org/10.1007/s10489-023-05235-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Overview of indoor scene recognition and representation methods based on multimodal knowledge graphs

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Computational knowledge vision: paradigmatic knowledge based prescriptive learning and reasoning for perception and vision

Knowledge Graphs Meet Geometry for Semi-supervised Monocular Depth Estimation

Double Graph Attention Networks for Visual Semantic Navigation

Explore related subjects

Availability of data and materials

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now