Abstract
Detecting objects in videos and harnessing their relationships for scene understanding is a challenging task in the computer vision domain. This has been attempted using the Scene Graph Generation (SGG) task. Recent YOLO models could track objects and establish spatial relationships among the detected objects. However, these deep neural networks are not capable of explaining the structural relationships among the objects that impair practical applications. The adoption of visual transformers is also not prudent as it leads to an increase in the complexity of the overall model. In this paper, an ontology-based scene graph engineering and reasoning approach over the extracted objects is proposed as a solution to this problem. First, the ontological model takes the detected objects from YOLO and generates corresponding entities and relationships. Then, the Semantic Web Rule (SWRL) is written on top of this model to discover the image sequence. And it also offers a machine-interpretable explanation for this sequence when a continuous stream (audio-visual) is constructed. Finally, this audio-visual is coupled with spatial media metadata to make it 360-degree panoramic viewable Virtual Reality (VR) content. It is found that the ontological model is a more versatile solution than the overall deep neural models. Overall, this methodology is helpful in various real-world scenarios for better learning and understanding of natural environments. For example, the audio explanation gadget helps specially-abled people navigate through cluttered environments such as metro rail stations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Zhu, G., et al.: Scene graph generation: a comprehensive survey. arXiv preprint arXiv:2201.00443 (2022)
Lu, C., Krishna, R., Bernstein, M., Fei-Fei, L.: Visual relationship detection with language priors. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision – ECCV 2016. Lecture Notes in Computer Science, LNCS, vol. 9905, pp. 852–869. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_51
Zhang, J., et al.: Large-scale visual relationship understanding. Proc. AAAI Conf. Artif. Intell. 33(01), 9185–9194 (2019)
Redmon, J., et al.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Cheng, J., et al.: Visual relationship detection: a survey. IEEE Trans. Cybern. 52(8), 8453–8466 (2022)
Yang, M.: Visual Transformer for Object Detection. arXiv preprint arXiv:2206.06323 (2022)
Cui, Y., Farazi, M.: VReBERT: a simple and flexible transformer for visual relationship detection. In: 2022 26th International Conference on Pattern Recognition (ICPR). IEEE (2022)
Adadi, A.: A survey on data-efficient algorithms in big data era. J. Big Data 8(1), 24 (2021)
Amodeo, F., et al.: OG-SGG: ontology-guided scene graph generation—a case study in transfer learning for telepresence robotics. IEEE Access 10, 132564–132583 (2022)
Rhee, T., et al.: Mr360: Mixed reality rendering for 360 panoramic videos. IEEE Trans. Visual. Comput. Graphics 23(4), 1379–1388 (2017)
Johnson, J., Karpathy, A., Li, F.-F.: Densecap: fully convolutional localization networks for dense captioning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Li, Y., et al.: Scene graph generation from objects, phrases and region captions. In: Proceedings of the IEEE International Conference on Computer Vision (2017)
Essam, M., et al.: An enhanced object detection model for scene graph generation. In: Proceedings of the 8th International Conference on Advanced Intelligent Systems and Informatics 2022. Springer International Publishing, Cham (2022). https://doi.org/10.1007/978-3-031-20601-6_30
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) Computer Vision – ECCV 2020. Lecture Notes in Computer Science, LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Chen, T., et al.: Knowledge-embedded routing network for scene graph generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019)
Baier, S., Ma, Y., Tresp, V.: Improving visual relationship detection using semantic modeling of scene descriptions. In: d’Amato, C., et al. (eds.) The Semantic Web – ISWC 2017. Lecture Notes in Computer Science LNCS, vol. 10587, pp. 53–68. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68288-4_4
Zellers, R., et al.: Neural motifs: scene graph parsing with global context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)
Zheng, S., Chen, S., Jin, Q.: Visual relation detection with multi-level attention. In: Proceedings of the 27th ACM International Conference on Multimedia (2019)
Kolesnikov, A., et al.: Detecting visual relationships using box attention. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (2019)
Wang, L., et al.: Visual relationship detection with recurrent attention and negative sampling. Neurocomputing 434, 55–66 (2021)
Dragoni, M., Ghidini, C., Busetta, P., Fruet, M., Pedrotti, M.: Using ontologies for modeling virtual reality scenarios. In: Gandon, F., Sabou, M., Sack, H., d’Amato, C., Cudré-Mauroux, P., Zimmermann, A. (eds.) The Semantic Web. Latest Advances and New Domains. ESWC 2015. LNCS, vol. 9088, pp. 575–590. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-18818-8_35
Walczak, K., Flotyński, J.: Inference-based creation of synthetic 3D content with ontologies. Multimed. Tools Appl. 78(9), 12607–12638 (2018)
Wang, M., et al.: VR content creation and exploration with deep learning: a survey. Comp. Visual Media 6, 3–28 (2020)
Catherine, R., et al.: Explainable entity-based recommendations with knowledge graphs. arXiv preprint arXiv:1707.05254 (2017)
Bao, Q., Witbrock, M., Liu, J.: Natural Language Processing and Reasoning (2022)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Raj, N.P., Tarun, G., Santosh, D.T., Raghava, M. (2023). Ontological Scene Graph Engineering and Reasoning Over YOLO Objects for Creating Panoramic VR Content. In: Morusupalli, R., Dandibhotla, T.S., Atluri, V.V., Windridge, D., Lingras, P., Komati, V.R. (eds) Multi-disciplinary Trends in Artificial Intelligence. MIWAI 2023. Lecture Notes in Computer Science(), vol 14078. Springer, Cham. https://doi.org/10.1007/978-3-031-36402-0_20
Download citation
DOI: https://doi.org/10.1007/978-3-031-36402-0_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-36401-3
Online ISBN: 978-3-031-36402-0
eBook Packages: Computer ScienceComputer Science (R0)