Ontological Scene Graph Engineering and Reasoning Over YOLO Objects for Creating Panoramic VR Content

Raj, N. Prabhas; Tarun, G.; Santosh, D. Teja; Raghava, M.

doi:10.1007/978-3-031-36402-0_20

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14078))

Included in the following conference series:

International Conference on Multi-disciplinary Trends in Artificial Intelligence

595 Accesses

Abstract

Detecting objects in videos and harnessing their relationships for scene understanding is a challenging task in the computer vision domain. This has been attempted using the Scene Graph Generation (SGG) task. Recent YOLO models could track objects and establish spatial relationships among the detected objects. However, these deep neural networks are not capable of explaining the structural relationships among the objects that impair practical applications. The adoption of visual transformers is also not prudent as it leads to an increase in the complexity of the overall model. In this paper, an ontology-based scene graph engineering and reasoning approach over the extracted objects is proposed as a solution to this problem. First, the ontological model takes the detected objects from YOLO and generates corresponding entities and relationships. Then, the Semantic Web Rule (SWRL) is written on top of this model to discover the image sequence. And it also offers a machine-interpretable explanation for this sequence when a continuous stream (audio-visual) is constructed. Finally, this audio-visual is coupled with spatial media metadata to make it 360-degree panoramic viewable Virtual Reality (VR) content. It is found that the ontological model is a more versatile solution than the overall deep neural models. Overall, this methodology is helpful in various real-world scenarios for better learning and understanding of natural environments. For example, the audio explanation gadget helps specially-abled people navigate through cluttered environments such as metro rail stations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Zhu, G., et al.: Scene graph generation: a comprehensive survey. arXiv preprint arXiv:2201.00443 (2022)
Lu, C., Krishna, R., Bernstein, M., Fei-Fei, L.: Visual relationship detection with language priors. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision – ECCV 2016. Lecture Notes in Computer Science, LNCS, vol. 9905, pp. 852–869. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_51
Chapter Google Scholar
Zhang, J., et al.: Large-scale visual relationship understanding. Proc. AAAI Conf. Artif. Intell. 33(01), 9185–9194 (2019)
Google Scholar
Redmon, J., et al.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Google Scholar
Cheng, J., et al.: Visual relationship detection: a survey. IEEE Trans. Cybern. 52(8), 8453–8466 (2022)
Article Google Scholar
Yang, M.: Visual Transformer for Object Detection. arXiv preprint arXiv:2206.06323 (2022)
Cui, Y., Farazi, M.: VReBERT: a simple and flexible transformer for visual relationship detection. In: 2022 26th International Conference on Pattern Recognition (ICPR). IEEE (2022)
Google Scholar
Adadi, A.: A survey on data-efficient algorithms in big data era. J. Big Data 8(1), 24 (2021)
Article Google Scholar
Amodeo, F., et al.: OG-SGG: ontology-guided scene graph generation—a case study in transfer learning for telepresence robotics. IEEE Access 10, 132564–132583 (2022)
Article Google Scholar
Rhee, T., et al.: Mr360: Mixed reality rendering for 360 panoramic videos. IEEE Trans. Visual. Comput. Graphics 23(4), 1379–1388 (2017)
Article Google Scholar
Johnson, J., Karpathy, A., Li, F.-F.: Densecap: fully convolutional localization networks for dense captioning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Google Scholar
Li, Y., et al.: Scene graph generation from objects, phrases and region captions. In: Proceedings of the IEEE International Conference on Computer Vision (2017)
Google Scholar
Essam, M., et al.: An enhanced object detection model for scene graph generation. In: Proceedings of the 8th International Conference on Advanced Intelligent Systems and Informatics 2022. Springer International Publishing, Cham (2022). https://doi.org/10.1007/978-3-031-20601-6_30
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) Computer Vision – ECCV 2020. Lecture Notes in Computer Science, LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Chapter Google Scholar
Chen, T., et al.: Knowledge-embedded routing network for scene graph generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019)
Google Scholar
Baier, S., Ma, Y., Tresp, V.: Improving visual relationship detection using semantic modeling of scene descriptions. In: d’Amato, C., et al. (eds.) The Semantic Web – ISWC 2017. Lecture Notes in Computer Science LNCS, vol. 10587, pp. 53–68. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68288-4_4
Chapter Google Scholar
Zellers, R., et al.: Neural motifs: scene graph parsing with global context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)
Google Scholar
Zheng, S., Chen, S., Jin, Q.: Visual relation detection with multi-level attention. In: Proceedings of the 27th ACM International Conference on Multimedia (2019)
Google Scholar
Kolesnikov, A., et al.: Detecting visual relationships using box attention. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (2019)
Google Scholar
Wang, L., et al.: Visual relationship detection with recurrent attention and negative sampling. Neurocomputing 434, 55–66 (2021)
Article Google Scholar
Dragoni, M., Ghidini, C., Busetta, P., Fruet, M., Pedrotti, M.: Using ontologies for modeling virtual reality scenarios. In: Gandon, F., Sabou, M., Sack, H., d’Amato, C., Cudré-Mauroux, P., Zimmermann, A. (eds.) The Semantic Web. Latest Advances and New Domains. ESWC 2015. LNCS, vol. 9088, pp. 575–590. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-18818-8_35
Walczak, K., Flotyński, J.: Inference-based creation of synthetic 3D content with ontologies. Multimed. Tools Appl. 78(9), 12607–12638 (2018)
Article Google Scholar
Wang, M., et al.: VR content creation and exploration with deep learning: a survey. Comp. Visual Media 6, 3–28 (2020)
Article Google Scholar
Catherine, R., et al.: Explainable entity-based recommendations with knowledge graphs. arXiv preprint arXiv:1707.05254 (2017)
Bao, Q., Witbrock, M., Liu, J.: Natural Language Processing and Reasoning (2022)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, CVR College of Engineering, Vastunagar, Mangalpalli, Telangana State, Ibrahimpatnam, India
N. Prabhas Raj, G. Tarun, D. Teja Santosh & M. Raghava

Authors

N. Prabhas Raj
View author publications
You can also search for this author in PubMed Google Scholar
G. Tarun
View author publications
You can also search for this author in PubMed Google Scholar
D. Teja Santosh
View author publications
You can also search for this author in PubMed Google Scholar
M. Raghava
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to D. Teja Santosh .

Editor information

Editors and Affiliations

CVR College of Engineering, Hyberabad, India
Raghava Morusupalli
CVR College of Engineering, Hyderabad, India
Teja Santosh Dandibhotla
CVR College of Engineering, Hyderabad, India
Vani Vathsala Atluri
Middlesex University, London, UK
David Windridge
Saint Mary's University, Halifax, NS, Canada
Pawan Lingras
CVR College of Engineering, Hyderabad, India
Venkateswara Rao Komati

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Raj, N.P., Tarun, G., Santosh, D.T., Raghava, M. (2023). Ontological Scene Graph Engineering and Reasoning Over YOLO Objects for Creating Panoramic VR Content. In: Morusupalli, R., Dandibhotla, T.S., Atluri, V.V., Windridge, D., Lingras, P., Komati, V.R. (eds) Multi-disciplinary Trends in Artificial Intelligence. MIWAI 2023. Lecture Notes in Computer Science(), vol 14078. Springer, Cham. https://doi.org/10.1007/978-3-031-36402-0_20

Download citation

DOI: https://doi.org/10.1007/978-3-031-36402-0_20
Published: 24 June 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-36401-3
Online ISBN: 978-3-031-36402-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics