Skip to main content

Ontological Scene Graph Engineering and Reasoning Over YOLO Objects for Creating Panoramic VR Content

  • Conference paper
  • First Online:
Multi-disciplinary Trends in Artificial Intelligence (MIWAI 2023)

Abstract

Detecting objects in videos and harnessing their relationships for scene understanding is a challenging task in the computer vision domain. This has been attempted using the Scene Graph Generation (SGG) task. Recent YOLO models could track objects and establish spatial relationships among the detected objects. However, these deep neural networks are not capable of explaining the structural relationships among the objects that impair practical applications. The adoption of visual transformers is also not prudent as it leads to an increase in the complexity of the overall model. In this paper, an ontology-based scene graph engineering and reasoning approach over the extracted objects is proposed as a solution to this problem. First, the ontological model takes the detected objects from YOLO and generates corresponding entities and relationships. Then, the Semantic Web Rule (SWRL) is written on top of this model to discover the image sequence. And it also offers a machine-interpretable explanation for this sequence when a continuous stream (audio-visual) is constructed. Finally, this audio-visual is coupled with spatial media metadata to make it 360-degree panoramic viewable Virtual Reality (VR) content. It is found that the ontological model is a more versatile solution than the overall deep neural models. Overall, this methodology is helpful in various real-world scenarios for better learning and understanding of natural environments. For example, the audio explanation gadget helps specially-abled people navigate through cluttered environments such as metro rail stations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Zhu, G., et al.: Scene graph generation: a comprehensive survey. arXiv preprint arXiv:2201.00443 (2022)

  2. Lu, C., Krishna, R., Bernstein, M., Fei-Fei, L.: Visual relationship detection with language priors. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision – ECCV 2016. Lecture Notes in Computer Science, LNCS, vol. 9905, pp. 852–869. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_51

    Chapter  Google Scholar 

  3. Zhang, J., et al.: Large-scale visual relationship understanding. Proc. AAAI Conf. Artif. Intell. 33(01), 9185–9194 (2019)

    Google Scholar 

  4. Redmon, J., et al.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)

    Google Scholar 

  5. Cheng, J., et al.: Visual relationship detection: a survey. IEEE Trans. Cybern. 52(8), 8453–8466 (2022)

    Article  Google Scholar 

  6. Yang, M.: Visual Transformer for Object Detection. arXiv preprint arXiv:2206.06323 (2022)

  7. Cui, Y., Farazi, M.: VReBERT: a simple and flexible transformer for visual relationship detection. In: 2022 26th International Conference on Pattern Recognition (ICPR). IEEE (2022)

    Google Scholar 

  8. Adadi, A.: A survey on data-efficient algorithms in big data era. J. Big Data 8(1), 24 (2021)

    Article  Google Scholar 

  9. Amodeo, F., et al.: OG-SGG: ontology-guided scene graph generation—a case study in transfer learning for telepresence robotics. IEEE Access 10, 132564–132583 (2022)

    Article  Google Scholar 

  10. Rhee, T., et al.: Mr360: Mixed reality rendering for 360 panoramic videos. IEEE Trans. Visual. Comput. Graphics 23(4), 1379–1388 (2017)

    Article  Google Scholar 

  11. Johnson, J., Karpathy, A., Li, F.-F.: Densecap: fully convolutional localization networks for dense captioning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)

    Google Scholar 

  12. Li, Y., et al.: Scene graph generation from objects, phrases and region captions. In: Proceedings of the IEEE International Conference on Computer Vision (2017)

    Google Scholar 

  13. Essam, M., et al.: An enhanced object detection model for scene graph generation. In: Proceedings of the 8th International Conference on Advanced Intelligent Systems and Informatics 2022. Springer International Publishing, Cham (2022). https://doi.org/10.1007/978-3-031-20601-6_30

  14. Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) Computer Vision – ECCV 2020. Lecture Notes in Computer Science, LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13

    Chapter  Google Scholar 

  15. Chen, T., et al.: Knowledge-embedded routing network for scene graph generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019)

    Google Scholar 

  16. Baier, S., Ma, Y., Tresp, V.: Improving visual relationship detection using semantic modeling of scene descriptions. In: d’Amato, C., et al. (eds.) The Semantic Web – ISWC 2017. Lecture Notes in Computer Science LNCS, vol. 10587, pp. 53–68. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68288-4_4

    Chapter  Google Scholar 

  17. Zellers, R., et al.: Neural motifs: scene graph parsing with global context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)

    Google Scholar 

  18. Zheng, S., Chen, S., Jin, Q.: Visual relation detection with multi-level attention. In: Proceedings of the 27th ACM International Conference on Multimedia (2019)

    Google Scholar 

  19. Kolesnikov, A., et al.: Detecting visual relationships using box attention. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (2019)

    Google Scholar 

  20. Wang, L., et al.: Visual relationship detection with recurrent attention and negative sampling. Neurocomputing 434, 55–66 (2021)

    Article  Google Scholar 

  21. Dragoni, M., Ghidini, C., Busetta, P., Fruet, M., Pedrotti, M.: Using ontologies for modeling virtual reality scenarios. In: Gandon, F., Sabou, M., Sack, H., d’Amato, C., Cudré-Mauroux, P., Zimmermann, A. (eds.) The Semantic Web. Latest Advances and New Domains. ESWC 2015. LNCS, vol. 9088, pp. 575–590. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-18818-8_35

  22. Walczak, K., Flotyński, J.: Inference-based creation of synthetic 3D content with ontologies. Multimed. Tools Appl. 78(9), 12607–12638 (2018)

    Article  Google Scholar 

  23. Wang, M., et al.: VR content creation and exploration with deep learning: a survey. Comp. Visual Media 6, 3–28 (2020)

    Article  Google Scholar 

  24. Catherine, R., et al.: Explainable entity-based recommendations with knowledge graphs. arXiv preprint arXiv:1707.05254 (2017)

  25. Bao, Q., Witbrock, M., Liu, J.: Natural Language Processing and Reasoning (2022)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to D. Teja Santosh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Raj, N.P., Tarun, G., Santosh, D.T., Raghava, M. (2023). Ontological Scene Graph Engineering and Reasoning Over YOLO Objects for Creating Panoramic VR Content. In: Morusupalli, R., Dandibhotla, T.S., Atluri, V.V., Windridge, D., Lingras, P., Komati, V.R. (eds) Multi-disciplinary Trends in Artificial Intelligence. MIWAI 2023. Lecture Notes in Computer Science(), vol 14078. Springer, Cham. https://doi.org/10.1007/978-3-031-36402-0_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-36402-0_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-36401-3

  • Online ISBN: 978-3-031-36402-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics