An intelligent multimedia information system for multimodal content extraction and querying

Yazici, Adnan; Koyuncu, Murat; Yilmaz, Turgay; Sattari, Saeid; Sert, Mustafa; Gulen, Elvan

doi:10.1007/s11042-017-4378-6

An intelligent multimedia information system for multimodal content extraction and querying

Published: 31 January 2017

Volume 77, pages 2225–2260, (2018)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Adnan Yazici¹,
Murat Koyuncu²,
Turgay Yilmaz³,
Saeid Sattari¹,
Mustafa Sert⁴ &
…
Elvan Gulen⁵

717 Accesses
Explore all metrics

Abstract

This paper introduces an intelligent multimedia information system, which exploits machine learning and database technologies. The system extracts semantic contents of videos automatically by using the visual, auditory and textual modalities, then, stores the extracted contents in an appropriate format to retrieve them efficiently in subsequent requests for information. The semantic contents are extracted from these three modalities of data separately. Afterwards, the outputs from these modalities are fused to increase the accuracy of the object extraction process. The semantic contents that are extracted using the information fusion are stored in an intelligent and fuzzy object-oriented database system. In order to answer user queries efficiently, a multidimensional indexing mechanism that combines the extracted high-level semantic information with the low-level video features is developed. The proposed multimedia information system is implemented as a prototype and its performance is evaluated using news video datasets for answering content and concept-based queries considering all these modalities and their fused data. The performance results show that the developed multimedia information system is robust and scalable for large scale multimedia applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

METU-MMDS: An Intelligent Multimedia Database System for Multimodal Content Extraction and Querying

Efficient Multimedia Information Retrieval with Query Level Fusion

Free-Form Multi-Modal Multimedia Retrieval (4MR)

References

Aydinlilar M, Yazici A (2012) Semi-automatic semantic video annotation tool. In: 27th Int. Symposium on Computer and Information Sciences (ISCIS 2012), Computer and Information Sciences III. Springer, London, pp 303–310
Google Scholar
Aygun RS, Yazici A (2004) Modeling and management of fuzzy information in multimedia database applications. Multimedia Tools and Applications 24:29–56
Article Google Scholar
Bastan M, Cam H, Gudukbay U, Ulusoy O (2010) BilVideo-7: an MPEG-7- compatible video indexing and retrieval system. IEEE MultiMedia 17(3):62–73
Article Google Scholar
Benavent X, Garcia-Serrano A, Granados R, Benavent J, De Ves E (2013) Multimedia information retrieval based on late semantic fusion approaches: experiments on a Wikipedia image collection. IEEE Trans Multimedia 15(8):2009–2021
Article Google Scholar
Berchtold S, Keim DA, Kriegel H-P (1996) The X-Tree: An index structure for high-dimensional data. In: Proc. of the 22th International Conference on Very Large Data Bases, Morgan Kaufmann Publishers Inc., pp 28–39
Bertini M, Del Bimbo A, Torniai C (2005) Automatic video annotation using ontologies extended with visual information. In: Proc. of the 13th annual ACM international conference on Multimedia, ACM, pp 395–398
Brendan J, Hongzhi L, Joseph GE, Daniel M-A, Hih-Fu C (2013) Structured exploration of who, what, when, and where in heterogeneous multimedia news sources. In: Proc. of the 21st ACM international conference on Multimedia, ACM, pp 357–360
Bu S, Cheng S, Liu Z, Han J (2014) Multimodal feature fusion for 3D shape recognition and retrieval. IEEE Multimedia 21(4):38–46
Article Google Scholar
Calistru C, Riberio C, David G (2006) Multidimensional descriptor indexing: exploring the BitMatrix. In: CIVR 2006, lecture notes in computer science, 4071: 401–410, Springer Berlin Heidelberg
Daras P, Manolopoulou S, Axenopoulos A (2012) Search and retrieval of Rich Media objects supporting multiple multimodal queries. IEEE Trans Multimedia 14(3):734–746
Article Google Scholar
Datta R, Li J, Wang JZ (2005) Content-based image retrieval: approaches and trends of the new age. In: Proc. of the 7th ACM International Workshop on Multimedia Information Retrieval, ACM, pp 253–262
Deng Y, Manjunath BS (2001) Unsupervised segmentation of color-texture regions in images and video. IEEE Trans Pattern Anal Mach Intell 23(8):800–810
Article Google Scholar
Ekin A, Tekalp AM, Mehrotra R (2004) Integrated semantic-syntactic video modeling for search and browsing. IEEE Trans Multimedia 6:839–851
Article Google Scholar
Faloutsos C, Lin K-I (1995) Fastmap: A fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. In: Proc. of the int. conf. on management of data (SIGMOD 95), ACM, pp 163–174
Fan J, Elmagarmid AK, Zhu X, Aref WG, Wu L (2004) ClassView: hierarchical video shot classification, indexing, and accessing. IEEE Trans Multimedia 6:70–86
Article Google Scholar
Fei-Fei L, Fergus R, Perona P (2006) One-shot learning of object categories. IEEE Trans Pattern Anal Mach Intell 28(4):594–611
Article Google Scholar
Gonalves B, Calistru C, Riberio C, David G (2007) An evaluation framework for multidimensional multimedia descriptor indexing. In: 23rd Int. Conf. on Data Engineering (ICDE 2007), IEEE, pp 95–102
Gulen E, Yilmaz T, Yazici A (2012) Multimodal information fusion for semantic video analysis. Int J Multimedia Data Eng Manag 3(4):51–73
Article Google Scholar
Hacid MS, Decleir C, Kouloumdjian J (2000) A database approach for modeling and querying video data. IEEE Trans Knowl Data Eng 12:729–750
Article Google Scholar
Hardoon DR, Szedmak S, Taylor JS (2003) Canonical correlation analysis; An overview with application to learning methods. Royal Holloway, University of London, Technical Report CSD-TR-03-02
Hjelsvold R, Midtstraum R (2012) Modeling and querying video data. In: Proc. of the 20th Int. Conf. on Very Large Data Bases, Morgan Kaufmann Publishers Inc., pp. 686–694
Hotelling H (1936) Relations between two sets of variants. Biometrika 28:321–377
Article MATH Google Scholar
Jiang YG (2012) SUPER: Towards real-time event recognition in Internet videos. In: Proceedings of ACM international conference on multimedia retrieval (ICMR ‘12), ACM, Article no. 7
Jiang Y-G, Ye G, Chang S-F, Ellis D, Loui AC (2011) Consumer video understanding: a benchmark database and an evaluation of human and machine performance. In: Proc. Int. Conf. on Multimedia Retrieval (ICMR), ACM, Article No. 29
Jiang YG, Bhattacharya S, Chang SF, Shah M (2012) High-level event recognition in unconstrained videos. Int J Multimedia Information Retrieval 2(2):73–101
Article Google Scholar
Kucuk D, Yazici A (2012) A hybrid named entity recognizer for Turkish. Expert Syst Appl 39(3):2733–2742
Article MATH Google Scholar
Kucuk D, Ozgur NB, Yazici A, Koyuncu M (2009) A fuzzy conceptual model for multimedia data with a text-based automatic annotation scheme. Int J Uncertainty Fuzziness Knowledge Based Syst 17(1):135–152
Article Google Scholar
Kuss M, Graepel T (2003) The geometry of kernel canonical correlation analysis. In: Technical Report No. 108. Max Planck Institute
Lew MS, Sebe N, Djeraba C, Jain R (2006) Content-based multimedia information retrieval: state of the art and challenges. ACM Trans Multimed Comput Commun Appl 2:1–19
Article Google Scholar
Li J, Allinson N, Tao D, Li X (2006) Multitraining support vector machine for image retrieval. IEEE Trans Image Process 15(11):3597–3601
Article Google Scholar
Liu Y, Zhang D, Lu G, Ma W (2007) A survey of content-based image retrieval with high-level semantics. Pattern Recogn 40:262–282
Article MATH Google Scholar
Liu Z, Wang X, Bu S (2016) Human-centered saliency detection. IEEE Trans Neural Netw Learn Syst 27(6):1150–1162
Article MathSciNet Google Scholar
LSCOM Lexicon Definitions and Annotations Version 1.0, DTO Challenge Workshop on Large Scale Concept Ontology for Multimedia, Columbia University ADVENT Technical Report #217–2006-3, March 2006
Ma ZM, Yan L (2010) A literature overview of fuzzy conceptual data modeling. J Inf Sci Eng 26:427–441
Google Scholar
Marques O, Furht B (2002) MUSE: a content-based image search and retrieval system using relevance feedback. Multimedia Tools Applications 17:21–50
Article Google Scholar
Meng T, Shyu ML (2012a) Leveraging concept association network for multimedia rare concept mining and retrieval. In: Int. Conf. on Multimedia and Expo (ICME 2012), IEEE, pp. 860–865
Meng T, Shyu ML (2012b) Model-driven collaboration and information integration for enhancing video semantic concept detection. In: 13th Int. Conf. on Information Reuse and Integration (IRI 2012), IEEE, pp 144–151
Montagnuolo M, Messina A (2009) Parallel neural networks for multimodal video genre classification. Multimedia Tools Applications 41(1):125–159
Article Google Scholar
MPEG-7 http://mpeg.chiariglione.org/standards/mpeg-7. Accessed date 13.02.2013
NTVMSNBC http://www.ntvmsnbc.com/. Accessed date May 2013
Okuyucu C, Sert M, Yazici A (2013) Audio feature and classifier analysis for efficient recognition of environmental sounds. In: Proc. Int. Symposium on Multimedia (ISM 2013), IEEE, pp 125–132
Over P, Awad G, Kraaij W, Smeaton AF (2007) Trecvid 2007–overview. In TRECVID 2007, National Institute of Standards and Technology (NIST)
Ozgur NB, Koyuncu M, Yazici A (2009) An intelligent fuzzy object-oriented database framework for video database applications. Fuzzy Set Syst 160:2253–2274
Article MathSciNet Google Scholar
Petkovic M, Jonker W (2000) An overview of data models and query languages for content-based video retrieval. In: Int. Conf. on Advances in Infrastructure for E-Business, Science, and Education on the Internet
Rho S, Lee SC, Hwang E, Lee YK (2004) XCRAB: A Content and Annotation-Based Multimedia Indexing and Retrieval System. In: ICCSA 2004, LNCS 3046(4). Springer, pp 859–868
Safadi B, Sahuguet M, Huet B (2014) When textual and visual information join forces for multimedia retrieval. In: Proc. of Int. Conf. on Multimedia Retrieval (ICMR 2014), p 265
Saggion H, Cunningham H, Bontcheva K, Maynard D, Hamza O, Wilks Y (2004) Multimedia indexing through multi-source and multi-language information extraction: MUMIS project. Data Knowl Eng 48:247–264
Article Google Scholar
Salton G (1983) Introduction to modern information retrieval. McGraw-Hill
Sattari S, Yazici A (2015) Efficient Multimedia Information Retrieval with Query Level Fusion. In: the Int. Conf. on Flexible Query Answering Systems, Advances in Intelligent Systems and Computing, 400. Springer, pp 367–379
Shao J, Shen HT, Zhou X (2008) Challenges and techniques for effective and efficient similarity search in large video databases. Proceedings of the VLDB Endowment 1(2):1598–1603
Article Google Scholar
Smith JR (2013) Riding the multimedia big data wave. In: Proc. of the 36th int. ACM SIGIR conf. on Research and development in information retrieval (SIGIR 2013), ACM, pp 1–2
Stowell D, Giannoulis D, Benetos E, Lagrange M, Plumbley MD (2015) Detection and classification of acoustic scenes and events. IEEE Trans Multimedia 17(10):1733–1746
Article Google Scholar
Tusch R, Kosch H, Böszörményi L (2000) VIDEX: an integrated generic video indexing approach. In: Proc. of the eighth ACM int. conf. on Multimedia, ACM, pp 448–451
Wang G, Zhang Y, Fei-Fei L (2006) Using dependent regions for object categorization in a generative framework. In: Proc. of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), IEEE Computer Society, pp 1597–1604
Yan R, Hauptman A (2007) A review of text and image retrieval approaches for broadcast news video. Inf Retr 10(445):–484
Yazici A, Ince C, Koyuncu M (2008) FOOD index: a multidimensional index structure for similarity-based fuzzy object-oriented database models. IEEE Trans Fuzzy Syst 16(4):942–957
Article Google Scholar
Yazici Y, Sattari S, Yilmaz T, Sert M, Koyuncu M, Gulen E (2016) METU-MMDS: An Intelligent Multimedia Database System for Multimodal Content Extraction and Querying. In: the 22nd Int. Conf. on Multimedia Modelling (MMM 2016), LNCS 9516 (2). Springer, pp 354–360
Yilmaz T, Yazici A, Yildirim Y (2011) Exploiting class-specific features in multi-feature dissimilarity space for efficient querying of images. In: the int. conf. On flexible query answering systems, LNCS, Springer, Berlin Heidelberg 7022: 149–161
Yilmaz T, Yazici A, Kitsuregawa M (2014) RELIEF-MM: effective modality weighting for multimedia information retrieval. Multimedia Systems 20(4):389–413
Article Google Scholar

Download references

Acknowledgements

This work is supported by the research grants from TUBITAK with the grant numbers “MFAG-114R082”. We thank to all of previous researchers of Multimedia DB Lab. at METU and Ahmet Cosar, who have contributed to this research.

Author information

Authors and Affiliations

Computer Engineering, Middle East Technical University, Ankara, Turkey
Adnan Yazici & Saeid Sattari
Information System Engineering, Atilim University, Ankara, Turkey
Murat Koyuncu
Command Control & Combat Systems, Havelsan Inc., Ankara, Turkey
Turgay Yilmaz
Computer Engineering, Baskent University, Ankara, Turkey
Mustafa Sert
C+E Management, Microsoft Corporation, Redmond, WA, USA
Elvan Gulen

Authors

Adnan Yazici
View author publications
You can also search for this author in PubMed Google Scholar
Murat Koyuncu
View author publications
You can also search for this author in PubMed Google Scholar
Turgay Yilmaz
View author publications
You can also search for this author in PubMed Google Scholar
Saeid Sattari
View author publications
You can also search for this author in PubMed Google Scholar
Mustafa Sert
View author publications
You can also search for this author in PubMed Google Scholar
Elvan Gulen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Murat Koyuncu.

Appendix: Example Screenshots of Developed System

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yazici, A., Koyuncu, M., Yilmaz, T. et al. An intelligent multimedia information system for multimodal content extraction and querying. Multimed Tools Appl 77, 2225–2260 (2018). https://doi.org/10.1007/s11042-017-4378-6

Download citation

Received: 16 February 2016
Revised: 05 January 2017
Accepted: 09 January 2017
Published: 31 January 2017
Issue Date: January 2018
DOI: https://doi.org/10.1007/s11042-017-4378-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An intelligent multimedia information system for multimodal content extraction and querying

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

METU-MMDS: An Intelligent Multimedia Database System for Multimodal Content Extraction and Querying

Efficient Multimedia Information Retrieval with Query Level Fusion

Free-Form Multi-Modal Multimedia Retrieval (4MR)

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix: Example Screenshots of Developed System

Appendix: Example Screenshots of Developed System

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now