Skip to main content
Log in

An intelligent multimedia information system for multimodal content extraction and querying

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

This paper introduces an intelligent multimedia information system, which exploits machine learning and database technologies. The system extracts semantic contents of videos automatically by using the visual, auditory and textual modalities, then, stores the extracted contents in an appropriate format to retrieve them efficiently in subsequent requests for information. The semantic contents are extracted from these three modalities of data separately. Afterwards, the outputs from these modalities are fused to increase the accuracy of the object extraction process. The semantic contents that are extracted using the information fusion are stored in an intelligent and fuzzy object-oriented database system. In order to answer user queries efficiently, a multidimensional indexing mechanism that combines the extracted high-level semantic information with the low-level video features is developed. The proposed multimedia information system is implemented as a prototype and its performance is evaluated using news video datasets for answering content and concept-based queries considering all these modalities and their fused data. The performance results show that the developed multimedia information system is robust and scalable for large scale multimedia applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

References

  1. Aydinlilar M, Yazici A (2012) Semi-automatic semantic video annotation tool. In: 27th Int. Symposium on Computer and Information Sciences (ISCIS 2012), Computer and Information Sciences III. Springer, London, pp 303–310

    Google Scholar 

  2. Aygun RS, Yazici A (2004) Modeling and management of fuzzy information in multimedia database applications. Multimedia Tools and Applications 24:29–56

    Article  Google Scholar 

  3. Bastan M, Cam H, Gudukbay U, Ulusoy O (2010) BilVideo-7: an MPEG-7- compatible video indexing and retrieval system. IEEE MultiMedia 17(3):62–73

    Article  Google Scholar 

  4. Benavent X, Garcia-Serrano A, Granados R, Benavent J, De Ves E (2013) Multimedia information retrieval based on late semantic fusion approaches: experiments on a Wikipedia image collection. IEEE Trans Multimedia 15(8):2009–2021

    Article  Google Scholar 

  5. Berchtold S, Keim DA, Kriegel H-P (1996) The X-Tree: An index structure for high-dimensional data. In: Proc. of the 22th International Conference on Very Large Data Bases, Morgan Kaufmann Publishers Inc., pp 28–39

  6. Bertini M, Del Bimbo A, Torniai C (2005) Automatic video annotation using ontologies extended with visual information. In: Proc. of the 13th annual ACM international conference on Multimedia, ACM, pp 395–398

  7. Brendan J, Hongzhi L, Joseph GE, Daniel M-A, Hih-Fu C (2013) Structured exploration of who, what, when, and where in heterogeneous multimedia news sources. In: Proc. of the 21st ACM international conference on Multimedia, ACM, pp 357–360

  8. Bu S, Cheng S, Liu Z, Han J (2014) Multimodal feature fusion for 3D shape recognition and retrieval. IEEE Multimedia 21(4):38–46

    Article  Google Scholar 

  9. Calistru C, Riberio C, David G (2006) Multidimensional descriptor indexing: exploring the BitMatrix. In: CIVR 2006, lecture notes in computer science, 4071: 401–410, Springer Berlin Heidelberg

  10. Daras P, Manolopoulou S, Axenopoulos A (2012) Search and retrieval of Rich Media objects supporting multiple multimodal queries. IEEE Trans Multimedia 14(3):734–746

    Article  Google Scholar 

  11. Datta R, Li J, Wang JZ (2005) Content-based image retrieval: approaches and trends of the new age. In: Proc. of the 7th ACM International Workshop on Multimedia Information Retrieval, ACM, pp 253–262

  12. Deng Y, Manjunath BS (2001) Unsupervised segmentation of color-texture regions in images and video. IEEE Trans Pattern Anal Mach Intell 23(8):800–810

    Article  Google Scholar 

  13. Ekin A, Tekalp AM, Mehrotra R (2004) Integrated semantic-syntactic video modeling for search and browsing. IEEE Trans Multimedia 6:839–851

    Article  Google Scholar 

  14. Faloutsos C, Lin K-I (1995) Fastmap: A fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. In: Proc. of the int. conf. on management of data (SIGMOD 95), ACM, pp 163–174

  15. Fan J, Elmagarmid AK, Zhu X, Aref WG, Wu L (2004) ClassView: hierarchical video shot classification, indexing, and accessing. IEEE Trans Multimedia 6:70–86

    Article  Google Scholar 

  16. Fei-Fei L, Fergus R, Perona P (2006) One-shot learning of object categories. IEEE Trans Pattern Anal Mach Intell 28(4):594–611

    Article  Google Scholar 

  17. Gonalves B, Calistru C, Riberio C, David G (2007) An evaluation framework for multidimensional multimedia descriptor indexing. In: 23rd Int. Conf. on Data Engineering (ICDE 2007), IEEE, pp 95–102

  18. Gulen E, Yilmaz T, Yazici A (2012) Multimodal information fusion for semantic video analysis. Int J Multimedia Data Eng Manag 3(4):51–73

    Article  Google Scholar 

  19. Hacid MS, Decleir C, Kouloumdjian J (2000) A database approach for modeling and querying video data. IEEE Trans Knowl Data Eng 12:729–750

    Article  Google Scholar 

  20. Hardoon DR, Szedmak S, Taylor JS (2003) Canonical correlation analysis; An overview with application to learning methods. Royal Holloway, University of London, Technical Report CSD-TR-03-02

  21. Hjelsvold R, Midtstraum R (2012) Modeling and querying video data. In: Proc. of the 20th Int. Conf. on Very Large Data Bases, Morgan Kaufmann Publishers Inc., pp. 686–694

  22. Hotelling H (1936) Relations between two sets of variants. Biometrika 28:321–377

    Article  MATH  Google Scholar 

  23. Jiang YG (2012) SUPER: Towards real-time event recognition in Internet videos. In: Proceedings of ACM international conference on multimedia retrieval (ICMR ‘12), ACM, Article no. 7

  24. Jiang Y-G, Ye G, Chang S-F, Ellis D, Loui AC (2011) Consumer video understanding: a benchmark database and an evaluation of human and machine performance. In: Proc. Int. Conf. on Multimedia Retrieval (ICMR), ACM, Article No. 29

  25. Jiang YG, Bhattacharya S, Chang SF, Shah M (2012) High-level event recognition in unconstrained videos. Int J Multimedia Information Retrieval 2(2):73–101

    Article  Google Scholar 

  26. Kucuk D, Yazici A (2012) A hybrid named entity recognizer for Turkish. Expert Syst Appl 39(3):2733–2742

    Article  MATH  Google Scholar 

  27. Kucuk D, Ozgur NB, Yazici A, Koyuncu M (2009) A fuzzy conceptual model for multimedia data with a text-based automatic annotation scheme. Int J Uncertainty Fuzziness Knowledge Based Syst 17(1):135–152

    Article  Google Scholar 

  28. Kuss M, Graepel T (2003) The geometry of kernel canonical correlation analysis. In: Technical Report No. 108. Max Planck Institute

  29. Lew MS, Sebe N, Djeraba C, Jain R (2006) Content-based multimedia information retrieval: state of the art and challenges. ACM Trans Multimed Comput Commun Appl 2:1–19

    Article  Google Scholar 

  30. Li J, Allinson N, Tao D, Li X (2006) Multitraining support vector machine for image retrieval. IEEE Trans Image Process 15(11):3597–3601

    Article  Google Scholar 

  31. Liu Y, Zhang D, Lu G, Ma W (2007) A survey of content-based image retrieval with high-level semantics. Pattern Recogn 40:262–282

    Article  MATH  Google Scholar 

  32. Liu Z, Wang X, Bu S (2016) Human-centered saliency detection. IEEE Trans Neural Netw Learn Syst 27(6):1150–1162

    Article  MathSciNet  Google Scholar 

  33. LSCOM Lexicon Definitions and Annotations Version 1.0, DTO Challenge Workshop on Large Scale Concept Ontology for Multimedia, Columbia University ADVENT Technical Report #217–2006-3, March 2006

  34. Ma ZM, Yan L (2010) A literature overview of fuzzy conceptual data modeling. J Inf Sci Eng 26:427–441

    Google Scholar 

  35. Marques O, Furht B (2002) MUSE: a content-based image search and retrieval system using relevance feedback. Multimedia Tools Applications 17:21–50

    Article  Google Scholar 

  36. Meng T, Shyu ML (2012a) Leveraging concept association network for multimedia rare concept mining and retrieval. In: Int. Conf. on Multimedia and Expo (ICME 2012), IEEE, pp. 860–865

  37. Meng T, Shyu ML (2012b) Model-driven collaboration and information integration for enhancing video semantic concept detection. In: 13th Int. Conf. on Information Reuse and Integration (IRI 2012), IEEE, pp 144–151

  38. Montagnuolo M, Messina A (2009) Parallel neural networks for multimodal video genre classification. Multimedia Tools Applications 41(1):125–159

    Article  Google Scholar 

  39. MPEG-7 http://mpeg.chiariglione.org/standards/mpeg-7. Accessed date 13.02.2013

  40. NTVMSNBC http://www.ntvmsnbc.com/. Accessed date May 2013

  41. Okuyucu C, Sert M, Yazici A (2013) Audio feature and classifier analysis for efficient recognition of environmental sounds. In: Proc. Int. Symposium on Multimedia (ISM 2013), IEEE, pp 125–132

  42. Over P, Awad G, Kraaij W, Smeaton AF (2007) Trecvid 2007–overview. In TRECVID 2007, National Institute of Standards and Technology (NIST)

  43. Ozgur NB, Koyuncu M, Yazici A (2009) An intelligent fuzzy object-oriented database framework for video database applications. Fuzzy Set Syst 160:2253–2274

    Article  MathSciNet  Google Scholar 

  44. Petkovic M, Jonker W (2000) An overview of data models and query languages for content-based video retrieval. In: Int. Conf. on Advances in Infrastructure for E-Business, Science, and Education on the Internet

  45. Rho S, Lee SC, Hwang E, Lee YK (2004) XCRAB: A Content and Annotation-Based Multimedia Indexing and Retrieval System. In: ICCSA 2004, LNCS 3046(4). Springer, pp 859–868

  46. Safadi B, Sahuguet M, Huet B (2014) When textual and visual information join forces for multimedia retrieval. In: Proc. of Int. Conf. on Multimedia Retrieval (ICMR 2014), p 265

  47. Saggion H, Cunningham H, Bontcheva K, Maynard D, Hamza O, Wilks Y (2004) Multimedia indexing through multi-source and multi-language information extraction: MUMIS project. Data Knowl Eng 48:247–264

    Article  Google Scholar 

  48. Salton G (1983) Introduction to modern information retrieval. McGraw-Hill

  49. Sattari S, Yazici A (2015) Efficient Multimedia Information Retrieval with Query Level Fusion. In: the Int. Conf. on Flexible Query Answering Systems, Advances in Intelligent Systems and Computing, 400. Springer, pp 367–379

  50. Shao J, Shen HT, Zhou X (2008) Challenges and techniques for effective and efficient similarity search in large video databases. Proceedings of the VLDB Endowment 1(2):1598–1603

    Article  Google Scholar 

  51. Smith JR (2013) Riding the multimedia big data wave. In: Proc. of the 36th int. ACM SIGIR conf. on Research and development in information retrieval (SIGIR 2013), ACM, pp 1–2

  52. Stowell D, Giannoulis D, Benetos E, Lagrange M, Plumbley MD (2015) Detection and classification of acoustic scenes and events. IEEE Trans Multimedia 17(10):1733–1746

    Article  Google Scholar 

  53. Tusch R, Kosch H, Böszörményi L (2000) VIDEX: an integrated generic video indexing approach. In: Proc. of the eighth ACM int. conf. on Multimedia, ACM, pp 448–451

  54. Wang G, Zhang Y, Fei-Fei L (2006) Using dependent regions for object categorization in a generative framework. In: Proc. of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), IEEE Computer Society, pp 1597–1604

  55. Yan R, Hauptman A (2007) A review of text and image retrieval approaches for broadcast news video. Inf Retr 10(445):–484

  56. Yazici A, Ince C, Koyuncu M (2008) FOOD index: a multidimensional index structure for similarity-based fuzzy object-oriented database models. IEEE Trans Fuzzy Syst 16(4):942–957

    Article  Google Scholar 

  57. Yazici Y, Sattari S, Yilmaz T, Sert M, Koyuncu M, Gulen E (2016) METU-MMDS: An Intelligent Multimedia Database System for Multimodal Content Extraction and Querying. In: the 22nd Int. Conf. on Multimedia Modelling (MMM 2016), LNCS 9516 (2). Springer, pp 354–360

  58. Yilmaz T, Yazici A, Yildirim Y (2011) Exploiting class-specific features in multi-feature dissimilarity space for efficient querying of images. In: the int. conf. On flexible query answering systems, LNCS, Springer, Berlin Heidelberg 7022: 149–161

  59. Yilmaz T, Yazici A, Kitsuregawa M (2014) RELIEF-MM: effective modality weighting for multimedia information retrieval. Multimedia Systems 20(4):389–413

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by the research grants from TUBITAK with the grant numbers “MFAG-114R082”. We thank to all of previous researchers of Multimedia DB Lab. at METU and Ahmet Cosar, who have contributed to this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Murat Koyuncu.

Appendix: Example Screenshots of Developed System

Appendix: Example Screenshots of Developed System

Fig. 17
figure 17

Example screen for semantic concept extractor: a given video is divided into shots (upper-left table). For the selected shot (shot 18) four keyframes are detected (second table). The selected keyframe is segmented and the objects in segmented parts are recognized (image). One of the segmented parts is selected and marked in red. The semantic content extractor determines this object as a football player with a score of 1.0 (shown in the table under the image)

Fig. 18
figure 18

Example screen for a query-by-content (QBE): an image (image at the lower part of screen) is given as an example and videos containing similar images are queried. The video shot given at the top is one of the answers returned by the system. We see a car accident in the query image and the answer image contains a car crashing into a shop

Fig. 19
figure 19

An example of a query-by-concept query using both data level fusion and query level fusion: In this query, “fire” is given as a concept and the system returns a list of video shots in which there are fire events (lower-right table). The first video shot is selected and displayed

Fig. 20
figure 20

An example multimodal query containing visual, audio and text modals: In this query, we search for video shots that are related to tennis videos containing tennis court and tennis players in visual modal; applause and crowd events in audio modal; the tennis player Federer in text modal. A number of video shots are selected and displayed based on their matching scores in decreasing order. The best matched result is shown at the top of the screen capture

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yazici, A., Koyuncu, M., Yilmaz, T. et al. An intelligent multimedia information system for multimodal content extraction and querying. Multimed Tools Appl 77, 2225–2260 (2018). https://doi.org/10.1007/s11042-017-4378-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-017-4378-6

Keywords

Navigation