Skip to main content
Log in

Multimedia data mining: state of the art and challenges

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Advances in multimedia data acquisition and storage technology have led to the growth of very large multimedia databases. Analyzing this huge amount of multimedia data to discover useful knowledge is a challenging problem. This challenge has opened the opportunity for research in Multimedia Data Mining (MDM). Multimedia data mining can be defined as the process of finding interesting patterns from media data such as audio, video, image and text that are not ordinarily accessible by basic queries and associated results. The motivation for doing MDM is to use the discovered patterns to improve decision making. MDM has therefore attracted significant research efforts in developing methods and tools to organize, manage, search and perform domain specific tasks for data from domains such as surveillance, meetings, broadcast news, sports, archives, movies, medical data, as well as personal and online media collections. This paper presents a survey on the problems and solutions in Multimedia Data Mining, approached from the following angles: feature extraction, transformation and representation techniques, data mining techniques, and current multimedia data mining systems in various application domains. We discuss main aspects of feature extraction, transformation and representation techniques. These aspects are: level of feature extraction, feature fusion, features synchronization, feature correlation discovery and accurate representation of multimedia data. Comparison of MDM techniques with state of the art video processing, audio processing and image processing techniques is also provided. Similarly, we compare MDM techniques with the state of the art data mining techniques involving clustering, classification, sequence pattern mining, association rule mining and visualization. We review current multimedia data mining systems in detail, grouping them according to problem formulations and approaches. The review includes supervised and unsupervised discovery of events and actions from one or more continuous sequences. We also do a detailed analysis to understand what has been achieved and what are the remaining gaps where future research efforts could be focussed. We then conclude this survey with a look at open research directions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Similar content being viewed by others

References

  1. Agrawal R, Imielinski T, Swami AN (1993) Mining association rules between sets of items in large databases. In: ACM SIGMOD international conference on management of data, pp 207–216

  2. Agrawal R, Srikant R (1995) Mining sequential patterns. In: International conference on data engineering

  3. Ajmera J, McCowan I, Bourlard H (2002) Robust hmm-based speech/music segmentation. In: IEEE international conference on acoustics, speech and signal processing, pp 1746–1749

  4. Aradhye H, Toderici G, Yagnik J (2009) Video2text: learning to annotate video content. In: International conference on data mining workshops, pp 144–151

  5. Artigan JA (1975) Clustering algorithms. Wiley, New York

    Google Scholar 

  6. Baillie M, Jose JM (2004) An audio-based sports video segmentation and event detection algorithm. In: Workshop on event mining, detection and recognition of events in video

  7. Barnard K, Duygulu P, Forsyth DA, de Freitas N, Blei DM, Jordan MI (2003) Matching words and pictures. J Mach Learn Res 3:1107–1135

    Article  MATH  Google Scholar 

  8. Benitez AB, Smith JR, Chang SF (2000) A multimedia information network for knowledge representation. SPIE, Bellingham

    Google Scholar 

  9. Box G, Jenkins GM, Reinsel G (1994) Time series analysis: forecasting and control. Pearson Education, Paris

    MATH  Google Scholar 

  10. Briggs F, Raich R, Fern X (2009) Audio classification of bird species: a statistical manifold approach. In: IEEE international conference on data mining (ICDM), pp 51–60

  11. Chang E, Goh K, Sychay G, Wu G (2002) Content-based annotation for multimodal image retrieval using bayes point machines. IEEE Trans Circuits Syst Video Technol 13(1):26–38

    Article  Google Scholar 

  12. Chang E, Li C, Wang J (1999) Searching near replicas of image via clustering. In: SPIE multimedia storage and archiving systems, vol 6

  13. Chen M, Chen SC, Shyu ML (2007) Hierarchical temporal association mining for video event detection in video databases. In: Multimedia databases and data management

  14. Chen M, Chen SC, Shyu ML, Wickramaratna K (2006) Semantic event detection via multimodal data mining. IEEE Signal Process Mag 23:38–46

    Article  Google Scholar 

  15. Chen SC, Shyu ML, Zhang C, Strickrott J (2001) Multimedia data mininig for traffic video sequenices. In: ACM SIGKDD

  16. Chen SC, Shyu ML, Chen M, Zhang C (2004) A decision tree-based multimodal data mining framework for soccer goal detection. In: IEEE international conference multimedia and expo, pp 265–268

  17. Dai K, Zhang J, Li G (2006) Video mining: concepts, approaches and applications. In: Multi-media modelling

  18. Darrell T, Pentland A (1993) Space-time gestures. In: IEEE Computing Society conference on computer vision and pattern recognition, pp 335–340

  19. Davis SB, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Signal Process 28(4):357–366

    Article  Google Scholar 

  20. Dhillon I (2001) Co-clustering documents and words using bipartite spectral graph partitioning. In: ACM SIGKDD

  21. Dimitriadis D, Maragos P (2003) Robust energy demodulation based on continuous models with application to speech recognition. In: European conference on speech communication and technology

  22. Duda R, Hart P, Stork D (2001) Pattern classification. Wiley, New York

    MATH  Google Scholar 

  23. El-Maleh K, Klein M, Petrucci G, Kabal P (2000) Speech/music discrimination for multimedia application. In: International conference on acoustics, speech and signal processing, pp 2445–2448

  24. Ellom BL, Hansen JHL (1998) Automatic segmentation of speech recorded in uknown noisy channel characteristics. Speech Commun 25:97–116

    Article  Google Scholar 

  25. Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: International conference on knowledge discovery and data mining, pp 226–231

  26. Fu CS, Chen W, Jianhao MH, Sundaram H, Zhong D (1998) A fully automated content based video search engine supporting spatio-temporal queries. IEEE Trans Circuits Syst Video Technol 8(5):602–615

    Article  Google Scholar 

  27. Faloutsos C, Equitz W, Flickner M, Niblack W, Petkovic D, Barber R (1994) Efficient and effective querying by image content. Journal of Intelligent Information Systems 3:231–262

    Article  Google Scholar 

  28. Fan J, Gao Y, Luo H (2007) Hierarchical classification for automatic image annotation. In: ACM SIGIR, pp 111–118

  29. Fan J, Gao Y, Luo H, Jain R (2008) Mining multilevel image semantics via hierarchical classification. IEEE Trans Multimedia 10(2):167–187

    Article  Google Scholar 

  30. Fan J, Gao Y, Luo H, Xu G (2005) Statistical modeling and conceptualization of natural scenes. Pattern Recogn 38(6):865–885

    Article  Google Scholar 

  31. Fersini E, Messina E, Arosio G, Archetti F (2009) Audio-based emotion recognition in judicial domain: a multilayer support vector machines approach. In: Machine learning and data mining in pattern recognition (MLDM), pp 594–602

  32. Foote JT (1997) Content-based retrieval of music and audio. SPIE 3229:138–147

    Article  Google Scholar 

  33. Forsati R, Mahdavi M (2010) Web text mining using harmony search. In: Recent advances in harmony search algorithm, pp 51–64

  34. Frakes WB, Baeza-Yates R (1992) Information retrieval: data structures and algorithms. Prentice-Hall, Englewood Cliffs

    Google Scholar 

  35. Frigui H, Caudill J (2007) Mining visual and textual data for constructing a multi-modal thesaurus. In: SIAM international conference on data mining

  36. Furui S (1981) Cepstral analysis technique for automatic speaker verification. IEEE Trans Acoust Speech Signal Process 29(2):254–272

    Article  Google Scholar 

  37. Gajic B, Paliwal KK (2001) Robust feature extraction using subband spectral centroid histograms. In: International conference on acoustics, speech and signal processing, vol 1, pp 85–88

  38. Gao J, Sun Y, Suo H, Zhao Q, Yan Y (2009) Waps: an audio program surveillance system for large scale web data stream. In: International conference on web information systems and mining (WISM), pp 116–128

  39. Gao Y, Fan J (2006) Incorporate concept ontology to enable probabilistic concept reasoning for multi-level image annotation. In: ACM MIR

  40. Garner P, Fukadam T, Komori Y (2004) A differential spectral voice activity detector. In: International conference on acoustics, speech and signal processing, vol 1, pp 597–600

  41. Ghitza O (1987) Auditory nerve representation as a front-end in a noisy environment. Comput Speech Lang 2(1):109–130

    Google Scholar 

  42. Goh KS, Miyahara K, Radhakrishan R, Xiong Z, Divakaran A (2004) Audio-visual event detection based on mining of semantic audio-visual labels. In: SPIE conference on storage and retrieval of multimedia databases, vol 5307, pp 292–299

  43. Gold B, Morgan N (2000) Speech and audio signal processing: processing and perception of speech and music. Wiley, New York

    Google Scholar 

  44. Gool LV, Breitenstein MD, Gammeter S, Grabner H, Quack T (2009) Mining from large image sets. In: ACM international conference on image and video retrieval(CIVR), pp 1–8

  45. Gorkani MM, Con R, Picard W (1994) Texture orientation for sorting photos at a glance. In: IEEE conference on pattern recognition

  46. Guo GD, Li SZ (2003) Content-based audio classification and retrieval by support vector machines. IEEE Trans Neural Netw 14(1):209–215

    Article  Google Scholar 

  47. Guo Z, Zhang Z, Xing EP, Faloutsos C (2007) Enhanced max margin learning on multimodal data mining in a multimedia database. In: ACM international conference knowledge discovery and data mining

  48. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Ian H (2009) The Weka data mining software: an update. In: SIGKDD explorations, vol 11

  49. Han J, Kamber M (2006) Data mining concepts and techniques. Morgan Kaufmann, San Mateo

    Google Scholar 

  50. Han J, Pei J (2000) Mining frequent patterns by pattern-growth: methodology and implications. ACM SIGKDD Explorations Newsletter 2(2):14–20

    Article  Google Scholar 

  51. Harb H, Chen L, Auloge JY (2001) Speech/music/silence and gender detection algorithm. In: International conference on distributed multimedia systems, pp 257–262

  52. He R, Xiong N, Yang L, Park J (2010) Using multi-modal semantic association rules to fuse keywords and visual features automatically for web image retrieval. In: International conference on information fusion

  53. He R, Zhan W (2009) Multi-modal mining in web image retrieval. In: Asia-Pacific conference on computational intelligence and industrial applications

  54. Hermansky H (1987) An efficient speaker independent automatic speech recognition by simulation of some properties of human auditory perception. In: International conference on acoustics, speech and signal processing, pp 1156–1162

  55. Hermansky H (1987) An efficient speaker independent automatic speech recognition by simulation of some properties of human auditory perception. In: IEEE international conference on acoustics, speech and signal processing, pp 1156–1162

  56. Hermansky H (1990) Perceptual linear predictive (plp) analysis of speech. J Acoust Soc Am 87(4):1738–1752

    Article  Google Scholar 

  57. Hermansky H, Morgan N (1994) Rasta processing of speech. IEEE Trans Acoust Speech Signal Process 2(4):578–589

    Google Scholar 

  58. Hermansky H, Morgan N, Bayya A, Kohn, P (1991) Compensation for the effect of the communication channel in auditory-like analysis of speech. In: European conference on speech communication and technology pp, 578–589

  59. Hermansky H, Sharma S (1998) Traps-classifiers of temporal patterns. In: International conference on speech and language processing

  60. Hipp J, Güntzer U, Nakhaeizadeh G (2000) Algorithms for association rule mining a general survey and comparison. SIGKDD Explorations 2(2):1–58

    Article  Google Scholar 

  61. Huang J, Kumar S, Zabih R (1998) An automatic hierarchical image classification scheme. In: ACM multimedia

  62. Hwan OJ, Lee JK, Kote S (2003) Real time video data mining for surveillance video streams. In: Pacific-Asia conference on knowledge discovery and data mining

  63. Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv (CSUR) 31(3):264–323

    Article  Google Scholar 

  64. Jiang C, Coenena F, Sandersona R, Zito M (2010) Text classification using graph mining-based feature extraction. Knowl-based Syst 23(4):302–308

    Article  Google Scholar 

  65. Jiang T (2009) Learning image text associations. IEEE Trans Knowl Data Eng 21(2):161–177

    Article  Google Scholar 

  66. Juang BH, Rabiner L (1993) Fundamentals of speech recognition. Prentice-Hall, Englewood Cliffs

    Google Scholar 

  67. Kemp T, Schmidt M, Westphal M, Waibel A (2000) Strategies for automatic segmentation of audio data. In: International conference on acoustics, speech and signal processing

  68. Kotsiantis S, Kanellopoulos D (2006) Association rules mining: a recent overview. Int Trans Comput Sci Eng 32(1):71–82

    Google Scholar 

  69. Kruskal JB (1983) An overview of sequence comparison: timewarps, string edits and macromolecules. SIAM Rev 25:201–237

    Article  MATH  MathSciNet  Google Scholar 

  70. Kubin G, Kleijn WB (1994) Time-scale modification of speech based on a nonlinear oscillator model. In: IEEE international conference on acoustics, speech and signal processing

  71. Kurabayashi S, Kiyoki Y (2010) Mediamatrix: A video stream retrieval system with mechanisms for mining contexts of query examples. In: Database systems for advanced applications (DASFAA)

  72. Leavitt N (2002) Let’s hear it for audio mining. Computer 35:23–25

    Article  Google Scholar 

  73. Li D, Dimitrova N, Li M, Sethi KI (2003) Multimedia content processing through cross-modal association. In: ACM multimedia, pp 604–611

  74. Li N, Wu DD (2010) Using text mining and sentiment analysis for online forums hotspot detection and forecast. Decis Support Syst 48:354–368

    Article  Google Scholar 

  75. Li SZ (2000) Content-based audio classification and retrieval using the nearest feature line method. In: International conference on acoustics, speech and signal processing, vol 8(5), pp 619–625

  76. Li Y, Shapiro LG, Bilmes JA (2005) A generative/discriminative learning algorithm for image classification. In: IEEE international conference of computer vision

  77. Lilt D, Kubala F (2004) Online speaker clustering. In: IEEE international conference on acoustics, speech, and signal processing (ICASSP)

  78. Lin L, Ravitz G, Shyu ML, Chen SC (2007) Video semantic concept discovery using multimodal-based association classification. In: IEEE international conference on multimedia and expo, pp 859–862

  79. Lin L, Shyu ML (2009) Mining high-level features from video using associations and correlations. In: International conference on semantic computing, pp 137–144

  80. Lin L, Shyu ML, Ravitz G, Chen SC (2009) Video semantic concept detection via associative classification. In: IEEE international conference on multimedia and expo, pp 418–421

  81. Lin W, Jin R, Hauptmann AG (2002) Triggering memories of conversations using multimodal classifiers. In: Workshop on intelligent situation aware media and presentation

  82. Lin WH, Hauptmann A (2003) Meta-classification: combining multimodal classifiers. Lect Notes Comput Sci 2797:217–231

    Article  Google Scholar 

  83. Lin WH, Jin R, Hauptmann AG (2002) News video classification using svm-based multimodal classifiers and combination strategies. In: ACM multimedia

  84. Liu J, Jiang L, Wu Z, Zheng Q, Qian Y (2010) Mining preorder relation between knowledge elements from text. In: ACM symposium on applied computing

  85. Liu Q, Sung A, Qiao M (2009) Spectrum steganalysis of wav audio streams. In: International conference on machine learning and data mining in pattern recognition (MLDM), pp 582–593

  86. Mannila H, Toivonen H, Verkamo AI (1997) Discovery of frequent episodes in event sequences. Data Mining Knowledge Discovery 1:259–289

    Article  Google Scholar 

  87. Maragos P (1991) Fractal aspects of speech signals: dimension and interpolation. In: IEEE international conference on acoustics, speech and signal processing

  88. Maragos P, Potamianos A (1999) Fractal dimensios of speech sounds: computation and application to automatic speech recognition. J Acoust Soc Am 105(3):1925–1932

    Article  Google Scholar 

  89. Mase K, Sawamoto Y, Koyama Y, Suzuki T, Katsuyama K (2009) Interaction pattern and motif mining method for doctor-patient multi-modal dialog analysis. In: Multimodal sensor-based systems and mobile phones for social computing, pp 1–4

  90. Matsuo Y, Shirahama K, Uehara K (2003) Video data mining: extracting cinematic rules from movies. In: International workshop on multimedia data mining, pp 18–27

  91. Megalooikonomou V, Davataikos C, Herskovits EH (1999) Mining lesion-deficit associations in a brain image database. In: ACM SIGKDD

  92. Meinedo H, Neto J (2005) A stream-based audio segmentation, classification and clustering pre-processing system for broadcast news using ann models. In: Interspeech—Eurospeech

  93. Mesgarani N, Shamma S, Slaney M (2004) Speech discrimination based on multiscale spectrotemporal modulations. In: International conference on acoustics, speech and signal processing, vol 1, pp 601–604

  94. Messina A, Montagnuolo M (2009) A generalised cross-modal clustering method applied to multimedia news semantic indexing and retrieval. In: International conference on world wide web (WWW), pp 321–330

  95. Montagnuolo M, Messina A, Ferri M (2010) Hmnews: a multimodal news data association framework. In: Symposium on applied computing (SAC), pp 1823–1824

  96. Moreno PJ, Rifkin R (2000) Using the fisher kernel method for web audio classification. In: IEEE international conference on acoustics, speech and signal processing

  97. Nørvåg K, Øivind Eriksen T, Skogstad KI (2006) Mining association rules in temporal document collections. In: International symposium on methodologies for intelligent systems (ISMIS), pp 745–754

  98. Nørvåg K, Fivelstad OK (2009) Semantic-based temporal text-rule mining. In: International conference on computational linguistics and intelligent text processing, pp 442–455

  99. Oates T, Cohen P (1996) Searching for structure in multiplestreams of data. In: International conference of machine learning, pp 346–354

  100. Oh J, Bandi B (2002) Multimedia data mining framework for raw video sequences. In: International workshop on multimedia data mining (MDM/KDD), pp 1–10

  101. Ordonez C, Omiecinski E (1999) Discovering association rules based on image content. In: IEEE advances in digital libraries conference

  102. Pan J, Faloutsos C (2002) Videocube: a novel tool for video mining and classification. In: International conference on Asian digital libraries (ICADL), pp 194–205

  103. Pan JY, Yang HJ, Faloutsos C, Duygulu P (2004) Automatic multimedia cross-modal correlation discovery. In: ACM SIGKDD conference on knowledge discovery and data mining

  104. Patel N, Sethi I (2007) Multimedia data mining: an overview. In: Multimedia data mining and knowledge discovery. Springer

  105. Pentland A, Picard RW, Sclaroff S (1996) Photobook: content-based manipulation of image databases. Int J Comput Vis 18:233–254

    Article  Google Scholar 

  106. Pfeiffer S, Fischer S, Effelsberg W (1996) Automatic audio content analysis. In: ACM multimedia, pp 21–30

  107. Pinquier J, Rouas JL, Andre-Obrecht R (2002) Robust speech/music classification in audio documents. In: International conference on speech and language processing, vol 3, pp 2005–2008

  108. Porter M (1980) An algorithm for suffix stripping. Program 14(3):130–137

    Google Scholar 

  109. Quatieri TF, Hofstetter EM (1990) Short-time signal representation by nonlinear difference equations. In: International conference on acoustics, speech and signal processing

  110. Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, San Mateo

    Google Scholar 

  111. Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286

    Article  Google Scholar 

  112. Rajendran P, Madheswaran M (2009) An improved image mining technique for brain tumour classification using efficient classifier. International Journal of Computer Science and Information Security (IJCSIS) 6(3):107–116

    Google Scholar 

  113. Ramachandran C, Malik R, Jin X, Gao J, Nahrstedt K, Han J (2009) Videomule: a consensus learning approach to multi-label classification from noisy user-generated videos. In: ACM international conference on multimedia, pp 721–724

  114. Ribeiro MX, Balan AGR, Felipe JC, Traina AJM, Traina C (2009) Mining statistical association rules to select the most relevant medical image features. In: Mining complex data. Springer, pp 113–131

  115. Rijsbergen CJV (1986) A non-classical logic for information retrieval. Comput J 29(6):481–485

    Article  MATH  Google Scholar 

  116. Robertson SE (1977) The probability ranking principle. J Doc 33:294–304

    Article  Google Scholar 

  117. Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620

    Article  MATH  Google Scholar 

  118. Saraceno C, Leonardi R (1997) Audio as a support to scene change detection and characterization of video sequences. In: IEEE international conference on acoustics, speech, and signal processing (ICASSP), vol 4, pp 2597–2600

  119. Saunders J (1996) Real-time discrimination of broadcast speech/music. ICASSP 2:993–996

    Google Scholar 

  120. Sclaroff S, Kollios G, Betke M, Rosales R (2001) Motion mining. In: International workshop on multimedia databases and image communication

  121. Seneff S (1984) Pitch and spectral estimation of speech based on an auditory synchrony model. In: IEEE international conference on acoustics, speech and signal processing, pp 3621–3624

  122. Seneff S (1988) A joint synchrony/mean-rate model of auditory speech processing. J Phon 16(1):57–76

    Google Scholar 

  123. Shao X, Xu C, Kankanhalli MS (2003) Applying neural network on content based audio classification. In: IEEE Pacific-Rim conference on multimedia

  124. Sheikholeslami G, Chatterjee S, Zhang A (1998) Wavecluster: a multi-resolution clustering approach for very large spatial databases. In: International conference on very large data bases (VLDB), pp 428–439

  125. Shirahama K, Ideno K, Uehara K (2005) Video data mining: mining semantic patterns with temporal constraints from movies. In: IEEE international symposium on multimedia

  126. Shirahama K, Ideno K, Uehara K (2008) A time constrained sequential pattern mining for extracting semantic events in videoss. In: Multimedia data mining. Springer Link

  127. Shirahama K, Iwamoto K, Uehara K (2004) Video data mining: rhythms in a movie. In: International conference on multimedia and expo

  128. Shirahama K, Sugihara C, Matsumura K, Matsuoka Y, Uehara K (2009) Mining event definitions from queries for video retrieval on the internet. In: International conference on data mining workshops, pp 176–183

  129. Shyu ML, Xie Z, Chen M, Chen SC (2008) Video semantic event concept detection using a subspace based multimedia data mining framework. IEEE Trans Multimedia 10(2):252–259

    Article  Google Scholar 

  130. Smith JR, Chang SF (1996) Local color and texture extraction and spatial query. IEEE Int Conf Image Proc 3:1011–1014

    Google Scholar 

  131. Sohn J, Kim NS, Sun W (1999) A statistical model-based voice activity detection. IEEE Signal Process Lett 6(1):1–3

    Article  Google Scholar 

  132. Steinbach M, Karypis G, Kumar V (2000) A comparison of document clustering techniques. In: ACM SIGKDD world text mining conference

  133. Stembridge B, Corish B (2004) Patent data mining and effective portfolio management. Intellect Asset Manage

  134. Stricker M, Orengo M (1995) Similarity of color images. Storage retr image video databases (SPIE) 2420:381–392

    Google Scholar 

  135. Swain MJ, Ballard DH Color indexing. Int J Comput Vis 7(7):11–32

  136. Tada T, Nagashima T, Okada Y (2009) Rule-based classification for audio data based on closed itemset mining. In: International multiconference of engineers and computer scientists (IMECS)

  137. Tong S, Chang E (2001) Support vector machine active learning for image retrieval. In: ACM multimedia

  138. Townshend B (1990) Nonlinear prediction of speech signals. In: IEEE international conference on acoustics, speech and signal processing

  139. Trippe A (2003) Patinformatics: tasks to tools. World Pat Inf 25:211–221

    Article  Google Scholar 

  140. Vailaya A, Figueiredo M, Jain AK, Zhang HJ (1998) A bayesian framework for semantic classification of outdoor vacation images. In: SPIE, vol 3656

  141. Vapnik V (1995) The nature of statistical learning theory. Springer, Berlin

    MATH  Google Scholar 

  142. Victor SP, Peter SJ (2010) A novel minimum spanning tree based clustering algorithm for image mining. European Journal of Scientific Research (EJSR) 40(4):540–546

    Google Scholar 

  143. Wang JZ, Li J, Wiederhold G, Firschein O (2001) Classifying objectionable websites based on image content. In: Lecture notes in computer science, pp 232–242

  144. Wei S, Zhao Y, Zhu Z, Liu N (2009) Multimodal fusion for video search reranking. IEEE Trans Knowl Data Eng 99(1):1191–1199

    Google Scholar 

  145. Williams G, Ellis D (1999) Speech/music discrimination based on posterior probability features. In: Eurospeech

  146. Wu Y, Chang EY, Tseng BL (2005) Multimodal metadata fusion using causal strength. In: ACM multimedia, pp 872–881

  147. Wynne H, Lee ML, Zhang J (2002) Image mining: trends and developments. J Intell Inf Syst 19(1):7–23

    Article  Google Scholar 

  148. Xie L, Kennedy L, Chang SF, Lin CY, Divakaran A, Sun H (2004) Discover meaningful multimedia patterns with audio-visual concepts and associated text. In: IEEE international conference on image processing

  149. Yamato J, Ohya J, Ishii K (1992) Recognizing human action in time-sequential images using hiddenmarkov model. In: IEEE Computing Society conference on computer vision and pattern recognition, pp 379–385

  150. Yan R, Yang J, Hauptmann AG (2004) Learning query class dependent weights in automatic video retrieval. In: ACM multimedia, pp 548–555

  151. Yang Y, Akers L, Klose T, Yang CB (2008) Text mining and visualization tools—impressions of emerging capabilities. World Pat Inf 30:280–293

    Article  Google Scholar 

  152. Yeung M, Yeo BL, Liu B (2001) Extracting story units from long programs for video browsing and navigation. In: Readings in multimedia computing and networking. Morgan Kaufmann, San Mateo

    Google Scholar 

  153. Yeung MM, Yeo BL (1996) Time-constrained clustering for segmentation of video into story unites. Int Conf Pattern Recognit 3:375–380

    Article  Google Scholar 

  154. Zaiane O, Han J, Li Z, Chee S, Chiang J (1998) Multimediaminer: a system prototype for multimedia data mining. In: ACM SIGMOD, pp 581–583

  155. Zhang C, Chen WB, Chen X, Tiwari R, Yang L, Warner G (2009) A multimodal data mining framework for revealing common sources of spam images. J Multimedia 4(5):321–330

    Google Scholar 

  156. Zhang HJ, Zhong D (1995) A scheme for visual feature based image indexing. In: SPIE conference on storage and retrieval for image and video databases

  157. Zhang R, Zhang Z, Li M, Ma WY, Zhang HJ (2005) A probabilistic semantic model for image annotation and multi-modal image retrieval. In: IEEE international conference of computer vision

  158. Zhang T, Kuo CCJ (2001) Audio content analysis for online audiovisual data segmentation and classification. IEEE Trans Speech Audio Process 9(4):441–457

    Article  Google Scholar 

  159. Zhang T, Ramakrishnan R, Livny M (1996) Birch: an efficient data clustering method for very large databases. In: SIGMOD conference, pp 103–114

  160. Zhu R, Yao M, Liu Y (2009) Image classification approach based on manifold learning in web image mining. In: International conference on advanced data mining and applications (ADMA), pp 780–787

  161. Zhu X, Wu X, Elmagarmid AK, Wu L (2005) Video data mining: semantic indexing and event detection from the association perspective. IEEE Trans Knowl Data Eng 17(5):665–677

    Article  Google Scholar 

  162. Ziang J, Ward W, Pellom B (2002) Phone based voice activity detection using online bayesian adaptation with conjugate normal distributions. In: International conference on acoustics, speech and signal processing

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chidansh Amitkumar Bhatt.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bhatt, C.A., Kankanhalli, M.S. Multimedia data mining: state of the art and challenges. Multimed Tools Appl 51, 35–76 (2011). https://doi.org/10.1007/s11042-010-0645-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-010-0645-5

Keywords

Navigation