Multimedia data mining: state of the art and challenges

Bhatt, Chidansh Amitkumar; Kankanhalli, Mohan S.

doi:10.1007/s11042-010-0645-5

Multimedia data mining: state of the art and challenges

Published: 16 November 2010

Volume 51, pages 35–76, (2011)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Chidansh Amitkumar Bhatt¹ &
Mohan S. Kankanhalli¹

2884 Accesses
64 Citations
Explore all metrics

Abstract

Advances in multimedia data acquisition and storage technology have led to the growth of very large multimedia databases. Analyzing this huge amount of multimedia data to discover useful knowledge is a challenging problem. This challenge has opened the opportunity for research in Multimedia Data Mining (MDM). Multimedia data mining can be defined as the process of finding interesting patterns from media data such as audio, video, image and text that are not ordinarily accessible by basic queries and associated results. The motivation for doing MDM is to use the discovered patterns to improve decision making. MDM has therefore attracted significant research efforts in developing methods and tools to organize, manage, search and perform domain specific tasks for data from domains such as surveillance, meetings, broadcast news, sports, archives, movies, medical data, as well as personal and online media collections. This paper presents a survey on the problems and solutions in Multimedia Data Mining, approached from the following angles: feature extraction, transformation and representation techniques, data mining techniques, and current multimedia data mining systems in various application domains. We discuss main aspects of feature extraction, transformation and representation techniques. These aspects are: level of feature extraction, feature fusion, features synchronization, feature correlation discovery and accurate representation of multimedia data. Comparison of MDM techniques with state of the art video processing, audio processing and image processing techniques is also provided. Similarly, we compare MDM techniques with the state of the art data mining techniques involving clustering, classification, sequence pattern mining, association rule mining and visualization. We review current multimedia data mining systems in detail, grouping them according to problem formulations and approaches. The review includes supervised and unsupervised discovery of events and actions from one or more continuous sequences. We also do a detailed analysis to understand what has been achieved and what are the remaining gaps where future research efforts could be focussed. We then conclude this survey with a look at open research directions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning in high-dimensional multimedia data: the state of the art

Article 20 October 2015

Lianli Gao, Jingkuan Song, … Jie Shao

A modified K-means clustering for mining of multimedia databases based on dimensionality reduction and similarity measures

Article 02 June 2017

Xiaoping Jiang, Chenghua Li & Jing Sun

A survey on data stream clustering and classification

Article 17 December 2014

Hai-Long Nguyen, Yew-Kwong Woon & Wee-Keong Ng

References

Agrawal R, Imielinski T, Swami AN (1993) Mining association rules between sets of items in large databases. In: ACM SIGMOD international conference on management of data, pp 207–216
Agrawal R, Srikant R (1995) Mining sequential patterns. In: International conference on data engineering
Ajmera J, McCowan I, Bourlard H (2002) Robust hmm-based speech/music segmentation. In: IEEE international conference on acoustics, speech and signal processing, pp 1746–1749
Aradhye H, Toderici G, Yagnik J (2009) Video2text: learning to annotate video content. In: International conference on data mining workshops, pp 144–151
Artigan JA (1975) Clustering algorithms. Wiley, New York
Google Scholar
Baillie M, Jose JM (2004) An audio-based sports video segmentation and event detection algorithm. In: Workshop on event mining, detection and recognition of events in video
Barnard K, Duygulu P, Forsyth DA, de Freitas N, Blei DM, Jordan MI (2003) Matching words and pictures. J Mach Learn Res 3:1107–1135
Article MATH Google Scholar
Benitez AB, Smith JR, Chang SF (2000) A multimedia information network for knowledge representation. SPIE, Bellingham
Google Scholar
Box G, Jenkins GM, Reinsel G (1994) Time series analysis: forecasting and control. Pearson Education, Paris
MATH Google Scholar
Briggs F, Raich R, Fern X (2009) Audio classification of bird species: a statistical manifold approach. In: IEEE international conference on data mining (ICDM), pp 51–60
Chang E, Goh K, Sychay G, Wu G (2002) Content-based annotation for multimodal image retrieval using bayes point machines. IEEE Trans Circuits Syst Video Technol 13(1):26–38
Article Google Scholar
Chang E, Li C, Wang J (1999) Searching near replicas of image via clustering. In: SPIE multimedia storage and archiving systems, vol 6
Chen M, Chen SC, Shyu ML (2007) Hierarchical temporal association mining for video event detection in video databases. In: Multimedia databases and data management
Chen M, Chen SC, Shyu ML, Wickramaratna K (2006) Semantic event detection via multimodal data mining. IEEE Signal Process Mag 23:38–46
Article Google Scholar
Chen SC, Shyu ML, Zhang C, Strickrott J (2001) Multimedia data mininig for traffic video sequenices. In: ACM SIGKDD
Chen SC, Shyu ML, Chen M, Zhang C (2004) A decision tree-based multimodal data mining framework for soccer goal detection. In: IEEE international conference multimedia and expo, pp 265–268
Dai K, Zhang J, Li G (2006) Video mining: concepts, approaches and applications. In: Multi-media modelling
Darrell T, Pentland A (1993) Space-time gestures. In: IEEE Computing Society conference on computer vision and pattern recognition, pp 335–340
Davis SB, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Signal Process 28(4):357–366
Article Google Scholar
Dhillon I (2001) Co-clustering documents and words using bipartite spectral graph partitioning. In: ACM SIGKDD
Dimitriadis D, Maragos P (2003) Robust energy demodulation based on continuous models with application to speech recognition. In: European conference on speech communication and technology
Duda R, Hart P, Stork D (2001) Pattern classification. Wiley, New York
MATH Google Scholar
El-Maleh K, Klein M, Petrucci G, Kabal P (2000) Speech/music discrimination for multimedia application. In: International conference on acoustics, speech and signal processing, pp 2445–2448
Ellom BL, Hansen JHL (1998) Automatic segmentation of speech recorded in uknown noisy channel characteristics. Speech Commun 25:97–116
Article Google Scholar
Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: International conference on knowledge discovery and data mining, pp 226–231
Fu CS, Chen W, Jianhao MH, Sundaram H, Zhong D (1998) A fully automated content based video search engine supporting spatio-temporal queries. IEEE Trans Circuits Syst Video Technol 8(5):602–615
Article Google Scholar
Faloutsos C, Equitz W, Flickner M, Niblack W, Petkovic D, Barber R (1994) Efficient and effective querying by image content. Journal of Intelligent Information Systems 3:231–262
Article Google Scholar
Fan J, Gao Y, Luo H (2007) Hierarchical classification for automatic image annotation. In: ACM SIGIR, pp 111–118
Fan J, Gao Y, Luo H, Jain R (2008) Mining multilevel image semantics via hierarchical classification. IEEE Trans Multimedia 10(2):167–187
Article Google Scholar
Fan J, Gao Y, Luo H, Xu G (2005) Statistical modeling and conceptualization of natural scenes. Pattern Recogn 38(6):865–885
Article Google Scholar
Fersini E, Messina E, Arosio G, Archetti F (2009) Audio-based emotion recognition in judicial domain: a multilayer support vector machines approach. In: Machine learning and data mining in pattern recognition (MLDM), pp 594–602
Foote JT (1997) Content-based retrieval of music and audio. SPIE 3229:138–147
Article Google Scholar
Forsati R, Mahdavi M (2010) Web text mining using harmony search. In: Recent advances in harmony search algorithm, pp 51–64
Frakes WB, Baeza-Yates R (1992) Information retrieval: data structures and algorithms. Prentice-Hall, Englewood Cliffs
Google Scholar
Frigui H, Caudill J (2007) Mining visual and textual data for constructing a multi-modal thesaurus. In: SIAM international conference on data mining
Furui S (1981) Cepstral analysis technique for automatic speaker verification. IEEE Trans Acoust Speech Signal Process 29(2):254–272
Article Google Scholar
Gajic B, Paliwal KK (2001) Robust feature extraction using subband spectral centroid histograms. In: International conference on acoustics, speech and signal processing, vol 1, pp 85–88
Gao J, Sun Y, Suo H, Zhao Q, Yan Y (2009) Waps: an audio program surveillance system for large scale web data stream. In: International conference on web information systems and mining (WISM), pp 116–128
Gao Y, Fan J (2006) Incorporate concept ontology to enable probabilistic concept reasoning for multi-level image annotation. In: ACM MIR
Garner P, Fukadam T, Komori Y (2004) A differential spectral voice activity detector. In: International conference on acoustics, speech and signal processing, vol 1, pp 597–600
Ghitza O (1987) Auditory nerve representation as a front-end in a noisy environment. Comput Speech Lang 2(1):109–130
Google Scholar
Goh KS, Miyahara K, Radhakrishan R, Xiong Z, Divakaran A (2004) Audio-visual event detection based on mining of semantic audio-visual labels. In: SPIE conference on storage and retrieval of multimedia databases, vol 5307, pp 292–299
Gold B, Morgan N (2000) Speech and audio signal processing: processing and perception of speech and music. Wiley, New York
Google Scholar
Gool LV, Breitenstein MD, Gammeter S, Grabner H, Quack T (2009) Mining from large image sets. In: ACM international conference on image and video retrieval(CIVR), pp 1–8
Gorkani MM, Con R, Picard W (1994) Texture orientation for sorting photos at a glance. In: IEEE conference on pattern recognition
Guo GD, Li SZ (2003) Content-based audio classification and retrieval by support vector machines. IEEE Trans Neural Netw 14(1):209–215
Article Google Scholar
Guo Z, Zhang Z, Xing EP, Faloutsos C (2007) Enhanced max margin learning on multimodal data mining in a multimedia database. In: ACM international conference knowledge discovery and data mining
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Ian H (2009) The Weka data mining software: an update. In: SIGKDD explorations, vol 11
Han J, Kamber M (2006) Data mining concepts and techniques. Morgan Kaufmann, San Mateo
Google Scholar
Han J, Pei J (2000) Mining frequent patterns by pattern-growth: methodology and implications. ACM SIGKDD Explorations Newsletter 2(2):14–20
Article Google Scholar
Harb H, Chen L, Auloge JY (2001) Speech/music/silence and gender detection algorithm. In: International conference on distributed multimedia systems, pp 257–262
He R, Xiong N, Yang L, Park J (2010) Using multi-modal semantic association rules to fuse keywords and visual features automatically for web image retrieval. In: International conference on information fusion
He R, Zhan W (2009) Multi-modal mining in web image retrieval. In: Asia-Pacific conference on computational intelligence and industrial applications
Hermansky H (1987) An efficient speaker independent automatic speech recognition by simulation of some properties of human auditory perception. In: International conference on acoustics, speech and signal processing, pp 1156–1162
Hermansky H (1987) An efficient speaker independent automatic speech recognition by simulation of some properties of human auditory perception. In: IEEE international conference on acoustics, speech and signal processing, pp 1156–1162
Hermansky H (1990) Perceptual linear predictive (plp) analysis of speech. J Acoust Soc Am 87(4):1738–1752
Article Google Scholar
Hermansky H, Morgan N (1994) Rasta processing of speech. IEEE Trans Acoust Speech Signal Process 2(4):578–589
Google Scholar
Hermansky H, Morgan N, Bayya A, Kohn, P (1991) Compensation for the effect of the communication channel in auditory-like analysis of speech. In: European conference on speech communication and technology pp, 578–589
Hermansky H, Sharma S (1998) Traps-classifiers of temporal patterns. In: International conference on speech and language processing
Hipp J, Güntzer U, Nakhaeizadeh G (2000) Algorithms for association rule mining a general survey and comparison. SIGKDD Explorations 2(2):1–58
Article Google Scholar
Huang J, Kumar S, Zabih R (1998) An automatic hierarchical image classification scheme. In: ACM multimedia
Hwan OJ, Lee JK, Kote S (2003) Real time video data mining for surveillance video streams. In: Pacific-Asia conference on knowledge discovery and data mining
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv (CSUR) 31(3):264–323
Article Google Scholar
Jiang C, Coenena F, Sandersona R, Zito M (2010) Text classification using graph mining-based feature extraction. Knowl-based Syst 23(4):302–308
Article Google Scholar
Jiang T (2009) Learning image text associations. IEEE Trans Knowl Data Eng 21(2):161–177
Article Google Scholar
Juang BH, Rabiner L (1993) Fundamentals of speech recognition. Prentice-Hall, Englewood Cliffs
Google Scholar
Kemp T, Schmidt M, Westphal M, Waibel A (2000) Strategies for automatic segmentation of audio data. In: International conference on acoustics, speech and signal processing
Kotsiantis S, Kanellopoulos D (2006) Association rules mining: a recent overview. Int Trans Comput Sci Eng 32(1):71–82
Google Scholar
Kruskal JB (1983) An overview of sequence comparison: timewarps, string edits and macromolecules. SIAM Rev 25:201–237
Article MATH MathSciNet Google Scholar
Kubin G, Kleijn WB (1994) Time-scale modification of speech based on a nonlinear oscillator model. In: IEEE international conference on acoustics, speech and signal processing
Kurabayashi S, Kiyoki Y (2010) Mediamatrix: A video stream retrieval system with mechanisms for mining contexts of query examples. In: Database systems for advanced applications (DASFAA)
Leavitt N (2002) Let’s hear it for audio mining. Computer 35:23–25
Article Google Scholar
Li D, Dimitrova N, Li M, Sethi KI (2003) Multimedia content processing through cross-modal association. In: ACM multimedia, pp 604–611
Li N, Wu DD (2010) Using text mining and sentiment analysis for online forums hotspot detection and forecast. Decis Support Syst 48:354–368
Article Google Scholar
Li SZ (2000) Content-based audio classification and retrieval using the nearest feature line method. In: International conference on acoustics, speech and signal processing, vol 8(5), pp 619–625
Li Y, Shapiro LG, Bilmes JA (2005) A generative/discriminative learning algorithm for image classification. In: IEEE international conference of computer vision
Lilt D, Kubala F (2004) Online speaker clustering. In: IEEE international conference on acoustics, speech, and signal processing (ICASSP)
Lin L, Ravitz G, Shyu ML, Chen SC (2007) Video semantic concept discovery using multimodal-based association classification. In: IEEE international conference on multimedia and expo, pp 859–862
Lin L, Shyu ML (2009) Mining high-level features from video using associations and correlations. In: International conference on semantic computing, pp 137–144
Lin L, Shyu ML, Ravitz G, Chen SC (2009) Video semantic concept detection via associative classification. In: IEEE international conference on multimedia and expo, pp 418–421
Lin W, Jin R, Hauptmann AG (2002) Triggering memories of conversations using multimodal classifiers. In: Workshop on intelligent situation aware media and presentation
Lin WH, Hauptmann A (2003) Meta-classification: combining multimodal classifiers. Lect Notes Comput Sci 2797:217–231
Article Google Scholar
Lin WH, Jin R, Hauptmann AG (2002) News video classification using svm-based multimodal classifiers and combination strategies. In: ACM multimedia
Liu J, Jiang L, Wu Z, Zheng Q, Qian Y (2010) Mining preorder relation between knowledge elements from text. In: ACM symposium on applied computing
Liu Q, Sung A, Qiao M (2009) Spectrum steganalysis of wav audio streams. In: International conference on machine learning and data mining in pattern recognition (MLDM), pp 582–593
Mannila H, Toivonen H, Verkamo AI (1997) Discovery of frequent episodes in event sequences. Data Mining Knowledge Discovery 1:259–289
Article Google Scholar
Maragos P (1991) Fractal aspects of speech signals: dimension and interpolation. In: IEEE international conference on acoustics, speech and signal processing
Maragos P, Potamianos A (1999) Fractal dimensios of speech sounds: computation and application to automatic speech recognition. J Acoust Soc Am 105(3):1925–1932
Article Google Scholar
Mase K, Sawamoto Y, Koyama Y, Suzuki T, Katsuyama K (2009) Interaction pattern and motif mining method for doctor-patient multi-modal dialog analysis. In: Multimodal sensor-based systems and mobile phones for social computing, pp 1–4
Matsuo Y, Shirahama K, Uehara K (2003) Video data mining: extracting cinematic rules from movies. In: International workshop on multimedia data mining, pp 18–27
Megalooikonomou V, Davataikos C, Herskovits EH (1999) Mining lesion-deficit associations in a brain image database. In: ACM SIGKDD
Meinedo H, Neto J (2005) A stream-based audio segmentation, classification and clustering pre-processing system for broadcast news using ann models. In: Interspeech—Eurospeech
Mesgarani N, Shamma S, Slaney M (2004) Speech discrimination based on multiscale spectrotemporal modulations. In: International conference on acoustics, speech and signal processing, vol 1, pp 601–604
Messina A, Montagnuolo M (2009) A generalised cross-modal clustering method applied to multimedia news semantic indexing and retrieval. In: International conference on world wide web (WWW), pp 321–330
Montagnuolo M, Messina A, Ferri M (2010) Hmnews: a multimodal news data association framework. In: Symposium on applied computing (SAC), pp 1823–1824
Moreno PJ, Rifkin R (2000) Using the fisher kernel method for web audio classification. In: IEEE international conference on acoustics, speech and signal processing
Nørvåg K, Øivind Eriksen T, Skogstad KI (2006) Mining association rules in temporal document collections. In: International symposium on methodologies for intelligent systems (ISMIS), pp 745–754
Nørvåg K, Fivelstad OK (2009) Semantic-based temporal text-rule mining. In: International conference on computational linguistics and intelligent text processing, pp 442–455
Oates T, Cohen P (1996) Searching for structure in multiplestreams of data. In: International conference of machine learning, pp 346–354
Oh J, Bandi B (2002) Multimedia data mining framework for raw video sequences. In: International workshop on multimedia data mining (MDM/KDD), pp 1–10
Ordonez C, Omiecinski E (1999) Discovering association rules based on image content. In: IEEE advances in digital libraries conference
Pan J, Faloutsos C (2002) Videocube: a novel tool for video mining and classification. In: International conference on Asian digital libraries (ICADL), pp 194–205
Pan JY, Yang HJ, Faloutsos C, Duygulu P (2004) Automatic multimedia cross-modal correlation discovery. In: ACM SIGKDD conference on knowledge discovery and data mining
Patel N, Sethi I (2007) Multimedia data mining: an overview. In: Multimedia data mining and knowledge discovery. Springer
Pentland A, Picard RW, Sclaroff S (1996) Photobook: content-based manipulation of image databases. Int J Comput Vis 18:233–254
Article Google Scholar
Pfeiffer S, Fischer S, Effelsberg W (1996) Automatic audio content analysis. In: ACM multimedia, pp 21–30
Pinquier J, Rouas JL, Andre-Obrecht R (2002) Robust speech/music classification in audio documents. In: International conference on speech and language processing, vol 3, pp 2005–2008
Porter M (1980) An algorithm for suffix stripping. Program 14(3):130–137
Google Scholar
Quatieri TF, Hofstetter EM (1990) Short-time signal representation by nonlinear difference equations. In: International conference on acoustics, speech and signal processing
Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, San Mateo
Google Scholar
Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286
Article Google Scholar
Rajendran P, Madheswaran M (2009) An improved image mining technique for brain tumour classification using efficient classifier. International Journal of Computer Science and Information Security (IJCSIS) 6(3):107–116
Google Scholar
Ramachandran C, Malik R, Jin X, Gao J, Nahrstedt K, Han J (2009) Videomule: a consensus learning approach to multi-label classification from noisy user-generated videos. In: ACM international conference on multimedia, pp 721–724
Ribeiro MX, Balan AGR, Felipe JC, Traina AJM, Traina C (2009) Mining statistical association rules to select the most relevant medical image features. In: Mining complex data. Springer, pp 113–131
Rijsbergen CJV (1986) A non-classical logic for information retrieval. Comput J 29(6):481–485
Article MATH Google Scholar
Robertson SE (1977) The probability ranking principle. J Doc 33:294–304
Article Google Scholar
Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620
Article MATH Google Scholar
Saraceno C, Leonardi R (1997) Audio as a support to scene change detection and characterization of video sequences. In: IEEE international conference on acoustics, speech, and signal processing (ICASSP), vol 4, pp 2597–2600
Saunders J (1996) Real-time discrimination of broadcast speech/music. ICASSP 2:993–996
Google Scholar
Sclaroff S, Kollios G, Betke M, Rosales R (2001) Motion mining. In: International workshop on multimedia databases and image communication
Seneff S (1984) Pitch and spectral estimation of speech based on an auditory synchrony model. In: IEEE international conference on acoustics, speech and signal processing, pp 3621–3624
Seneff S (1988) A joint synchrony/mean-rate model of auditory speech processing. J Phon 16(1):57–76
Google Scholar
Shao X, Xu C, Kankanhalli MS (2003) Applying neural network on content based audio classification. In: IEEE Pacific-Rim conference on multimedia
Sheikholeslami G, Chatterjee S, Zhang A (1998) Wavecluster: a multi-resolution clustering approach for very large spatial databases. In: International conference on very large data bases (VLDB), pp 428–439
Shirahama K, Ideno K, Uehara K (2005) Video data mining: mining semantic patterns with temporal constraints from movies. In: IEEE international symposium on multimedia
Shirahama K, Ideno K, Uehara K (2008) A time constrained sequential pattern mining for extracting semantic events in videoss. In: Multimedia data mining. Springer Link
Shirahama K, Iwamoto K, Uehara K (2004) Video data mining: rhythms in a movie. In: International conference on multimedia and expo
Shirahama K, Sugihara C, Matsumura K, Matsuoka Y, Uehara K (2009) Mining event definitions from queries for video retrieval on the internet. In: International conference on data mining workshops, pp 176–183
Shyu ML, Xie Z, Chen M, Chen SC (2008) Video semantic event concept detection using a subspace based multimedia data mining framework. IEEE Trans Multimedia 10(2):252–259
Article Google Scholar
Smith JR, Chang SF (1996) Local color and texture extraction and spatial query. IEEE Int Conf Image Proc 3:1011–1014
Google Scholar
Sohn J, Kim NS, Sun W (1999) A statistical model-based voice activity detection. IEEE Signal Process Lett 6(1):1–3
Article Google Scholar
Steinbach M, Karypis G, Kumar V (2000) A comparison of document clustering techniques. In: ACM SIGKDD world text mining conference
Stembridge B, Corish B (2004) Patent data mining and effective portfolio management. Intellect Asset Manage
Stricker M, Orengo M (1995) Similarity of color images. Storage retr image video databases (SPIE) 2420:381–392
Google Scholar
Swain MJ, Ballard DH Color indexing. Int J Comput Vis 7(7):11–32
Tada T, Nagashima T, Okada Y (2009) Rule-based classification for audio data based on closed itemset mining. In: International multiconference of engineers and computer scientists (IMECS)
Tong S, Chang E (2001) Support vector machine active learning for image retrieval. In: ACM multimedia
Townshend B (1990) Nonlinear prediction of speech signals. In: IEEE international conference on acoustics, speech and signal processing
Trippe A (2003) Patinformatics: tasks to tools. World Pat Inf 25:211–221
Article Google Scholar
Vailaya A, Figueiredo M, Jain AK, Zhang HJ (1998) A bayesian framework for semantic classification of outdoor vacation images. In: SPIE, vol 3656
Vapnik V (1995) The nature of statistical learning theory. Springer, Berlin
MATH Google Scholar
Victor SP, Peter SJ (2010) A novel minimum spanning tree based clustering algorithm for image mining. European Journal of Scientific Research (EJSR) 40(4):540–546
Google Scholar
Wang JZ, Li J, Wiederhold G, Firschein O (2001) Classifying objectionable websites based on image content. In: Lecture notes in computer science, pp 232–242
Wei S, Zhao Y, Zhu Z, Liu N (2009) Multimodal fusion for video search reranking. IEEE Trans Knowl Data Eng 99(1):1191–1199
Google Scholar
Williams G, Ellis D (1999) Speech/music discrimination based on posterior probability features. In: Eurospeech
Wu Y, Chang EY, Tseng BL (2005) Multimodal metadata fusion using causal strength. In: ACM multimedia, pp 872–881
Wynne H, Lee ML, Zhang J (2002) Image mining: trends and developments. J Intell Inf Syst 19(1):7–23
Article Google Scholar
Xie L, Kennedy L, Chang SF, Lin CY, Divakaran A, Sun H (2004) Discover meaningful multimedia patterns with audio-visual concepts and associated text. In: IEEE international conference on image processing
Yamato J, Ohya J, Ishii K (1992) Recognizing human action in time-sequential images using hiddenmarkov model. In: IEEE Computing Society conference on computer vision and pattern recognition, pp 379–385
Yan R, Yang J, Hauptmann AG (2004) Learning query class dependent weights in automatic video retrieval. In: ACM multimedia, pp 548–555
Yang Y, Akers L, Klose T, Yang CB (2008) Text mining and visualization tools—impressions of emerging capabilities. World Pat Inf 30:280–293
Article Google Scholar
Yeung M, Yeo BL, Liu B (2001) Extracting story units from long programs for video browsing and navigation. In: Readings in multimedia computing and networking. Morgan Kaufmann, San Mateo
Google Scholar
Yeung MM, Yeo BL (1996) Time-constrained clustering for segmentation of video into story unites. Int Conf Pattern Recognit 3:375–380
Article Google Scholar
Zaiane O, Han J, Li Z, Chee S, Chiang J (1998) Multimediaminer: a system prototype for multimedia data mining. In: ACM SIGMOD, pp 581–583
Zhang C, Chen WB, Chen X, Tiwari R, Yang L, Warner G (2009) A multimodal data mining framework for revealing common sources of spam images. J Multimedia 4(5):321–330
Google Scholar
Zhang HJ, Zhong D (1995) A scheme for visual feature based image indexing. In: SPIE conference on storage and retrieval for image and video databases
Zhang R, Zhang Z, Li M, Ma WY, Zhang HJ (2005) A probabilistic semantic model for image annotation and multi-modal image retrieval. In: IEEE international conference of computer vision
Zhang T, Kuo CCJ (2001) Audio content analysis for online audiovisual data segmentation and classification. IEEE Trans Speech Audio Process 9(4):441–457
Article Google Scholar
Zhang T, Ramakrishnan R, Livny M (1996) Birch: an efficient data clustering method for very large databases. In: SIGMOD conference, pp 103–114
Zhu R, Yao M, Liu Y (2009) Image classification approach based on manifold learning in web image mining. In: International conference on advanced data mining and applications (ADMA), pp 780–787
Zhu X, Wu X, Elmagarmid AK, Wu L (2005) Video data mining: semantic indexing and event detection from the association perspective. IEEE Trans Knowl Data Eng 17(5):665–677
Article Google Scholar
Ziang J, Ward W, Pellom B (2002) Phone based voice activity detection using online bayesian adaptation with conjugate normal distributions. In: International conference on acoustics, speech and signal processing

Download references

Author information

Authors and Affiliations

School of Computing, National University of Singapore, Singapore, 117417, Singapore
Chidansh Amitkumar Bhatt & Mohan S. Kankanhalli

Authors

Chidansh Amitkumar Bhatt
View author publications
You can also search for this author in PubMed Google Scholar
Mohan S. Kankanhalli
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chidansh Amitkumar Bhatt.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bhatt, C.A., Kankanhalli, M.S. Multimedia data mining: state of the art and challenges. Multimed Tools Appl 51, 35–76 (2011). https://doi.org/10.1007/s11042-010-0645-5

Download citation

Published: 16 November 2010
Issue Date: January 2011
DOI: https://doi.org/10.1007/s11042-010-0645-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multimedia data mining: state of the art and challenges

Abstract

Access this article

Similar content being viewed by others

Learning in high-dimensional multimedia data: the state of the art

A modified K-means clustering for mining of multimedia databases based on dimensionality reduction and similarity measures

A survey on data stream clustering and classification

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multimedia data mining: state of the art and challenges

Abstract

Access this article

Similar content being viewed by others

Learning in high-dimensional multimedia data: the state of the art

A modified K-means clustering for mining of multimedia databases based on dimensionality reduction and similarity measures

A survey on data stream clustering and classification

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation