Abstract
The existence of multiple modalities poses a challenge to the design of multimedia data clustering systems, as the unsupervised nature of the problem makes it very difficult to determine a priori whether a single modality should dominate the clustering process, or if modalities should be combined somehow. In order to fight against these indeterminacies—which come on top of those referring to the selection of the optimal clustering algorithm and data representation for the problem at hand–, this work introduces robust multimedia clustering, a one-shot methodology for domain independent multimedia data clustering based on hybrid multimodal fusion. By means of experimentation, we firstly justify the motivation of the proposed methodology by proving the relevance of multimedia clustering indeterminacies. Subsequently, a specific multimedia clustering system based on the requirements of the methodology is implemented and evaluated on three multimedia clustering applications—music genres, photographic topics and audio-visual objects classification—as a proof of concept, analyzing the quality of the obtained partitions and the time complexity of the proposal. The experimental results reveal that the implemented system, which includes a self-refining consensus clustering procedure for attaining high levels of robustness, allows to obtain, in a fully unsupervised manner, better quality partitions than 93 % of the clusterers available in our experiments, being even able to improve the quality of the best ones and outperforming state-of-the-art alternatives.
Similar content being viewed by others
Notes
The IsoLetters data collection is available on request to the first author via e-mail.
All four feature extraction techniques were employed for all modalities except for those datasets that do not satisfy the non-negativity constraint necessary for applying NMF (CAL500 and IsoLetters).
These values correspond to the following: \(\left|\mathrm{ORP}_1\right|=31=1 (\mathrm{original})+10 (\mathrm{PCA})+10 (\mathrm{ICA})+10 (\mathrm{RP})\), \(\left|\mathrm{ORP}_2\right|=16=1(\mathrm{original})+5(\mathrm{PCA})+5(\mathrm{ICA})+5(\mathrm{RP})\) and \(\left|\mathrm{ORP}_{1\div 2}\right|=55=1(\mathrm{original})+\) 18(PCA) + 18(ICA) + 18(RP).
CLUTO is a software package for clustering low- and high-dimensional data sets, and it is available at http://glaros.dtc.umn.edu/gkhome/cluto/cluto/download (checked in September 2012)
Despite we conducted experiments using a total of seven consensus functions based on different approaches (e.g. similarity-as-data [35]), we finally present the results of these three consensus functions as they constitute a representative and diverse sample, as well as for the sake of paper length. The interested reader should refer to [49] for more experimental results.
Despite we conducted experiments using wider sweeps of p i values, the ones employed in this work yield the most representative results. The interested reader should refer to [49] for these experiments.
In our case, we have employed the NMF Matlab implementation of NMFPack [25].
References
Atrey P, Hossein M, El Saddik A, Kankanhalli M (2010) Multimodal fusion for multimedia analysis: a survey. Multimedia Systems 16:345–379
Atrey P, Kankanhalli M, Jain R (2006) Information assimilation framework for event detection in multimedia surveillance systems. Multimedia Systems 12(3):239–253
Ayache S, Quénot G, Gensel J (2007) Classifier fusion for svm-based multimedia semantic indexing. In: Proc. ECIR, pp 494–504
Barnard K, Forsyth D (2001) Learning the semantics of words and pictures. In: Proc. IEEE-ICCV, vol II, pp 408–415
Bassiou N, Moschou V, Kotropoulos C (2010) Speaker diarization exploiting the eigengap criterion and cluster ensembles. IEEE Trans Audio Speech Lang Process 18(8):2134–2144
Bekkerman R, Jeon J (2007) Multi-modal clustering for multimedia collections. In: Proc. IEEE-CVPR, pp 1–8
Bendjebbour A, Delignon Y, Fouque L, Samson V, Pieczynski W (2001) Multisensor image segmentation using DempsterShafer fusion in Markov fields context. IEEE Trans Geosci Remote Sens 39(8), 1789–1798
Benitez A, Chang S (2002) Perceptual knowledge construction from annotated image collections. In: Columbia University ADVENT, pp 26–29
Cai D, He X, Li Z, Ma W, Wen J (2004) Hierarchical clustering of WWW image search results using visual, textual and link information. In: Proc. ACM Multimedia, pp 952–959
Chaudhuri K, Kakade S, Livescu K, Sridharan K (2009) Multiview clustering via canonical correlation analysis. In: Proc. ICML, pp 129–136
Cooper M (2011) Clustering geo-tagged photo collections using dynamic programming. In: Proc. ACM MM, pp 1025–1028
Duygulu P, Barnard K, de Freitas N, Forsyth D (2002) Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In: Proc. ECCV, vol 4. Springer Verlag, pp 97–112
Dy J, Brodley C (2004) Feature selection for unsupervised learning. J Mach Learn Res 5:845–889
Fern X, Brodley C (2004) Solving cluster ensemble problems by bipartite graph partitioning. In: Proc. ICML, pp 281–288
Fern X, Lin W (2008) Cluster ensemble selection. In: Proc. SDM
Foster I (1986) Designing and building parallel programs: concepts and tools for parallel software engineering. Addison-Wesley
Frank A, Asuncion, A (2010) UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences. [Online] Available: http://archive.ics.uci.edu/ml. Accessed Aug 2013
Fred A, Jain AK (2002) Data clustering using evidence accumulation. In: Proc. ICPR, pp 276–280
Fred A, Jain A (2005) Combining multiple clusterings using evidence accumulation. IEEE Trans Pattern Anal Mach Intell 27(6):835–850
Friedland G, Hung H, Yeo C (2009) Multi-modal speaker diarization of real-world meetings using compressed-domain video features. In: Proc. IEEE-ICASSP, pp 4069–4072
Gao B, Liu T, Qin T, Zheng X, Cheng Q, Ma W (2005) Web image clustering by consistent utilization of visual features and surrounding texts. In: Proc. ACM Multimedia, pp 112–121
Ghosh J, Acharya A (2011) Cluster ensembles. WIREs Data Mining Knowl Discov 1(4):305–315
Gionis A, Mannila H, Tsaparas P (2007) Clustering aggregation. ACM Trans Knowl Discov Data 1(1):1–30
Goder A, Filkov V (2008) Consensus clustering algorithms: comparison and refinement. In: Proc. ALENEX, pp 109–117
Hoyer PO (2004) Non-Negative Matrix Factorization with sparseness constraints. J Mach Learn Res 5:1457–1469
Hyvärinen A, Karhunen J, Oja E (2001) Independent component analysis. John Wiley and Sons
Iam-on N, Boongoen T, Garrett S (2010) LCE: a link-based cluster ensemble method for improved gene expression data analysis. Bioinformatics 26(12):1513–1519
Jain A, Murty M, Flynn P (1999) Data clustering: a survey. ACM Comput Surv 31(3):264–323
Jolliffe I (1986) Principal component analysis. Springer
Kaski S (1998) Dimensionality reduction by random mapping: fast similarity computation for clustering. In: Proc. IJCNN, pp 413–418
Khalidov V, Forbes F, Hansard M, Arnaud E, Horaud R (2008) Audio-visual clustering for multiple speaker localization. In: Proc. MLMI, pp 86–97
Kleinberg J (2002) An impossibility theorem for clustering. Proc NIPS 15:463–470
Klosgen W, Zytkow J, Zyt J (2002) Handbook of data mining and knowledge discovery. Oxford University Press, USA
Kohavi R, John G (1998) The wrapper approach. In: Liu H, Motoda H (eds) Feature extraction, construction and selection: a data mining perspective. Springer-Verlag, pp 33–50
Kuncheva LI, Hadjitodorov ST, Todorova LP (2006) Experimental comparison of cluster ensemble methods. In: Proc. FUSION, pp 24–28
Lee D, Seung H (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401:788–791
Lew MS, Sebe N, Djeraba C, Jain R (2006) Content-based multimedia information retrieval: state of the art and challenges. ACN Trans Multim Comp 2(1):1–19
Li T, Ding C, Jordan M (2007) Solving consensus and semi-supervised clustering problems using non-negative matrix factorization. In: Proc. IEEE-ICDM, pp 577–582
Loeff N, Ovesdotter-Alm C, Forsyth D (2006) Discriminating image senses by clustering with multimodal features. In: Proc. COLING/ACL, pp 547–554
Lu W, Li L, Li T, Zhang H, Guo J (2011) Web multimedia object clustering via information fusion. In: Proc. ICDAR, pp 319–323
Messina A, Montagnuolo M (2009) A generalised cross-modal clustering method applied to multimedia news semantic indexing and retrieval. In Proc. WWW, pp 321–330
Monti S, Tamayo, P, Mesirov J, Golub T (2003) Consensus clustering: a resampling based method for class discovery and visualization of gene expression microarray data. J Mach Learn Res 52(1–2):91–118
Ni J, Ma X, Xu L, Wang J (2004) An image recognition method based on multiple BP neural networks fusion. In: Proc. IEEE int’l conf. on information acquisition, pp 323–326
Pinto F, Carriço J, Ramirez M, Almeida J (2007) Ranked adjusted rand: integrating distance and partition information in a measure of clustering agreement. BMC Bioinformatics 8(44):1–13
van Rijsbergen C (1979) Information retrieval. Buttersworth-Heinemann
Sevillano X, Alías F, Socoró J (2012) Positional and confidence voting-based consensus functions for fuzzy cluster ensembles. Fuzzy Sets Syst 193:1–32
Sevillano X, Cobo G, Alías F, Socoró J (2006) Feature diversity in cluster ensembles for robust document clustering. In: Proc. SIGIR, pp 697–698
Sevillano X, Cobo G, Alías F, Socoró J (2007) Text clustering on latent thematic spaces: variants, strenghts and weaknesses. In: Proc. ICA, pp 794–801
Sevillano X (2009) Hierarchical consensus architectures and soft consensus functions for robust multimedia clustering. Ph.D. thesis, La Salle-Universitat Ramon Llull
Sevillano X, Valero X, Alías F (2012) Audio and video cues for geo-tagging online videos in the absence of metadata. In: Proc. CBMI
Strehl A, Ghosh J (2002) Cluster ensembles – a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617
Topchy A, Jain A, Punch W (2004) A mixture model for clustering ensembles. In: Proc. SIAM-SDM, pp 379–390
Torkkola K (2003) Discriminative features for text document classification. Pattern Anal Appl 6(4):301–308
Turnbull D, Barrington L, Torres D, Lanckriet G (2007) Towards musical query-by-semantic-description using the CAL500 dataset. In: Proc. ACM SIGIR, pp 439–446
Vajaria H, Islam T, Sarkar S, Sankar R, Kasturi R (2006) Audio segmentation and speaker localization in meeting videos. In: Proc. IAPR-ICPR, vol 2, pp 1150–1153
Witten I, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kauffman Publishers
Wu Z, Cai L, Meng, H (2006) Multi-level fusion of audio and visual features for speaker identification. In: Proc. int’l conf. on adv. in biometrics, pp 493–499
Xu H, Chua T (2006) Fusion of AV features and external information sources for event detection in team sports video. ACM Trans Multimed Comput Commun Appl 2(1):44–67
Xu R, Wunsch II D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(2):645–678
Ye Y, Li T, Chen Y, Jiang Q (2010) Automatic malware categorization using cluster ensemble. In: Proc. SIGKDD, pp 95–104
Yu Z, Wang X, Wong H (2008) Ensemble based 3D human motion classification. In: Proc. IJCNN, pp 506–510
Yu Z, Wong H (2008) Knowledge based cluster ensemble for 3D head model classification. In: Proc. ICPR, pp 1–4
Yu Z, Wong H (2009) Class discovery from gene expression data based on perturbation and cluster ensemble. IEEE Trans. NanoBioSci. 8(2):147–160
Zhang X, Jiao L, Liu F, Bo L, Gong, M (2008) Spectral clustering ensemble applied to SAR image segmentation. IEEE Trans Geosci Remote Sens 46(7):2126–2136
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sevillano, X., Alías, F. A one-shot domain-independent robust multimedia clustering methodology based on hybrid multimodal fusion. Multimed Tools Appl 73, 1507–1543 (2014). https://doi.org/10.1007/s11042-013-1655-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-013-1655-x