A one-shot domain-independent robust multimedia clustering methodology based on hybrid multimodal fusion

Sevillano, Xavier; Alías, Francesc

doi:10.1007/s11042-013-1655-x

A one-shot domain-independent robust multimedia clustering methodology based on hybrid multimodal fusion

Published: 16 August 2013

Volume 73, pages 1507–1543, (2014)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Xavier Sevillano¹ &
Francesc Alías¹

259 Accesses
7 Citations
Explore all metrics

Abstract

The existence of multiple modalities poses a challenge to the design of multimedia data clustering systems, as the unsupervised nature of the problem makes it very difficult to determine a priori whether a single modality should dominate the clustering process, or if modalities should be combined somehow. In order to fight against these indeterminacies—which come on top of those referring to the selection of the optimal clustering algorithm and data representation for the problem at hand–, this work introduces robust multimedia clustering, a one-shot methodology for domain independent multimedia data clustering based on hybrid multimodal fusion. By means of experimentation, we firstly justify the motivation of the proposed methodology by proving the relevance of multimedia clustering indeterminacies. Subsequently, a specific multimedia clustering system based on the requirements of the methodology is implemented and evaluated on three multimedia clustering applications—music genres, photographic topics and audio-visual objects classification—as a proof of concept, analyzing the quality of the obtained partitions and the time complexity of the proposal. The experimental results reveal that the implemented system, which includes a self-refining consensus clustering procedure for attaining high levels of robustness, allows to obtain, in a fully unsupervised manner, better quality partitions than 93 % of the clusterers available in our experiments, being even able to improve the quality of the best ones and outperforming state-of-the-art alternatives.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust Discriminative multi-view K-means clustering with feature selection and group sparsity learning

Article 26 July 2018

Collaborative multi-view K-means clustering

Article 01 September 2017

An adaptive version of k-medoids to deal with the uncertainty in clustering heterogeneous data using an intermediary fusion approach

Article 18 March 2016

Notes

The IsoLetters data collection is available on request to the first author via e-mail.
All four feature extraction techniques were employed for all modalities except for those datasets that do not satisfy the non-negativity constraint necessary for applying NMF (CAL500 and IsoLetters).
These values correspond to the following: $\left|\mathrm{ORP}_1\right|=31=1 (\mathrm{original})+10 (\mathrm{PCA})+10 (\mathrm{ICA})+10 (\mathrm{RP})$, $\left|\mathrm{ORP}_2\right|=16=1(\mathrm{original})+5(\mathrm{PCA})+5(\mathrm{ICA})+5(\mathrm{RP})$ and $\left|\mathrm{ORP}_{1\div 2}\right|=55=1(\mathrm{original})+$ 18(PCA) + 18(ICA) + 18(RP).
CLUTO is a software package for clustering low- and high-dimensional data sets, and it is available at http://glaros.dtc.umn.edu/gkhome/cluto/cluto/download (checked in September 2012)
Despite we conducted experiments using a total of seven consensus functions based on different approaches (e.g. similarity-as-data [35]), we finally present the results of these three consensus functions as they constitute a representative and diverse sample, as well as for the sake of paper length. The interested reader should refer to [49] for more experimental results.
Despite we conducted experiments using wider sweeps of p _i values, the ones employed in this work yield the most representative results. The interested reader should refer to [49] for these experiments.
In our case, we have employed the NMF Matlab implementation of NMFPack [25].

References

Atrey P, Hossein M, El Saddik A, Kankanhalli M (2010) Multimodal fusion for multimedia analysis: a survey. Multimedia Systems 16:345–379
Article Google Scholar
Atrey P, Kankanhalli M, Jain R (2006) Information assimilation framework for event detection in multimedia surveillance systems. Multimedia Systems 12(3):239–253
Article Google Scholar
Ayache S, Quénot G, Gensel J (2007) Classifier fusion for svm-based multimedia semantic indexing. In: Proc. ECIR, pp 494–504
Barnard K, Forsyth D (2001) Learning the semantics of words and pictures. In: Proc. IEEE-ICCV, vol II, pp 408–415
Bassiou N, Moschou V, Kotropoulos C (2010) Speaker diarization exploiting the eigengap criterion and cluster ensembles. IEEE Trans Audio Speech Lang Process 18(8):2134–2144
Article Google Scholar
Bekkerman R, Jeon J (2007) Multi-modal clustering for multimedia collections. In: Proc. IEEE-CVPR, pp 1–8
Bendjebbour A, Delignon Y, Fouque L, Samson V, Pieczynski W (2001) Multisensor image segmentation using DempsterShafer fusion in Markov fields context. IEEE Trans Geosci Remote Sens 39(8), 1789–1798
Article Google Scholar
Benitez A, Chang S (2002) Perceptual knowledge construction from annotated image collections. In: Columbia University ADVENT, pp 26–29
Cai D, He X, Li Z, Ma W, Wen J (2004) Hierarchical clustering of WWW image search results using visual, textual and link information. In: Proc. ACM Multimedia, pp 952–959
Chaudhuri K, Kakade S, Livescu K, Sridharan K (2009) Multiview clustering via canonical correlation analysis. In: Proc. ICML, pp 129–136
Cooper M (2011) Clustering geo-tagged photo collections using dynamic programming. In: Proc. ACM MM, pp 1025–1028
Duygulu P, Barnard K, de Freitas N, Forsyth D (2002) Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In: Proc. ECCV, vol 4. Springer Verlag, pp 97–112
Dy J, Brodley C (2004) Feature selection for unsupervised learning. J Mach Learn Res 5:845–889
MATH MathSciNet Google Scholar
Fern X, Brodley C (2004) Solving cluster ensemble problems by bipartite graph partitioning. In: Proc. ICML, pp 281–288
Fern X, Lin W (2008) Cluster ensemble selection. In: Proc. SDM
Foster I (1986) Designing and building parallel programs: concepts and tools for parallel software engineering. Addison-Wesley
Frank A, Asuncion, A (2010) UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences. [Online] Available: http://archive.ics.uci.edu/ml. Accessed Aug 2013
Fred A, Jain AK (2002) Data clustering using evidence accumulation. In: Proc. ICPR, pp 276–280
Fred A, Jain A (2005) Combining multiple clusterings using evidence accumulation. IEEE Trans Pattern Anal Mach Intell 27(6):835–850
Article Google Scholar
Friedland G, Hung H, Yeo C (2009) Multi-modal speaker diarization of real-world meetings using compressed-domain video features. In: Proc. IEEE-ICASSP, pp 4069–4072
Gao B, Liu T, Qin T, Zheng X, Cheng Q, Ma W (2005) Web image clustering by consistent utilization of visual features and surrounding texts. In: Proc. ACM Multimedia, pp 112–121
Ghosh J, Acharya A (2011) Cluster ensembles. WIREs Data Mining Knowl Discov 1(4):305–315
Article Google Scholar
Gionis A, Mannila H, Tsaparas P (2007) Clustering aggregation. ACM Trans Knowl Discov Data 1(1):1–30
Article Google Scholar
Goder A, Filkov V (2008) Consensus clustering algorithms: comparison and refinement. In: Proc. ALENEX, pp 109–117
Hoyer PO (2004) Non-Negative Matrix Factorization with sparseness constraints. J Mach Learn Res 5:1457–1469
MATH MathSciNet Google Scholar
Hyvärinen A, Karhunen J, Oja E (2001) Independent component analysis. John Wiley and Sons
Iam-on N, Boongoen T, Garrett S (2010) LCE: a link-based cluster ensemble method for improved gene expression data analysis. Bioinformatics 26(12):1513–1519
Article Google Scholar
Jain A, Murty M, Flynn P (1999) Data clustering: a survey. ACM Comput Surv 31(3):264–323
Article Google Scholar
Jolliffe I (1986) Principal component analysis. Springer
Kaski S (1998) Dimensionality reduction by random mapping: fast similarity computation for clustering. In: Proc. IJCNN, pp 413–418
Khalidov V, Forbes F, Hansard M, Arnaud E, Horaud R (2008) Audio-visual clustering for multiple speaker localization. In: Proc. MLMI, pp 86–97
Kleinberg J (2002) An impossibility theorem for clustering. Proc NIPS 15:463–470
Google Scholar
Klosgen W, Zytkow J, Zyt J (2002) Handbook of data mining and knowledge discovery. Oxford University Press, USA
Google Scholar
Kohavi R, John G (1998) The wrapper approach. In: Liu H, Motoda H (eds) Feature extraction, construction and selection: a data mining perspective. Springer-Verlag, pp 33–50
Kuncheva LI, Hadjitodorov ST, Todorova LP (2006) Experimental comparison of cluster ensemble methods. In: Proc. FUSION, pp 24–28
Lee D, Seung H (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401:788–791
Article Google Scholar
Lew MS, Sebe N, Djeraba C, Jain R (2006) Content-based multimedia information retrieval: state of the art and challenges. ACN Trans Multim Comp 2(1):1–19
Article Google Scholar
Li T, Ding C, Jordan M (2007) Solving consensus and semi-supervised clustering problems using non-negative matrix factorization. In: Proc. IEEE-ICDM, pp 577–582
Loeff N, Ovesdotter-Alm C, Forsyth D (2006) Discriminating image senses by clustering with multimodal features. In: Proc. COLING/ACL, pp 547–554
Lu W, Li L, Li T, Zhang H, Guo J (2011) Web multimedia object clustering via information fusion. In: Proc. ICDAR, pp 319–323
Messina A, Montagnuolo M (2009) A generalised cross-modal clustering method applied to multimedia news semantic indexing and retrieval. In Proc. WWW, pp 321–330
Monti S, Tamayo, P, Mesirov J, Golub T (2003) Consensus clustering: a resampling based method for class discovery and visualization of gene expression microarray data. J Mach Learn Res 52(1–2):91–118
Article MATH Google Scholar
Ni J, Ma X, Xu L, Wang J (2004) An image recognition method based on multiple BP neural networks fusion. In: Proc. IEEE int’l conf. on information acquisition, pp 323–326
Pinto F, Carriço J, Ramirez M, Almeida J (2007) Ranked adjusted rand: integrating distance and partition information in a measure of clustering agreement. BMC Bioinformatics 8(44):1–13
Google Scholar
van Rijsbergen C (1979) Information retrieval. Buttersworth-Heinemann
Sevillano X, Alías F, Socoró J (2012) Positional and confidence voting-based consensus functions for fuzzy cluster ensembles. Fuzzy Sets Syst 193:1–32
Article Google Scholar
Sevillano X, Cobo G, Alías F, Socoró J (2006) Feature diversity in cluster ensembles for robust document clustering. In: Proc. SIGIR, pp 697–698
Sevillano X, Cobo G, Alías F, Socoró J (2007) Text clustering on latent thematic spaces: variants, strenghts and weaknesses. In: Proc. ICA, pp 794–801
Sevillano X (2009) Hierarchical consensus architectures and soft consensus functions for robust multimedia clustering. Ph.D. thesis, La Salle-Universitat Ramon Llull
Sevillano X, Valero X, Alías F (2012) Audio and video cues for geo-tagging online videos in the absence of metadata. In: Proc. CBMI
Strehl A, Ghosh J (2002) Cluster ensembles – a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617
MathSciNet Google Scholar
Topchy A, Jain A, Punch W (2004) A mixture model for clustering ensembles. In: Proc. SIAM-SDM, pp 379–390
Torkkola K (2003) Discriminative features for text document classification. Pattern Anal Appl 6(4):301–308
MathSciNet Google Scholar
Turnbull D, Barrington L, Torres D, Lanckriet G (2007) Towards musical query-by-semantic-description using the CAL500 dataset. In: Proc. ACM SIGIR, pp 439–446
Vajaria H, Islam T, Sarkar S, Sankar R, Kasturi R (2006) Audio segmentation and speaker localization in meeting videos. In: Proc. IAPR-ICPR, vol 2, pp 1150–1153
Witten I, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kauffman Publishers
Wu Z, Cai L, Meng, H (2006) Multi-level fusion of audio and visual features for speaker identification. In: Proc. int’l conf. on adv. in biometrics, pp 493–499
Xu H, Chua T (2006) Fusion of AV features and external information sources for event detection in team sports video. ACM Trans Multimed Comput Commun Appl 2(1):44–67
Article Google Scholar
Xu R, Wunsch II D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(2):645–678
Article Google Scholar
Ye Y, Li T, Chen Y, Jiang Q (2010) Automatic malware categorization using cluster ensemble. In: Proc. SIGKDD, pp 95–104
Yu Z, Wang X, Wong H (2008) Ensemble based 3D human motion classification. In: Proc. IJCNN, pp 506–510
Yu Z, Wong H (2008) Knowledge based cluster ensemble for 3D head model classification. In: Proc. ICPR, pp 1–4
Yu Z, Wong H (2009) Class discovery from gene expression data based on perturbation and cluster ensemble. IEEE Trans. NanoBioSci. 8(2):147–160
Article MathSciNet Google Scholar
Zhang X, Jiao L, Liu F, Bo L, Gong, M (2008) Spectral clustering ensemble applied to SAR image segmentation. IEEE Trans Geosci Remote Sens 46(7):2126–2136
Article Google Scholar

Download references

Author information

Authors and Affiliations

Grup de Recerca en Tecnologies Mèdia, La Salle - Universitat Ramon Llull, Quatre Camins, 30, 08022, Barcelona, Spain
Xavier Sevillano & Francesc Alías

Authors

Xavier Sevillano
View author publications
You can also search for this author in PubMed Google Scholar
Francesc Alías
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xavier Sevillano.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sevillano, X., Alías, F. A one-shot domain-independent robust multimedia clustering methodology based on hybrid multimodal fusion. Multimed Tools Appl 73, 1507–1543 (2014). https://doi.org/10.1007/s11042-013-1655-x

Download citation

Published: 16 August 2013
Issue Date: December 2014
DOI: https://doi.org/10.1007/s11042-013-1655-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A one-shot domain-independent robust multimedia clustering methodology based on hybrid multimodal fusion

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Robust Discriminative multi-view K-means clustering with feature selection and group sparsity learning

Collaborative multi-view K-means clustering

An adaptive version of k-medoids to deal with the uncertainty in clustering heterogeneous data using an intermediary fusion approach

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A one-shot domain-independent robust multimedia clustering methodology based on hybrid multimodal fusion

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Robust Discriminative multi-view K-means clustering with feature selection and group sparsity learning

Collaborative multi-view K-means clustering

An adaptive version of k-medoids to deal with the uncertainty in clustering heterogeneous data using an intermediary fusion approach

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation