Skip to main content
Log in

A one-shot domain-independent robust multimedia clustering methodology based on hybrid multimodal fusion

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The existence of multiple modalities poses a challenge to the design of multimedia data clustering systems, as the unsupervised nature of the problem makes it very difficult to determine a priori whether a single modality should dominate the clustering process, or if modalities should be combined somehow. In order to fight against these indeterminacies—which come on top of those referring to the selection of the optimal clustering algorithm and data representation for the problem at hand–, this work introduces robust multimedia clustering, a one-shot methodology for domain independent multimedia data clustering based on hybrid multimodal fusion. By means of experimentation, we firstly justify the motivation of the proposed methodology by proving the relevance of multimedia clustering indeterminacies. Subsequently, a specific multimedia clustering system based on the requirements of the methodology is implemented and evaluated on three multimedia clustering applications—music genres, photographic topics and audio-visual objects classification—as a proof of concept, analyzing the quality of the obtained partitions and the time complexity of the proposal. The experimental results reveal that the implemented system, which includes a self-refining consensus clustering procedure for attaining high levels of robustness, allows to obtain, in a fully unsupervised manner, better quality partitions than 93 % of the clusterers available in our experiments, being even able to improve the quality of the best ones and outperforming state-of-the-art alternatives.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. The IsoLetters data collection is available on request to the first author via e-mail.

  2. All four feature extraction techniques were employed for all modalities except for those datasets that do not satisfy the non-negativity constraint necessary for applying NMF (CAL500 and IsoLetters).

  3. These values correspond to the following: \(\left|\mathrm{ORP}_1\right|=31=1 (\mathrm{original})+10 (\mathrm{PCA})+10 (\mathrm{ICA})+10 (\mathrm{RP})\), \(\left|\mathrm{ORP}_2\right|=16=1(\mathrm{original})+5(\mathrm{PCA})+5(\mathrm{ICA})+5(\mathrm{RP})\) and \(\left|\mathrm{ORP}_{1\div 2}\right|=55=1(\mathrm{original})+\) 18(PCA) + 18(ICA) + 18(RP).

  4. CLUTO is a software package for clustering low- and high-dimensional data sets, and it is available at http://glaros.dtc.umn.edu/gkhome/cluto/cluto/download (checked in September 2012)

  5. Despite we conducted experiments using a total of seven consensus functions based on different approaches (e.g. similarity-as-data [35]), we finally present the results of these three consensus functions as they constitute a representative and diverse sample, as well as for the sake of paper length. The interested reader should refer to [49] for more experimental results.

  6. Despite we conducted experiments using wider sweeps of p i values, the ones employed in this work yield the most representative results. The interested reader should refer to [49] for these experiments.

  7. In our case, we have employed the NMF Matlab implementation of NMFPack [25].

References

  1. Atrey P, Hossein M, El Saddik A, Kankanhalli M (2010) Multimodal fusion for multimedia analysis: a survey. Multimedia Systems 16:345–379

    Article  Google Scholar 

  2. Atrey P, Kankanhalli M, Jain R (2006) Information assimilation framework for event detection in multimedia surveillance systems. Multimedia Systems 12(3):239–253

    Article  Google Scholar 

  3. Ayache S, Quénot G, Gensel J (2007) Classifier fusion for svm-based multimedia semantic indexing. In: Proc. ECIR, pp 494–504

  4. Barnard K, Forsyth D (2001) Learning the semantics of words and pictures. In: Proc. IEEE-ICCV, vol II, pp 408–415

  5. Bassiou N, Moschou V, Kotropoulos C (2010) Speaker diarization exploiting the eigengap criterion and cluster ensembles. IEEE Trans Audio Speech Lang Process 18(8):2134–2144

    Article  Google Scholar 

  6. Bekkerman R, Jeon J (2007) Multi-modal clustering for multimedia collections. In: Proc. IEEE-CVPR, pp 1–8

  7. Bendjebbour A, Delignon Y, Fouque L, Samson V, Pieczynski W (2001) Multisensor image segmentation using DempsterShafer fusion in Markov fields context. IEEE Trans Geosci Remote Sens 39(8), 1789–1798

    Article  Google Scholar 

  8. Benitez A, Chang S (2002) Perceptual knowledge construction from annotated image collections. In: Columbia University ADVENT, pp 26–29

  9. Cai D, He X, Li Z, Ma W, Wen J (2004) Hierarchical clustering of WWW image search results using visual, textual and link information. In: Proc. ACM Multimedia, pp 952–959

  10. Chaudhuri K, Kakade S, Livescu K, Sridharan K (2009) Multiview clustering via canonical correlation analysis. In: Proc. ICML, pp 129–136

  11. Cooper M (2011) Clustering geo-tagged photo collections using dynamic programming. In: Proc. ACM MM, pp 1025–1028

  12. Duygulu P, Barnard K, de Freitas N, Forsyth D (2002) Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In: Proc. ECCV, vol 4. Springer Verlag, pp 97–112

  13. Dy J, Brodley C (2004) Feature selection for unsupervised learning. J Mach Learn Res 5:845–889

    MATH  MathSciNet  Google Scholar 

  14. Fern X, Brodley C (2004) Solving cluster ensemble problems by bipartite graph partitioning. In: Proc. ICML, pp 281–288

  15. Fern X, Lin W (2008) Cluster ensemble selection. In: Proc. SDM

  16. Foster I (1986) Designing and building parallel programs: concepts and tools for parallel software engineering. Addison-Wesley

  17. Frank A, Asuncion, A (2010) UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences. [Online] Available: http://archive.ics.uci.edu/ml. Accessed Aug 2013

  18. Fred A, Jain AK (2002) Data clustering using evidence accumulation. In: Proc. ICPR, pp 276–280

  19. Fred A, Jain A (2005) Combining multiple clusterings using evidence accumulation. IEEE Trans Pattern Anal Mach Intell 27(6):835–850

    Article  Google Scholar 

  20. Friedland G, Hung H, Yeo C (2009) Multi-modal speaker diarization of real-world meetings using compressed-domain video features. In: Proc. IEEE-ICASSP, pp 4069–4072

  21. Gao B, Liu T, Qin T, Zheng X, Cheng Q, Ma W (2005) Web image clustering by consistent utilization of visual features and surrounding texts. In: Proc. ACM Multimedia, pp 112–121

  22. Ghosh J, Acharya A (2011) Cluster ensembles. WIREs Data Mining Knowl Discov 1(4):305–315

    Article  Google Scholar 

  23. Gionis A, Mannila H, Tsaparas P (2007) Clustering aggregation. ACM Trans Knowl Discov Data 1(1):1–30

    Article  Google Scholar 

  24. Goder A, Filkov V (2008) Consensus clustering algorithms: comparison and refinement. In: Proc. ALENEX, pp 109–117

  25. Hoyer PO (2004) Non-Negative Matrix Factorization with sparseness constraints. J Mach Learn Res 5:1457–1469

    MATH  MathSciNet  Google Scholar 

  26. Hyvärinen A, Karhunen J, Oja E (2001) Independent component analysis. John Wiley and Sons

  27. Iam-on N, Boongoen T, Garrett S (2010) LCE: a link-based cluster ensemble method for improved gene expression data analysis. Bioinformatics 26(12):1513–1519

    Article  Google Scholar 

  28. Jain A, Murty M, Flynn P (1999) Data clustering: a survey. ACM Comput Surv 31(3):264–323

    Article  Google Scholar 

  29. Jolliffe I (1986) Principal component analysis. Springer

  30. Kaski S (1998) Dimensionality reduction by random mapping: fast similarity computation for clustering. In: Proc. IJCNN, pp 413–418

  31. Khalidov V, Forbes F, Hansard M, Arnaud E, Horaud R (2008) Audio-visual clustering for multiple speaker localization. In: Proc. MLMI, pp 86–97

  32. Kleinberg J (2002) An impossibility theorem for clustering. Proc NIPS 15:463–470

    Google Scholar 

  33. Klosgen W, Zytkow J, Zyt J (2002) Handbook of data mining and knowledge discovery. Oxford University Press, USA

    Google Scholar 

  34. Kohavi R, John G (1998) The wrapper approach. In: Liu H, Motoda H (eds) Feature extraction, construction and selection: a data mining perspective. Springer-Verlag, pp 33–50

  35. Kuncheva LI, Hadjitodorov ST, Todorova LP (2006) Experimental comparison of cluster ensemble methods. In: Proc. FUSION, pp 24–28

  36. Lee D, Seung H (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401:788–791

    Article  Google Scholar 

  37. Lew MS, Sebe N, Djeraba C, Jain R (2006) Content-based multimedia information retrieval: state of the art and challenges. ACN Trans Multim Comp 2(1):1–19

    Article  Google Scholar 

  38. Li T, Ding C, Jordan M (2007) Solving consensus and semi-supervised clustering problems using non-negative matrix factorization. In: Proc. IEEE-ICDM, pp 577–582

  39. Loeff N, Ovesdotter-Alm C, Forsyth D (2006) Discriminating image senses by clustering with multimodal features. In: Proc. COLING/ACL, pp 547–554

  40. Lu W, Li L, Li T, Zhang H, Guo J (2011) Web multimedia object clustering via information fusion. In: Proc. ICDAR, pp 319–323

  41. Messina A, Montagnuolo M (2009) A generalised cross-modal clustering method applied to multimedia news semantic indexing and retrieval. In Proc. WWW, pp 321–330

  42. Monti S, Tamayo, P, Mesirov J, Golub T (2003) Consensus clustering: a resampling based method for class discovery and visualization of gene expression microarray data. J Mach Learn Res 52(1–2):91–118

    Article  MATH  Google Scholar 

  43. Ni J, Ma X, Xu L, Wang J (2004) An image recognition method based on multiple BP neural networks fusion. In: Proc. IEEE int’l conf. on information acquisition, pp 323–326

  44. Pinto F, Carriço J, Ramirez M, Almeida J (2007) Ranked adjusted rand: integrating distance and partition information in a measure of clustering agreement. BMC Bioinformatics 8(44):1–13

    Google Scholar 

  45. van Rijsbergen C (1979) Information retrieval. Buttersworth-Heinemann

  46. Sevillano X, Alías F, Socoró J (2012) Positional and confidence voting-based consensus functions for fuzzy cluster ensembles. Fuzzy Sets Syst 193:1–32

    Article  Google Scholar 

  47. Sevillano X, Cobo G, Alías F, Socoró J (2006) Feature diversity in cluster ensembles for robust document clustering. In: Proc. SIGIR, pp 697–698

  48. Sevillano X, Cobo G, Alías F, Socoró J (2007) Text clustering on latent thematic spaces: variants, strenghts and weaknesses. In: Proc. ICA, pp 794–801

  49. Sevillano X (2009) Hierarchical consensus architectures and soft consensus functions for robust multimedia clustering. Ph.D. thesis, La Salle-Universitat Ramon Llull

  50. Sevillano X, Valero X, Alías F (2012) Audio and video cues for geo-tagging online videos in the absence of metadata. In: Proc. CBMI

  51. Strehl A, Ghosh J (2002) Cluster ensembles – a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617

    MathSciNet  Google Scholar 

  52. Topchy A, Jain A, Punch W (2004) A mixture model for clustering ensembles. In: Proc. SIAM-SDM, pp 379–390

  53. Torkkola K (2003) Discriminative features for text document classification. Pattern Anal Appl 6(4):301–308

    MathSciNet  Google Scholar 

  54. Turnbull D, Barrington L, Torres D, Lanckriet G (2007) Towards musical query-by-semantic-description using the CAL500 dataset. In: Proc. ACM SIGIR, pp 439–446

  55. Vajaria H, Islam T, Sarkar S, Sankar R, Kasturi R (2006) Audio segmentation and speaker localization in meeting videos. In: Proc. IAPR-ICPR, vol 2, pp 1150–1153

  56. Witten I, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kauffman Publishers

  57. Wu Z, Cai L, Meng, H (2006) Multi-level fusion of audio and visual features for speaker identification. In: Proc. int’l conf. on adv. in biometrics, pp 493–499

  58. Xu H, Chua T (2006) Fusion of AV features and external information sources for event detection in team sports video. ACM Trans Multimed Comput Commun Appl 2(1):44–67

    Article  Google Scholar 

  59. Xu R, Wunsch II D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(2):645–678

    Article  Google Scholar 

  60. Ye Y, Li T, Chen Y, Jiang Q (2010) Automatic malware categorization using cluster ensemble. In: Proc. SIGKDD, pp 95–104

  61. Yu Z, Wang X, Wong H (2008) Ensemble based 3D human motion classification. In: Proc. IJCNN, pp 506–510

  62. Yu Z, Wong H (2008) Knowledge based cluster ensemble for 3D head model classification. In: Proc. ICPR, pp 1–4

  63. Yu Z, Wong H (2009) Class discovery from gene expression data based on perturbation and cluster ensemble. IEEE Trans. NanoBioSci. 8(2):147–160

    Article  MathSciNet  Google Scholar 

  64. Zhang X, Jiao L, Liu F, Bo L, Gong, M (2008) Spectral clustering ensemble applied to SAR image segmentation. IEEE Trans Geosci Remote Sens 46(7):2126–2136

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xavier Sevillano.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sevillano, X., Alías, F. A one-shot domain-independent robust multimedia clustering methodology based on hybrid multimodal fusion. Multimed Tools Appl 73, 1507–1543 (2014). https://doi.org/10.1007/s11042-013-1655-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-013-1655-x

Keywords

Navigation