Skip to main content
Log in

Performance evaluation of early and late fusion methods for generic semantics indexing

  • Theoretical Advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

This paper focuses on the comparison between two fusion methods, namely early fusion and late fusion. The former fusion is carried out at kernel level, also known as multiple kernel learning, and in the latter, the modalities are fused through logistic regression at classifier score level. Two kinds of multilayer fusion structures, differing in the quantities of feature/kernel groups in a lower fusion layer, are constructed for early and late fusion systems, respectively. The goal of these fusion methods is to put each of various features into effect and mine redundant information of the combination of them, and then to develop a generic and robust semantic indexing system to bridge semantic gap between human concepts and these low-level visual features. Performance evaluated on both TRECVID2009 and TRECVID2010 datasets demonstrates that the systems with our proposed multilayer fusion methods at kernel level perform more stably to reach the goal than the classification-score-level fusion; the most effective and robust one with highest MAP score is constructed by early fusion with two-layer equally weighted composite kernel learning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Lienhart R, Kuhmunch C, Effelsberg W (1997) On the detection and recognition of television commercials. In: Proceeding of the IEEE conference on multimedia computing and systems, pp 509–516

  2. Zhang H, Tan SY, Smoliar SW, Yihong G (1995) Automatic parsing and indexing of news video. Multimed Syst 2:256–266

    Article  Google Scholar 

  3. Rui Y, Gupta A, Acero A (2000) Automatically extracting highlights for TV baseball programs. In: Proceedings of the eighth ACM international conference on multimedia, pp 105–115

  4. Snoek G, Worring M et al (2006) The semantic pathfinder: using an authoring metaphor for generic multimedia indexing. IEEE Trans Pattern Anal Mach Intell 28:1678–1689

    Article  Google Scholar 

  5. Cees G.M. Snoek, Koen E.A. van de Sande et al (2010) The MediaMill TRECVID 2010 Semantic Video Search Engine TRECVID Workshop

  6. Cees G.M. Snoek et al (2005) Early versus late fusion in semantic video analysis. In: ACM MM’05

  7. Kieran Mc Donald, Alan F. Smeaton (2005) A comparison of score, rank and probability-based fusion methods for video shot retrieval

  8. Ayache S, Gensel J, Qu’enot GM (2006) Clips-lsr experiments at trecvid 2006—draft. In:TREC Video Retrieval Workshop, NIST

  9. Dong Y et al (2009) The france telecom orange labs (beijing) video high-level feature extraction systems—trecvid 2009 notebook paper. TRECVID Workshop

  10. Dong Y, Tao K et al (2010) The france telecom orange labs (beijing) video semantic indexing systems—trecvid 2010 notebook paper. TRECVID Workshop

  11. Amir A, Argillander J, Campbell M et al (2005) IBM research trecvid-2005 video retrieval system. NIST TRECVID-2005 Workshop

  12. Souvannavong F, Huet B (2005) Hierarchical genetic fusion of possibilities. In: Proceedings of the European workshop on the integration of knowledge. Semantic and Digital Media Technologies

  13. Xue X, Lu H, Wu L et al (2005) Fudan university at trecvid 2005. In: TREC Video Retrieval Workshop, NIST

  14. Liu J, Zhai Y, Basharat A et al (2006) University of central florida at trecvid 2006 high-level feature extraction and video search. In: TREC Video Retrieval Workshop, NIST

  15. Yuan J, Guo Z, Lv L et al (2007) Thu and icrc at trecvid 2007. In: TREC Video Retrieval Workshop, NIST

  16. Tang S, Zhang YD, Li JT et al (2007) Trecvid 2007 high-level feature extraction by mcg-ict-cas. In: Proceedings of the TRECVID, NIST

  17. M. Li, Y. T. Zheng, SX Lin et al (2009) Multimedia evidence fusion for video concept detection via owa operator. In: MMM’09, pp 208–216

  18. Yuan J, Wang H, Xiao L et al (2005) Tsinghua university at trecvid 2005. In: TREC Video Retrieval Workshop, NIST

  19. Cooper M, Adcock J, Chen R et al (2005) Fxpal at trecvid 2005. In: TREC Video Retrieval Workshop, NIST

  20. Naphade MR, Mehrotra R et al (1998) A high performance algorithm for shot boundary detection using multiple cues. In: Proceedings of the IEEE International Conference on Image Processing, pp 884–887

  21. Hadjidemetriou E, Grossberg MD, Nayar SK (2004) Multiresolution histograms and their use for recognition. IEEE Trans Pattern Anal Mach Intell 26:831–847

    Article  Google Scholar 

  22. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput V 60:91–110

    Google Scholar 

  23. Pass G, Zabih R, Miller J (1997) Comparing images using color coherence vectors. In: Proceedings of the fourth ACM international conference on Multimedia, pp 65–73

  24. Huang J, Ravi Kumar S, Mitra M, Zhu W, Zabih R (1999) Spatial color indexing and applications. Int J Comput V 35:245–268

    Google Scholar 

  25. Willamowski J, Arregui D, Csurka G, Dance CR, Fan L Categorizing nine visual classes using local appearance descriptors. illumination, vol 17

  26. Liang Y, Liu X, Wang Z et al (2008) THU and ICRC at trecvid

  27. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of the 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05). IEEE Computer Society 1:886–893

  28. Bosch A, Zisserman A, Muoz X (2008) Scene classification using a hybrid generative/discriminative approach. IEEE Trans Pattern Anal Mach Intell 30:712–727

    Article  Google Scholar 

  29. Muller KR, Mika S, Ratsch G et al (2001) An introduction to kernel-based learning algorithms. IEEE trans neural netw 12:181–201

    Article  Google Scholar 

  30. Collobert R, Bengio S (2001) Svmtorch: support vector machines for large-scale regression problems. J Mach Learn Res 1:143–160

    MathSciNet  Google Scholar 

  31. Akbani R., Kwek S, Japkowicz N (2004) Applying support vector machines to imbalanced datasets. In: Proceedings of the 15th European conference on machine learning, pp 39–50

  32. Zhang J, Marszaek M, et al (2007) Local features and kernels for classification of texture and object categories: a comprehensive study. Int J Comput Vision 73:213–238

    Article  Google Scholar 

  33. Rakotomamonjy A, Bach F et al (2007) More efficiency in multiple kernel learning.In: Proceedings of the 24th international conference on machine learning. ACM, Corvalis, Oregon, pp 775–782

  34. Longworth C, Gales M (2009) Combining derivative and parametric kernels for speaker verification. IEEE Trans Audio Speech Lang Process 17:748–757

    Article  Google Scholar 

  35. Kraaij W, Awad G (2009) TRECVID 2009 High-Level Feature Task: Overview. http://www-nlpir.nist.gov/projects/tvpubs/tv9.slides/tv9.sin.slides.pdf, NIST

  36. Quenot G, Awad G (2010) TRECVID 2010 Semantic Indextion Task. http://www-nlpir.nist.gov/projects/tvpubs/tv10.slides/tv10.hlf.slides.pdf, NIST

  37. Fan RE et al (2009) LIBLINEAR: A library for large linear classification journal of Machine Learning Research, pp 1871–1874

Download references

Acknowledgments

This work is sponsored by collaborative Research Project SEV01100474 between Beijing University of Posts and Telecommunications and France Telecom R&D Beijing, and National Natural Science Foundation of China 90920001.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shan Gao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dong, Y., Gao, S., Tao, K. et al. Performance evaluation of early and late fusion methods for generic semantics indexing. Pattern Anal Applic 17, 37–50 (2014). https://doi.org/10.1007/s10044-013-0336-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-013-0336-8

keywords

Navigation