Performance evaluation of early and late fusion methods for generic semantics indexing

Dong, Yuan; Gao, Shan; Tao, Kun; Liu, Jiqing; Wang, Haila

doi:10.1007/s10044-013-0336-8

Performance evaluation of early and late fusion methods for generic semantics indexing

Theoretical Advances
Published: 17 April 2013

Volume 17, pages 37–50, (2014)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

Yuan Dong¹,
Shan Gao¹,
Kun Tao²,
Jiqing Liu¹ &
…
Haila Wang²

1951 Accesses
31 Citations
Explore all metrics

Abstract

This paper focuses on the comparison between two fusion methods, namely early fusion and late fusion. The former fusion is carried out at kernel level, also known as multiple kernel learning, and in the latter, the modalities are fused through logistic regression at classifier score level. Two kinds of multilayer fusion structures, differing in the quantities of feature/kernel groups in a lower fusion layer, are constructed for early and late fusion systems, respectively. The goal of these fusion methods is to put each of various features into effect and mine redundant information of the combination of them, and then to develop a generic and robust semantic indexing system to bridge semantic gap between human concepts and these low-level visual features. Performance evaluated on both TRECVID2009 and TRECVID2010 datasets demonstrates that the systems with our proposed multilayer fusion methods at kernel level perform more stably to reach the goal than the classification-score-level fusion; the most effective and robust one with highest MAP score is constructed by early fusion with two-layer equally weighted composite kernel learning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multiscale Feature Extraction and Fusion of Image and Text in VQA

Article Open access 11 April 2023

Category-Level Contrastive Learning for Unsupervised Hashing in Cross-Modal Retrieval

Article Open access 02 April 2024

A Hybrid Principal Label Space Transformation-Based Binary Relevance Support Vector Machine and Q-Learning Algorithm for Multi-label Classification

Article 20 April 2024

References

Lienhart R, Kuhmunch C, Effelsberg W (1997) On the detection and recognition of television commercials. In: Proceeding of the IEEE conference on multimedia computing and systems, pp 509–516
Zhang H, Tan SY, Smoliar SW, Yihong G (1995) Automatic parsing and indexing of news video. Multimed Syst 2:256–266
Article Google Scholar
Rui Y, Gupta A, Acero A (2000) Automatically extracting highlights for TV baseball programs. In: Proceedings of the eighth ACM international conference on multimedia, pp 105–115
Snoek G, Worring M et al (2006) The semantic pathfinder: using an authoring metaphor for generic multimedia indexing. IEEE Trans Pattern Anal Mach Intell 28:1678–1689
Article Google Scholar
Cees G.M. Snoek, Koen E.A. van de Sande et al (2010) The MediaMill TRECVID 2010 Semantic Video Search Engine TRECVID Workshop
Cees G.M. Snoek et al (2005) Early versus late fusion in semantic video analysis. In: ACM MM’05
Kieran Mc Donald, Alan F. Smeaton (2005) A comparison of score, rank and probability-based fusion methods for video shot retrieval
Ayache S, Gensel J, Qu’enot GM (2006) Clips-lsr experiments at trecvid 2006—draft. In:TREC Video Retrieval Workshop, NIST
Dong Y et al (2009) The france telecom orange labs (beijing) video high-level feature extraction systems—trecvid 2009 notebook paper. TRECVID Workshop
Dong Y, Tao K et al (2010) The france telecom orange labs (beijing) video semantic indexing systems—trecvid 2010 notebook paper. TRECVID Workshop
Amir A, Argillander J, Campbell M et al (2005) IBM research trecvid-2005 video retrieval system. NIST TRECVID-2005 Workshop
Souvannavong F, Huet B (2005) Hierarchical genetic fusion of possibilities. In: Proceedings of the European workshop on the integration of knowledge. Semantic and Digital Media Technologies
Xue X, Lu H, Wu L et al (2005) Fudan university at trecvid 2005. In: TREC Video Retrieval Workshop, NIST
Liu J, Zhai Y, Basharat A et al (2006) University of central florida at trecvid 2006 high-level feature extraction and video search. In: TREC Video Retrieval Workshop, NIST
Yuan J, Guo Z, Lv L et al (2007) Thu and icrc at trecvid 2007. In: TREC Video Retrieval Workshop, NIST
Tang S, Zhang YD, Li JT et al (2007) Trecvid 2007 high-level feature extraction by mcg-ict-cas. In: Proceedings of the TRECVID, NIST
M. Li, Y. T. Zheng, SX Lin et al (2009) Multimedia evidence fusion for video concept detection via owa operator. In: MMM’09, pp 208–216
Yuan J, Wang H, Xiao L et al (2005) Tsinghua university at trecvid 2005. In: TREC Video Retrieval Workshop, NIST
Cooper M, Adcock J, Chen R et al (2005) Fxpal at trecvid 2005. In: TREC Video Retrieval Workshop, NIST
Naphade MR, Mehrotra R et al (1998) A high performance algorithm for shot boundary detection using multiple cues. In: Proceedings of the IEEE International Conference on Image Processing, pp 884–887
Hadjidemetriou E, Grossberg MD, Nayar SK (2004) Multiresolution histograms and their use for recognition. IEEE Trans Pattern Anal Mach Intell 26:831–847
Article Google Scholar
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput V 60:91–110
Google Scholar
Pass G, Zabih R, Miller J (1997) Comparing images using color coherence vectors. In: Proceedings of the fourth ACM international conference on Multimedia, pp 65–73
Huang J, Ravi Kumar S, Mitra M, Zhu W, Zabih R (1999) Spatial color indexing and applications. Int J Comput V 35:245–268
Google Scholar
Willamowski J, Arregui D, Csurka G, Dance CR, Fan L Categorizing nine visual classes using local appearance descriptors. illumination, vol 17
Liang Y, Liu X, Wang Z et al (2008) THU and ICRC at trecvid
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of the 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05). IEEE Computer Society 1:886–893
Bosch A, Zisserman A, Muoz X (2008) Scene classification using a hybrid generative/discriminative approach. IEEE Trans Pattern Anal Mach Intell 30:712–727
Article Google Scholar
Muller KR, Mika S, Ratsch G et al (2001) An introduction to kernel-based learning algorithms. IEEE trans neural netw 12:181–201
Article Google Scholar
Collobert R, Bengio S (2001) Svmtorch: support vector machines for large-scale regression problems. J Mach Learn Res 1:143–160
MathSciNet Google Scholar
Akbani R., Kwek S, Japkowicz N (2004) Applying support vector machines to imbalanced datasets. In: Proceedings of the 15th European conference on machine learning, pp 39–50
Zhang J, Marszaek M, et al (2007) Local features and kernels for classification of texture and object categories: a comprehensive study. Int J Comput Vision 73:213–238
Article Google Scholar
Rakotomamonjy A, Bach F et al (2007) More efficiency in multiple kernel learning.In: Proceedings of the 24th international conference on machine learning. ACM, Corvalis, Oregon, pp 775–782
Longworth C, Gales M (2009) Combining derivative and parametric kernels for speaker verification. IEEE Trans Audio Speech Lang Process 17:748–757
Article Google Scholar
Kraaij W, Awad G (2009) TRECVID 2009 High-Level Feature Task: Overview. http://www-nlpir.nist.gov/projects/tvpubs/tv9.slides/tv9.sin.slides.pdf, NIST
Quenot G, Awad G (2010) TRECVID 2010 Semantic Indextion Task. http://www-nlpir.nist.gov/projects/tvpubs/tv10.slides/tv10.hlf.slides.pdf, NIST
Fan RE et al (2009) LIBLINEAR: A library for large linear classification journal of Machine Learning Research, pp 1871–1874

Download references

Acknowledgments

This work is sponsored by collaborative Research Project SEV01100474 between Beijing University of Posts and Telecommunications and France Telecom R&D Beijing, and National Natural Science Foundation of China 90920001.

Author information

Authors and Affiliations

Beijing University of Posts and Telecommunications, Beijing, People’s Republic of China
Yuan Dong, Shan Gao & Jiqing Liu
France Telecom R&D Beijing Co., Ltd., Beijing, People’s Republic of China
Kun Tao & Haila Wang

Authors

Yuan Dong
View author publications
You can also search for this author in PubMed Google Scholar
Shan Gao
View author publications
You can also search for this author in PubMed Google Scholar
Kun Tao
View author publications
You can also search for this author in PubMed Google Scholar
Jiqing Liu
View author publications
You can also search for this author in PubMed Google Scholar
Haila Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shan Gao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dong, Y., Gao, S., Tao, K. et al. Performance evaluation of early and late fusion methods for generic semantics indexing. Pattern Anal Applic 17, 37–50 (2014). https://doi.org/10.1007/s10044-013-0336-8

Download citation

Received: 02 March 2011
Accepted: 08 April 2013
Published: 17 April 2013
Issue Date: February 2014
DOI: https://doi.org/10.1007/s10044-013-0336-8

keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Performance evaluation of early and late fusion methods for generic semantics indexing

Abstract

Access this article

Similar content being viewed by others

Multiscale Feature Extraction and Fusion of Image and Text in VQA

Category-Level Contrastive Learning for Unsupervised Hashing in Cross-Modal Retrieval

A Hybrid Principal Label Space Transformation-Based Binary Relevance Support Vector Machine and Q-Learning Algorithm for Multi-label Classification

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

keywords

Navigation

Performance evaluation of early and late fusion methods for generic semantics indexing

Abstract

Access this article

Similar content being viewed by others

Multiscale Feature Extraction and Fusion of Image and Text in VQA

Category-Level Contrastive Learning for Unsupervised Hashing in Cross-Modal Retrieval

A Hybrid Principal Label Space Transformation-Based Binary Relevance Support Vector Machine and Q-Learning Algorithm for Multi-label Classification

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

keywords

Search

Navigation