Blind late fusion in multimedia event retrieval

de Boer, Maaike H. T.; Schutte, Klamer; Zhang, Hao; Lu, Yi-Jie; Ngo, Chong-Wah; Kraaij, Wessel

doi:10.1007/s13735-016-0112-9

Blind late fusion in multimedia event retrieval

Regular Paper
Published: 08 November 2016

Volume 5, pages 203–217, (2016)
Cite this article

International Journal of Multimedia Information Retrieval Aims and scope Submit manuscript

Maaike H. T. de Boer^1,2,
Klamer Schutte¹,
Hao Zhang³,
Yi-Jie Lu³,
Chong-Wah Ngo³ &
…
Wessel Kraaij^4,5

294 Accesses
5 Citations
Explore all metrics

Abstract

One of the challenges in Multimedia Event Retrieval is the integration of data from multiple modalities. A modality is defined as a single channel of sensory input, such as visual or audio. We also refer to this as data source. Previous research has shown that the integration of different data sources can improve performance compared to only using one source, but a clear insight of success factors of alternative fusion methods is still lacking. We introduce several new blind late fusion methods based on inversions and ratios of the state-of-the-art blind fusion methods and compare performance in both simulations and an international benchmark data set in multimedia event retrieval named TRECVID MED. The results show that five of the proposed methods outperform the state-of-the-art methods in a case with sufficient training examples (100 examples). The novel fusion method named JRER is not only the best method with dependent data sources, but this method is also a robust method in all simulations with sufficient training examples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Survey on Fusion of Audiovisual Information for Multimedia Event Recognition

Resource Constrained Multimedia Event Detection

Evaluating Multimedia Features and Fusion for Example-Based Event Detection

References

Atrey PK, Hossain MA, El Saddik A, Kankanhalli MS (2010) Multimodal fusion for multimedia analysis: a survey. Multimed syst 16(6):345–379
Article Google Scholar
Cremer F, Schutte K, Schavemaker JG, den Breejen E (2001) A comparison of decision-level sensor-fusion methods for anti-personnel landmine detection. Inf Fusion 2(3):187–208
Article Google Scholar
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proc. of Int. Conf. on Multimedia. ACM, pp 675–678
Jiang YG, Bhattacharya S, Chang S-F, Shah MI (2012) High-level event recognition in unconstrained videos. Int J Multimed Inf Retr 1–29
Jiang Y-G, Wu Z, Wang J, Xue X, Chang S-F (2015) Exploiting feature and class relationships in video categorization with regularized deep neural networks. In: arXiv preprint arXiv:1502.07209
Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: CVPR. IEEE, pp 1725–1732
Kittler J, Hatef M, Duin RP, Matas J (1998) On combining classifiers. IEEE Trans Pattern Anal Mach Intell 20(3):226–239
Article Google Scholar
Kraaij W, Westerveld T, Hiemstra D (2002) The importance of prior probabilities for entry page search. In: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, pp 27–34
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 1097–1105
Lan Z-Z, Bao L, Yu S-I, Liu W, Hauptmann AG (2012) Double fusion for multimedia event detection. In: Advances in multimedia modeling. Springer, pp 173–185
Lewis DD (1998) Naive (bayes) at forty: the independence assumption in information retrieval. In: European conference on machine learning. Springer, pp 4–15
Ma AJ, Yuen PC, Lai J-H (2013) Linear dependency modeling for classifier fusion and feature combination. IEEE Trans Pattern Anal Mach Intell 35(5):1135–1148
Article Google Scholar
Mc Donald K, Smeaton AF (2005) A comparison of score, rank and probability-based fusion methods for video shot retrieval. In: International Conference on Image and Video Retrieval. Springer, pp 61–70
Mladenić D (1998) Feature subset selection in text-learning. In: European Conference on Machine Learning. Springer, pp 95–100
Mukaka M (2012) A guide to appropriate use of correlation coefficient in medical research. Malawi Med J 24(3):69–71
Google Scholar
Myers GK, Nallapati R, van Hout J, Pancoast S, Nevatia R, Sun C, Habibian A, Koelma DC, van de Sande KE, Smeulders AW et al (2014) Evaluating multimedia features and fusion for example-based event detection. Mach Vis Appl 25(1):17–32
Article Google Scholar
Natarajan P, Wu S, Luisier F, Zhuang X, Tickoo M (2013) BBN VISER TRECVID 2013 multimedia event detection and multimedia event recounting systems. In: NIST TRECVID workshop
Natarajan P, Wu S, Vitaladevuni S, Zhuang X, Tsakalidis S, Park U, Prasad R (2012) Multimodal feature fusion for robust event detection in web videos. In: CVPR. IEEE, pp 1298–1305
Oh S, McCloskey S, Kim I, Vahdat A, Cannons KJ, Hajimirsadeghi H, Mori G, Perera AA, Pandey M, Corso JJ (2014) Multimedia event detection with multimodal feature fusion and temporal concept localization. Mach Vis Appl 25(1):49–69
Article Google Scholar
Over P, Awad G, Michel M, Fiscus J, Sanders G, Kraaij W, Smeaton AF, Quenot G, Ordelman R (2015) Trecvid 2015—an overview of the goals, tasks, data, evaluation mechanisms and metrics. In: Proc. TRECVID 2015. NIST, USA
Platt J et al (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv Large Mar Classifi 10(3):61–74
Google Scholar
Ravana SD, Moffat A (2009) Score aggregation techniques in retrieval experimentation. In: Proceedings of the Twentieth Australasian Conference on Australasian Database-Volume 92. Australian Computer Society, Inc, pp 57–66
Robertson SE, Jones KS (1976) Relevance weighting of search terms. J Am Soc Inf Sci 27(3):129–146
Article Google Scholar
Strassel S, Morris A, Fiscus JG, Caruso C, Lee H, Over P, Fiumara J, Shaw B, Antonishek B, Michel M (2012) Creating havic: heterogeneous audio visual internet collection. In: LREC. Citeseer, pp 2573–2577
Tamrakar A, Ali S, Yu Q, Liu J, Javed O, Divakaran A, Cheng H, Sawhney H (2012) Evaluation of low-level features and their combinations for complex event detection in open source videos. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 3681–3688
Terrades OR, Valveny E, Tabbone S (2009) Optimal classifier fusion in a non-bayesian probabilistic framework. IEEE Trans Pattern Anal Mach Intell 31(9):1630–1644
Article Google Scholar
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3D convolutional networks. In: Proc. ICCV. IEEE, pp 4489–4497
Tulyakov S, Jaeger S, Govindaraju V, Doermann D (2008) Review of classifier combination methods. In: Machine learning in document analysis and recognition. Springer, pp 361–386
Van Rijsbergen C (1979) Information retrieval
Wang H, Schmid C (2013) Action recognition with improved trajectories. In: Proceedings of the IEEE International Conference on Computer Vision, pp 3551–3558
Wilkins P, Ferguson P, Smeaton AF (2006) Using score distributions for query-time fusion in multimediaretrieval. In: Proceedings of the 8th ACM international workshop on Multimedia information retrieval. ACM, pp 51–60
Xiong Y, Zhu K, Lin D, Tang X (2015) Recognize complex events from static images by fusing deep channels. In: Proc. CVPR, pp 1600–1609
Xu L, Krzyzak A, Suen CY (1992) Methods of combining multiple classifiers and their applications to handwriting recognition. IEEE Trans Syst Man Cybern 22(3):418–435
Article Google Scholar
Yu CT, Salton G (1976) Precision weightingan effective automatic indexing method. J ACM (JACM) 23(1):76–88
Article MathSciNet MATH Google Scholar
Zhang H, Lu Y-J, de Boer M, ter Haar F, Qiu Z, Schutte K, Kraaij W, Ngo C-W (2015) VIREO-TNO @ TRECVID 2015: multimedia event detection. In: Proc. of TRECVID 2015
Zheng L, Wang S, Tian L, He F, Liu Z, Tian Q (2015) Query-adaptive late fusion for image search and person re-identification. In: Computer vision and pattern recognition, vol 1
Zhou B, Lapedriza A, Xiao J, Torralba A, Oliva A (2014) Learning deep features for scene recognition using places database. In: Advances in neural information processing systems, pp 487–495

Download references

Acknowledgements

We would like to thank the TNO Early Research Program Making Sense of Big Data (MSoBD) for financial support. The work described in this paper was supported in part by a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (CityU 120213).

Author information

Authors and Affiliations

TNO, Oude Waalsdorperweg, 63, 2597, AK The Hague, The Netherlands
Maaike H. T. de Boer & Klamer Schutte
Radboud University, Toernooiveld 200, 6525, EC Nijmegen, The Netherlands
Maaike H. T. de Boer
City University, 83 Tat Chee Avenue, Kowloon Tong, Hong Kong
Hao Zhang, Yi-Jie Lu & Chong-Wah Ngo
TNO, Anna van Buerenplein 1, 2595 DA, The Hague, The Netherlands
Wessel Kraaij
Leiden University, Niels Bohrweg 1, 2333 CA, Leiden, The Netherlands
Wessel Kraaij

Authors

Maaike H. T. de Boer
View author publications
You can also search for this author in PubMed Google Scholar
Klamer Schutte
View author publications
You can also search for this author in PubMed Google Scholar
Hao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yi-Jie Lu
View author publications
You can also search for this author in PubMed Google Scholar
Chong-Wah Ngo
View author publications
You can also search for this author in PubMed Google Scholar
Wessel Kraaij
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maaike H. T. de Boer.

Appendix

See appendix 4, 5, 6, and 7.

Table 4 Performance of the late fusion methods for different simulated distributions on a 100Ex case

Full size table

Table 5 Performance of the late fusion methods for simulated distributions on a 10Ex case

Full size table

Table 6 %MAP integrating visual and motion features in MED14Test 100Ex

Full size table

Table 7 %MAP integrating visual and motion features in MED14Test 10Ex

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

de Boer, M.H.T., Schutte, K., Zhang, H. et al. Blind late fusion in multimedia event retrieval. Int J Multimed Info Retr 5, 203–217 (2016). https://doi.org/10.1007/s13735-016-0112-9

Download citation

Received: 19 August 2016
Revised: 17 September 2016
Accepted: 27 September 2016
Published: 08 November 2016
Issue Date: November 2016
DOI: https://doi.org/10.1007/s13735-016-0112-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Blind late fusion in multimedia event retrieval

Abstract

Access this article

Similar content being viewed by others

Survey on Fusion of Audiovisual Information for Multimedia Event Recognition

Resource Constrained Multimedia Event Detection

Evaluating Multimedia Features and Fusion for Example-Based Event Detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Blind late fusion in multimedia event retrieval

Abstract

Access this article

Similar content being viewed by others

Survey on Fusion of Audiovisual Information for Multimedia Event Recognition

Resource Constrained Multimedia Event Detection

Evaluating Multimedia Features and Fusion for Example-Based Event Detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation