Adaptive Learning for Multimodal Fusion in Video Search

Lee, Wen-Yu; Wu, Po-Tun; Hsu, Winston

doi:10.1007/978-3-642-10467-1_58

Wen-Yu Lee²²,
Po-Tun Wu²² &
Winston Hsu²²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5879))

Included in the following conference series:

Pacific-Rim Conference on Multimedia

1393 Accesses

Abstract

Multimodal fusion had been shown prominent in video search for the sheer volume of video data. The state-of-the-art methods address the problem by query-dependent fusion, where modality weights vary across query classes (e.g., object, sports, scenes, people, etc.). However, provided the training queries, most of the prior methods rely on manually pre-defined query classes, ad-hoc query class classification, and heuristically determined fusion weights, which suffer from accuracy issues and are not scalable to large-scale data. Unlike prior methods, we propose an adaptive query learning framework for multimodal fusion. For each new query, we adopt ListNet to adaptively learn the fusion weights from its semantically-related training queries dynamically selected by K-nearest neighbor method. ListNet is efficient for optimizing the performance in search ranking rather than classification. In general, the proposed method has the following advantages: 1) No pre-defined query classes are needed. 2) The multimodal query weights are automatically and adaptively learned without ad-hoc hand-tuning. 3) The query training examples are selected according to the query semantics and require no noisy query classification. Experimenting in large-scale video benchmarks (i.e., TRECVID), we will show that the proposed method is scalable and competitive with prior query-dependent methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Multi-query Video Retrieval

Improving video event retrieval by user feedback

Article Open access 12 May 2017

HORUS: Multimodal Large Language Models Framework for Video Retrieval at VBS 2025

References

Yan, R., et al.: Learning query-class dependent weights in automatic video retrieval. In: ACM Multimedia (2004)
Google Scholar
Chua, T.-S., et al.: TRECVID 2004 search and feature extraction task by NUS PRIS. In: NIST TRECVID Workshop (2004)
Google Scholar
Kennedy, L.S., et al.: Automatic discovery of query-class-dependent models for multimodal search. In: ACM Multimedia (2005)
Google Scholar
Yan, R., Hauptmann, A.G.: Probabilistic latent query analysis for combining multiple retrieval sources. In: ACM SIGIR (2006)
Google Scholar
Xie, L., et al.: Dynamic multimodal fusion in video search. In: IEEE ICME (2007)
Google Scholar
Kennedy, L., et al.: Query-adaptive fusion for multimodal search. IEEE 96(4) (2008)
Google Scholar
Cao, Z., et al.: Learning to rank: from pairwise approach to listwise approach. In: ACM ICML, pp. 129–136 (2007)
Google Scholar
Hauptmann, G., et al.: Video Retrieval Based on Semantic Concepts. IEEE 94(4) (2008)
Google Scholar
Yanagawa, A., et al.: Columbia University’s baseline detectors for 374 LSCOM semantic visual concepts. Columbia University ADVENT Technical Report #222-2006-8 (2007)
Google Scholar
Natsev, A., et al.: Learning the Semantics of Multimedia Queries and Concepts from a Small Number of Examples. In: ACM Multimedia (2005)
Google Scholar
Chu-Carroll, J., et al.: Ibm’s piquant ii in trec 2005. In: NIST TREC (2005)
Google Scholar
Wu, P.T., Yang, Y.-H., Chen, K.-T., Hsu, W.H., Li, T.H., Lee, C.J.: Keyword-based concept search on consumer photos by web-based kernel function. In: ACM Multimedia (2008)
Google Scholar
Dummett, M.: The Borda count and agenda manipulation. In: Social Choice and Welfare. Springer, Heidelberg (1998)
Google Scholar
Herbrich, R., et al.: Support vector learning for ordinal regression. In: IEEE ICANN, pp. 97–102 (1999)
Google Scholar
Freund, Y., et al.: An efficient boosting algorithm for combining preferences. In: ACM ICML, pp. 170–178 (1998)
Google Scholar
Xu, J., et al.: AdaRank - a boosting algorithm for information retrieval. In: ACM SIGIR, pp. 391–398 (2007)
Google Scholar
Burges, C., et al.: Learning to rank using gradient descent. In: ACM ICML, pp. 89–96 (2005)
Google Scholar
Yang, Y., et al.: Video search reranking via online ordinal reranking. In: IEEE ICME, pp. 285–288 (2008)
Google Scholar
Yang, Y., et al.: ContextSeer: Context search and recommendation at query time for shared consumer photos. In: Proc. ACM Multimedia, pp. 199–208 (2008)
Google Scholar
Geng, X., et al.: Query Dependent Ranking Using K-Nearest Neighbor. In: ACM SIGIR, pp. 115–122 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

National Taiwan University, Taiwan
Wen-Yu Lee, Po-Tun Wu & Winston Hsu

Authors

Wen-Yu Lee
View author publications
You can also search for this author in PubMed Google Scholar
Po-Tun Wu
View author publications
You can also search for this author in PubMed Google Scholar
Winston Hsu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Naresuan University, 65000, Phisanulok, Thailand
Paisarn Muneesawang
Microsoft Research Asia, 100109, Beijing, China
Feng Wu
Tokyo Institute of Technology, 226-8503, Yokohama, Japan
Itsuo Kumazawa
Mahanakorn University of Technology, 10530, Bankok, Thailand
Athikom Roeksabutr
Institute of Information Science, Academia Sinica, Taipei, Taiwan
Mark Liao
Chinese University of Hong Kong, Shatin, N.T., Hong Kong,
Xiaoou Tang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lee, WY., Wu, PT., Hsu, W. (2009). Adaptive Learning for Multimodal Fusion in Video Search. In: Muneesawang, P., Wu, F., Kumazawa, I., Roeksabutr, A., Liao, M., Tang, X. (eds) Advances in Multimedia Information Processing - PCM 2009. PCM 2009. Lecture Notes in Computer Science, vol 5879. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10467-1_58

Download citation

DOI: https://doi.org/10.1007/978-3-642-10467-1_58
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-10466-4
Online ISBN: 978-3-642-10467-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics