Abstract
Multimodal fusion had been shown prominent in video search for the sheer volume of video data. The state-of-the-art methods address the problem by query-dependent fusion, where modality weights vary across query classes (e.g., object, sports, scenes, people, etc.). However, provided the training queries, most of the prior methods rely on manually pre-defined query classes, ad-hoc query class classification, and heuristically determined fusion weights, which suffer from accuracy issues and are not scalable to large-scale data. Unlike prior methods, we propose an adaptive query learning framework for multimodal fusion. For each new query, we adopt ListNet to adaptively learn the fusion weights from its semantically-related training queries dynamically selected by K-nearest neighbor method. ListNet is efficient for optimizing the performance in search ranking rather than classification. In general, the proposed method has the following advantages: 1) No pre-defined query classes are needed. 2) The multimodal query weights are automatically and adaptively learned without ad-hoc hand-tuning. 3) The query training examples are selected according to the query semantics and require no noisy query classification. Experimenting in large-scale video benchmarks (i.e., TRECVID), we will show that the proposed method is scalable and competitive with prior query-dependent methods.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Yan, R., et al.: Learning query-class dependent weights in automatic video retrieval. In: ACM Multimedia (2004)
Chua, T.-S., et al.: TRECVID 2004 search and feature extraction task by NUS PRIS. In: NIST TRECVID Workshop (2004)
Kennedy, L.S., et al.: Automatic discovery of query-class-dependent models for multimodal search. In: ACM Multimedia (2005)
Yan, R., Hauptmann, A.G.: Probabilistic latent query analysis for combining multiple retrieval sources. In: ACM SIGIR (2006)
Xie, L., et al.: Dynamic multimodal fusion in video search. In: IEEE ICME (2007)
Kennedy, L., et al.: Query-adaptive fusion for multimodal search. IEEE 96(4) (2008)
Cao, Z., et al.: Learning to rank: from pairwise approach to listwise approach. In: ACM ICML, pp. 129–136 (2007)
Hauptmann, G., et al.: Video Retrieval Based on Semantic Concepts. IEEE 94(4) (2008)
Yanagawa, A., et al.: Columbia University’s baseline detectors for 374 LSCOM semantic visual concepts. Columbia University ADVENT Technical Report #222-2006-8 (2007)
Natsev, A., et al.: Learning the Semantics of Multimedia Queries and Concepts from a Small Number of Examples. In: ACM Multimedia (2005)
Chu-Carroll, J., et al.: Ibm’s piquant ii in trec 2005. In: NIST TREC (2005)
Wu, P.T., Yang, Y.-H., Chen, K.-T., Hsu, W.H., Li, T.H., Lee, C.J.: Keyword-based concept search on consumer photos by web-based kernel function. In: ACM Multimedia (2008)
Dummett, M.: The Borda count and agenda manipulation. In: Social Choice and Welfare. Springer, Heidelberg (1998)
Herbrich, R., et al.: Support vector learning for ordinal regression. In: IEEE ICANN, pp. 97–102 (1999)
Freund, Y., et al.: An efficient boosting algorithm for combining preferences. In: ACM ICML, pp. 170–178 (1998)
Xu, J., et al.: AdaRank - a boosting algorithm for information retrieval. In: ACM SIGIR, pp. 391–398 (2007)
Burges, C., et al.: Learning to rank using gradient descent. In: ACM ICML, pp. 89–96 (2005)
Yang, Y., et al.: Video search reranking via online ordinal reranking. In: IEEE ICME, pp. 285–288 (2008)
Yang, Y., et al.: ContextSeer: Context search and recommendation at query time for shared consumer photos. In: Proc. ACM Multimedia, pp. 199–208 (2008)
Geng, X., et al.: Query Dependent Ranking Using K-Nearest Neighbor. In: ACM SIGIR, pp. 115–122 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lee, WY., Wu, PT., Hsu, W. (2009). Adaptive Learning for Multimodal Fusion in Video Search. In: Muneesawang, P., Wu, F., Kumazawa, I., Roeksabutr, A., Liao, M., Tang, X. (eds) Advances in Multimedia Information Processing - PCM 2009. PCM 2009. Lecture Notes in Computer Science, vol 5879. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10467-1_58
Download citation
DOI: https://doi.org/10.1007/978-3-642-10467-1_58
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-10466-4
Online ISBN: 978-3-642-10467-1
eBook Packages: Computer ScienceComputer Science (R0)