Skip to main content

Adaptive Learning for Multimodal Fusion in Video Search

  • Conference paper
Book cover Advances in Multimedia Information Processing - PCM 2009 (PCM 2009)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5879))

Included in the following conference series:

  • 1327 Accesses

Abstract

Multimodal fusion had been shown prominent in video search for the sheer volume of video data. The state-of-the-art methods address the problem by query-dependent fusion, where modality weights vary across query classes (e.g., object, sports, scenes, people, etc.). However, provided the training queries, most of the prior methods rely on manually pre-defined query classes, ad-hoc query class classification, and heuristically determined fusion weights, which suffer from accuracy issues and are not scalable to large-scale data. Unlike prior methods, we propose an adaptive query learning framework for multimodal fusion. For each new query, we adopt ListNet to adaptively learn the fusion weights from its semantically-related training queries dynamically selected by K-nearest neighbor method. ListNet is efficient for optimizing the performance in search ranking rather than classification. In general, the proposed method has the following advantages: 1) No pre-defined query classes are needed. 2) The multimodal query weights are automatically and adaptively learned without ad-hoc hand-tuning. 3) The query training examples are selected according to the query semantics and require no noisy query classification. Experimenting in large-scale video benchmarks (i.e., TRECVID), we will show that the proposed method is scalable and competitive with prior query-dependent methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Yan, R., et al.: Learning query-class dependent weights in automatic video retrieval. In: ACM Multimedia (2004)

    Google Scholar 

  2. Chua, T.-S., et al.: TRECVID 2004 search and feature extraction task by NUS PRIS. In: NIST TRECVID Workshop (2004)

    Google Scholar 

  3. Kennedy, L.S., et al.: Automatic discovery of query-class-dependent models for multimodal search. In: ACM Multimedia (2005)

    Google Scholar 

  4. Yan, R., Hauptmann, A.G.: Probabilistic latent query analysis for combining multiple retrieval sources. In: ACM SIGIR (2006)

    Google Scholar 

  5. Xie, L., et al.: Dynamic multimodal fusion in video search. In: IEEE ICME (2007)

    Google Scholar 

  6. Kennedy, L., et al.: Query-adaptive fusion for multimodal search. IEEE 96(4) (2008)

    Google Scholar 

  7. Cao, Z., et al.: Learning to rank: from pairwise approach to listwise approach. In: ACM ICML, pp. 129–136 (2007)

    Google Scholar 

  8. Hauptmann, G., et al.: Video Retrieval Based on Semantic Concepts. IEEE 94(4) (2008)

    Google Scholar 

  9. Yanagawa, A., et al.: Columbia University’s baseline detectors for 374 LSCOM semantic visual concepts. Columbia University ADVENT Technical Report #222-2006-8 (2007)

    Google Scholar 

  10. Natsev, A., et al.: Learning the Semantics of Multimedia Queries and Concepts from a Small Number of Examples. In: ACM Multimedia (2005)

    Google Scholar 

  11. Chu-Carroll, J., et al.: Ibm’s piquant ii in trec 2005. In: NIST TREC (2005)

    Google Scholar 

  12. Wu, P.T., Yang, Y.-H., Chen, K.-T., Hsu, W.H., Li, T.H., Lee, C.J.: Keyword-based concept search on consumer photos by web-based kernel function. In: ACM Multimedia (2008)

    Google Scholar 

  13. Dummett, M.: The Borda count and agenda manipulation. In: Social Choice and Welfare. Springer, Heidelberg (1998)

    Google Scholar 

  14. Herbrich, R., et al.: Support vector learning for ordinal regression. In: IEEE ICANN, pp. 97–102 (1999)

    Google Scholar 

  15. Freund, Y., et al.: An efficient boosting algorithm for combining preferences. In: ACM ICML, pp. 170–178 (1998)

    Google Scholar 

  16. Xu, J., et al.: AdaRank - a boosting algorithm for information retrieval. In: ACM SIGIR, pp. 391–398 (2007)

    Google Scholar 

  17. Burges, C., et al.: Learning to rank using gradient descent. In: ACM ICML, pp. 89–96 (2005)

    Google Scholar 

  18. Yang, Y., et al.: Video search reranking via online ordinal reranking. In: IEEE ICME, pp. 285–288 (2008)

    Google Scholar 

  19. Yang, Y., et al.: ContextSeer: Context search and recommendation at query time for shared consumer photos. In: Proc. ACM Multimedia, pp. 199–208 (2008)

    Google Scholar 

  20. Geng, X., et al.: Query Dependent Ranking Using K-Nearest Neighbor. In: ACM SIGIR, pp. 115–122 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lee, WY., Wu, PT., Hsu, W. (2009). Adaptive Learning for Multimodal Fusion in Video Search. In: Muneesawang, P., Wu, F., Kumazawa, I., Roeksabutr, A., Liao, M., Tang, X. (eds) Advances in Multimedia Information Processing - PCM 2009. PCM 2009. Lecture Notes in Computer Science, vol 5879. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10467-1_58

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-10467-1_58

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-10466-4

  • Online ISBN: 978-3-642-10467-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics