skip to main content
10.1145/2072298.2071944acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
short-paper

Efficient multi-modal retrieval in conceptual space

Published: 28 November 2011 Publication History

Abstract

In this paper, we propose a new, efficient retrieval system for large-scale multi-modal data including video tracks. With large-scale multi-modal data, the huge data size and various contents cause degradation of efficiency and precision of retrieval results. Recent research on image annotation and retrieval shows that image features based on the Bag-of-Visual Words approach with local descriptors such as SIFT perform surprisingly well with large-scale image datasets. Those powerful descriptors tend to be high-dimensional, imposing a high computational cost for approximate nearest neighbor searching in raw feature space. Our video retrieval method is focused on the correlation between image, sound, and location information recorded simultaneously, and to learn conceptual space describing the contents of the data to realize efficient searching. Experiments show good performance of our retrieval system with low memory usage and temporal complexity.

References

[1]
G. Csurka, C. R. Dance, L. Fan, J. Willamowski, and C. Bray. Visual categorization with bags of keypoints. In ECCV SLCV Workshop, 2004.
[2]
P. Indyk and R. Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In ACM Symposium on Theory of Computing, 1998.
[3]
T. S. Jaakkola and D. Haussler. Exploiting generative models in discriminative classifiers. In NIPS, 1998.
[4]
H. Jégou, M. Douze, and C. Schmid. Product quantization for nearest neighbor search. IEEE PAMI, 33(1):117--128, 2011.
[5]
H. Jégou, M. Douze, C. Schmid, and P. Pérez. Aggregating local descriptors into a compact image representation. In CVPR, 2010.
[6]
F. Perronnin, Y. Liu, J. Saándnchez, and H. Poirier. Large-scale image retrieval with compressed fisher vectors. In CVPR, 2010.
[7]
N. Rasiwasia, J. Costa Pereira, E. Coviello, G. Doyle, G. R. Lanckriet, R. Levy, and N. Vasconcelos. A new approach to cross-modal multimedia retrieval. In ACM Multimedia, 2010.
[8]
J. Vía, I. Santamaría, and J. Pérez. Canonical correlation analysis (CCA) algorithms for multiple data sets: Application to blind simo equalization. In EUSIPCO, 2005.
[9]
Y. Weiss, A. Torralba, and R. Fergus. Spectral hashing. In NIPS, 2008.

Cited By

View all
  • (2024)Video Classification Using Smooth Approximation of Hard-assignment EncodingJournal of Information Processing10.2197/ipsjjip.32.64132(641-651)Online publication date: 2024
  • (2020)Spatio-Temporal VLAD Encoding of Visual Events Using Temporal Ordering of the Mid-Level Deep SemanticsIEEE Transactions on Multimedia10.1109/TMM.2019.295942622:7(1769-1784)Online publication date: Jul-2020
  • (2016)Bridging Music and Image via Cross-Modal Ranking AnalysisIEEE Transactions on Multimedia10.1109/TMM.2016.255772218:7(1305-1318)Online publication date: 1-Jul-2016
  • Show More Cited By

Index Terms

  1. Efficient multi-modal retrieval in conceptual space

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '11: Proceedings of the 19th ACM international conference on Multimedia
    November 2011
    944 pages
    ISBN:9781450306164
    DOI:10.1145/2072298
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 November 2011

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. generalized canonical correlation analysis
    2. product quantization
    3. video retrieval

    Qualifiers

    • Short-paper

    Conference

    MM '11
    Sponsor:
    MM '11: ACM Multimedia Conference
    November 28 - December 1, 2011
    Arizona, Scottsdale, USA

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 20 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Video Classification Using Smooth Approximation of Hard-assignment EncodingJournal of Information Processing10.2197/ipsjjip.32.64132(641-651)Online publication date: 2024
    • (2020)Spatio-Temporal VLAD Encoding of Visual Events Using Temporal Ordering of the Mid-Level Deep SemanticsIEEE Transactions on Multimedia10.1109/TMM.2019.295942622:7(1769-1784)Online publication date: Jul-2020
    • (2016)Bridging Music and Image via Cross-Modal Ranking AnalysisIEEE Transactions on Multimedia10.1109/TMM.2016.255772218:7(1305-1318)Online publication date: 1-Jul-2016
    • (2015)Improving cross-modal and multi-modal retrieval combining content and semantics similarities with probabilistic modelMultimedia Tools and Applications10.1007/s11042-013-1737-974:6(2009-2032)Online publication date: 1-Mar-2015
    • (2013)MLRankPattern Recognition10.1016/j.patcog.2013.03.01646:10(2700-2710)Online publication date: 1-Oct-2013

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media