short-paper

Efficient multi-modal retrieval in conceptual space

Authors:

Jun Imura,

Teppei Fujisawa,

Tatsuya Harada,

Yasuo KuniyoshiAuthors Info & Claims

MM '11: Proceedings of the 19th ACM international conference on Multimedia

Pages 1085 - 1088

https://doi.org/10.1145/2072298.2071944

Published: 28 November 2011 Publication History

Get Access

Abstract

In this paper, we propose a new, efficient retrieval system for large-scale multi-modal data including video tracks. With large-scale multi-modal data, the huge data size and various contents cause degradation of efficiency and precision of retrieval results. Recent research on image annotation and retrieval shows that image features based on the Bag-of-Visual Words approach with local descriptors such as SIFT perform surprisingly well with large-scale image datasets. Those powerful descriptors tend to be high-dimensional, imposing a high computational cost for approximate nearest neighbor searching in raw feature space. Our video retrieval method is focused on the correlation between image, sound, and location information recorded simultaneously, and to learn conceptual space describing the contents of the data to realize efficient searching. Experiments show good performance of our retrieval system with low memory usage and temporal complexity.

References

[1]

G. Csurka, C. R. Dance, L. Fan, J. Willamowski, and C. Bray. Visual categorization with bags of keypoints. In ECCV SLCV Workshop, 2004.

Google Scholar

[2]

P. Indyk and R. Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In ACM Symposium on Theory of Computing, 1998.

Digital Library

Google Scholar

[3]

T. S. Jaakkola and D. Haussler. Exploiting generative models in discriminative classifiers. In NIPS, 1998.

Digital Library

Google Scholar

[4]

H. Jégou, M. Douze, and C. Schmid. Product quantization for nearest neighbor search. IEEE PAMI, 33(1):117--128, 2011.

Digital Library

Google Scholar

[5]

H. Jégou, M. Douze, C. Schmid, and P. Pérez. Aggregating local descriptors into a compact image representation. In CVPR, 2010.

Crossref

Google Scholar

[6]

F. Perronnin, Y. Liu, J. Saándnchez, and H. Poirier. Large-scale image retrieval with compressed fisher vectors. In CVPR, 2010.

Crossref

Google Scholar

[7]

N. Rasiwasia, J. Costa Pereira, E. Coviello, G. Doyle, G. R. Lanckriet, R. Levy, and N. Vasconcelos. A new approach to cross-modal multimedia retrieval. In ACM Multimedia, 2010.

Digital Library

Google Scholar

[8]

J. Vía, I. Santamaría, and J. Pérez. Canonical correlation analysis (CCA) algorithms for multiple data sets: Application to blind simo equalization. In EUSIPCO, 2005.

Google Scholar

[9]

Y. Weiss, A. Torralba, and R. Fergus. Spectral hashing. In NIPS, 2008.

Digital Library

Google Scholar

Cited By

View all

Soltanian MBorna K(2024)Video Classification Using Smooth Approximation of Hard-assignment EncodingJournal of Information Processing10.2197/ipsjjip.32.64132(641-651)Online publication date: 2024
https://doi.org/10.2197/ipsjjip.32.641
Soltanian MAmini SGhaemmaghami S(2020)Spatio-Temporal VLAD Encoding of Visual Events Using Temporal Ordering of the Mid-Level Deep SemanticsIEEE Transactions on Multimedia10.1109/TMM.2019.295942622:7(1769-1784)Online publication date: Jul-2020
https://doi.org/10.1109/TMM.2019.2959426
Wu XQiao YWang XTang X(2016)Bridging Music and Image via Cross-Modal Ranking AnalysisIEEE Transactions on Multimedia10.1109/TMM.2016.255772218:7(1305-1318)Online publication date: 1-Jul-2016
https://dl.acm.org/doi/10.1109/TMM.2016.2557722
Show More Cited By

Index Terms

Efficient multi-modal retrieval in conceptual space
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Multimodal Video Retrieval with the 2017 IMOTION System
ICMR '17: Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval

The IMOTION system is a multimodal content-based video search and browsing application offering a rich set of query modes on the basis of a broad range of different features. It is able to scale with the size of the collection due to its underlying ...
Content-based multimedia information retrieval: State of the art and challenges

Extending beyond the boundaries of science, art, and culture, content-based multimedia information retrieval provides new paradigms and methods for searching through the myriad variety of media all over the world. This survey reviews 100+ recent ...
The evolution of visual information retrieval

This paper seeks to provide a brief overview of those developments which have taken the theory and practice of image and video retrieval into the digital age. Drawing on a voluminous literature, the context in which visual information retrieval takes ...

Comments

Information & Contributors

Information

Published In

MM '11: Proceedings of the 19th ACM international conference on Multimedia

November 2011

944 pages

ISBN:9781450306164

DOI:10.1145/2072298

General Chairs:
K. Selçuk Candan
Arizona State University, USA
,
Sethuraman Panchanathan
Arizona State University, USA
,
Balakrishnan Prabhakaran
University of Texas at Dallas, USA
,
Program Chairs:
Hari Sundaram
Arizona State University, USA
,
Wu-Chi Feng
Portland State University, USA
,
Nicu Sebe
University of Trento, Italy

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 November 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

MM '11

Sponsor:

SIGMM

MM '11: ACM Multimedia Conference

November 28 - December 1, 2011

Arizona, Scottsdale, USA

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
191
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)1

Reflects downloads up to 20 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Soltanian MBorna K(2024)Video Classification Using Smooth Approximation of Hard-assignment EncodingJournal of Information Processing10.2197/ipsjjip.32.64132(641-651)Online publication date: 2024
https://doi.org/10.2197/ipsjjip.32.641
Soltanian MAmini SGhaemmaghami S(2020)Spatio-Temporal VLAD Encoding of Visual Events Using Temporal Ordering of the Mid-Level Deep SemanticsIEEE Transactions on Multimedia10.1109/TMM.2019.295942622:7(1769-1784)Online publication date: Jul-2020
https://doi.org/10.1109/TMM.2019.2959426
Wu XQiao YWang XTang X(2016)Bridging Music and Image via Cross-Modal Ranking AnalysisIEEE Transactions on Multimedia10.1109/TMM.2016.255772218:7(1305-1318)Online publication date: 1-Jul-2016
https://dl.acm.org/doi/10.1109/TMM.2016.2557722
Wang SPan PLu YXie L(2015)Improving cross-modal and multi-modal retrieval combining content and semantics similarities with probabilistic modelMultimedia Tools and Applications10.1007/s11042-013-1737-974:6(2009-2032)Online publication date: 1-Mar-2015
https://dl.acm.org/doi/10.1007/s11042-013-1737-9
Li ZLiu JXu CLu H(2013)MLRankPattern Recognition10.1016/j.patcog.2013.03.01646:10(2700-2710)Online publication date: 1-Oct-2013
https://dl.acm.org/doi/10.1016/j.patcog.2013.03.016

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Multimodal Video Retrieval with the 2017 IMOTION System

Content-based multimedia information retrieval: State of the art and challenges

The evolution of visual information retrieval

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations