short-paper

Joint Latent Subspace Learning and Regression for Cross-Modal Retrieval

Authors:

Hongbin ZhaAuthors Info & Claims

SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 917 - 920

https://doi.org/10.1145/3077136.3080678

Published: 07 August 2017 Publication History

Abstract

Cross-modal retrieval has received much attention in recent years. It is a commonly used method to project multi-modality data into a common subspace and then retrieve. However, nearly all existing methods directly adopt the space defined by the binary class label information without learning as the shared subspace for regression. In this paper, we first adopt the spectral regression method to learn the optimal latent space shared by data of all modalities based on the orthogonal constraints. Then we construct a graph model to project the multi-modality data into the latent space. Finally, we combine these two processes together to jointly learn the latent space and regress. We conduct extensive experiments on multiple benchmark datasets and our proposed method outperforms the state-of-the-art approaches.

References

[1]

D. Cai, X. He, and J. Han. Spectral regression for efficient regularized subspace learning. In ICCV, 2007.

[2]

Y. Gong, Q. Ke, M. Isard, and S. Lazebnik. A multi-view embedding space for modeling internet images, tags, and their semantics. IJCV, 106(2):210--233, 2014.

Digital Library

[3]

D. R. Hardoon, S. Szedmak, and J. Shawe-Taylor. Canonical correlation analysis: An overview with application to learning methods. Neural Computation, 16(12):2639--2664, 2004.

Digital Library

[4]

R. He, M. Zhang, L. Wang, Y. Ji, and Q. Yin. Cross-modal subspace learning via pairwise constraints. IEEE TIP, 24(12):5543--5556, 2015.

Digital Library

[5]

C. Hou, F. Nie, X. Li, D. Yi, and Y. Wu. Joint embedding learning and sparse regression: A framework for unsupervised feature selection. IEEE TCYB, 44(6):793--804, 2014.

[6]

S. J. Hwang and K. Grauman. Reading between the lines: Object localization using implicit cues from image tags. IEEE TPAMI, 34(6):1145--1158, 2012.

Digital Library

[7]

M. Kan, S. Shan, H. Zhang, S. Lao, and X. Chen. Multi-view discriminant analysis. IEEE TPAMI, 38(1):188--194, 2016.

Digital Library

[8]

C. Kang, S. Xiang, S. Liao, C. Xu, and C. Pan. Learning consistent feature representation for cross-modal multimedia retrieval. IEEE TMM, 17(3):370--381, 2015.

Digital Library

[9]

J. Liang, Z. Li, D. Cao, R. He, and J. Wang. Self-paced cross-modal subspace matching. In SIGIR, 2016.

Digital Library

[10]

D. Lin and X. Tang. Inter-modality face recognition. In ECCV, 2006.

Digital Library

[11]

D. G. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 60(2):91--110, 2004.

Digital Library

[12]

N. Rasiwasia, J. Costa Pereira, E. Coviello, G. Doyle, G. R. Lanckriet, R. Levy, and N. Vasconcelos. A new approach to cross-modal multimedia retrieval. In ACM MM, 2010.

Digital Library

[13]

R. Rosipal and N. Krämer. Overview and recent advances in partial least squares. In Subspace, Latent Structure and Feature Selection. Springer, 2006.

Digital Library

[14]

J. Rupnik and J. Shawe-Taylor. Multi-view canonical correlation analysis. In Slovenian KDD Conference on Data Mining and Data Warehouses, 2010.

[15]

A. Sharma, A. Kumar, H. Daume, and D. W. Jacobs. Generalized multiview analysis: A discriminative latent space. In CVPR, 2012.

[16]

J. B. Tenenbaum and W. T. Freeman. Separating style and content with bilinear models. Neural Computation, 12(6):1247--1283, 2000.

Digital Library

[17]

K. Wang, R. He, L. Wang, W. Wang, and T. Tan. Joint feature selection and subspace learning for cross-modal retrieval. IEEE TPAMI, 38(10):2010--2023, 2016.

Digital Library

[18]

K. Wang, R. He, W. Wang, L. Wang, and T. Tan. Learning coupled feature spaces for cross-modal matching. In ICCV, 2013.

Digital Library

[19]

K. Wang, Q. Yin, W. Wang, S. Wu, and L. Wang. A comprehensive survey on cross-modal retrieval. ArXiv, 2016.

[20]

Y. Wei, Y. Zhao, C. Lu, S. Wei, L. Liu, Z. Zhu, and S. Yan. Cross-modal retrieval with cnn visual features: A new baseline. IEEE TCYB, 47(2):449--460, 2017.

[21]

Y. Wei, Y. Zhao, Z. Zhu, S. Wei, Y. Xiao, J. Feng, and S. Yan. Modality-dependent cross-media retrieval. ACM TIST, 7(4):57, 2016.

Digital Library

[22]

J. Wu, Z. Lin, and H. Zha. Multi-view common space learning for emotion recognition in the wild. In ICMI, 2016.

Digital Library

[23]

Z. Yu, F. Wu, Y. Yang, Q. Tian, J. Luo, and Y. Zhuang. Discriminative coupled dictionary hashing for fast cross-media retrieval. In SIGIR, 2014.

Digital Library

[24]

J. Zhou, G. Ding, and Y. Guo. Latent semantic sparse hashing for cross-modal similarity search. In SIGIR, 2014.

Digital Library

Cited By

Zeng ZHe SZhang YMao W(2025)A Multimodal Embedding Transfer Approach for Consistent and Selective Learning Processes in Cross-Modal RetrievalInformation Sciences10.1016/j.ins.2025.121974(121974)Online publication date: Feb-2025
https://doi.org/10.1016/j.ins.2025.121974
Wang JGong TYan YHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Semi-supervised Prototype Semantic Association Learning for Robust Cross-modal RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657756(872-881)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657756
Wang TLi FZhu LLi JZhang ZShen H(2024)Cross-Modal Retrieval: A Systematic Review of Methods and Future DirectionsProceedings of the IEEE10.1109/JPROC.2024.3525147112:11(1716-1754)Online publication date: Nov-2024
https://doi.org/10.1109/JPROC.2024.3525147
Show More Cited By

Index Terms

Joint Latent Subspace Learning and Regression for Cross-Modal Retrieval
1. Information systems
  1. Information retrieval
    1. Document representation
      1. Content analysis and feature selection
    2. Specialized information retrieval
      1. Multimedia and multimodal retrieval
        Image search

Recommendations

Adversarial Cross-Modal Retrieval
MM '17: Proceedings of the 25th ACM international conference on Multimedia

Cross-modal retrieval aims to enable flexible retrieval experience across different modalities (e.g., texts vs. images). The core of cross-modal retrieval research is to learn a common subspace where the items of different modalities can be directly ...
Joint Feature Selection and Subspace Learning for Cross-Modal Retrieval

Cross-modal retrieval has recently drawn much attention due to the widespread existence of multimodal data. It takes one type of data as the query to retrieve relevant data objects of another type, and generally involves two basic problems: the measure ...
Joint Dictionary Learning and Semantic Constrained Latent Subspace Projection for Cross-Modal Retrieval
CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge Management

With the increasing of multi-modal data on the internet, cross-modal retrieval has received a lot of attention in recent years. It aims to use one type of data as query and retrieve results of another type. For different modality data, how to reduce ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

August 2017

1476 pages

ISBN:9781450350228

DOI:10.1145/3077136

General Chairs:
Noriko Kando
National Institute of Informatics
,
Tetsuya Sakai
Waseda University
,
Hideo Joho
University of Tsukuba
,
Program Chairs:
Hang Li
Huawei Noah's Ark Lab
,
Arjen P. de Vries
Radboud University
,
Ryen W. White
Microsoft Cortana

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 August 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

National Natural Science Foundation of China
National Basic Research Program of China(973 Program)
Qualcomm
Beijing Municipal Natural Science Foundation

Conference

SIGIR '17

Sponsor:

SIGIR

SIGIR '17: The 40th International ACM SIGIR conference on research and development in Information Retrieval

August 7 - 11, 2017

Tokyo, Shinjuku, Japan

Acceptance Rates

SIGIR '17 Paper Acceptance Rate 78 of 362 submissions, 22%;

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

27
Total Citations
View Citations
507
Total Downloads

Downloads (Last 12 months)15
Downloads (Last 6 weeks)4

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zeng ZHe SZhang YMao W(2025)A Multimodal Embedding Transfer Approach for Consistent and Selective Learning Processes in Cross-Modal RetrievalInformation Sciences10.1016/j.ins.2025.121974(121974)Online publication date: Feb-2025
https://doi.org/10.1016/j.ins.2025.121974
Wang JGong TYan YHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Semi-supervised Prototype Semantic Association Learning for Robust Cross-modal RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657756(872-881)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657756
Wang TLi FZhu LLi JZhang ZShen H(2024)Cross-Modal Retrieval: A Systematic Review of Methods and Future DirectionsProceedings of the IEEE10.1109/JPROC.2024.3525147112:11(1716-1754)Online publication date: Nov-2024
https://doi.org/10.1109/JPROC.2024.3525147
Song YXu XDutta KLi Z(2024)Improving answer quality using image-text coherence on social Q&A sitesDecision Support Systems10.1016/j.dss.2024.114191180:COnline publication date: 9-Jul-2024
https://dl.acm.org/doi/10.1016/j.dss.2024.114191
Zhou KHassan FGan K(2024)Pretrained models for cross-modal retrieval: experiments and improvementsSignal, Image and Video Processing10.1007/s11760-024-03126-z18:5(4915-4923)Online publication date: 6-Apr-2024
https://doi.org/10.1007/s11760-024-03126-z
Gong TDu GWang JDing YZhang LEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)Prototype-guided Cross-modal Completion and Alignment for Incomplete Text-based Person Re-identificationProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3613802(5253-5261)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3613802
Zhang JYu YTang SWu JLi W(2023)Variational Autoencoder with CCA for Audio–Visual Cross-modal RetrievalACM Transactions on Multimedia Computing, Communications, and Applications10.1145/357565819:3s(1-21)Online publication date: 24-Feb-2023
https://dl.acm.org/doi/10.1145/3575658
Gohil DBithel SBedathur S(2023)LCM: A Surprisingly Effective Framework for Supervised Cross-modal RetrievalProceedings of the 6th Joint International Conference on Data Science & Management of Data (10th ACM IKDD CODS and 28th COMAD)10.1145/3570991.3571048(37-46)Online publication date: 4-Jan-2023
https://dl.acm.org/doi/10.1145/3570991.3571048
Wang JGong TZeng ZSun CYan YMagalhães Jdel Bimbo ASatoh SSebe NAlameda-Pineda XJin QOria VToni L(2022)C3CMR: Cross-Modality Cross-Instance Contrastive Learning for Cross-Media RetrievalProceedings of the 30th ACM International Conference on Multimedia10.1145/3503161.3548263(4300-4308)Online publication date: 10-Oct-2022
https://dl.acm.org/doi/10.1145/3503161.3548263
Li JZhu WYang YZheng X(2021)A Cross-Media Retrieval Method Based on Semisupervised Learning and Alternate OptimizationMobile Information Systems10.1155/2021/99476442021Online publication date: 27-Sep-2021
https://dl.acm.org/doi/10.1155/2021/9947644
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten