skip to main content
10.1145/3077136.3080678acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper

Joint Latent Subspace Learning and Regression for Cross-Modal Retrieval

Published: 07 August 2017 Publication History

Abstract

Cross-modal retrieval has received much attention in recent years. It is a commonly used method to project multi-modality data into a common subspace and then retrieve. However, nearly all existing methods directly adopt the space defined by the binary class label information without learning as the shared subspace for regression. In this paper, we first adopt the spectral regression method to learn the optimal latent space shared by data of all modalities based on the orthogonal constraints. Then we construct a graph model to project the multi-modality data into the latent space. Finally, we combine these two processes together to jointly learn the latent space and regress. We conduct extensive experiments on multiple benchmark datasets and our proposed method outperforms the state-of-the-art approaches.

References

[1]
D. Cai, X. He, and J. Han. Spectral regression for efficient regularized subspace learning. In ICCV, 2007.
[2]
Y. Gong, Q. Ke, M. Isard, and S. Lazebnik. A multi-view embedding space for modeling internet images, tags, and their semantics. IJCV, 106(2):210--233, 2014.
[3]
D. R. Hardoon, S. Szedmak, and J. Shawe-Taylor. Canonical correlation analysis: An overview with application to learning methods. Neural Computation, 16(12):2639--2664, 2004.
[4]
R. He, M. Zhang, L. Wang, Y. Ji, and Q. Yin. Cross-modal subspace learning via pairwise constraints. IEEE TIP, 24(12):5543--5556, 2015.
[5]
C. Hou, F. Nie, X. Li, D. Yi, and Y. Wu. Joint embedding learning and sparse regression: A framework for unsupervised feature selection. IEEE TCYB, 44(6):793--804, 2014.
[6]
S. J. Hwang and K. Grauman. Reading between the lines: Object localization using implicit cues from image tags. IEEE TPAMI, 34(6):1145--1158, 2012.
[7]
M. Kan, S. Shan, H. Zhang, S. Lao, and X. Chen. Multi-view discriminant analysis. IEEE TPAMI, 38(1):188--194, 2016.
[8]
C. Kang, S. Xiang, S. Liao, C. Xu, and C. Pan. Learning consistent feature representation for cross-modal multimedia retrieval. IEEE TMM, 17(3):370--381, 2015.
[9]
J. Liang, Z. Li, D. Cao, R. He, and J. Wang. Self-paced cross-modal subspace matching. In SIGIR, 2016.
[10]
D. Lin and X. Tang. Inter-modality face recognition. In ECCV, 2006.
[11]
D. G. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 60(2):91--110, 2004.
[12]
N. Rasiwasia, J. Costa Pereira, E. Coviello, G. Doyle, G. R. Lanckriet, R. Levy, and N. Vasconcelos. A new approach to cross-modal multimedia retrieval. In ACM MM, 2010.
[13]
R. Rosipal and N. Krämer. Overview and recent advances in partial least squares. In Subspace, Latent Structure and Feature Selection. Springer, 2006.
[14]
J. Rupnik and J. Shawe-Taylor. Multi-view canonical correlation analysis. In Slovenian KDD Conference on Data Mining and Data Warehouses, 2010.
[15]
A. Sharma, A. Kumar, H. Daume, and D. W. Jacobs. Generalized multiview analysis: A discriminative latent space. In CVPR, 2012.
[16]
J. B. Tenenbaum and W. T. Freeman. Separating style and content with bilinear models. Neural Computation, 12(6):1247--1283, 2000.
[17]
K. Wang, R. He, L. Wang, W. Wang, and T. Tan. Joint feature selection and subspace learning for cross-modal retrieval. IEEE TPAMI, 38(10):2010--2023, 2016.
[18]
K. Wang, R. He, W. Wang, L. Wang, and T. Tan. Learning coupled feature spaces for cross-modal matching. In ICCV, 2013.
[19]
K. Wang, Q. Yin, W. Wang, S. Wu, and L. Wang. A comprehensive survey on cross-modal retrieval. ArXiv, 2016.
[20]
Y. Wei, Y. Zhao, C. Lu, S. Wei, L. Liu, Z. Zhu, and S. Yan. Cross-modal retrieval with cnn visual features: A new baseline. IEEE TCYB, 47(2):449--460, 2017.
[21]
Y. Wei, Y. Zhao, Z. Zhu, S. Wei, Y. Xiao, J. Feng, and S. Yan. Modality-dependent cross-media retrieval. ACM TIST, 7(4):57, 2016.
[22]
J. Wu, Z. Lin, and H. Zha. Multi-view common space learning for emotion recognition in the wild. In ICMI, 2016.
[23]
Z. Yu, F. Wu, Y. Yang, Q. Tian, J. Luo, and Y. Zhuang. Discriminative coupled dictionary hashing for fast cross-media retrieval. In SIGIR, 2014.
[24]
J. Zhou, G. Ding, and Y. Guo. Latent semantic sparse hashing for cross-modal similarity search. In SIGIR, 2014.

Cited By

View all
  • (2025)A Multimodal Embedding Transfer Approach for Consistent and Selective Learning Processes in Cross-Modal RetrievalInformation Sciences10.1016/j.ins.2025.121974(121974)Online publication date: Feb-2025
  • (2024)Semi-supervised Prototype Semantic Association Learning for Robust Cross-modal RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657756(872-881)Online publication date: 10-Jul-2024
  • (2024)Cross-Modal Retrieval: A Systematic Review of Methods and Future DirectionsProceedings of the IEEE10.1109/JPROC.2024.3525147112:11(1716-1754)Online publication date: Nov-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval
August 2017
1476 pages
ISBN:9781450350228
DOI:10.1145/3077136
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 August 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cross-modal matching
  2. latent subspace learning
  3. regression

Qualifiers

  • Short-paper

Funding Sources

Conference

SIGIR '17
Sponsor:

Acceptance Rates

SIGIR '17 Paper Acceptance Rate 78 of 362 submissions, 22%;
Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)15
  • Downloads (Last 6 weeks)4
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)A Multimodal Embedding Transfer Approach for Consistent and Selective Learning Processes in Cross-Modal RetrievalInformation Sciences10.1016/j.ins.2025.121974(121974)Online publication date: Feb-2025
  • (2024)Semi-supervised Prototype Semantic Association Learning for Robust Cross-modal RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657756(872-881)Online publication date: 10-Jul-2024
  • (2024)Cross-Modal Retrieval: A Systematic Review of Methods and Future DirectionsProceedings of the IEEE10.1109/JPROC.2024.3525147112:11(1716-1754)Online publication date: Nov-2024
  • (2024)Improving answer quality using image-text coherence on social Q&A sitesDecision Support Systems10.1016/j.dss.2024.114191180:COnline publication date: 9-Jul-2024
  • (2024)Pretrained models for cross-modal retrieval: experiments and improvementsSignal, Image and Video Processing10.1007/s11760-024-03126-z18:5(4915-4923)Online publication date: 6-Apr-2024
  • (2023)Prototype-guided Cross-modal Completion and Alignment for Incomplete Text-based Person Re-identificationProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3613802(5253-5261)Online publication date: 26-Oct-2023
  • (2023)Variational Autoencoder with CCA for Audio–Visual Cross-modal RetrievalACM Transactions on Multimedia Computing, Communications, and Applications10.1145/357565819:3s(1-21)Online publication date: 24-Feb-2023
  • (2023)LCM: A Surprisingly Effective Framework for Supervised Cross-modal RetrievalProceedings of the 6th Joint International Conference on Data Science & Management of Data (10th ACM IKDD CODS and 28th COMAD)10.1145/3570991.3571048(37-46)Online publication date: 4-Jan-2023
  • (2022)C3CMR: Cross-Modality Cross-Instance Contrastive Learning for Cross-Media RetrievalProceedings of the 30th ACM International Conference on Multimedia10.1145/3503161.3548263(4300-4308)Online publication date: 10-Oct-2022
  • (2021)A Cross-Media Retrieval Method Based on Semisupervised Learning and Alternate OptimizationMobile Information Systems10.1155/2021/99476442021Online publication date: 27-Sep-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media