skip to main content
10.1145/2733373.2806346acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
short-paper

Semi-supervised Coupled Dictionary Learning for Cross-modal Retrieval in Internet Images and Texts

Published: 13 October 2015 Publication History

Abstract

Nowadays massive amount of images and texts has been emerging on the Internet, arousing the demand of effective cross-modal retrieval such as text-to-image search and image-to-text search. To eliminate the heterogeneity between the modalities of images and texts, the existing subspace learning methods try to learn a common latent subspace under which cross-modal matching can be performed. However, these methods usually require fully paired samples (images with corresponding texts) and also ignore the class label information along with the paired samples. This may inhibit these methods from learning an effective subspace since the correlations between two modalities are implicitly incorporated. Indeed, the class label information can reduce the semantic gap between different modalities and explicitly guide the subspace learning procedure. In addition, the large quantities of unpaired samples (images or texts) may provide useful side information to enrich the representations from learned subspace. Thus, in this paper we propose a novel model for cross-modal retrieval problem. It consists of 1) a semi-supervised coupled dictionary learning step to generate homogeneously sparse representations for different modalities based on both paired and unpaired samples; 2) a coupled feature mapping step to project the sparse representations of different modalities into a common subspace defined by class label information to perform cross-modal matching. Experiments on a large scale web image dataset MIRFlickr-1M with both fully paired and unpaired settings show the effectiveness of the proposed model on the cross-modal retrieval task.

References

[1]
L. Ballan, T. Uricchio, L. Seidenari, and A. Bimbo. A cross media model for automatic image annotation. In ICMR, 2014.
[2]
Y. Gong, Q. Ke, M. Isard, and S. Lazebnik. A multi-view embedding space for modeling internet images, tags, and their semantics. IJCV, 106:210--233, 2014.
[3]
D. R. Hardoon, S. R. Szedmak, and J. R. Shawe-taylor. Canonical correlation analysis: An overview with application to learning methods. Neural Comput., 16(12):2639--2664, 2004.
[4]
D. Huang and Y. Wang. Coupled dictionary and feature space learning with applications to cross-domain image synthesis and recognition. In ICCV, pages 2496--2503, 2013.
[5]
C. Kang, S. Xiang, S. Liao, C. Xu, and C. Pan. Learning consistent feature representation for cross-modal multimedia retrieval. TMM, 17(3):370--381, 2015.
[6]
N. Rasiwasia, J. Costa Pereira, E. Coviello, G. Doyle, G. Lanckriet, R. Levy, and N. Vasconcelos. A new approach to cross-modal multimedia retrieval. In ACM MM, 2010.
[7]
N. Srivastava and R. Salakhutdinov. Multimodal learning with deep boltzmann machines. JMLR, 15:2949--2980, 2014.
[8]
K. Wang, R. He, W. Wang, L. Wang, and T. Tan. Learning coupled feature spaces for cross-modal matching. In ICCV, pages 2088--2095, 2013.
[9]
Y. Yang, Y. Yang, and H. T. Shen. Effective transfer tagging from image to video. ACM Trans. Multimedia Comput. Commun. Appl., 9(2):1--20, 2013.
[10]
Y. Yang, Z.-J. Zha, Y. Gao, X. Zhu, and T.-S. Chua. Exploiting web images for semantic video indexing via robust sample-specific loss. TMM, 17(2):246--256, 2015.

Cited By

View all
  • (2023)Deep collaborative learning with class-rebalancing for semi-supervised change detection in SAR imagesKnowledge-Based Systems10.1016/j.knosys.2023.110281264(110281)Online publication date: Mar-2023
  • (2022)Deep Multi-Semantic Fusion-Based Cross-Modal HashingMathematics10.3390/math1003043010:3(430)Online publication date: 29-Jan-2022
  • (2022)A Survey of Data Representation for Multi-Modality Event Detection and EvolutionApplied Sciences10.3390/app1204220412:4(2204)Online publication date: 20-Feb-2022
  • Show More Cited By

Index Terms

  1. Semi-supervised Coupled Dictionary Learning for Cross-modal Retrieval in Internet Images and Texts

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '15: Proceedings of the 23rd ACM international conference on Multimedia
    October 2015
    1402 pages
    ISBN:9781450334594
    DOI:10.1145/2733373
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 October 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. coupled dictionary learning
    2. cross-modal retrieval
    3. semi-supervised learning

    Qualifiers

    • Short-paper

    Funding Sources

    • Grant-in-Aid for Scientific Research (B)

    Conference

    MM '15
    Sponsor:
    MM '15: ACM Multimedia Conference
    October 26 - 30, 2015
    Brisbane, Australia

    Acceptance Rates

    MM '15 Paper Acceptance Rate 56 of 252 submissions, 22%;
    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 28 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Deep collaborative learning with class-rebalancing for semi-supervised change detection in SAR imagesKnowledge-Based Systems10.1016/j.knosys.2023.110281264(110281)Online publication date: Mar-2023
    • (2022)Deep Multi-Semantic Fusion-Based Cross-Modal HashingMathematics10.3390/math1003043010:3(430)Online publication date: 29-Jan-2022
    • (2022)A Survey of Data Representation for Multi-Modality Event Detection and EvolutionApplied Sciences10.3390/app1204220412:4(2204)Online publication date: 20-Feb-2022
    • (2022)Hadamard-Coded Supervised Discrete Hashing on Complex and Quaternion Domain2022 IEEE International Conference on Image Processing (ICIP)10.1109/ICIP46576.2022.9897327(3883-3887)Online publication date: 16-Oct-2022
    • (2020)Label Prediction Framework For Semi-Supervised Cross-Modal Retrieval2020 IEEE International Conference on Image Processing (ICIP)10.1109/ICIP40778.2020.9190722(2311-2315)Online publication date: Oct-2020
    • (2019)STCMH with minimal semantic lossIET Image Processing10.1049/iet-ipr.2018.503413:13(2529-2537)Online publication date: 11-Oct-2019
    • (2019)Deep adversarial metric learning for cross-modal retrievalWorld Wide Web10.1007/s11280-018-0541-x22:2(657-672)Online publication date: 1-Mar-2019
    • (2018)Joint Dictionary Learning and Semantic Constrained Latent Subspace Projection for Cross-Modal RetrievalProceedings of the 27th ACM International Conference on Information and Knowledge Management10.1145/3269206.3269296(1663-1666)Online publication date: 17-Oct-2018
    • (2018)Domain separation network for cross-modal retrievalProceedings of the 10th International Conference on Internet Multimedia Computing and Service10.1145/3240876.3240878(1-5)Online publication date: 17-Aug-2018
    • (2018)Dictionary Learning based Supervised Discrete Hashing for Cross-Media RetrievalProceedings of the 2018 ACM on International Conference on Multimedia Retrieval10.1145/3206025.3206045(222-230)Online publication date: 5-Jun-2018
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media