Skip to main content
Log in

Coupled feature selection based semi-supervised modality-dependent cross-modal retrieval

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

With the explosive growth of multimedia data, the information is usually represented in multi-modal version. The cross-modal based applications attracted increasing attention in recent years, and cross-modal retrieval is the popular one of them. In this paper, we propose a semi-supervised modality-dependent cross-modal retrieval method based on coupled feature selection (Semi-CoFe). It is different from most of the previous cross-modal retrieval methods, which usually used only labeled data for training to obtain the projection matrices under the constraint of l2-norm. In details, we propagate the label of cluster centers to unlabeled data via a devised weight matrix and construct the pseudo corresponding heterogeneous data. And then we jointly considered the semantic regression and pair-wised correlation analysis when learning the mapping matrices to keep the semantic consistency and the closeness of pair-wised data. Meanwhile, the l2,1-norm constraint is used for informative and discriminative features selection and noise reduction. In addition, we learn different mapping matrices for different sub-tasks (such as, using image to search text (I2T) and using text to search image (T2I)) to distinguish the semantic information of query data, and the optimal mapping matrices are achieved via an iterative optimization method. The experimental results on three public datasets verify that the proposed method performs better than the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  2. Chang X, Yang Y (2017) Semisupervised feature analysis by mining correlations among multiple tasks. IEEE Trans Neural Netw Learn Syst 28(10):2294–2305

    Article  MathSciNet  Google Scholar 

  3. Chang X, Ma Z, Lin M, Yang Y, Hauptmann A (2017) Feature interaction augmented sparse learning for fast kinect motion detection. IEEE Trans Image Process 26(8):3911–3920

    Article  MathSciNet  MATH  Google Scholar 

  4. Chang X, Ma Z, Yang Y, Zeng Z, Hauptmann AG (2017) Bi-level semantic representation analysis for multimedia event detection. IEEE Trans Cybern 47(5):1180–1197

    Article  Google Scholar 

  5. Chang X, Yu YL, Yang Y, Xing EP (2017) Semantic pooling for complex event analysis in untrimmed videos. IEEE Trans Pattern Anal Mach Intell 39(8):1617–1632

    Article  Google Scholar 

  6. Gong Y, Lazebnik S, Gordo A, Perronnin F (2013) Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans Pattern Anal Mach Intell 35(12):2916–2929

    Article  Google Scholar 

  7. Gong Y, Ke Q, Isard M, Lazebnik S (2014) A multi-view embedding space for modeling internet images, tags, and their semantics. Int J Comput Vis 106 (2):210–233

    Article  Google Scholar 

  8. Hardoon DR, Szedmak S, Shawetaylor J (2004) Canonical correlation analysis: an overview with application to learning methods. Neural Comput 16(12):2639–2664

    Article  MATH  Google Scholar 

  9. He J, Ma B, Wang S, Liu Y, Huang Q (2016) Cross-modal retrieval by real label partial least squares. In: ACM on Multimedia Conference, pp 227–231

  10. Hua Y, Wang S, Liu S, Huang Q, Cai A (2016) Cross-modal correlation learning by adaptive hierarchical semantic aggregation. IEEE Transactions on Multimedia 18(10):190–199

  11. Jia Y, Salzmann M, Darrell T (2011) Learning cross-modality similarity for multinomial data. In: IEEE International Conference on Computer Vision, pp 2407–2414

  12. Kang C, Xiang S, Liao S, Xu C, Pan C (2015) Learning consistent feature representation for cross-modal multimedia retrieval. IEEE Trans Multimed 17 (3):370–381

    Article  Google Scholar 

  13. Katsurai M, Ogawa T, Haseyama M (2014) A cross-modal approach for extracting semantic relationships between concepts using tagged images. IEEE Trans Multimed 16(4):1059–1074

    Article  Google Scholar 

  14. Li Z, Liu J, Tang J, Lu H (2015) Robust structured subspace learning for data representation. IEEE Trans Pattern Anal Mach Intell 37(10):2085–2098

    Article  Google Scholar 

  15. Mao X, Lin B, Cai D, He X, Pei J (2013) Parallel field alignment for cross media retrieval. In: ACM International Conference on Multimedia, pp 897–906

  16. Nie F, Huang H, Cai X, Ding C (2010) Efficient and robust feature selection via joint 2,1 -norms minimization. In: International Conference on Neural Information Processing Systems, pp 1813–1821

  17. Nie F, Wang H, Huang H, Ding C (2013) Early active learning via robust representation and structured sparsity. In: International Joint Conference on Artificial Intelligence, pp 1572–1578

  18. Peng Y, Zhai X, Zhao Y, Huang X (2016) Semi-supervised cross-media feature learning with unified patch graph regularization. IEEE Trans Circ Syst Video Technol 26(3):583–596

    Article  Google Scholar 

  19. Pereira JC, Coviello E, Doyle G, Rasiwasia N, Lanckriet GRG, Levy R, Vasconcelos N (2014) On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Trans Pattern Anal Mach Intell 36(3):521–35

    Article  Google Scholar 

  20. Rasiwasia N, Pereira JC, Coviello E, Doyle G, Lanckriet GRG, Levy R, Vasconcelos N (2010) A new approach to cross-modal multimedia retrieval. In: International Conference on Multimedia, pp 251–260

  21. Sharma A (2012) Generalized multiview analysis: A discriminative latent space. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 2160–2167

  22. Sharma A, Jacobs DW (2011) Bypassing synthesis: Pls for face recognition with pose, low-resolution and sketch. In: Computer Vision and Pattern Recognition, pp 593–600

  23. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. Computer Science

  24. He R, Tan T, Wang L, Zheng W (2012) l2, 1 regularized correntropy for robust feature selection. IEEE Conf Comput Vis Pattern Recognit 157(10):2504–2511

  25. Tenenbaum JB, Freeman WT (2014) Separating style and content with bilinear models. Neural Comput 12(6):1247–1283

    Article  Google Scholar 

  26. Wang J, Kumar S, Chang SF (2012) Semi-supervised hashing for large-scale search. IEEE Trans Pattern Anal Mach Intell 34(12):2393

    Article  Google Scholar 

  27. Wang K, He R, Wang W, Wang L, Tan T (2013) Learning coupled feature spaces for cross-modal matching. In: IEEE International Conference on Computer Vision, pp 2088–2095

  28. Wang K, He R, Wang L, Wang W, Tan T (2016) Joint feature selection and subspace learning for cross-modal retrieval. IEEE Trans Pattern Anal Mach Intell 38(10):2010

    Article  Google Scholar 

  29. Wei Y, Zhao Y, Zhu Z, Wei S, Xiao Y, Feng J, Yan S (2016) Modality-dependent cross-media retrieval. Acm Trans Intell Syst Technol 7(4):57

    Article  Google Scholar 

  30. Wu F, Zhang H, Zhuang Y (2007) Learning semantic correlations for cross-media retrieval. In: IEEE International Conference on Image Processing, pp 1465–1468

  31. Xie L, Pan P, Lu Y (2013) A semantic model for cross-modal and multi-modal retrieval. In: ACM Conference on International Conference on Multimedia Retrieval, pp 175–182

  32. Yan J, Zhang H, Sun J, Wang Q, Guo P, Meng L, Wan W, Dong X (2018) Joint graph regularization based modality-dependent cross-media retrieval. Multimedia Tools and Applications 77(3):3009–3027

    Article  Google Scholar 

  33. Zhai X, Peng Y, Xiao J (2013) Heterogeneous metric learning with joint graph regularization for cross-media retrieval. In: Twenty-Seventh AAAI Conference on Artificial Intelligence, pp 1198–1204

  34. Zhai X, Peng Y, Xiao J (2014) Learning cross-media joint representation with sparse and semisupervised regularization. IEEE Trans Circ Syst Video Technol 24(6):965–978

    Article  Google Scholar 

  35. Zhuang Y, Wang Y, Wu F, Zhang Y, Lu W (2013) Supervised coupled dictionary learning with group structures for multi-modal retrieval. In: AAAI Conference on Artificial Intelligence

  36. Zhang H, Liu Y, Ma Z (2013) Fusing inherent and external knowledge with nonlinear learning for cross-media retrieval. Neurocomputing 119(16):10–16

    Article  Google Scholar 

  37. Zhang H, Cao L, Gao S (2014) A locality correlation preserving support vector machine. Pattern Recogn 47(9):3168–3178

    Article  MATH  Google Scholar 

  38. Zhang L, Ma B, Li G, Huang Q, Tian Q (2016) Pl-ranking: A novel ranking method for cross-modal retrieval. In: ACM on Multimedia Conference, pp 1355–1364

  39. Zhang L, Ma B, Li G, Huang Q, Tian Q (2017) Generalized semi-supervised and structured subspace learning for cross-modal retrieval. IEEE Trans Multimed PP(99):1–1

    Google Scholar 

Download references

Acknowledgements

This work is supported by Natural Science Foundation for Distinguished Young Scholars of Shandong Province (JQ201718), Key Research and Development Foundation of Shandong Province (2016GGX101009), the Natural Science Foundation of China (U1736122, 61603225, 61601268), Shandong Provincial Key Research and Development Plan (2017CXGC1504). And we gratefully acknowledge the support of NVIDIA Corporation with the donation of the TITAN X GPU used for this research.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jiande Sun or Wenbo Wan.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yu, E., Sun, J., Wang, L. et al. Coupled feature selection based semi-supervised modality-dependent cross-modal retrieval. Multimed Tools Appl 78, 28931–28951 (2019). https://doi.org/10.1007/s11042-018-5958-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-5958-9

Keywords

Navigation