Skip to main content
Log in

Efficient cross-modal retrieval via flexible supervised collective matrix factorization hashing

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Cross-modal retrieval has recently drawn much attention in multimedia analysis, and it is still a challenging topic mainly attributes to its heterogeneous nature. In this paper, we propose a flexible supervised collective matrix factorization hashing (FS-CMFH) to efficient cross-modal retrieval. First, we exploit a flexible collective matrix factorization framework to jointly learn the individual latent space of similar semantic with respected to each modality. Meanwhile, the label consistency across different modalities is simultaneously exploited to preserve both intra-modal and inter-modal semantics within these similar latent semantic spaces. Accordingly, these two ingredients are formulated as a joint graph regularization term in an overall objective function, through which the similar hash codes of different modalities in an instance can be discriminatively obtained to flexibly characterize such instance. As a result, these derived hash codes incorporating higher discrimination power are able to improve the cross-modal searching accuracy significantly. The extensive experiments tested on three popular benchmark datasets show that the proposed approach performs favorably compared to the state-of-the-art competing approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Bronstein MM, Bronstein AM, Michel F, Paragios N (2010) Data fusion through cross-modality metric learning using similarity-sensitive hashing. In: Proc. IEEE Conference on computer vision and pattern recognition, pp 3594–3601

  2. Chua TS, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) Nus-wide: a real-world web image database from national university of singapore. In: Proc. ACM International conference on image and video retrieval, pp 48:1–48:9

  3. Ding G, Guo Y, Zhou J (2014) Collective matrix factorization hashing for multimodal data. In: Proc. IEEE Conference on computer vision and pattern recognition, pp 2083–2090

  4. Ding G, Guo Y, Zhou J, Yue G (2016) Large-scale cross-modality search via collective matrix factorization hashing. IEEE Trans Image Process 25(11):5427–5440

    Article  MathSciNet  Google Scholar 

  5. Gong Y, Lazebnik S (2013) Iterative quantization: a procrustean approach to learning binary codes. In: Proc. IEEE Conference on computer vision and pattern recognition, pp 817–824

  6. Gong Y, Ke Q, Isard M, Lazebnik S (2014) A multi-view embedding space for modeling internet images, tags, and their semantics. Int J Comput Vis 106(2):210–233

    Article  Google Scholar 

  7. Hardoon D R, Szedmak S, Shawe-Taylor J (2004) Canonical correlation analysis: an overview with application to learning methods. Neural Comput 16(12):2639–2664

    Article  Google Scholar 

  8. Huiskes MJ, Lew MS (2008) The mir flickr retrieval evaluation. In: Proc. ACM International conference on multimedia information retrieval, pp 39–43

  9. Kim TK, Kittler J, Cipolla R (2007) Discriminative learning and recognition of image set classes using canonical correlations. IEEE Trans Pattern Anal Mach Intell 29(6):1005–1018

    Article  Google Scholar 

  10. Kim S, Kang Y, Choi S (2012) Sequential spectral learning to hash with multiple representations. In: Proc. European Conference on computer vision, pp 538–551

  11. Lee SG, Vu QP (2011) Simultaneous solutions of sylvester equations and idempotent matrices separating the joint spectrum. Linear Algebra Appl 435 (9):2097–2109

    Article  MathSciNet  Google Scholar 

  12. Li A, Shan S, Chen X, Gao W (2009) Maximizing intra-individual correlations for face recognition across pose differences. In: Proc. IEEE Conference on computer vision and pattern recognition, pp 605–611

  13. Lin Z, Ding G, Hu M, Wang J (2015) Semantics-preserving hashing for cross-view retrieval. In: Proc. IEEE Conference on computer vision and pattern recognition, pp 3864–3872

  14. Pauleve L, Jegou H, Amsaleg L (2010) Locality sensitive hashing: a comparison of hash function types and querying mechanisms. Pattern Recogn Lett 31(11):1348–1358

    Article  Google Scholar 

  15. Rasiwasia N, Costa Pereira J, Coviello E, Doyle G, Lanckriet GR, Levy R, Vasconcelos N (2010) A new approach to cross-modal multimedia retrieval. In: Proc. ACM International conference on multimedia, pp 251–260

  16. Sharma A, Jacobs DW (2011) Bypassing synthesis: Pls for face recognition with pose, low-resolution and sketch. In: Proc. IEEE Conference on computer vision and pattern recognition, pp 593–600

  17. Sharma A, Kumar A, Daume H, Jacobs DW (2012) Generalized multiview analysis: a discriminative latent space. In: IEEE Conference on computer vision and pattern recognition, pp 2160–2167

  18. Singh AP, Gordon GJ (2008) Relational learning via collective matrix factorization. In: Proc. ACM SIGKDD International conference on knowledge discovery and data mining, pp 650–658

  19. Song J, Yang Y, Yang Y, Huang Z, Shen HT (2013) Inter-media hashing for large-scale retrieval from heterogeneous data sources. In: Proc. ACM SIGMOD International conference on management of data, pp 785–796

  20. Tang J, Wang K, Shao L (2016) Supervised matrix factorization hashing for cross-modal retrieval. IEEE Trans Image Process 25(7):3157–3166

    Article  MathSciNet  Google Scholar 

  21. Weiss Y, Torralba A, Fergus R (2008) Spectral hashing. In: Proc. Neural Information processing systems, pp 1753–1760

  22. Wu B, Yang Q, Zheng WS, Wang Y, Wang J (2015) Quantized correlation hashing for fast cross-modal search. In: Proc. International Joint conference on artificial intelligence, pp 3946–3952

  23. Xie L, Zhu L, Chen G (2016) Unsupervised multi-graph cross-modal hashing for large-scale multimedia retrieval. Multimed Tools Appl 75(15):9185–9204

    Article  Google Scholar 

  24. Xu C, Tao D, Xu C (2015) Multi-view intact space learning. IEEE Trans Pattern Anal Mach Intell 37(12):2531–2544

    Article  Google Scholar 

  25. You X, Li Q, Tao D, Ou W, Gong M (2014) Local metric learning for exemplar-based object detection. IEEE Trans Circ Syst Vid Technol 24(8):1265–1276

    Article  Google Scholar 

  26. You X, Ou W, Chen CLP, Li Q, Zhu Z, Tang Y (2015) Robust nonnegative patch alignment for dimensionality reduction. IEEE Trans Neural Netw Learn Syst 26(11):2760–2774

    Article  MathSciNet  Google Scholar 

  27. Yu Z, Wu F, Yang Y, Tian Q, Luo J, Zhuang Y (2014) Discriminative coupled dictionary hashing for fast cross-media retrieval. In: Proc. ACM SIGIR Conference on research and development in information retrieval, pp 395–404

  28. Zhang D, Li WJ (2014) Large-scale supervised multimodal hashing with semantic correlation maximization. In: Proc. AAAI Conference on artificial intelligence, pp 2177–2183

  29. Zhang D, Wang F, Si L (2011) Composite hashing with multiple information sources. In: Proc. ACM SIGIR Conference on research and development in information retrieval, pp 225–234

  30. Zhen Y, Yeung DY (2012) Co-regularized hashing for multimodal data. In: Proc. Advances in neural information processing systems, vol 2, pp 1385–1393

  31. Zhou J, Ding G, Guo Y (2014) Latent semantic sparse hashing for cross-modal similarity search. In: Proc. International ACM SIGIR conference on research and development in information retrieval, pp 415–424

Download references

Acknowledgments

The work was supported by the National Science Foundation of China under Grants 61673185, 61572205 and 61673186, National Science Foundation of Fujian Province (No. 2017J01112), Promotion Program for Young and Middle-aged Teacher in Science and Technology Research (No. ZQN-PY309).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xin Liu.

Appendix

Appendix

In this section, we show the equivalent derivations for (12), (13) and (14), respectively. Let \(\mathbf {X}=\{\mathbf {x}_{i}\}_{i = 1}^{n}\in \mathbb {R}^{d\times n}\), \(\mathbf {Y}=\{\mathbf {y}_{i}\}_{i = 1}^{n}\in \mathbb {R}^{d\times n}\), and \(\mathbf {w}(i,j)\) denotes the similarity measurement between \(\mathbf {x}_{i}\) and xj, we first construct a function \(\boldsymbol {{\Phi }}\) of following form:

$$\begin{array}{@{}rcl@{}} \boldsymbol{{\Phi}}&{=}& \frac{1}{2}{\sum\limits_{i = 1}^{n} {\sum\limits_{j = 1}^{m} {{\mathbf{w}_{ij}}\|{\mathbf{x}_{i}}{-}{\mathbf{y}_{j}}\|}^{2} }}\\ &=& \frac{1}{2}\left( \sum\limits_{i = 1}^{n} {\sum\limits_{j = 1}^{m} {{\mathbf{w}_{ij}}\left( \mathbf{x}_{i}^{\mathrm{T}{\mathbf{x}_{i}}} - 2\mathbf{x}_{i}^{\mathrm{T}{\mathbf{y}_{j}}} + \mathbf{y}_{j}^{\mathrm{T}{\mathbf{y}_{j}}}\right)} } \right)\\ &=& \frac{1}{2}\left( \sum\limits_{i = 1}^{n} {\sum\limits_{j = 1}^{m} {{\mathbf{w}_{ij}}\mathbf{x}_{i}^{\mathrm{T}{\mathbf{x}_{i}}{-}}\sum\limits_{i = 1}^{n} {\sum\limits_{j = 1}^{m} {2{\mathbf{w}_{ij}}\mathbf{x}_{i}^{\mathrm{T}{\mathbf{y}_{j}}}} }{+}\sum\limits_{i = 1}^{n} {\sum\limits_{j = 1}^{m} {{\mathbf{w}_{ij}}\mathbf{y}_{j}^{\mathrm{T}{\mathbf{y}_{j}}}} } } } \right) \end{array} $$
(25)

Accordingly, we can also generalize function \(\boldsymbol {{\Phi }}\) as:

$$\begin{array}{@{}rcl@{}} \boldsymbol{{\Phi}}&{=}& \left[ {\begin{array}{*{20}{c}} \mathbf{X}& \mathbf{Y} \end{array}} \right]\left[ {\begin{array}{*{20}{c}} \mathbf{A} & \mathbf{B}\\ \mathbf{E} & \mathbf{F} \end{array}} \right]\left[ {\begin{array}{*{20}{c}} {{\mathbf{X}^{\mathrm{T}}}}\\ {{\mathbf{Y}^{\mathrm{T}}}} \end{array}} \right] \\ &=& \left[ {\begin{array}{*{20}{c}} {\mathbf{XA}{+}\mathbf{YE}}& ~{\mathbf{XB}{+}\mathbf{YF}} \end{array}} \right]\left[ {\begin{array}{*{20}{c}} {{\mathbf{X}^{\mathrm{T}}}}\\ {{\mathbf{Y}^{\mathrm{T}}}} \end{array}} \right] \\&=& \mathbf{XAX}^{\mathrm{T}} + \mathbf{YEX}^{\mathrm{T}} + \mathbf{XBY}^{\mathrm{T}} + \mathbf{YFY}^{\mathrm{T}} \end{array} $$
(26)

Therefore, we can obtain \(\boldsymbol {{\Phi }}=tr(\mathbf {XAX}^{\mathrm {T}} {+} \mathbf {YEX}^{\mathrm {T}} {+} \mathbf {XBY}^{\mathrm {T}} {+} \mathbf {YFY}^{\mathrm {T}})\). By comparing the (25) and (26), we can obtain that \(\mathbf {A}_{ii}{=} \sum \nolimits _{j = 1}^{m} {{\mathbf {w}_{ij}} = {\mathbf {D}_{ii}}}\), \(\mathbf {B} {=} - \mathbf {W}\), \(\mathbf {F}_{jj}{=}\sum \nolimits _{i = 1}^{n} {{\mathbf {w}_{ij}} = {\mathbf {D}_{jj}}}\), \(\mathbf {E} {=} - {\mathbf {W}^{\mathrm {T}}}\). Let \(\mathbf {U} = [\mathbf {X}{\kern 1pt} ~~ {\kern 1pt} \mathbf {Y}]\), we can obtain:

$$\begin{array}{@{}rcl@{}} \frac{1}{2}{\sum\limits_{i = 1}^{n} {\sum\limits_{j = 1}^{m} {{\mathbf{w}_{ij}}||{\mathbf{x}_{i}} - {\mathbf{y}_{j}}||^{2}} }} = tr(\mathbf{ULU}^{\mathrm{T}}) \end{array} $$
(27)

where \(\mathbf {L} = \left [ {\begin {array}{*{20}{c}} {{\mathbf {D}}}&{ - \mathbf {W}}\\ { - {\mathbf {W}^{\mathrm {T}}}}&{{\mathbf {D}}} \end {array}} \right ]\). According to this general formation, the three items in (11) can be directly converted as:

$$ \frac{\lambda_{1}}{2}\sum\limits_{i,j = 1}^{n} \mathbf{r}_{ij}^{(1)}||\mathbf{s}_{i}^{(1)}-\mathbf{s}_{j}^{(1)}||^{2} \Leftrightarrow tr(\mathbf{S}^{(1)}{(\mathbf{D}_{1}- \lambda_{1} \mathbf{R}_{1})}({\mathbf{S}^{(1)}})^{\mathrm{T}}) $$
(28)
$$ \frac{\lambda_{2}}{2}\sum\limits_{i,j = 1}^{n} \mathbf{r}_{ij}^{(2)}||\mathbf{s}_{i}^{(2)}-\mathbf{s}_{j}^{(2)}||^{2} \Leftrightarrow tr(\mathbf{S}^{(2)}{(\mathbf{D}_{2}- \lambda_{2} \mathbf{R}_{2})}({\mathbf{S}^{(2)}})^{\mathrm{T}}) $$
(29)
$$ \begin{array}{lllllllll} \sum\limits_{i,j = 1}^{n} \mathbf{c}_{ij}{||\mathbf{s}_{i}^{(1)}{-}\mathbf{s}_{j}^{(2)}||}^{2} &{\Leftrightarrow} tr\left( {\left[ {\begin{array}{*{20}{c}} {{\mathbf{S}^{(1)}}}&{{\mathbf{S}^{(2)}}} \end{array}} \right]\left( {\begin{array}{*{20}{c}} \mathbf{D}_{3}&{ - \mathbf{C}}\\ {-\mathbf{C}^{\mathrm{T}}}&\mathbf{D}_{3} \end{array}} \right)\left[ \begin{array}{l} ({\mathbf{S}^{{{(1)}}}})^{\mathrm{T}}\\ ({\mathbf{S}^{{{(2)}}}})^{\mathrm{T}} \end{array} \right]} \right)\\ &= tr\left( {\mathbf{S}^{(1)}}{\mathbf{D}_{3}}({\mathbf{S}^{(1)}})^{\mathrm{T}{+}{\mathbf{S}^{(2)}}{\mathbf{D}_{3}}}({\mathbf{S}^{(2)}})^{\mathrm{T}{-}{2\mathbf{S}^{(1)}}}\mathbf{C}({\mathbf{S}^{(2)}})^{\mathrm{T}}\right) \end{array} $$
(30)

where \(\mathbf {D}_{1}\), \(\mathbf {D}_{2}\), \(\mathbf {D}_{3}\in {\mathbb {R}}^{n\times n}\) are diagonal matrices with entries being the column sum of \(\lambda _{1} \mathbf {R}^{(1)}\), \(\lambda _{2} \mathbf {R}^{(2)}\) and \(\mathbf {C}\), respectively.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, X., Li, A., Du, JX. et al. Efficient cross-modal retrieval via flexible supervised collective matrix factorization hashing. Multimed Tools Appl 77, 28665–28683 (2018). https://doi.org/10.1007/s11042-018-6006-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-6006-5

Keywords

Navigation