Progressive Cross-Media Correlation Learning

Huang, Xin; Peng, Yuxin

doi:10.1007/978-981-13-1702-6_20

Xin Huang¹¹ &
Yuxin Peng¹¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 875))

Included in the following conference series:

Chinese Conference on Image and Graphics Technologies

1856 Accesses

Abstract

Cross-media retrieval aims to retrieve across different media types, such as image and text, whose key problem is to learn cross-media correlation from known training data. Existing methods indiscriminately take all data for model training, ignoring that there exist hard samples which lead to misleading and even noisy information, bringing negative effect especially in the early period of model training. Because cross-media training data is difficult to collect, the common challenge of small-scale training data makes this problem even severer to limit the robustness and accuracy of cross-media retrieval. For addressing the above problem, this paper proposes Progressive Cross-media Correlation Learning (PCCL) approach, which takes a large-scale cross-media dataset with general knowledge (reference data), to guide the correlation learning on another small-scale dataset (target data) via the progressive sample selection mechanism. Specifically, we first pre-train a hierarchical correlation learning network on reference data as reference model, which is used to assign samples in target data with different learning difficulties, via intra-media and inter-media relevance significance metric. Then, training samples in target data are selected with gradually ascending learning difficulties, so that the correlation learning process can progressively reduce the “heterogeneity gap” to enhance the model robustness and improve retrieval accuracy. We take our self-constructed large-scale XMediaNet dataset as the reference data, and the cross-media retrieval experiments on 2 widely-used datasets show PCCL outperforms 9 state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Gilakjani, A.P.: Visual, auditory, kinaesthetic learning styles and their impacts on english language teaching. J. Stud. Educ. 2, 104–113 (2012)
Article Google Scholar
Peng, Y., Huang, X., Zhao, Y.: An overview of cross-media retrieval: concepts, methodologies, benchmarks and challenges. IEEE Trans. Circ. Syst. Video Technol. (TCSVT) (2017). https://doi.org/10.1109/TCSVT.2017.2705068
Hotelling, H.: Relations between two sets of variates. Biometrika 28(3/4), 321–377 (1936)
Article Google Scholar
Zhai, X., Peng, Y., Xiao, J.: Learning cross-media joint representation with sparse and semi-supervised regularization. IEEE Trans. Circ. Syst. Video Technol. (TCSVT) 24(6), 965–978 (2014)
Article Google Scholar
Kang, C., Xiang, S., Liao, S., Xu, C., Pan, C.: Learning consistent feature representation for cross-modal multimedia retrieval. IEEE Trans. Multimedia (TMM) 17(3), 370–381 (2015)
Article Google Scholar
Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., Ng, A.Y.: Multimodal deep learning. In: International Conference Machine Learning (ICML), pp. 689–696 (2011)
Google Scholar
Feng, F., Wang, X., Li, R.: Cross-modal retrieval with correspondence autoencoder. In: ACM MM, pp. 7–16 (2014)
Google Scholar
Peng, Y., Huang, X., Qi, J.: Cross-media shared representation by hierarchical learning with multiple deep networks. In: IJCAI, pp. 3846–3853 (2016)
Google Scholar
Bengio, Y., Louradour, J., Collobert, R., and Weston, J.: Curriculum learning. In: ICML, pp. 41–48 (2009)
Google Scholar
Rasiwasia, N., et al.: A new approach to cross-modal multimedia retrieval. In: ACM MM, pp. 251–260 (2010)
Google Scholar
Ranjan, V., Rasiwasia, N., Jawahar, C.V.: Multi-label cross-modal retrieval. In: ICCV, pp. 4094–4102 (2015)
Google Scholar
Peng, Y., Zhai, X., Zhao, Y., Huang, X.: Semi-supervised cross-media feature learning with unified patch graph regularization. IEEE Trans. Circ. Syst. Video Technol. (TCSVT) 26(3), 583–596 (2016)
Article Google Scholar
Wei, Y., Lu, C., Wei, S., Liu, L., Zhu, Z., Yan, S.: Cross-modal retrieval with CNN visual features: a new baseline. IEEE Trans. Cybern. (TCYB) 47(2), 449–460 (2017)
Google Scholar
Huang, X., Peng, Y., Yuan, M.: Cross-modal common representation learning by hybrid transfer network. In: IJCAI, pp. 1893–1900 (2017)
Google Scholar
Yan, F., Mikolajczyk, K.: Deep correlation for matching images and text. In: CVPR, pp. 3441–3450 (2015)
Google Scholar
Pentina, A., Sharmanska, V., Lampert, C.H.: Curriculum learning of multiple tasks. In: CVPR, pp. 5492–5500 (2015)
Google Scholar
Kumar, M.P., Packer, B., Koller, D.: Self-paced learning for latent variable models. In: NIPS, pp. 1189–1197 (2010)
Google Scholar
Gong, C., Tao, D., Maybank, S.J., Liu, W., Kang, G., Yang, J.: Multi-modal curriculum learning for semi-supervised image classification. IEEE Trans. Image Process. (TIP) 25(7), 3249–3260 (2016)
Article MathSciNet Google Scholar
Supancic, J.S., Ramanan, D.: Self-paced learning for long-term tracking. In: CVPR, pp. 2379–2386 (2013)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv: 1409.1556 (2014)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS, pp. 3111–3119 (2013)
Google Scholar
Kim, Y.: Convolutional neural networks for sentence classification. In: EMNLP. 1746–1751 (2014)
Google Scholar
Li, D., Dimitrova, N., Li, M., Sethi, I.K.: Multimedia content processing through cross-modal association. In: ACM MM, pp. 604–611 (2003)
Google Scholar
Chua, T.-S., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.: NUS-WIDE: a real-world web image database from National University of Singapore. In: CIVR, No. 48 (2009)
Google Scholar
Hardoon, D.R., Szedmák, S., Shawe-Taylor, J.: Canonical correlation analysis: an overview with application to learning methods. Neural Comput. 16(12), 2639–2664 (2004)
Article Google Scholar

Download references

Acknowledgments

This work was supported by National Natural Science Foundation of China under Grants 61771025 and 61532005.

Author information

Authors and Affiliations

Institute of Computer Science and Technology, Peking University, Beijing, 100871, China
Xin Huang & Yuxin Peng

Authors

Xin Huang
View author publications
You can also search for this author in PubMed Google Scholar
Yuxin Peng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuxin Peng .

Editor information

Editors and Affiliations

Beijing Institute of Technology, Beijing, China
Yongtian Wang
Beihang University, Beijing, China
Zhiguo Jiang
Peking University, Beijing, China
Yuxin Peng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Huang, X., Peng, Y. (2018). Progressive Cross-Media Correlation Learning. In: Wang, Y., Jiang, Z., Peng, Y. (eds) Image and Graphics Technologies and Applications. IGTA 2018. Communications in Computer and Information Science, vol 875. Springer, Singapore. https://doi.org/10.1007/978-981-13-1702-6_20

Download citation

DOI: https://doi.org/10.1007/978-981-13-1702-6_20
Published: 12 August 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1701-9
Online ISBN: 978-981-13-1702-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics