skip to main content
research-article

Deep Semantic Mapping for Heterogeneous Multimedia Transfer Learning Using Co-Occurrence Data

Published: 24 January 2019 Publication History

Abstract

Transfer learning, which focuses on finding a favorable representation for instances of different domains based on auxiliary data, can mitigate the divergence between domains through knowledge transfer. Recently, increasing efforts on transfer learning have employed deep neural networks (DNN) to learn more robust and higher level feature representations to better tackle cross-media disparities. However, only a few articles consider the correction and semantic matching between multi-layer heterogeneous domain networks. In this article, we propose a deep semantic mapping model for heterogeneous multimedia transfer learning (DHTL) using co-occurrence data. More specifically, we integrate the DNN with canonical correlation analysis (CCA) to derive a deep correlation subspace as the joint semantic representation for associating data across different domains. In the proposed DHTL, a multi-layer correlation matching network across domains is constructed, in which the CCA is combined to bridge each pair of domain-specific hidden layers. To train the network, a joint objective function is defined and the optimization processes are presented. When the deep semantic representation is achieved, the shared features of the source domain are transferred for task learning in the target domain. Extensive experiments for three multimedia recognition applications demonstrate that the proposed DHTL can effectively find deep semantic representations for heterogeneous domains, and it is superior to the several existing state-of-the-art methods for deep transfer learning.

References

[1]
G. Andrew, R. Arora, J. Bilmes, and K. Livescu. 2013. Deep canonical correlation analysis. In International Conference on Machine Learning. 1247--1255.
[2]
M. Chen, Z. Xu, K. Weinberger, and F. Sha. 2012. Marginalized denoising autoencoders for domain adaptation. In International Conference on Machine Learning. 1--8.
[3]
Z. Ding, N. M. Nasrabadi, and Y. Fu. 2016. Task-driven deep transfer learning for image classification. In IEEE International Conference on Acoustics, Speech and Signal Processing. 2414--2418.
[4]
J. Donahue, J. Hoffman, E. Rodner, K. Saenko, and T. Darrell. 2013. Semi-supervised domain adaptation with instance constraints. In IEEE Conference on Computer Vision and Pattern Recognition. 668--675.
[5]
L. Duan, D. Xu, and I. Tsang. 2012. Learning with augmented features for heterogeneous domain adaptation. In International Conference on Machine Learning. 711--718.
[6]
F. Feng, R. Li, and X. Wang. 2015. Deep correspondence restricted Boltzmann machine for cross-modal retrieval. Neurocomputing 154, 4 (2015), 50--60.
[7]
X. Glorot, A. Bordes, and Y. Bengio. 2011. Domain adaptation for large-scale sentiment classification: A deep learning approach. In International Conference on Machine Learning. 513--520.
[8]
D. R. Hardoon, S. Szedmak, and J. Shawe-Taylor. 2004. Canonical correlation analysis: An overview with application to learning methods. Neural Computation 16, 12 (2004), 2639--2664.
[9]
J. Hoffman, D. Wang, F. Yu, and T. Darrell. 2016. FCNS in the wild: Pixel-level adversarial and constraint-based adaptation. In Computer Vision and Pattern Recognition. arXiv preprint arXiv:1612.02649.
[10]
L. Jing, C. Zhang, and M. K. Ng. 2012. SNMFCA: Supervised NMF-based image classification and annotation. IEEE Transactions on Image Processing 21, 11 (2012), 4508--4521.
[11]
B. Kulis, K. Saenko, and T. Darrell. 2011. What you saw is not what you get: Domain adaptation using asymmetric kernel transforms. In Computer Vision and Pattern Recognition. 1785--1792.
[12]
W. Li, L. Duan, D. Xu, and I. W. Tsang. 2014. Learning with augmented features for supervised and semi-supervised heterogeneous domain adaptation. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 6 (2014), 1134--1148.
[13]
M. Long, J. Wang, G. Ding, J. Sun, and P. S. Yu. 2014. Transfer joint matching for unsupervised domain adaptation. In IEEE Conference on Computer Vision and Pattern Recognition. 1410--1417.
[14]
J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee, and A. Y. Ng. 2012. Multimodal deep learning. In International Conference on Machine Learning. 689--696.
[15]
J. Ni, Q. Qiu, and R. Chellappa. 2013. Subspace interpolation via dictionary learning for unsupervised domain adaptation. In IEEE Conference on Computer Vision and Pattern Recognition. 692--699.
[16]
S. J. Pan, I. W. Tsang, J. T. Kwok, and Q. Yang. 2011. Domain adaptation via transfer component analysis. IEEE Transactions on Neural Networks 22, 2 (2011), 199--210.
[17]
S. J. Pan and Q. Yang. 2010. A survey on transfer learning. IEEE Transactions on Knowledge 8 Data Engineering 22, 10 (2010), 1345--1359.
[18]
W. Pan and Q. Yang. 2013. Transfer learning in heterogeneous collaborative filtering domains. Artificial Intelligence 197, 4 (2013), 39--55.
[19]
J. C. Pereira, E. Coviello, G. Doyle, N. Rasiwasia, G. R. G. Lanckriet, R. Levy, and N. Vasconcelos. 2014. On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Transactions Pattern Analysis and Machine Intelligence 36, 3 (2014), 521--535.
[20]
R. Raina, A. Battle, H. Lee, B. Packer, and A. Y. Ng. 2007. Self-taught learning: Transfer learning from unlabeled data. In Proceedings of the 24th International Conference on Machine Learning. 759--766.
[21]
H. Sagha, J. Deng, M. Gavryukova, J. Han, and B. Schuller. 2016. Cross lingual speech emotion recognition using canonical correlation analysis on principal component subspace. In IEEE International Conference on Acoustics, Speech and Signal Processing. 5800--5804.
[22]
X. Shu, G. J. Qi, J. Tang, and J. Wang. 2015. Weakly-shared deep transfer networks for heterogeneous-domain knowledge propagation. In ACM International Conference on Multimedia. 35--44.
[23]
R. Socher, M. Ganjoo, H. Sridhar, O. Bastani, C. D. Manning, and A. Y. Ng. 2013. Zero-shot learning through cross-modal transfer. In Advances in Neural Information Processing Systems. 935--943.
[24]
J. Tang, X. Shu, Z. Li, G. J. Qi, and J. Wang. 2016. Generalized deep transfer networks for knowledge propagation in heterogeneous domains. ACM Transactions on Multimedia Computing Communications and Applications 12, 4s (2016), 68:1--68:22.
[25]
D. Tao, C. Hong, J. Yu, J. Wan, and M. Wang. 2015. Multimodal deep autoencoder for human pose recovery. IEEE Transactions on Image Processing 24, 12 (2015), 5659--5670.
[26]
E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell. 2017. Adversarial discriminative domain adaptation. In Computer Vision and Pattern Recognition. 7167--7176.
[27]
L. Yang, L. Jing, and M. K. Ng. 2015. Robust and non-negative collective matrix factorization for text-to-image transfer learning. IEEE Transactions on Image Processing 24, 12 (2015), 4701--4714.
[28]
L. Yang, L. Jing, J. Yu, and M. K. Ng. 2016. Learning transferred weights from co-occurrence data for heterogeneous transfer learning. IEEE Transactions on Neural Networks and Learning Systems 27, 11 (2016), 2187--2200.
[29]
Q. Yang, Y. Chen, G. R. Xue, W. Dai, and Y. Yu. 2009. Heterogeneous transfer learning for image clustering via the social web. In Joint Conference of the Meeting of the ACL and the International Joint Conference on Natural Language Processing. 1--9.
[30]
X. Yang, T. Zhang, C. Xu, and M. H. Yang. 2015. Boosted multifeature learning for cross-domain transfer. ACM Transactions on Multimedia Computing Communications and Applications 11, 3 (2015), 35:1--35:19.
[31]
Y. R. Yeh, C. H. Huang, and Y. C. Wang. 2014. Heterogeneous domain adaptation and classification by exploiting the correlation subspace. IEEE Transactions on Image Processing 23, 5 (2014), 2009--2018.
[32]
Q. Zhang, L. T. Yang, and Z. Chen. 2016. Deep computation model for unsupervised feature learning on big data. IEEE Transactions on Services Computing 9, 1 (2016), 161--171.
[33]
Q. Zhang, H. Zhong, L. T. Yang, Z. Chen, and F. Bu. 2016. PPHOCFS: Privacy preserving high-order CFS algorithm on the cloud for clustering multimedia data. ACM Transactions on Multimedia Computing Communications and Applications 12, 4s (2016), 66:1--66:15.
[34]
X. Zhang, F. X. Yu, S. F. Chang, and S. Wang. 2015. Deep transfer network: Unsupervised domain adaptation. In Computer Vision and Pattern Recognition.
[35]
L. Zhao, Z. Chen, and Z. J. Wang. 2018. Unsupervised multiview nonnegative correlated feature learning for data clustering. IEEE Signal Processing Letters 25, 1 (2018), 60--64.
[36]
L. Zhao, Z. Chen, Z. Yang, Y. Hu, and M. S. Obaidat. 2018. Local similarity imputation based on fast clustering for incomplete data in cyber-physical systems. IEEE Systems Journal 12, 2 (2018), 1610--1620.
[37]
J. T. Zhou, S. J. Pan, I. W. Tsang, and Y. Yan. 2014. Hybrid heterogeneous transfer learning through deep learning. In 28th AAAI Conference on Artificial Intelligence. 2213--2219.
[38]
J. T. Zhou, I. W. Tsang, S. J. Pan, and M. Tan. 2014. Heterogeneous domain adaptation for multiple classes. In Artificial Intelligence and Statistics. 1095--1103.
[39]
Y. Zhu, Y. Chen, Z. Lu, S. J. Pan, G. R. Xue, Y. Yu, and Q. Yang. 2011. Heterogeneous transfer learning for image classification. In AAAI Conference on Artificial Intelligence. 1304--1309.
[40]
F. Zhuang, X. Cheng, P. Luo, S. J. Pan, and Q. He. 2015. Supervised representation learning: Transfer learning with deep autoencoders. In International Conference on Artificial Intelligence. 4119--4125.

Cited By

View all
  • (2024)AED-PADA: Improving Generalizability of Adversarial Example Detection via Principal Adversarial Domain AdaptationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/370606121:2(1-24)Online publication date: 3-Dec-2024
  • (2024)Incomplete Multiview Clustering via Semidiscrete Optimal Transport for Multimedia Data Mining in IoTACM Transactions on Multimedia Computing, Communications, and Applications10.1145/362554820:6(1-20)Online publication date: 8-Mar-2024
  • (2023)Emotion-Prior Awareness Network for Emotional Video CaptioningProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3611726(589-600)Online publication date: 26-Oct-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications
ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 15, Issue 1s
Special Section on Deep Learning for Intelligent Multimedia Analytics and Special Section on Multi-Modal Understanding of Social, Affective and Subjective Attributes of Data
January 2019
265 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/3309769
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 January 2019
Accepted: 01 July 2018
Revised: 01 June 2018
Received: 01 October 2017
Published in TOMM Volume 15, Issue 1s

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Deep semantic mapping
  2. canonical correlation analysis
  3. deep neural networks
  4. heterogeneous multimedia
  5. transfer learning

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)36
  • Downloads (Last 6 weeks)4
Reflects downloads up to 17 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)AED-PADA: Improving Generalizability of Adversarial Example Detection via Principal Adversarial Domain AdaptationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/370606121:2(1-24)Online publication date: 3-Dec-2024
  • (2024)Incomplete Multiview Clustering via Semidiscrete Optimal Transport for Multimedia Data Mining in IoTACM Transactions on Multimedia Computing, Communications, and Applications10.1145/362554820:6(1-20)Online publication date: 8-Mar-2024
  • (2023)Emotion-Prior Awareness Network for Emotional Video CaptioningProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3611726(589-600)Online publication date: 26-Oct-2023
  • (2023)Addressing modern and practical challenges in machine learning: a survey of online federated and transfer learningApplied Intelligence10.1007/s10489-022-04065-353:9(11045-11072)Online publication date: 1-May-2023
  • (2022)DA-GAN: Dual Attention Generative Adversarial Network for Cross-Modal RetrievalFuture Internet10.3390/fi1402004314:2(43)Online publication date: 27-Jan-2022
  • (2022)Prediction of Pulmonary Fibrosis Based on X-Rays by Deep Neural NetworkJournal of Healthcare Engineering10.1155/2022/38450082022(1-13)Online publication date: 26-Mar-2022
  • (2022)Feeling Without SharingProceedings of the 30th ACM International Conference on Multimedia10.1145/3503161.3548278(151-160)Online publication date: 10-Oct-2022
  • (2022)Representation Learning through Multimodal Attention and Time-Sync Comments for Affective Video Content AnalysisProceedings of the 30th ACM International Conference on Multimedia10.1145/3503161.3548018(42-50)Online publication date: 10-Oct-2022
  • (2021)LPWAN and Embedded Machine Learning as Enablers for the Next Generation of Wearable DevicesSensors10.3390/s2115521821:15(5218)Online publication date: 31-Jul-2021
  • (2021)Research Based on Multimodal Deep Feature Fusion for the Auxiliary Diagnosis Model of Infectious Respiratory DiseasesScientific Programming10.1155/2021/55769782021Online publication date: 1-Jan-2021
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media