Abstract
In many real-world applications, an increasing number of objects can be collected at varying viewpoints or by different sensors, which brings in the urgent demand for recognizing objects from distinct heterogeneous views. Although significant progress has been achieved recently, heterogeneous recognition (cross-view recognition) in multi-view learning is still challenging due to the complex correlations among views. Multi-view subspace learning is an effective solution, which attempts to obtain a common representation from downstream computations. Most previous methods are based on the idea of maximal correlation after feature extraction to establish the relationship among different views in a two-step manner, thus leading to performance deterioration. To overcome this drawback, in this paper, we propose a deep cross-view autoencoder network (DCVAE) that extracts the features of different views and establishes the correlation between views in one step to simultaneously handle view-specific, view-correlation, and consistency in a joint manner. Specifically, DCVAE contains self-reconstruction, newly designed cross-view reconstruction, and consistency constraint modules. Self-reconstruction ensures the view-specific, cross-view reconstruction transfers the information from one view to another view, and consistency constraint makes the representation of different views more consistent. The proposed model suffices to discover the complex correlation embedded in multi-view data and to integrate heterogeneous views into a latent common representation subspace. Furthermore, the 2D embeddings of the learned common representation subspace demonstrate the consistency constraint is valid and cross-view classification experiments verify the superior performance of DCVAE in the two-view scenario.










Similar content being viewed by others
References
Akaho S (2006) A kernel method for canonical correlation analysis. arXiv:cs/0609071
Andrew G, Arora R, Livescu K, Bilmes J (2013) Deep Canonical Correlation Analysis. In: International Conference on Machine Learning (ICML), pp 2284–2292
Bajorski P (2011) Canonical Correlation Analysis. Encyclopedia of Stat Behav Sci:241–259. https://doi.org/10.1002/9781118121955.ch8
Bottou L (2012) Stochastic gradient descent tricks. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics). https://doi.org/10.1007/978-3-642-35289-8_25, vol 7700 LECTU. Springer, pp 421–436
Cao G, Iosifidis A, Chen K, Gabbouj M (2018) Generalized multi-view embedding for visual recognition and cross-modal retrieval. IEEE Trans Cybern 48(9):2542–2555. https://doi.org/10.1109/TCYB.2017.2742705, https://ieeexplore.ieee.org/document/8026149/
Deng S, Xia W, Gao Q, Gao X (2021) Cross-view classification by joint adversarial learning and class-specificity distribution. Pattern Recogn 110:107633. https://doi.org/10.1016/j.patcog.2020.107633, https://linkinghub.elsevier.com/retrieve/pii/S0031320320304362
Ding Z, Fu Y (2018) Robust multiview data analysis through collective low-rank subspace. IEEE Trans Neural Netw Learn Syst 29(5):1986–1997. https://doi.org/10.1109/TNNLS.2017.2690970, http://ieeexplore.ieee.org/document/7902214/
Guo Y, Ji J, Shi D, Ye Q, Xie H (2020) Multi-view feature learning for VHR remote sensing image classification. Multimed Tools Appl:1–13. https://doi.org/10.1007/s11042-020-08713-z
Hu P, Peng D, Sang Y, Xiang Y (2019) Multi-view linear discriminant analysis network. IEEE Trans Image Process 28(11):5352–5365. https://doi.org/10.1109/TIP.2019.2913511, https://ieeexplore.ieee.org/document/8704986/
Kan M, Shan S, Zhang H, Lao S, Chen X (2016) Multi-View Discriminant Analysis. IEEE Trans Pattern Anal Mach Intell 38(1):188–194. https://doi.org/10.1109/TPAMI.2015.2435740, http://ieeexplore.ieee.org/document/7110624/
Kingma D P, Ba J (2014) Adam: A Method for Stochastic Optimization. 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings. arXIv:1412.6980
Kuehlkamp A, Pinto A, Rocha A, Bowyer K W, Czajka A (2019) Ensemble of multi-view learning classifiers for cross-domain iris presentation attack detection. IEEE Trans Inf Forensic Secur 14(6):1419–1431. https://doi.org/10.1109/TIFS.2018.2878542, https://ieeexplore.ieee.org/document/8513867/
Li S Z, Lei Z, Meng Ao (2009) The HFB Face Database for Heterogeneous Face Biometrics research. In: 2009 IEEE computer society conference on computer vision and pattern recognition workshops. https://doi.org/0.1109/CVPRW.2009.5204149, https://ieeexplore.ieee.org/document/5204149/. IEEE, pp 1–8
Li Y, Yang M, Zhang Z (2019) A survey of multi-view representation learning. IEEE Trans Knowl Data Eng 31(10):1863–1883. https://doi.org/10.1109/TKDE.2018.2872063, https://ieeexplore.ieee.org/document/8471216/
Liu H, Han J, Nie F, Li X (2017) Balanced clustering with least square regression. In: 31st AAAI Conference on Artificial Intelligence, AAAI 2017, vol 31, pp 2231–2237
Ngiam J, Khosla A, Kim M, Nam J, Lee H, Ng A Y (2011) Multimodal deep learning. In: Proceedings of the 28th International Conference on Machine Learning, ICML 2011, pp 689–696
Nie F, Cai G, Li J, Li X (2018) Auto-weighted multi-view learning for image clustering and semi-supervised classification. IEEE Trans Image Process 27(3):1501–1511. https://doi.org/10.1109/TIP.2017.2754939, http://ieeexplore.ieee.org/document/8047308/
Rupnik J, Shawe-taylor J, Rupnik J, Shawe-taylor J (2016) Multi-View Canonical Correlation Analysis Multi-View Canonical Correlation Analysis. In: Conference on data mining and data warehouses (SiKDD 2010)
Shang R, Meng Y, Wang W, Shang F, Jiao L (2019) Local discriminative based sparse subspace learning for feature selection. Pattern Recogn 92:219–230. https://doi.org/10.1016/j.patcog.2019.03.026, https://linkinghub.elsevier.com/retrieve/pii/S0031320319301347
Sharma A, Kumar A, Daume H, Jacobs D W (2012) Generalized multiview analysis: a discriminative latent space. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition. https://doi.org/10.1109/CVPR.2012.6247923, http://ieeexplore.ieee.org/document/6247923/. IEEE, pp 2160–2167
van der Maaten L (2009) A new benchmark dataset for handwritten character recognition. Technical Report. Tilburg University, The Netherlands, pp 2–5. http://www.tilburguniversity.edu/research/institutes-and-research-groups/ticc/research/technicalreports/TR2009002.pdf
Van Der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(11):2579–2625
Wang Q, Ding Z, Tao Z, Gao Q, Fu Y (2018) Partial multi-view clustering via consistent GAN. In: 2018 IEEE International Conference on Data Mining (ICDM). https://doi.org/10.1109/ICDM.2018.00174, https://ieeexplore.ieee.org/document/8594983/. IEEE, pp 1290–1295
Wang W, Arora R, Livescu K, Bilmes J (2015) On deep multi-view representation learning. In: 32nd International Conference on Machine Learning, ICML 2015, vol 2, pp 1083–1092
Wen J, Zhang Z, Xu Y, Zhang B, Fei L, Xie G-S (2020) CDIMC-net: Cognitive Deep Incomplete Multi-view Clustering Network. In: Proceedings of the twenty-ninth international joint conference on artificial intelligence. https://doi.org/10.1145/3394171.3413807, https://www.ijcai.org/proceedings/2020/447. International Joint Conferences on Artificial Intelligence Organization, California, pp 3230–3236
Yang S, Gao T, Wang J, Deng B, Lansdell B, Linares-Barranco B (2021) Efficient Spike-Driven Learning With Dendritic Event-Based Processing. Front Neurosci 15. https://doi.org/10.3389/fnins.2021.601109
Yang S, Wang J, Deng B, Liu C, Li H, Fietkiewicz C, Loparo K A (2019) Real-time neuromorphic system for large-scale conductance-based spiking neural networks. IEEE Trans Cybern 49(7):2490–2503. https://doi.org/10.1109/TCYB.2018.2823730, https://ieeexplore.ieee.org/document/8341965/
Yang S, Wang J, Hao X, Li H, Wei X, Deng B, Loparo K A (2021) BiCoSS: Toward Large-Scale Cognition Brain With Multigranular Neuromorphic Architecture. IEEE Trans Neural Netw Learn Syst:1–15. https://doi.org/10.1109/TNNLS.2020.3045492
Yang S, Wang J, Zhang N, Deng B, Pang Y, Azghadi M R (2021) CerebelluMorphic: large-scale neuromorphic model and architecture for supervised motor learning. IEEE Trans Neural Netw Learn Syst:1–15. https://doi.org/10.1109/TNNLS.2021.3057070, https://ieeexplore.ieee.org/document/9361429/
Yoshida K, Yoshimoto J, Doya K (2017) Sparse kernel canonical correlation analysis for discovery of nonlinear interactions in high-dimensional data. BMC Bioinform 18 (1):108. https://doi.org/10.1186/s12859-017-1543-x
You X, Xu J, Yuan W, Jing X-Y, Tao D, Zhang T (2019) Multi-view common component discriminant analysis for cross-view classification. Pattern Recogn 92:37–51. https://doi.org/10.1016/j.patcog.2019.03.008, https://linkinghub.elsevier.com/retrieve/pii/S0031320319301074
Zhang C, Liu Y, Fu H (2019) AE2-Nets: autoencoder in autoencoder networks. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/CVPR.2019.00268, https://ieeexplore.ieee.org/document/8953969/. IEEE, pp 2572–2580
Zhang Y, Lu H (2018) Deep cross-modal projection learning for image-text matching. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 686–701
Zhang Z, Zhong Z, Cui J, Fei L (2018) Learning robust latent subspace for discriminative regression. In: 2017 IEEE visual communications and image processing, VCIP 2017. https://doi.org/10.1109/VCIP.2017.8305137, http://ieeexplore.ieee.org/document/8305137/, pp 1–4
Zhao J, Xie X, Xu X, Sun S (2017) Multi-view learning overview: recent progress and new challenges. Inf Fusion 38:43–54. https://doi.org/10.1016/j.inffus.2017.02.007, https://linkinghub.elsevier.com/retrieve/pii/S1566253516302032
Zhao Y, You X, Yu S, Xu C, Yuan W, Jing X-Y, Zhang T, Tao D (2018) Multi-view manifold learning with locality alignment. Pattern Recogn 78:154–166. https://doi.org/10.1016/j.patcog.2018.01.012, https://linkinghub.elsevier.com/retrieve/pii/S0031320318300128
Acknowledgements
This work was sponsored by Scientific and Technological Research Program of Chongqing Municipal Education Commission (Grant No. KJQN202100638) and Natural Science Foundation of Chongqing (Grant No. cstc2018jcyjAX0532).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Mi, JX., Fu, CQ., Chen, T. et al. Deep cross-view autoencoder network for multi-view learning. Multimed Tools Appl 81, 24645–24664 (2022). https://doi.org/10.1007/s11042-022-12636-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-12636-2