Abstract
Multi-view learning (MVL) explores the data extracted from multiple resources. It assumes that the complementary information between different views could be revealed to further improve the learning performance. There are two challenges. First, it is difficult to effectively combine the different view data while still fully preserve the view-specific information. Second, multi-view datasets are usually small, which means the model can be easily overfitted. To address the challenges, we propose a novel View-Correlation Adaptation (VCA) framework in semi-supervised fashion. A semi-supervised data augmentation me-thod is designed to generate extra features and labels based on both labeled and unlabeled samples. In addition, a cross-view adversarial training strategy is proposed to explore the structural information from one view and help the representation learning of the other view. Moreover, an effective and simple fusion network is proposed for the late fusion stage. In our model, all networks are jointly trained in an end-to-end fashion. Extensive experiments demonstrate that our approach is effective and stable compared with other state-of-the-art methods (Code is available on: https://github.com/wenwen0319/GVCA).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Azad, R., Asadi-Aghbolaghi, M., Kasaei, S., Escalera, S.: Dynamic 3D hand gesture recognition by learning weighted depth motion maps. IEEE Trans. Circuits Syst. Video Technol. 29, 1729–1740 (2018)
Banica, D., Sminchisescu, C.: Second-order constrained parametric proposals and sequential search-based structured prediction for semantic segmentation in RGB-D images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3517–3526 (2015)
Berthelot, D., Carlini, N., Goodfellow, I., Papernot, N., Oliver, A., Raffel, C.: MixMatch: a holistic approach to semi-supervised learning. arXiv preprint arXiv:1905.02249 (2019)
Cai, Z., Wang, L., Peng, X., Qiao, Y.: Multi-view super vector for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 596–603 (2014)
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27 (2011)
Chapelle, O., Scholkopf, B., Zien, A.: Semi-supervised learning. IEEE Trans. Neural Netw. 20(3), 542 (2009)
Cheng, Y., Zhao, X., Cai, R., Li, Z., Huang, K., Rui, Y., et al.: Semi-supervised multimodal deep learning for RGB-D object recognition (2016)
Cheng, Z., Qin, L., Ye, Y., Huang, Q., Tian, Q.: Human daily action analysis with multi-view and color-depth data. In: Fusiello, A., Murino, V., Cucchiara, R. (eds.) ECCV 2012. LNCS, vol. 7584, pp. 52–61. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33868-7_6
Ding, Z., Shao, M., Fu, Y.: Robust multi-view representation: a unified perspective from multi-view learning to domain adaption. In: Proceedings of the International Joint Conferences on Artificial Intelligence, pp. 5434–5440 (2018)
Du, D., Wang, L., Wang, H., Zhao, K., Wu, G.: Translate-to-recognize networks for RGB-D scene recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 11836–11845 (2019)
Girdhar, R., Ramanan, D., Gupta, A., Sivic, J., Russell, B.: ActionVLAD: learning spatio-temporal aggregation for action classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, p. 3 (2017)
Gupta, S., Hoffman, J., Malik, J.: Cross modal distillation for supervision transfer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2827–2836 (2016)
Holte, M.B., Moeslund, T.B., Nikolaidis, N., Pitas, I.: 3D human action recognition for multi-view camera systems. In: Proceedings of the International conference on 3D Imaging, Modeling, Processing, Visualization and Transmission, pp. 342–349 (2011)
Ji, X., Wang, C., Li, Y.: A view-invariant action recognition based on multi-view space hidden Markov models. Int. J. Hum. Robot. 11(01), 1450011 (2014)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Proceedings of Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Li, Y., Zhang, J., Cheng, Y., Huang, K., Tan, T.: DF2Net: discriminative feature learning and fusion network for RGB-D indoor scene classification. In: Proceedings of AAAI Conference on Artificial Intelligence (2018)
Lin, Y.C., Hu, M.C., Cheng, W.H., Hsieh, Y.H., Chen, H.M.: Human action recognition and retrieval using sole depth information. In: Proceedings of the ACM International Conference on Multimedia, pp. 1053–1056 (2012)
Maaten, L.V.D., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)
Nie, F., Cai, G., Li, X.: Multi-view clustering and semi-supervised classification with adaptive neighbours. In: Proceedings of AAAI Conference on Artificial Intelligence (2017)
Nie, F., Li, J., Li, X., et al.: Parameter-free auto-weighted multiple graph learning: a framework for multiview clustering and semi-supervised classification. In: Proceedings of International Joint Conferences on Artificial Intelligence, pp. 1881–1887 (2016)
Nie, F., Tian, L., Wang, R., Li, X.: Multiview semi-supervised learning model for image classification. IEEE Trans. Knowl. Data Eng. (2019)
Ofli, F., Chaudhry, R., Kurillo, G., Vidal, R., Bajcsy, R.: Berkeley MHAD: a comprehensive multimodal human action database. In: IEEE Workshop on Applications of Computer Vision, pp. 53–60 (2013)
Pagliari, D., Pinto, L.: Calibration of Kinect for Xbox one and comparison between the two generations of Microsoft sensors. Sensors 15, 27569–27589 (2015)
Rahmani, H., Mahmood, A., Huynh, D., Mian, A.: Histogram of oriented principal components for cross-view action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 38(12), 2430–2443 (2016)
Scholkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2001)
Verma, V., Lamb, A., Beckham, C., Courville, A., Mitliagkis, I., Bengio, Y.: Manifold mixup: encouraging meaningful on-manifold interpolation as a regularizer. stat 1050, vol. 13 (2018)
Wang, A., Cai, J., Lu, J., Cham, T.J.: Modality and component aware feature fusion for RGB-D scene classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5995–6004 (2016)
Wang, D., Ouyang, W., Li, W., Xu, D.: Dividing and aggregating network for multi-view action recognition. In: Proceedings of European Conference on Computer Vision (September 2018)
Wang, L., Ding, Z., Fu, Y.: Learning transferable subspace for human motion segmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence (2018)
Wang, L., Ding, Z., Fu, Y.: Low-rank transfer human motion segmentation. IEEE Trans. Image Process. 28(2), 1023–1034 (2019)
Wang, L., Ding, Z., Tao, Z., Liu, Y., Fu, Y.: Generative multi-view human action recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 6212–6221 (2019)
Wang, L., Liu, Y., Qin, C., Sun, G., Fu, Y.: Dual relation semi-supervised multi-label learning. In: Proceedings of the AAAI Conference on Artificial Intelligence (2020)
Wang, L., Sun, B., Robinson, J., Jing, T., Fu, Y.: EV-Action: electromyography-vision multi-modal action dataset. In: Proceedings of IEEE International Conference on Automatic Face and Gesture Recognition (2020)
Wang, L., et al.: Temporal segment networks: towards good practices for deep action recognition. In: Proceedings of European Conference on Machine Learning, pp. 20–36 (2016)
Wang, W., Zhou, Z.-H.: Analyzing co-training style algorithms. In: Kok, J.N., Koronacki, J., Mantaras, R.L., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 454–465. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74958-5_42
Yang, Y., Zhan, D.C., Sheng, X.R., Jiang, Y.: Semi-supervised multi-modal learning with incomplete modalities. In: Proceedings of International Joint Conferences on Artificial Intelligence, pp. 2998–3004 (2018)
Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: Mixup: beyond empirical risk minimization. In: Proceedings of International Conference on Learning Representations (2018)
Zhang, Z.: Microsoft Kinect sensor and its effect. IEEE Multimed. 19(2), 4–10 (2012)
Acknowledgement
This research is supported by the U.S. Army Research Office Award W911NF-17-1-0367.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Liu, Y., Wang, L., Bai, Y., Qin, C., Ding, Z., Fu, Y. (2020). Generative View-Correlation Adaptation for Semi-supervised Multi-view Learning. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12359. Springer, Cham. https://doi.org/10.1007/978-3-030-58568-6_19
Download citation
DOI: https://doi.org/10.1007/978-3-030-58568-6_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58567-9
Online ISBN: 978-3-030-58568-6
eBook Packages: Computer ScienceComputer Science (R0)