Skip to main content

Generative View-Correlation Adaptation for Semi-supervised Multi-view Learning

  • Conference paper
  • First Online:
Computer Vision – ECCV 2020 (ECCV 2020)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12359))

Included in the following conference series:

Abstract

Multi-view learning (MVL) explores the data extracted from multiple resources. It assumes that the complementary information between different views could be revealed to further improve the learning performance. There are two challenges. First, it is difficult to effectively combine the different view data while still fully preserve the view-specific information. Second, multi-view datasets are usually small, which means the model can be easily overfitted. To address the challenges, we propose a novel View-Correlation Adaptation (VCA) framework in semi-supervised fashion. A semi-supervised data augmentation me-thod is designed to generate extra features and labels based on both labeled and unlabeled samples. In addition, a cross-view adversarial training strategy is proposed to explore the structural information from one view and help the representation learning of the other view. Moreover, an effective and simple fusion network is proposed for the late fusion stage. In our model, all networks are jointly trained in an end-to-end fashion. Extensive experiments demonstrate that our approach is effective and stable compared with other state-of-the-art methods (Code is available on: https://github.com/wenwen0319/GVCA).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Azad, R., Asadi-Aghbolaghi, M., Kasaei, S., Escalera, S.: Dynamic 3D hand gesture recognition by learning weighted depth motion maps. IEEE Trans. Circuits Syst. Video Technol. 29, 1729–1740 (2018)

    Article  Google Scholar 

  2. Banica, D., Sminchisescu, C.: Second-order constrained parametric proposals and sequential search-based structured prediction for semantic segmentation in RGB-D images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3517–3526 (2015)

    Google Scholar 

  3. Berthelot, D., Carlini, N., Goodfellow, I., Papernot, N., Oliver, A., Raffel, C.: MixMatch: a holistic approach to semi-supervised learning. arXiv preprint arXiv:1905.02249 (2019)

  4. Cai, Z., Wang, L., Peng, X., Qiao, Y.: Multi-view super vector for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 596–603 (2014)

    Google Scholar 

  5. Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27 (2011)

    Article  Google Scholar 

  6. Chapelle, O., Scholkopf, B., Zien, A.: Semi-supervised learning. IEEE Trans. Neural Netw. 20(3), 542 (2009)

    Article  Google Scholar 

  7. Cheng, Y., Zhao, X., Cai, R., Li, Z., Huang, K., Rui, Y., et al.: Semi-supervised multimodal deep learning for RGB-D object recognition (2016)

    Google Scholar 

  8. Cheng, Z., Qin, L., Ye, Y., Huang, Q., Tian, Q.: Human daily action analysis with multi-view and color-depth data. In: Fusiello, A., Murino, V., Cucchiara, R. (eds.) ECCV 2012. LNCS, vol. 7584, pp. 52–61. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33868-7_6

    Chapter  Google Scholar 

  9. Ding, Z., Shao, M., Fu, Y.: Robust multi-view representation: a unified perspective from multi-view learning to domain adaption. In: Proceedings of the International Joint Conferences on Artificial Intelligence, pp. 5434–5440 (2018)

    Google Scholar 

  10. Du, D., Wang, L., Wang, H., Zhao, K., Wu, G.: Translate-to-recognize networks for RGB-D scene recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 11836–11845 (2019)

    Google Scholar 

  11. Girdhar, R., Ramanan, D., Gupta, A., Sivic, J., Russell, B.: ActionVLAD: learning spatio-temporal aggregation for action classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, p. 3 (2017)

    Google Scholar 

  12. Gupta, S., Hoffman, J., Malik, J.: Cross modal distillation for supervision transfer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2827–2836 (2016)

    Google Scholar 

  13. Holte, M.B., Moeslund, T.B., Nikolaidis, N., Pitas, I.: 3D human action recognition for multi-view camera systems. In: Proceedings of the International conference on 3D Imaging, Modeling, Processing, Visualization and Transmission, pp. 342–349 (2011)

    Google Scholar 

  14. Ji, X., Wang, C., Li, Y.: A view-invariant action recognition based on multi-view space hidden Markov models. Int. J. Hum. Robot. 11(01), 1450011 (2014)

    Article  Google Scholar 

  15. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Proceedings of Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

    Google Scholar 

  16. Li, Y., Zhang, J., Cheng, Y., Huang, K., Tan, T.: DF2Net: discriminative feature learning and fusion network for RGB-D indoor scene classification. In: Proceedings of AAAI Conference on Artificial Intelligence (2018)

    Google Scholar 

  17. Lin, Y.C., Hu, M.C., Cheng, W.H., Hsieh, Y.H., Chen, H.M.: Human action recognition and retrieval using sole depth information. In: Proceedings of the ACM International Conference on Multimedia, pp. 1053–1056 (2012)

    Google Scholar 

  18. Maaten, L.V.D., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)

    MATH  Google Scholar 

  19. Nie, F., Cai, G., Li, X.: Multi-view clustering and semi-supervised classification with adaptive neighbours. In: Proceedings of AAAI Conference on Artificial Intelligence (2017)

    Google Scholar 

  20. Nie, F., Li, J., Li, X., et al.: Parameter-free auto-weighted multiple graph learning: a framework for multiview clustering and semi-supervised classification. In: Proceedings of International Joint Conferences on Artificial Intelligence, pp. 1881–1887 (2016)

    Google Scholar 

  21. Nie, F., Tian, L., Wang, R., Li, X.: Multiview semi-supervised learning model for image classification. IEEE Trans. Knowl. Data Eng. (2019)

    Google Scholar 

  22. Ofli, F., Chaudhry, R., Kurillo, G., Vidal, R., Bajcsy, R.: Berkeley MHAD: a comprehensive multimodal human action database. In: IEEE Workshop on Applications of Computer Vision, pp. 53–60 (2013)

    Google Scholar 

  23. Pagliari, D., Pinto, L.: Calibration of Kinect for Xbox one and comparison between the two generations of Microsoft sensors. Sensors 15, 27569–27589 (2015)

    Article  Google Scholar 

  24. Rahmani, H., Mahmood, A., Huynh, D., Mian, A.: Histogram of oriented principal components for cross-view action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 38(12), 2430–2443 (2016)

    Article  Google Scholar 

  25. Scholkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2001)

    Google Scholar 

  26. Verma, V., Lamb, A., Beckham, C., Courville, A., Mitliagkis, I., Bengio, Y.: Manifold mixup: encouraging meaningful on-manifold interpolation as a regularizer. stat 1050, vol. 13 (2018)

    Google Scholar 

  27. Wang, A., Cai, J., Lu, J., Cham, T.J.: Modality and component aware feature fusion for RGB-D scene classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5995–6004 (2016)

    Google Scholar 

  28. Wang, D., Ouyang, W., Li, W., Xu, D.: Dividing and aggregating network for multi-view action recognition. In: Proceedings of European Conference on Computer Vision (September 2018)

    Google Scholar 

  29. Wang, L., Ding, Z., Fu, Y.: Learning transferable subspace for human motion segmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence (2018)

    Google Scholar 

  30. Wang, L., Ding, Z., Fu, Y.: Low-rank transfer human motion segmentation. IEEE Trans. Image Process. 28(2), 1023–1034 (2019)

    Article  MathSciNet  Google Scholar 

  31. Wang, L., Ding, Z., Tao, Z., Liu, Y., Fu, Y.: Generative multi-view human action recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 6212–6221 (2019)

    Google Scholar 

  32. Wang, L., Liu, Y., Qin, C., Sun, G., Fu, Y.: Dual relation semi-supervised multi-label learning. In: Proceedings of the AAAI Conference on Artificial Intelligence (2020)

    Google Scholar 

  33. Wang, L., Sun, B., Robinson, J., Jing, T., Fu, Y.: EV-Action: electromyography-vision multi-modal action dataset. In: Proceedings of IEEE International Conference on Automatic Face and Gesture Recognition (2020)

    Google Scholar 

  34. Wang, L., et al.: Temporal segment networks: towards good practices for deep action recognition. In: Proceedings of European Conference on Machine Learning, pp. 20–36 (2016)

    Google Scholar 

  35. Wang, W., Zhou, Z.-H.: Analyzing co-training style algorithms. In: Kok, J.N., Koronacki, J., Mantaras, R.L., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 454–465. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74958-5_42

    Chapter  Google Scholar 

  36. Yang, Y., Zhan, D.C., Sheng, X.R., Jiang, Y.: Semi-supervised multi-modal learning with incomplete modalities. In: Proceedings of International Joint Conferences on Artificial Intelligence, pp. 2998–3004 (2018)

    Google Scholar 

  37. Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: Mixup: beyond empirical risk minimization. In: Proceedings of International Conference on Learning Representations (2018)

    Google Scholar 

  38. Zhang, Z.: Microsoft Kinect sensor and its effect. IEEE Multimed. 19(2), 4–10 (2012)

    Article  Google Scholar 

Download references

Acknowledgement

This research is supported by the U.S. Army Research Office Award W911NF-17-1-0367.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yunyu Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, Y., Wang, L., Bai, Y., Qin, C., Ding, Z., Fu, Y. (2020). Generative View-Correlation Adaptation for Semi-supervised Multi-view Learning. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12359. Springer, Cham. https://doi.org/10.1007/978-3-030-58568-6_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58568-6_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58567-9

  • Online ISBN: 978-3-030-58568-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics