Skip to main content
Log in

An adversarial training-based mutual information constraint method

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

As an auxiliary loss function, the mutual information constraint is widely used in various deep learning tasks, such as deep reinforcement learning and representation learning. However, the degree of mutual information constraint is not concerned in many tasks. This paper discusses a situation in which mutual information must be constrained within an appropriate range. In this situation, it is challenging to calculate mutual information accurately. There is still an error between the obtained and ideal features when using mutual information constraints. Therefore, how to reduce this error is a problem worthy of study. This paper proposes a new method to utilize mutual information constraints to more precisely constrain features. This method, called the Adversarial Training-based Mutual Information Constraint (ATMIC) method, can be constructed using different mutual information estimation methods and widely applied in various tasks. ATMIC extracts noise from the obtained feature to attack the original mutual information constraint. The noise can force the encoder to reduce the error between the obtained and ideal features. Experiments show that our ATMIC method achieves better results than other mutual information constraint methods in permutation-invariant MNIST, fair learning, multimodal sentiment analysis, and GLUE tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. https://github.com/1Konny/VIB-pytorch

  2. https://github.com/zhouyiji/MIGE

References

  1. Krause, A, Perona, P, Gomes, R (2010) Discriminative clustering by regularized information maximization. Adv Neural Inf Process Syst 23

  2. Hu, W, Miyato, T, Tokui, S, Matsumoto, E, Sugiyama, M (2017) Learning discrete representations via information maximizing self-augmented training. In: International Conference on Machine Learning, pp 1558–1567 PMLR

  3. Alemi, A, Poole, B, Fischer, I, Dillon, J, Saurous, RA, Murphy, K (2018) Fixing a broken elbo. In: International Conference on Machine Learning, pp 159–168 PMLR

  4. Zhang, Y, Xiang, T, Hospedales, TM, Lu, H (2018) Deep mutual learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4320–4328

  5. Liu X, Yang C, You J, Kuo C-CJ, Kumar BV (2021) Mutual information regularized feature-level frankenstein for discriminative recognition. IEEE Trans Pattern Anal Mach Intell 44(9):5243–5260

    Google Scholar 

  6. Tishby, N, Zaslavsky, N (2015) Deep learning and the information bottleneck principle. In: 2015 IEEE Inf Theory Workshop (ITW), pp 1–5

  7. Wang, Y, Zhao, Y (2023) Arbitrary spatial trajectory reconstruction based on a single inertial sensor. IEEE Sens J

  8. Poole, B, Ozair, S, Van Den Oord, A, Alemi, A, Tucker, G (2019) On variational bounds of mutual information. In: International Conference on Machine Learning, pp 5171–5180 PMLR

  9. Tschannen, M, Djolonga, J, Rubenstein, PK, Gelly, S, Lucic, M (2019) On mutual information maximization for representation learning. In: International Conference on Learning Representations

  10. Hou, Z, Zhang, K, Wan, Y, Li, D, Fu, C, Yu, H (2020) Off-policy maximum entropy reinforcement learning: Soft actor-critic with advantage weighted mixture policy (sac-awmp). arXiv:2002.02829

  11. Chen, J, Li, K, Deng, Q, Li, K, Philip, SY (2019) Distributed deep learning model for intelligent video surveillance systems with edge computing. IEEE Trans Ind Informatic

  12. Xu C, Dai Y, Lin R, Wang S (2020) Deep clustering by maximizing mutual information in variational auto-encoder. Knowl-Based Syst 205:106260

    Article  Google Scholar 

  13. Tao X, Wang R, Chang R, Li C (2019) Density-sensitive fuzzy kernel maximum entropy clustering algorithm. Knowl-Based Syst 166:42–57

    Article  Google Scholar 

  14. Li W, Liang Z, Neuman J, Chen J, Cui X (2021) Multi-generator gan learning disconnected manifolds with mutual information. Knowl-Based Syst 212:106513

    Article  Google Scholar 

  15. Balaji, Y, Hassani, H, Chellappa, R, Feizi, S (2019) Entropic gans meet vaes: A statistical approach to compute sample likelihoods in gans. In: International Conference on Machine Learning, pp 414–423 PMLR

  16. Dieng, AB, Ruiz, FJ, Blei, DM, Titsias, MK (2019) Prescribed generative adversarial networks. arXiv:1910.04302

  17. Ahn, S, Hu, SX, Damianou, A, Lawrence, ND, Dai, Z (2019) Variational information distillation for knowledge transfer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 9163–9171

  18. Gholami B, Sahu P, Rudovic O, Bousmalis K, Pavlovic V (2020) Unsupervised multi-target domain adaptation: An information theoretic approach. IEEE Trans Image Process 29:3993–4002

  19. Van den Oord, A, Li, Y, Vinyals, O (2018) Representation learning with contrastive predictive coding. arXiv e-prints, 1807

  20. Hjelm, RD, Fedorov, A, Lavoie-Marchildon, S, Grewal, K, Bachman, P, Trischler, A, Bengio, Y (2018) Learning deep representations by mutual information estimation and maximization. In: International Conference on Learning Representations

  21. Alemi, AA, Fischer, I, Dillon, JV, Murphy, K (2016) Deep variational information bottleneck. arXiv:1612.00410

  22. Belghazi, MI, Baratin, A, Rajeshwar, S, Ozair, S, Bengio, Y, Courville, A, Hjelm, D (2018) Mutual information neural estimation. In: International Conference on Machine Learning, pp 531–540 PMLR

  23. Nowozin, S, Cseke, B, Tomioka, R (2016) f-gan: Training generative neural samplers using variational divergence minimization. Adv Neural Inf Process Syst 29

  24. Nguyen X, Wainwright MJ, Jordan MI (2010) Estimating divergence functionals and the likelihood ratio by convex risk minimization. IEEE Trans Inf Theory 56(11):5847–5861

    Article  MathSciNet  MATH  Google Scholar 

  25. Guo, Q, Chen, J, Wang, D, Yang, Y, Deng, X, Carin, L, Li, F, Tao, C (2021) Tight mutual information estimation with contrastive fenchel-legendre optimization. arXiv:2107.01131

  26. Colombo, P, Piantanida, P, Clavel, C (2021) A novel estimator of mutual information for learning to disentangle textual representations. In: Annual Meeting of the Association for Computational Linguistics

  27. Wen L, Bai H, He L, Zhou Y, Zhou M, Xu Z (2021) Gradient estimation of information measures in deep learning. Knowledge-Based Systems 224:107046

    Article  Google Scholar 

  28. Shi, J, Sun, S, Zhu, J (2018) A spectral approach to gradient estimation for implicit distributions. In: International Conference on Machine Learning, pp 4644–4653 PMLR

  29. Ahn, S, Hu, SX, Damianou, A, Lawrence, ND, Dai, Z (2019) Variational information distillation for knowledge transfer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 9163–9171

  30. Yang P, Ge Y, Yao Y, Yang Y (2022) Gcn-based document representation for keyphrase generation enhanced by maximizing mutual information. Knowl-Based Syst 243:08488

    Article  Google Scholar 

  31. Sanchez, EH, Serrurier, M, Ortner, M (2020) Learning disentangled representations via mutual information estimation. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXII 16, pp 205–221 Springer

  32. Kong, L, de Masson d’Autume, C, Yu, L, Ling, W, Dai, Z, Yogatama, D (2020) A mutual information maximization perspective of language representation learning. In: International Conference on Learning Representations

  33. Li, J, Selvaraju, R, Gotmare, A, Joty, S, Xiong, C, Hoi, SCH (2021) Align before fuse: Vision and language representation learning with momentum distillation. Adv Neural Inf Process Syst 34

  34. Asuncion A, Newman D (2007) UCI machine learning repository. Irvine, CA, USA

    Google Scholar 

  35. Nguyen X, Wainwright MJ, Jordan MI (2010) Estimating divergence functionals and the likelihood ratio by convex risk minimization. IEEE Trans Inf Theory 56(11):5847–5861

    Article  MathSciNet  MATH  Google Scholar 

  36. Han, W, Chen, H, Poria, S (2021) Improving multimodal fusion with hierarchical mutual information maximization for multimodal sentiment analysis. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp 9180–9192

  37. Zadeh A, Zellers R, Pincus E, Morency L-P (2016) Multimodal sentiment intensity analysis in videos: Facial gestures and verbal messages. IEEE Intell Syst 31(6):82–88

    Article  Google Scholar 

  38. Zadeh, A, Pu, P (2018) Multimodal language analysis in the wild: Cmu-mosei dataset and interpretable dynamic fusion graph. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Long Papers)

  39. Zadeh, A, Chen, M, Poria, S, Cambria, E, Morency, L-P (2017) Tensor fusion network for multimodal sentiment analysis. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp 1103–1114

  40. Liu, Z, Shen, Y, Lakshminarasimhan, VB, Liang, PP, Zadeh, AB, Morency, L-P (2018) Efficient low-rank multimodal fusion with modalityspecific factors. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 2247–2256

  41. Tsai, Y-HH, Liang, PP, Zadeh, A, Morency, L-P, Salakhutdinov, R (2019) Learning factorized multimodal representations. In: International Conference on Representation Learning

  42. Sun Z, Sarma P, Sethares W, Liang Y (2020) Learning relationships between text, audio, and video via deep canonical correlation for multimodal language analysis. Proceedings of the AAAI Conference on Artificial Intelligence 34:8992–8999

    Article  Google Scholar 

  43. Tsai, Y-HH, Bai, S, Liang, PP, Kolter, JZ, Morency, L-P, Salakhutdinov, R (2019) Multimodal transformer for unaligned multimodal language sequences. In: Proceedings of the Conference. Association for Computational Linguistics. Meeting, 2019:6558. NIH Public Access

  44. Hazarika, D, Zimmermann, R, Poria, S (2020) Misa: Modality-invariant and-specific representations for multimodal sentiment analysis. In: Proceedings of the 28th ACM International Conference on Multimedia, pp 1122–1131

  45. Rahman, W, Hasan, MK, Lee, S, Zadeh, A, Mao, C, Morency, L-P, Hoque, E (2020) Integrating multimodal information in large pretrained transformers. In: Proceedings of the Conference. Association for Computational Linguistics. Meeting, 2020:2359 NIH Public Access

  46. Yu W, Xu H, Yuan Z, Wu J (2021) Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis. Proceedings of the AAAI Conference on Artificial Intelligence 35:10790–10797

    Article  Google Scholar 

Download references

Funding

This work was supported by the Natural Science Foundation of China under Grants No.61966038 and No.62266051.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaobing Zhou.

Ethics declarations

Conflicts of interest

There are no conflicting interests known to the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, R., Zhang, X., Wang, J. et al. An adversarial training-based mutual information constraint method. Appl Intell 53, 24377–24392 (2023). https://doi.org/10.1007/s10489-023-04803-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-04803-1

Keywords

Navigation