Abstract
As an auxiliary loss function, the mutual information constraint is widely used in various deep learning tasks, such as deep reinforcement learning and representation learning. However, the degree of mutual information constraint is not concerned in many tasks. This paper discusses a situation in which mutual information must be constrained within an appropriate range. In this situation, it is challenging to calculate mutual information accurately. There is still an error between the obtained and ideal features when using mutual information constraints. Therefore, how to reduce this error is a problem worthy of study. This paper proposes a new method to utilize mutual information constraints to more precisely constrain features. This method, called the Adversarial Training-based Mutual Information Constraint (ATMIC) method, can be constructed using different mutual information estimation methods and widely applied in various tasks. ATMIC extracts noise from the obtained feature to attack the original mutual information constraint. The noise can force the encoder to reduce the error between the obtained and ideal features. Experiments show that our ATMIC method achieves better results than other mutual information constraint methods in permutation-invariant MNIST, fair learning, multimodal sentiment analysis, and GLUE tasks.
Similar content being viewed by others
References
Krause, A, Perona, P, Gomes, R (2010) Discriminative clustering by regularized information maximization. Adv Neural Inf Process Syst 23
Hu, W, Miyato, T, Tokui, S, Matsumoto, E, Sugiyama, M (2017) Learning discrete representations via information maximizing self-augmented training. In: International Conference on Machine Learning, pp 1558–1567 PMLR
Alemi, A, Poole, B, Fischer, I, Dillon, J, Saurous, RA, Murphy, K (2018) Fixing a broken elbo. In: International Conference on Machine Learning, pp 159–168 PMLR
Zhang, Y, Xiang, T, Hospedales, TM, Lu, H (2018) Deep mutual learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4320–4328
Liu X, Yang C, You J, Kuo C-CJ, Kumar BV (2021) Mutual information regularized feature-level frankenstein for discriminative recognition. IEEE Trans Pattern Anal Mach Intell 44(9):5243–5260
Tishby, N, Zaslavsky, N (2015) Deep learning and the information bottleneck principle. In: 2015 IEEE Inf Theory Workshop (ITW), pp 1–5
Wang, Y, Zhao, Y (2023) Arbitrary spatial trajectory reconstruction based on a single inertial sensor. IEEE Sens J
Poole, B, Ozair, S, Van Den Oord, A, Alemi, A, Tucker, G (2019) On variational bounds of mutual information. In: International Conference on Machine Learning, pp 5171–5180 PMLR
Tschannen, M, Djolonga, J, Rubenstein, PK, Gelly, S, Lucic, M (2019) On mutual information maximization for representation learning. In: International Conference on Learning Representations
Hou, Z, Zhang, K, Wan, Y, Li, D, Fu, C, Yu, H (2020) Off-policy maximum entropy reinforcement learning: Soft actor-critic with advantage weighted mixture policy (sac-awmp). arXiv:2002.02829
Chen, J, Li, K, Deng, Q, Li, K, Philip, SY (2019) Distributed deep learning model for intelligent video surveillance systems with edge computing. IEEE Trans Ind Informatic
Xu C, Dai Y, Lin R, Wang S (2020) Deep clustering by maximizing mutual information in variational auto-encoder. Knowl-Based Syst 205:106260
Tao X, Wang R, Chang R, Li C (2019) Density-sensitive fuzzy kernel maximum entropy clustering algorithm. Knowl-Based Syst 166:42–57
Li W, Liang Z, Neuman J, Chen J, Cui X (2021) Multi-generator gan learning disconnected manifolds with mutual information. Knowl-Based Syst 212:106513
Balaji, Y, Hassani, H, Chellappa, R, Feizi, S (2019) Entropic gans meet vaes: A statistical approach to compute sample likelihoods in gans. In: International Conference on Machine Learning, pp 414–423 PMLR
Dieng, AB, Ruiz, FJ, Blei, DM, Titsias, MK (2019) Prescribed generative adversarial networks. arXiv:1910.04302
Ahn, S, Hu, SX, Damianou, A, Lawrence, ND, Dai, Z (2019) Variational information distillation for knowledge transfer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 9163–9171
Gholami B, Sahu P, Rudovic O, Bousmalis K, Pavlovic V (2020) Unsupervised multi-target domain adaptation: An information theoretic approach. IEEE Trans Image Process 29:3993–4002
Van den Oord, A, Li, Y, Vinyals, O (2018) Representation learning with contrastive predictive coding. arXiv e-prints, 1807
Hjelm, RD, Fedorov, A, Lavoie-Marchildon, S, Grewal, K, Bachman, P, Trischler, A, Bengio, Y (2018) Learning deep representations by mutual information estimation and maximization. In: International Conference on Learning Representations
Alemi, AA, Fischer, I, Dillon, JV, Murphy, K (2016) Deep variational information bottleneck. arXiv:1612.00410
Belghazi, MI, Baratin, A, Rajeshwar, S, Ozair, S, Bengio, Y, Courville, A, Hjelm, D (2018) Mutual information neural estimation. In: International Conference on Machine Learning, pp 531–540 PMLR
Nowozin, S, Cseke, B, Tomioka, R (2016) f-gan: Training generative neural samplers using variational divergence minimization. Adv Neural Inf Process Syst 29
Nguyen X, Wainwright MJ, Jordan MI (2010) Estimating divergence functionals and the likelihood ratio by convex risk minimization. IEEE Trans Inf Theory 56(11):5847–5861
Guo, Q, Chen, J, Wang, D, Yang, Y, Deng, X, Carin, L, Li, F, Tao, C (2021) Tight mutual information estimation with contrastive fenchel-legendre optimization. arXiv:2107.01131
Colombo, P, Piantanida, P, Clavel, C (2021) A novel estimator of mutual information for learning to disentangle textual representations. In: Annual Meeting of the Association for Computational Linguistics
Wen L, Bai H, He L, Zhou Y, Zhou M, Xu Z (2021) Gradient estimation of information measures in deep learning. Knowledge-Based Systems 224:107046
Shi, J, Sun, S, Zhu, J (2018) A spectral approach to gradient estimation for implicit distributions. In: International Conference on Machine Learning, pp 4644–4653 PMLR
Ahn, S, Hu, SX, Damianou, A, Lawrence, ND, Dai, Z (2019) Variational information distillation for knowledge transfer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 9163–9171
Yang P, Ge Y, Yao Y, Yang Y (2022) Gcn-based document representation for keyphrase generation enhanced by maximizing mutual information. Knowl-Based Syst 243:08488
Sanchez, EH, Serrurier, M, Ortner, M (2020) Learning disentangled representations via mutual information estimation. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXII 16, pp 205–221 Springer
Kong, L, de Masson d’Autume, C, Yu, L, Ling, W, Dai, Z, Yogatama, D (2020) A mutual information maximization perspective of language representation learning. In: International Conference on Learning Representations
Li, J, Selvaraju, R, Gotmare, A, Joty, S, Xiong, C, Hoi, SCH (2021) Align before fuse: Vision and language representation learning with momentum distillation. Adv Neural Inf Process Syst 34
Asuncion A, Newman D (2007) UCI machine learning repository. Irvine, CA, USA
Nguyen X, Wainwright MJ, Jordan MI (2010) Estimating divergence functionals and the likelihood ratio by convex risk minimization. IEEE Trans Inf Theory 56(11):5847–5861
Han, W, Chen, H, Poria, S (2021) Improving multimodal fusion with hierarchical mutual information maximization for multimodal sentiment analysis. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp 9180–9192
Zadeh A, Zellers R, Pincus E, Morency L-P (2016) Multimodal sentiment intensity analysis in videos: Facial gestures and verbal messages. IEEE Intell Syst 31(6):82–88
Zadeh, A, Pu, P (2018) Multimodal language analysis in the wild: Cmu-mosei dataset and interpretable dynamic fusion graph. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Long Papers)
Zadeh, A, Chen, M, Poria, S, Cambria, E, Morency, L-P (2017) Tensor fusion network for multimodal sentiment analysis. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp 1103–1114
Liu, Z, Shen, Y, Lakshminarasimhan, VB, Liang, PP, Zadeh, AB, Morency, L-P (2018) Efficient low-rank multimodal fusion with modalityspecific factors. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 2247–2256
Tsai, Y-HH, Liang, PP, Zadeh, A, Morency, L-P, Salakhutdinov, R (2019) Learning factorized multimodal representations. In: International Conference on Representation Learning
Sun Z, Sarma P, Sethares W, Liang Y (2020) Learning relationships between text, audio, and video via deep canonical correlation for multimodal language analysis. Proceedings of the AAAI Conference on Artificial Intelligence 34:8992–8999
Tsai, Y-HH, Bai, S, Liang, PP, Kolter, JZ, Morency, L-P, Salakhutdinov, R (2019) Multimodal transformer for unaligned multimodal language sequences. In: Proceedings of the Conference. Association for Computational Linguistics. Meeting, 2019:6558. NIH Public Access
Hazarika, D, Zimmermann, R, Poria, S (2020) Misa: Modality-invariant and-specific representations for multimodal sentiment analysis. In: Proceedings of the 28th ACM International Conference on Multimedia, pp 1122–1131
Rahman, W, Hasan, MK, Lee, S, Zadeh, A, Mao, C, Morency, L-P, Hoque, E (2020) Integrating multimodal information in large pretrained transformers. In: Proceedings of the Conference. Association for Computational Linguistics. Meeting, 2020:2359 NIH Public Access
Yu W, Xu H, Yuan Z, Wu J (2021) Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis. Proceedings of the AAAI Conference on Artificial Intelligence 35:10790–10797
Funding
This work was supported by the Natural Science Foundation of China under Grants No.61966038 and No.62266051.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
There are no conflicting interests known to the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, R., Zhang, X., Wang, J. et al. An adversarial training-based mutual information constraint method. Appl Intell 53, 24377–24392 (2023). https://doi.org/10.1007/s10489-023-04803-1
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-04803-1