Minimum interpretation by autoencoder-based serial and enhanced mutual information production

Kamimura, Ryotaro

doi:10.1007/s10489-019-01619-w

Minimum interpretation by autoencoder-based serial and enhanced mutual information production

Published: 09 March 2020

Volume 50, pages 2423–2448, (2020)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Ryotaro Kamimura¹

191 Accesses
1 Citation
Explore all metrics

Abstract

The present paper aims to propose an information-theoretic method for interpreting the inference mechanism of neural networks. The new method aims to interpret the inference mechanism minimally by disentangling complex information into simpler and easily interpretable information. This disentanglement of complex information can be realized by maximizing mutual information between input patterns and the corresponding neurons. However, because the use of mutual information has faced difficulty in computation, we use the well-known autoencoder to increase mutual information by re-interpreting the sparsity constraint, which is considered a device to increase mutual information. The computational procedures to increase mutual information are decomposed into the serial operation of equal use of neurons and specific responses to input patterns. The specific responses are realized by enhancing the results by the equal use of neurons. The method was applied to three data sets: the glass, office equipment, and pulsar data sets. With all three data sets, we could observe that, when the number of neurons was forced to increase, mutual information could be increased. Then, collective weights, or average collectively treated weights, showed that the method could extract the simple and linear relations between inputs and targets, making it possible to interpret the inference mechanism minimally.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Information Theoretic Approach to the Autoencoder

A Local Learning Rule for Independent Component Analysis

Article Open access 21 June 2016

Takuya Isomura & Taro Toyoizumi

Probabilistic auto-associative models and semi-linear PCA

Article 03 September 2014

Serge Iovleff

Notes

[x,y]=glass_dataset

References

Linsker R (1988) Self-organization in a perceptual network. Computer 21(3):105–117
Google Scholar
Linsker R (1989) How to generate ordered maps by maximizing the mutual information between input and output signals. Neural Comput 1(3):402–411
Google Scholar
Linsker R (1992) Local synaptic learning rules suffice to maximize mutual information in a linear network. Neural Comput 4(5):691–702
Google Scholar
Linsker R (2005) Improved local learning rule for information maximization and related applications. Neural Netw 18(3):261–265
MATH Google Scholar
Becker S (1996) Mutual information maximization: models of cortical self-organization. Netw Comput Neural Syst 7:7–31
MATH Google Scholar
Deco G, Finnoff W, Zimmermann H (1995) Unsupervised mutual information criterion for elimination of overtraining in supervised multilayer networks. Neural Comput 7(1):86–107
Google Scholar
Deco G, Obradovic D (2012) An information-theoretic approach to neural computing. Springer Science & Business Media, Berlin
MATH Google Scholar
Principe JC, Xu D, Fisher J (2000) Information theoretic learning. Unsupervised Adaptive Filtering 1:265–319
Google Scholar
Principe JC (2010) Information theoretic learning: Renyi’s entropy and kernel perspectives. Springer Science & Business Media, Berlin
MATH Google Scholar
Torkkola K (2003) Feature extraction by non-parametric mutual information maximization. J Mach Learn Res 3:1415–1438
MathSciNet MATH Google Scholar
Fleuret F (2004) Fast binary feature selection with conditional mutual information. J Mach Learn Res 5 (Nov):1531–1555
MathSciNet MATH Google Scholar
Chow TW, Huang D (2005) Estimating optimal feature subsets using efficient estimation of high-dimensional mutual information. IEEE Trans Neural Netw 16(1):213–224
Google Scholar
Estévez P. A., Tesmer M, Perez CA, Zurada JM (2009) Normalized mutual information feature selection. IEEE Trans Neural Netw 20(2):189–201
Google Scholar
Vergara JR, Estévez PA (2014) A review of feature selection methods based on mutual information. Neural Comput Appl 24(1):175–186
Google Scholar
Goodman B, Flaxman S (2016) European union regulations on algorithmic decision-making and a right to explanation, arXiv:1606.08813
Battiti R (1994) Using mutual information for selecting features in supervised neural net learning. IEEE Trans Neural Netw 5(4):537–550
Google Scholar
Torkkola K (2001) Nonlinear feature transforms using maximum mutual information. In: International joint conference on neural networks, 2001. Proceedings. IJCNN?01, vol 4. IEEE, pp 2756?-2761
Ng A (2011) Sparse autoencoder, vol. 72 of CS294a Lecture notes
Bengio Y, Lamblin P, Popovici D, Larochelle H, et al. (2007) Greedy layer-wise training of deep networks. Advances in Neural Information Processing Systems 19:153–160
Google Scholar
Vincent P, Larochelle H, Bengio Y, Manzagol P-A. (2008) Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on machine learning. ACM, pp 1096–1103
Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol P-A (2010) Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11(Dec):3371–3408
MathSciNet MATH Google Scholar
Xu J, Xiang L, Hang R, Wu J (2014) Stacked sparse autoencoder (ssae) based framework for nuclei patch classification on breast cancer histopathology. In: IEEE 11Th international symposium on biomedical imaging (ISBI). IEEE, p 2014
Tao C, Pan H, Li Y, Zou Z (2015) Unsupervised spectral–spatial feature learning with stacked sparse autoencoder for hyperspectral imagery classification. IEEE Geoscience and Remote Sensing Letters 12(12):2438–2442
Google Scholar
Deng J, Zhang Z, Marchi E, Schuller B (2013) Sparse autoencoder-based feature transfer learning for speech emotion recognition. In: Humaine association conference on affective computing and intelligent interaction, p 2013
Bologna G (2004) Is it worth generating rules from neural network ensembles? J Appl Log 2(3):325–348
MathSciNet MATH Google Scholar
Wall R, Cunningham P (2000) Exploring the potential for rule extraction from ensembles of neural networks. In: 11th Irish conference on artificial intelligence & cognitive science, pp 52–68
Nishiuchi K (2015) Fundamental statistical analysis by excel (in japanese), Nikkei BigData
Lyon RJ, Stappers B, Cooper S, Brooke J, Knowles J (2016) Fifty years of pulsar candidate selection: from simple filters to a new principled real-time classification approach. Mon Not R Astron Soc 459(1):1104–1123
Google Scholar
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artificial Intell Res 16:321–357
MATH Google Scholar
Khiat S, Djamila H A temporal distributed group decision support system based on multi-criteria analysis. International Journal of Interactive Multimedia and Artificial Intelligence, vol. (in Press)
O L-C, NN O (2015) A network based methodology to reveal patterns in knowledge transfer. Int J Interactive Multimed Artificial Intell 3:67–76
Google Scholar
Craven M, Shavlik JW (1996) Extracting tree-structured representations of trained networks. In: Advances in neural information processing systems, pp 24–30
Baehrens D, Schroeter T, Harmeling S, Kawanabe M, Hansen K, MÃžller K-R (2010) How to explain individual classification decisions. J Mach Learn Res 11(Jun):1803–1831
MathSciNet MATH Google Scholar
Kononenko I, et al. (2010) An efficient explanation of individual classifications using game theory. J Mach Learn Res 11(Jan):1–18
MathSciNet MATH Google Scholar
Navarro A. ́ AM, Ger PM (2018) Comparison of clustering algorithms for learning analytics with educational datasets. IJIMAI 5(2):9–16
Google Scholar
Dietterich TG (2000) Ensemble methods in machine learning. In: International workshop on multiple classifier systems. Springer, pp 1–15
Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network, arXiv:1503.02531
Li J, Zhao R, Huang J-T, Gong Y (2014) Learning small-size dnn with output-distribution-based criteria. In: Fifteenth annual conference of the international speech communication association
Che Z, Purushotham S, Khemani R, Liu Y (2016) Interpretable deep models for icu outcome prediction. In: AMIA annual symposium proceedings, vol 2016. American Medical Informatics Association, p 371
Adriana R, Nicolas B, Ebrahimi KS, Antoine C, Carlo G, Yoshua B (2015) Fitnets: Hints for thin deep nets. In: Proc. ICLR
Ribeiro MT, Singh S, Guestrin C (2016) Why should i trust you?: Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1135–1144
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
MATH Google Scholar
Rakotomamonjy A (2003) Variable selection using SVM-based criteria. J Mach Learn Res 3:1357–1370
MathSciNet MATH Google Scholar
Perkins S, Lacker K, Theiler J (2003) Grafting: fast, incremental feature selection by gradient descent in function space. J Mach Learn Res 3:1333–1356
MathSciNet MATH Google Scholar
Reunanen J (2003) Overfitting in making comparisons between variable selection methods. J Mach Learn Res 3:1371–1382
MATH Google Scholar
Caruana R, Sa V. R. d. (2003) Benefitting from the variables that variable selection discards. J Mach Learn Res 3(7-8):1245–1264
MATH Google Scholar
Kohavi R, John G (1997) Wrappers for feature subset selection. Artificial Intelligence 97(1):273–324
MATH Google Scholar
Blum A, Langley P (1997) Selection of relevant features and examples in machine learning. Artificial Intelligence 97(1):245–271
MathSciNet MATH Google Scholar
Gros C (2009) Cognitive computation with autonomously active neural networks: an emerging field. Cogn Comput 1(1):77–90
Google Scholar

Download references

Author information

Authors and Affiliations

IT Education Center, Tokai University, 4-1-1 Kitakaname, Hiratsuka, Kanagawa, 259-1292, Japan
Ryotaro Kamimura

Authors

Ryotaro Kamimura
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ryotaro Kamimura.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kamimura, R. Minimum interpretation by autoencoder-based serial and enhanced mutual information production. Appl Intell 50, 2423–2448 (2020). https://doi.org/10.1007/s10489-019-01619-w

Download citation

Published: 09 March 2020
Issue Date: August 2020
DOI: https://doi.org/10.1007/s10489-019-01619-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Minimum interpretation by autoencoder-based serial and enhanced mutual information production

Abstract

Access this article

Similar content being viewed by others

An Information Theoretic Approach to the Autoencoder

A Local Learning Rule for Independent Component Analysis

Probabilistic auto-associative models and semi-linear PCA

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Abstract

Access this article

Similar content being viewed by others

An Information Theoretic Approach to the Autoencoder

A Local Learning Rule for Independent Component Analysis

Probabilistic auto-associative models and semi-linear PCA

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation