Skip to main content
Log in

Minimum interpretation by autoencoder-based serial and enhanced mutual information production

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

The present paper aims to propose an information-theoretic method for interpreting the inference mechanism of neural networks. The new method aims to interpret the inference mechanism minimally by disentangling complex information into simpler and easily interpretable information. This disentanglement of complex information can be realized by maximizing mutual information between input patterns and the corresponding neurons. However, because the use of mutual information has faced difficulty in computation, we use the well-known autoencoder to increase mutual information by re-interpreting the sparsity constraint, which is considered a device to increase mutual information. The computational procedures to increase mutual information are decomposed into the serial operation of equal use of neurons and specific responses to input patterns. The specific responses are realized by enhancing the results by the equal use of neurons. The method was applied to three data sets: the glass, office equipment, and pulsar data sets. With all three data sets, we could observe that, when the number of neurons was forced to increase, mutual information could be increased. Then, collective weights, or average collectively treated weights, showed that the method could extract the simple and linear relations between inputs and targets, making it possible to interpret the inference mechanism minimally.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21

Similar content being viewed by others

Notes

  1. [x,y]=glass_dataset

References

  1. Linsker R (1988) Self-organization in a perceptual network. Computer 21(3):105–117

    Google Scholar 

  2. Linsker R (1989) How to generate ordered maps by maximizing the mutual information between input and output signals. Neural Comput 1(3):402–411

    Google Scholar 

  3. Linsker R (1992) Local synaptic learning rules suffice to maximize mutual information in a linear network. Neural Comput 4(5):691–702

    Google Scholar 

  4. Linsker R (2005) Improved local learning rule for information maximization and related applications. Neural Netw 18(3):261–265

    MATH  Google Scholar 

  5. Becker S (1996) Mutual information maximization: models of cortical self-organization. Netw Comput Neural Syst 7:7–31

    MATH  Google Scholar 

  6. Deco G, Finnoff W, Zimmermann H (1995) Unsupervised mutual information criterion for elimination of overtraining in supervised multilayer networks. Neural Comput 7(1):86–107

    Google Scholar 

  7. Deco G, Obradovic D (2012) An information-theoretic approach to neural computing. Springer Science & Business Media, Berlin

    MATH  Google Scholar 

  8. Principe JC, Xu D, Fisher J (2000) Information theoretic learning. Unsupervised Adaptive Filtering 1:265–319

    Google Scholar 

  9. Principe JC (2010) Information theoretic learning: Renyi’s entropy and kernel perspectives. Springer Science & Business Media, Berlin

    MATH  Google Scholar 

  10. Torkkola K (2003) Feature extraction by non-parametric mutual information maximization. J Mach Learn Res 3:1415–1438

    MathSciNet  MATH  Google Scholar 

  11. Fleuret F (2004) Fast binary feature selection with conditional mutual information. J Mach Learn Res 5 (Nov):1531–1555

    MathSciNet  MATH  Google Scholar 

  12. Chow TW, Huang D (2005) Estimating optimal feature subsets using efficient estimation of high-dimensional mutual information. IEEE Trans Neural Netw 16(1):213–224

    Google Scholar 

  13. Estévez P. A., Tesmer M, Perez CA, Zurada JM (2009) Normalized mutual information feature selection. IEEE Trans Neural Netw 20(2):189–201

    Google Scholar 

  14. Vergara JR, Estévez PA (2014) A review of feature selection methods based on mutual information. Neural Comput Appl 24(1):175–186

    Google Scholar 

  15. Goodman B, Flaxman S (2016) European union regulations on algorithmic decision-making and a right to explanation, arXiv:1606.08813

  16. Battiti R (1994) Using mutual information for selecting features in supervised neural net learning. IEEE Trans Neural Netw 5(4):537–550

    Google Scholar 

  17. Torkkola K (2001) Nonlinear feature transforms using maximum mutual information. In: International joint conference on neural networks, 2001. Proceedings. IJCNN?01, vol 4. IEEE, pp 2756?-2761

  18. Ng A (2011) Sparse autoencoder, vol. 72 of CS294a Lecture notes

  19. Bengio Y, Lamblin P, Popovici D, Larochelle H, et al. (2007) Greedy layer-wise training of deep networks. Advances in Neural Information Processing Systems 19:153–160

    Google Scholar 

  20. Vincent P, Larochelle H, Bengio Y, Manzagol P-A. (2008) Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on machine learning. ACM, pp 1096–1103

  21. Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol P-A (2010) Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11(Dec):3371–3408

    MathSciNet  MATH  Google Scholar 

  22. Xu J, Xiang L, Hang R, Wu J (2014) Stacked sparse autoencoder (ssae) based framework for nuclei patch classification on breast cancer histopathology. In: IEEE 11Th international symposium on biomedical imaging (ISBI). IEEE, p 2014

  23. Tao C, Pan H, Li Y, Zou Z (2015) Unsupervised spectral–spatial feature learning with stacked sparse autoencoder for hyperspectral imagery classification. IEEE Geoscience and Remote Sensing Letters 12(12):2438–2442

    Google Scholar 

  24. Deng J, Zhang Z, Marchi E, Schuller B (2013) Sparse autoencoder-based feature transfer learning for speech emotion recognition. In: Humaine association conference on affective computing and intelligent interaction, p 2013

  25. Bologna G (2004) Is it worth generating rules from neural network ensembles? J Appl Log 2(3):325–348

    MathSciNet  MATH  Google Scholar 

  26. Wall R, Cunningham P (2000) Exploring the potential for rule extraction from ensembles of neural networks. In: 11th Irish conference on artificial intelligence & cognitive science, pp 52–68

  27. Nishiuchi K (2015) Fundamental statistical analysis by excel (in japanese), Nikkei BigData

  28. Lyon RJ, Stappers B, Cooper S, Brooke J, Knowles J (2016) Fifty years of pulsar candidate selection: from simple filters to a new principled real-time classification approach. Mon Not R Astron Soc 459(1):1104–1123

    Google Scholar 

  29. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artificial Intell Res 16:321–357

    MATH  Google Scholar 

  30. Khiat S, Djamila H A temporal distributed group decision support system based on multi-criteria analysis. International Journal of Interactive Multimedia and Artificial Intelligence, vol. (in Press)

  31. O L-C, NN O (2015) A network based methodology to reveal patterns in knowledge transfer. Int J Interactive Multimed Artificial Intell 3:67–76

    Google Scholar 

  32. Craven M, Shavlik JW (1996) Extracting tree-structured representations of trained networks. In: Advances in neural information processing systems, pp 24–30

  33. Baehrens D, Schroeter T, Harmeling S, Kawanabe M, Hansen K, MÞller K-R (2010) How to explain individual classification decisions. J Mach Learn Res 11(Jun):1803–1831

    MathSciNet  MATH  Google Scholar 

  34. Kononenko I, et al. (2010) An efficient explanation of individual classifications using game theory. J Mach Learn Res 11(Jan):1–18

    MathSciNet  MATH  Google Scholar 

  35. Navarro A. ́ AM, Ger PM (2018) Comparison of clustering algorithms for learning analytics with educational datasets. IJIMAI 5(2):9–16

    Google Scholar 

  36. Dietterich TG (2000) Ensemble methods in machine learning. In: International workshop on multiple classifier systems. Springer, pp 1–15

  37. Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network, arXiv:1503.02531

  38. Li J, Zhao R, Huang J-T, Gong Y (2014) Learning small-size dnn with output-distribution-based criteria. In: Fifteenth annual conference of the international speech communication association

  39. Che Z, Purushotham S, Khemani R, Liu Y (2016) Interpretable deep models for icu outcome prediction. In: AMIA annual symposium proceedings, vol 2016. American Medical Informatics Association, p 371

  40. Adriana R, Nicolas B, Ebrahimi KS, Antoine C, Carlo G, Yoshua B (2015) Fitnets: Hints for thin deep nets. In: Proc. ICLR

  41. Ribeiro MT, Singh S, Guestrin C (2016) Why should i trust you?: Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1135–1144

  42. Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182

    MATH  Google Scholar 

  43. Rakotomamonjy A (2003) Variable selection using SVM-based criteria. J Mach Learn Res 3:1357–1370

    MathSciNet  MATH  Google Scholar 

  44. Perkins S, Lacker K, Theiler J (2003) Grafting: fast, incremental feature selection by gradient descent in function space. J Mach Learn Res 3:1333–1356

    MathSciNet  MATH  Google Scholar 

  45. Reunanen J (2003) Overfitting in making comparisons between variable selection methods. J Mach Learn Res 3:1371–1382

    MATH  Google Scholar 

  46. Caruana R, Sa V. R. d. (2003) Benefitting from the variables that variable selection discards. J Mach Learn Res 3(7-8):1245–1264

    MATH  Google Scholar 

  47. Kohavi R, John G (1997) Wrappers for feature subset selection. Artificial Intelligence 97(1):273–324

    MATH  Google Scholar 

  48. Blum A, Langley P (1997) Selection of relevant features and examples in machine learning. Artificial Intelligence 97(1):245–271

    MathSciNet  MATH  Google Scholar 

  49. Gros C (2009) Cognitive computation with autonomously active neural networks: an emerging field. Cogn Comput 1(1):77–90

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ryotaro Kamimura.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kamimura, R. Minimum interpretation by autoencoder-based serial and enhanced mutual information production. Appl Intell 50, 2423–2448 (2020). https://doi.org/10.1007/s10489-019-01619-w

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-019-01619-w

Keywords

Navigation