Learning from Examples with Information Theoretic Criteria

Principe, Jose C.; Xu, Dongxin; Zhao, Qun; Fisher, John W.

doi:10.1023/A:1008143417156

Jose C. Principe¹,
Dongxin Xu¹,
Qun Zhao¹ &
…
John W. Fisher III¹

394 Accesses
91 Citations
Explore all metrics

Abstract

This paper discusses a framework for learning based on information theoretic criteria. A novel algorithm based on Renyi's quadratic entropy is used to train, directly from a data set, linear or nonlinear mappers for entropy maximization or minimization. We provide an intriguing analogy between the computation and an information potential measuring the interactions among the data samples. We also propose two approximations to the Kulback-Leibler divergence based on quadratic distances (Cauchy-Schwartz inequality and Euclidean distance). These distances can still be computed using the information potential. We test the newly proposed distances in blind source separation (unsupervised learning) and in feature extraction for classification (supervised learning). In blind source separation our algorithm is capable of separating instantaneously mixed sources, and for classification the performance of our classifier is comparable to the support vector machines (SVMs).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A review of unsupervised feature selection methods

Article 29 January 2019

Tutorial on PCA and approximate PCA and approximate kernel PCA

Article Open access 31 October 2022

Introduction to Bayesian Inference for Psychology

Article 04 April 2017

References

V. Vapnik, Statistical Learning Theory, Wiley, 1998.
H. Barlow, “Unsupervised Learning,” Neural Computation, vol. 1, 1989, pp. 295–311.
Article Google Scholar
P. Foldiak, “Adaptive Network for Optimal Linear Feature Extraction,” IEEE Int. Joint Conf. Neural Net., vol. 1, 1989, pp. 401–405.
Article Google Scholar
B. Olshausen and D. Fields, “Sparse Coding with an Overcomplete Basis Set: A Strategy Employed by V1,” Vision Research, vol. 37, 1997, pp. 3311–3325.
Article Google Scholar
R. Linsker, “An Application of the Principle of Maximum Information Preservation to Linear Systems,” in Advances in Neural Information Processing Systems, vol. 1, Morgan-Kaufman, 1988, pp. 485–494.
Google Scholar
C. Shannon and W. Weaver, The Mathematical Theory of Communication, University of Illinois Press, 1949.
T. Cover and J. Thomas, Elements of Information Theory, Wiley, 1991.
G. Deco and D. Obradovic, An Information-Theoretic Approach to Neural Computing, New York: Springer, 1996.
Book MATH Google Scholar
M. Plumbley and F. Fallside, “An Information Theoretic Approach to Unsupervised Networks,” Int. J. Conf. on Neural Nets, Washington, DC, 1989, vol. 2, p. 598.
Google Scholar
J.W. Fisher III, “Nonlinear Extensions to the Minimum Average Correlation Energy Filter,” Ph.D. Dissertation, Dept. of ECE, University of Florida, 1997.
D. Xu, “Energy, Entropy and Information Potential for Neural Computation,” Ph.D. Dissertation, U. of Florida, 1999.
D. Xu, J. Principe, J. Fisher, and H.-C. Wu, “A Novel Measure for Independent Component Analysis (ICA),” in Proc. ICASSP'98, vol. II, 1998, pp. 1161–1164.
Google Scholar
D. Xu, J. Fisher, and J. Principe, “Mutual Information Approach to Pose Estimation,” in Proc. SPIE, vol. 3370, 1998, pp. 218–229. Algorithms for Synthetic Aperture Radar Imagery V.
Article Google Scholar
A. Bell and T. Sejnowski, “An Information-Maximization Approach to Blind Separation and Blind Deconvolution,” Neural Computation, vol. 7, 1995, pp. 1129–1159.
Article Google Scholar
J. Fisher, A. Ihler, and P. Viola, “Learning Informative Statistics: A Nonparametric Approach,” Proc. of Neural Information Proc. Systems, vol. 12, in press.
J.N. Kapur, Measures of Information and Their Applications, John Wiley & Sons, 1994.
S. Amari, A. Chichocki, and H. Yang, “A New Learning Algorithm for Blind Source Separation,” Advances of Information Processing Systems, vol. 8, 1996, pp. 757–763.
Google Scholar
E. Jaynes, “Information Theory and Statistical Mechanics,” Physical Review, vol. 106, 1957, pp. 620–630.
Article MathSciNet MATH Google Scholar
K. Diamantaras and S. Kung, Principal Component Neural Networks: Theory and Applications, Wiley, 1996.
S. Haykin, Adaptive Filter Theory, Prentice Hall, 1986.
J. Principe, D. Xu, and J. Fisher, “Information Theoretic Learning,” in Unsupervised Adaptive Filtering, Haykin (Ed.), Wiley, 2000, pp. 265–319.
S. Haykin, Neural Networks, A Comprehensive Foundation, Macmillan Publishing Company, 1998.
J. Lin, “Divergence Measures Based on Shannon Entropy,” IEEE Trans. Inform. Theory, vol. 37, no.1, 1991, pp. 145–151.
Article MathSciNet MATH Google Scholar
J. Principe, “From Linear Adaptive to Information Filtering,” in IEEE Workshop Neural Nets for Sig. Proc., Key note address, Cambridge, England, Aug. 1998.
R. Fano, Transmission of information, MIT Press, 1961.
M. Hellman and J. Raviv, “Probability of Error, Equivocation and the Chernoff Bound,” IEEE Trans. Inform. Theory, vol. IT-16, no.4, 1970, pp. 368–372.
Article MathSciNet Google Scholar
A. Renyi, “Some Fundamental Questions of Information Theory,” in Selected Papers of Alfred Renyi, vol. 2, Budapest: Akademic Kiado, 1976.
Google Scholar
I. Grassberger and I. Proccacia, “Measuring the Strangeness of Strange Attractors,” Physica D, vol. 9, 1983, pp. 189–208.
Article MathSciNet MATH Google Scholar
P. Viola, N. Schraudolph, and T. Sejnowski, “Empirical Entropy Manipulation for Real-World Problems,” in Proc. Neural Info. Proc. Sys. (NIPS 8) Conf., 1995, pp. 851–857.
E. Parzen, “On the Estimation of a Probability Density Function and the Mode,” Ann. Math. Stat., vol. 33, 1962, p. 1065.
Article MathSciNet MATH Google Scholar
D.E. Rumelhart, G.E. Hinton, and J.R. Williams, “Learning Representations by Back-Propagating Errors,” Nature (London), vol. 323, 1986, pp. 533–536.
Article Google Scholar
C. Diks, W. Zwet, F. Takens, and J. DeGoede, “Detecting Differences Between Delay Vector Distributions,” Physical Rev E, vol. 53, no.3, 1996, pp. 2169–2176.
Article Google Scholar
H.C. Wu and J. Principe, “Novel Quadratic Entropy Measures and their Application to Blind Source Separation/Extraction,” in IEEE Workshop Neural Networks Sig. Proc.1999, accepted.
MSTAR (public) Targets, CDROM, Veda Inc. Ohio, 1997.
Google Scholar
V. Velten, T. Ross, J. Mossing, S. Worrell, and M. Bryant, “Standard SAR/ATR Evaluation Experiments Using the MSTAR Public Release Data Set,” Research Report, Wright State U., 1998.
Q. Zhao and J. Principe, “From Hyperplanes to Large Margin Classifiers: Appllications to SAR/ATR,” in Proc. SPIE 13th Annual Int. Sym. Aerospace/Defense Sensing, Simulation and Control, 1999, vol. 3718.
T. Friess, “Support Vector Neural Networks: The Kernel Adatron with Bias and Soft Margin,” Research Report, U. of Sheffield, UK, 1998.
Google Scholar
M. Gori and F. Scarselli, “Are Multilayer Perceptrons Adequate for Pattern Recognition and Verification?” IEEE Trans. Pattern Analysis and Machine Intell., vol. 20, no.11, 1998, pp. 1121–1132.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computational NeuroEngineering Laboratory, University of Florida, Gainesville, FL, 32611, USA
Jose C. Principe, Dongxin Xu, Qun Zhao & John W. Fisher III

Authors

Jose C. Principe
View author publications
You can also search for this author in PubMed Google Scholar
Dongxin Xu
View author publications
You can also search for this author in PubMed Google Scholar
Qun Zhao
View author publications
You can also search for this author in PubMed Google Scholar
John W. Fisher III
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Principe, J.C., Xu, D., Zhao, Q. et al. Learning from Examples with Information Theoretic Criteria. The Journal of VLSI Signal Processing-Systems for Signal, Image, and Video Technology 26, 61–77 (2000). https://doi.org/10.1023/A:1008143417156

Download citation

Published: 01 August 2000
Issue Date: August 2000
DOI: https://doi.org/10.1023/A:1008143417156

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning from Examples with Information Theoretic Criteria

Abstract

Access this article

Similar content being viewed by others

A review of unsupervised feature selection methods

Tutorial on PCA and approximate PCA and approximate kernel PCA

Introduction to Bayesian Inference for Psychology

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning from Examples with Information Theoretic Criteria

Abstract

Access this article

Similar content being viewed by others

A review of unsupervised feature selection methods

Tutorial on PCA and approximate PCA and approximate kernel PCA

Introduction to Bayesian Inference for Psychology

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation