Abstract
This paper discusses a framework for learning based on information theoretic criteria. A novel algorithm based on Renyi's quadratic entropy is used to train, directly from a data set, linear or nonlinear mappers for entropy maximization or minimization. We provide an intriguing analogy between the computation and an information potential measuring the interactions among the data samples. We also propose two approximations to the Kulback-Leibler divergence based on quadratic distances (Cauchy-Schwartz inequality and Euclidean distance). These distances can still be computed using the information potential. We test the newly proposed distances in blind source separation (unsupervised learning) and in feature extraction for classification (supervised learning). In blind source separation our algorithm is capable of separating instantaneously mixed sources, and for classification the performance of our classifier is comparable to the support vector machines (SVMs).
Similar content being viewed by others
References
V. Vapnik, Statistical Learning Theory, Wiley, 1998.
H. Barlow, “Unsupervised Learning,” Neural Computation, vol. 1, 1989, pp. 295–311.
P. Foldiak, “Adaptive Network for Optimal Linear Feature Extraction,” IEEE Int. Joint Conf. Neural Net., vol. 1, 1989, pp. 401–405.
B. Olshausen and D. Fields, “Sparse Coding with an Overcomplete Basis Set: A Strategy Employed by V1,” Vision Research, vol. 37, 1997, pp. 3311–3325.
R. Linsker, “An Application of the Principle of Maximum Information Preservation to Linear Systems,” in Advances in Neural Information Processing Systems, vol. 1, Morgan-Kaufman, 1988, pp. 485–494.
C. Shannon and W. Weaver, The Mathematical Theory of Communication, University of Illinois Press, 1949.
T. Cover and J. Thomas, Elements of Information Theory, Wiley, 1991.
G. Deco and D. Obradovic, An Information-Theoretic Approach to Neural Computing, New York: Springer, 1996.
M. Plumbley and F. Fallside, “An Information Theoretic Approach to Unsupervised Networks,” Int. J. Conf. on Neural Nets, Washington, DC, 1989, vol. 2, p. 598.
J.W. Fisher III, “Nonlinear Extensions to the Minimum Average Correlation Energy Filter,” Ph.D. Dissertation, Dept. of ECE, University of Florida, 1997.
D. Xu, “Energy, Entropy and Information Potential for Neural Computation,” Ph.D. Dissertation, U. of Florida, 1999.
D. Xu, J. Principe, J. Fisher, and H.-C. Wu, “A Novel Measure for Independent Component Analysis (ICA),” in Proc. ICASSP'98, vol. II, 1998, pp. 1161–1164.
D. Xu, J. Fisher, and J. Principe, “Mutual Information Approach to Pose Estimation,” in Proc. SPIE, vol. 3370, 1998, pp. 218–229. Algorithms for Synthetic Aperture Radar Imagery V.
A. Bell and T. Sejnowski, “An Information-Maximization Approach to Blind Separation and Blind Deconvolution,” Neural Computation, vol. 7, 1995, pp. 1129–1159.
J. Fisher, A. Ihler, and P. Viola, “Learning Informative Statistics: A Nonparametric Approach,” Proc. of Neural Information Proc. Systems, vol. 12, in press.
J.N. Kapur, Measures of Information and Their Applications, John Wiley & Sons, 1994.
S. Amari, A. Chichocki, and H. Yang, “A New Learning Algorithm for Blind Source Separation,” Advances of Information Processing Systems, vol. 8, 1996, pp. 757–763.
E. Jaynes, “Information Theory and Statistical Mechanics,” Physical Review, vol. 106, 1957, pp. 620–630.
K. Diamantaras and S. Kung, Principal Component Neural Networks: Theory and Applications, Wiley, 1996.
S. Haykin, Adaptive Filter Theory, Prentice Hall, 1986.
J. Principe, D. Xu, and J. Fisher, “Information Theoretic Learning,” in Unsupervised Adaptive Filtering, Haykin (Ed.), Wiley, 2000, pp. 265–319.
S. Haykin, Neural Networks, A Comprehensive Foundation, Macmillan Publishing Company, 1998.
J. Lin, “Divergence Measures Based on Shannon Entropy,” IEEE Trans. Inform. Theory, vol. 37, no.1, 1991, pp. 145–151.
J. Principe, “From Linear Adaptive to Information Filtering,” in IEEE Workshop Neural Nets for Sig. Proc., Key note address, Cambridge, England, Aug. 1998.
R. Fano, Transmission of information, MIT Press, 1961.
M. Hellman and J. Raviv, “Probability of Error, Equivocation and the Chernoff Bound,” IEEE Trans. Inform. Theory, vol. IT-16, no.4, 1970, pp. 368–372.
A. Renyi, “Some Fundamental Questions of Information Theory,” in Selected Papers of Alfred Renyi, vol. 2, Budapest: Akademic Kiado, 1976.
I. Grassberger and I. Proccacia, “Measuring the Strangeness of Strange Attractors,” Physica D, vol. 9, 1983, pp. 189–208.
P. Viola, N. Schraudolph, and T. Sejnowski, “Empirical Entropy Manipulation for Real-World Problems,” in Proc. Neural Info. Proc. Sys. (NIPS 8) Conf., 1995, pp. 851–857.
E. Parzen, “On the Estimation of a Probability Density Function and the Mode,” Ann. Math. Stat., vol. 33, 1962, p. 1065.
D.E. Rumelhart, G.E. Hinton, and J.R. Williams, “Learning Representations by Back-Propagating Errors,” Nature (London), vol. 323, 1986, pp. 533–536.
C. Diks, W. Zwet, F. Takens, and J. DeGoede, “Detecting Differences Between Delay Vector Distributions,” Physical Rev E, vol. 53, no.3, 1996, pp. 2169–2176.
H.C. Wu and J. Principe, “Novel Quadratic Entropy Measures and their Application to Blind Source Separation/Extraction,” in IEEE Workshop Neural Networks Sig. Proc.1999, accepted.
MSTAR (public) Targets, CDROM, Veda Inc. Ohio, 1997.
V. Velten, T. Ross, J. Mossing, S. Worrell, and M. Bryant, “Standard SAR/ATR Evaluation Experiments Using the MSTAR Public Release Data Set,” Research Report, Wright State U., 1998.
Q. Zhao and J. Principe, “From Hyperplanes to Large Margin Classifiers: Appllications to SAR/ATR,” in Proc. SPIE 13th Annual Int. Sym. Aerospace/Defense Sensing, Simulation and Control, 1999, vol. 3718.
T. Friess, “Support Vector Neural Networks: The Kernel Adatron with Bias and Soft Margin,” Research Report, U. of Sheffield, UK, 1998.
M. Gori and F. Scarselli, “Are Multilayer Perceptrons Adequate for Pattern Recognition and Verification?” IEEE Trans. Pattern Analysis and Machine Intell., vol. 20, no.11, 1998, pp. 1121–1132.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Principe, J.C., Xu, D., Zhao, Q. et al. Learning from Examples with Information Theoretic Criteria. The Journal of VLSI Signal Processing-Systems for Signal, Image, and Video Technology 26, 61–77 (2000). https://doi.org/10.1023/A:1008143417156
Published:
Issue Date:
DOI: https://doi.org/10.1023/A:1008143417156