Skip to main content
Log in

Abstract

This paper discusses a framework for learning based on information theoretic criteria. A novel algorithm based on Renyi's quadratic entropy is used to train, directly from a data set, linear or nonlinear mappers for entropy maximization or minimization. We provide an intriguing analogy between the computation and an information potential measuring the interactions among the data samples. We also propose two approximations to the Kulback-Leibler divergence based on quadratic distances (Cauchy-Schwartz inequality and Euclidean distance). These distances can still be computed using the information potential. We test the newly proposed distances in blind source separation (unsupervised learning) and in feature extraction for classification (supervised learning). In blind source separation our algorithm is capable of separating instantaneously mixed sources, and for classification the performance of our classifier is comparable to the support vector machines (SVMs).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. V. Vapnik, Statistical Learning Theory, Wiley, 1998.

  2. H. Barlow, “Unsupervised Learning,” Neural Computation, vol. 1, 1989, pp. 295–311.

    Article  Google Scholar 

  3. P. Foldiak, “Adaptive Network for Optimal Linear Feature Extraction,” IEEE Int. Joint Conf. Neural Net., vol. 1, 1989, pp. 401–405.

    Article  Google Scholar 

  4. B. Olshausen and D. Fields, “Sparse Coding with an Overcomplete Basis Set: A Strategy Employed by V1,” Vision Research, vol. 37, 1997, pp. 3311–3325.

    Article  Google Scholar 

  5. R. Linsker, “An Application of the Principle of Maximum Information Preservation to Linear Systems,” in Advances in Neural Information Processing Systems, vol. 1, Morgan-Kaufman, 1988, pp. 485–494.

    Google Scholar 

  6. C. Shannon and W. Weaver, The Mathematical Theory of Communication, University of Illinois Press, 1949.

  7. T. Cover and J. Thomas, Elements of Information Theory, Wiley, 1991.

  8. G. Deco and D. Obradovic, An Information-Theoretic Approach to Neural Computing, New York: Springer, 1996.

    Book  MATH  Google Scholar 

  9. M. Plumbley and F. Fallside, “An Information Theoretic Approach to Unsupervised Networks,” Int. J. Conf. on Neural Nets, Washington, DC, 1989, vol. 2, p. 598.

    Google Scholar 

  10. J.W. Fisher III, “Nonlinear Extensions to the Minimum Average Correlation Energy Filter,” Ph.D. Dissertation, Dept. of ECE, University of Florida, 1997.

  11. D. Xu, “Energy, Entropy and Information Potential for Neural Computation,” Ph.D. Dissertation, U. of Florida, 1999.

  12. D. Xu, J. Principe, J. Fisher, and H.-C. Wu, “A Novel Measure for Independent Component Analysis (ICA),” in Proc. ICASSP'98, vol. II, 1998, pp. 1161–1164.

    Google Scholar 

  13. D. Xu, J. Fisher, and J. Principe, “Mutual Information Approach to Pose Estimation,” in Proc. SPIE, vol. 3370, 1998, pp. 218–229. Algorithms for Synthetic Aperture Radar Imagery V.

    Article  Google Scholar 

  14. A. Bell and T. Sejnowski, “An Information-Maximization Approach to Blind Separation and Blind Deconvolution,” Neural Computation, vol. 7, 1995, pp. 1129–1159.

    Article  Google Scholar 

  15. J. Fisher, A. Ihler, and P. Viola, “Learning Informative Statistics: A Nonparametric Approach,” Proc. of Neural Information Proc. Systems, vol. 12, in press.

  16. J.N. Kapur, Measures of Information and Their Applications, John Wiley & Sons, 1994.

  17. S. Amari, A. Chichocki, and H. Yang, “A New Learning Algorithm for Blind Source Separation,” Advances of Information Processing Systems, vol. 8, 1996, pp. 757–763.

    Google Scholar 

  18. E. Jaynes, “Information Theory and Statistical Mechanics,” Physical Review, vol. 106, 1957, pp. 620–630.

    Article  MathSciNet  MATH  Google Scholar 

  19. K. Diamantaras and S. Kung, Principal Component Neural Networks: Theory and Applications, Wiley, 1996.

  20. S. Haykin, Adaptive Filter Theory, Prentice Hall, 1986.

  21. J. Principe, D. Xu, and J. Fisher, “Information Theoretic Learning,” in Unsupervised Adaptive Filtering, Haykin (Ed.), Wiley, 2000, pp. 265–319.

  22. S. Haykin, Neural Networks, A Comprehensive Foundation, Macmillan Publishing Company, 1998.

  23. J. Lin, “Divergence Measures Based on Shannon Entropy,” IEEE Trans. Inform. Theory, vol. 37, no.1, 1991, pp. 145–151.

    Article  MathSciNet  MATH  Google Scholar 

  24. J. Principe, “From Linear Adaptive to Information Filtering,” in IEEE Workshop Neural Nets for Sig. Proc., Key note address, Cambridge, England, Aug. 1998.

  25. R. Fano, Transmission of information, MIT Press, 1961.

  26. M. Hellman and J. Raviv, “Probability of Error, Equivocation and the Chernoff Bound,” IEEE Trans. Inform. Theory, vol. IT-16, no.4, 1970, pp. 368–372.

    Article  MathSciNet  Google Scholar 

  27. A. Renyi, “Some Fundamental Questions of Information Theory,” in Selected Papers of Alfred Renyi, vol. 2, Budapest: Akademic Kiado, 1976.

    Google Scholar 

  28. I. Grassberger and I. Proccacia, “Measuring the Strangeness of Strange Attractors,” Physica D, vol. 9, 1983, pp. 189–208.

    Article  MathSciNet  MATH  Google Scholar 

  29. P. Viola, N. Schraudolph, and T. Sejnowski, “Empirical Entropy Manipulation for Real-World Problems,” in Proc. Neural Info. Proc. Sys. (NIPS 8) Conf., 1995, pp. 851–857.

  30. E. Parzen, “On the Estimation of a Probability Density Function and the Mode,” Ann. Math. Stat., vol. 33, 1962, p. 1065.

    Article  MathSciNet  MATH  Google Scholar 

  31. D.E. Rumelhart, G.E. Hinton, and J.R. Williams, “Learning Representations by Back-Propagating Errors,” Nature (London), vol. 323, 1986, pp. 533–536.

    Article  Google Scholar 

  32. C. Diks, W. Zwet, F. Takens, and J. DeGoede, “Detecting Differences Between Delay Vector Distributions,” Physical Rev E, vol. 53, no.3, 1996, pp. 2169–2176.

    Article  Google Scholar 

  33. H.C. Wu and J. Principe, “Novel Quadratic Entropy Measures and their Application to Blind Source Separation/Extraction,” in IEEE Workshop Neural Networks Sig. Proc.1999, accepted.

  34. MSTAR (public) Targets, CDROM, Veda Inc. Ohio, 1997.

    Google Scholar 

  35. V. Velten, T. Ross, J. Mossing, S. Worrell, and M. Bryant, “Standard SAR/ATR Evaluation Experiments Using the MSTAR Public Release Data Set,” Research Report, Wright State U., 1998.

  36. Q. Zhao and J. Principe, “From Hyperplanes to Large Margin Classifiers: Appllications to SAR/ATR,” in Proc. SPIE 13th Annual Int. Sym. Aerospace/Defense Sensing, Simulation and Control, 1999, vol. 3718.

  37. T. Friess, “Support Vector Neural Networks: The Kernel Adatron with Bias and Soft Margin,” Research Report, U. of Sheffield, UK, 1998.

    Google Scholar 

  38. M. Gori and F. Scarselli, “Are Multilayer Perceptrons Adequate for Pattern Recognition and Verification?” IEEE Trans. Pattern Analysis and Machine Intell., vol. 20, no.11, 1998, pp. 1121–1132.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Principe, J.C., Xu, D., Zhao, Q. et al. Learning from Examples with Information Theoretic Criteria. The Journal of VLSI Signal Processing-Systems for Signal, Image, and Video Technology 26, 61–77 (2000). https://doi.org/10.1023/A:1008143417156

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1008143417156

Keywords

Navigation