Skip to main content

Self-Organizing ITL Principles for Unsupervised Learning

  • Chapter
  • First Online:
Information Theoretic Learning

Part of the book series: Information Science and Statistics ((ISS))

  • 3931 Accesses

Abstract

Chapter 1 presented a synopsis of information theory to understand its foundations and how it affected the field of communication systems. In a nutshell, mutual information characterizes the fundamental compromise of maximum rate for error-free information transmission (the channel capacity theorem) as well as the minimal information that needs to be sent for a given distortion (the rate distortion theorem). In essence given the statistical knowledge of the data and these theorems the optimal communication system emerges, or self-organizes from the data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Aczél J., Daróczy Z., On measures of information and their characterizations, Mathematics in Science and Engineering, vol. 115, Academic Press, New York, 1975.

    Google Scholar 

  2. Amari S., Cichocki A., Yang H., A new learning algorithm for blind signal separation. Advances in Neural Information Processing Systems, vol. 8 pp. 757–763, MIT Press, Cambridge, MA, 1996.

    Google Scholar 

  3. Atick J., Redlich A., Towards a theory of early visual processing, Neural Comput., 2:308–320, 1990.

    Article  Google Scholar 

  4. Attneave F., Some informational aspects of visual perception, Psychol. Rev., 61; 183–193, 1954.

    Article  Google Scholar 

  5. Barlow H., Unsupervised learning. Neural Comput., 1(3):295–311, 1989.

    Article  MathSciNet  Google Scholar 

  6. Barlow H., Kausal T., Mitchison G., Finding minimum entropy codes, Neural Comput., 1(3):412–423, 1989.

    Article  Google Scholar 

  7. Becker S., Hinton G., A self-organizing neural network that discovers surfaces in random-dot stereograms. Nature, 355:161–163, 1992

    Article  Google Scholar 

  8. Becker. S., Unsupervised learning with global objective functions. In M. A. Arbib, (Ed.), The Handbook of Brain Theory and Neural Networks, pp. 997–1000. MIT Press, Cambridge, MA, 1998c

    Google Scholar 

  9. Bell A., Sejnowski T., An information-maximization approach to blind separation and blind deconvolution. Neural Comput., 7(6):1129–1159, 1995.

    Article  Google Scholar 

  10. Benveniste A., Goursat M., Ruget G., Robust identification of a non-minimum phase system: Blind adjustment of a linear equalizer in data communications, IEEE Trans. Autom. Control, 25(3):385–399, 1980.

    Article  MATH  MathSciNet  Google Scholar 

  11. Bishop C., Neural Networks for Pattern Recognition, Clarendon Press, Oxford, 1995.

    Google Scholar 

  12. Cardoso J., Blind signal separation: Statistical principles, Proc. IEEE, 86(10):2009–2025, 1998.

    Article  Google Scholar 

  13. Chechik G., Tishby N., Temporal dependent plasticity: An information theoretic account, Proc. Neural Inf. Process. Syst., 13:110–116, 2001.

    Google Scholar 

  14. Chen Z., Haykin S., Eggermont J., Bekker S., Correlative learning: A basis for brain and adaptive systems, John Wiley, Hoboken, NJ, 2007.

    Book  Google Scholar 

  15. Choi S., Cichocki A., Amari S., Flexible independent component analysis, J. VLSI Signal Process., 26:25–38, 2000.

    Article  MATH  Google Scholar 

  16. Comon P., Independent component analysis, a new concept?, Signal Process., 36(3):287–314, 1994.

    Article  MATH  Google Scholar 

  17. Donoho D., On minimum entropy deconvolution, in Applied Time Series Analysis II, Academic Press, New York, 1981, pp. 565–609.

    Chapter  Google Scholar 

  18. Erdogmus D., Principe J., Hild II K., Do Hebbian synapses estimate entropy?, Proc. IEEE Workshop on Neural Networks for Signal Process., Martigni, Switzerland, pp. 199–208, 2002.

    Google Scholar 

  19. Erdogmus D., Hild II K., Lazaro M., Santamaria I., Principe J., Adaptive blind deconvolution of linear channels using Renyi’s nntropy with Parzen estimation, IEEE Trans. Signal Process., 52(6)1489–1498, 2004.

    Article  MathSciNet  Google Scholar 

  20. Erdogmus D., Ozertem U., Self-consistent locally defined principal surfaces. In Proc. Int. Conf. Acoustic, Speech and Signal Processing, volume 2, pp. 15–20, April 2007.

    Google Scholar 

  21. Gersho A., Gray R. Vector Quantization and Signal Compression. Springer, New York, 1991

    Google Scholar 

  22. Haykin S., Neural Networks: A Comprehensive Foundation, Prentice Hall, Upper Saddle River, NJ, 1999.

    MATH  Google Scholar 

  23. Hebb D., Organization of Behavior: A Neurophysiology Theory, John Wiley, NY, New York, 1949.

    Google Scholar 

  24. Heskes T., Energy functions for self-organizing maps. In E. Oja and S. Kaski, editors, Kohonen Maps, Elsevier, Amsterdam, 1999, pp. 303–316.

    Chapter  Google Scholar 

  25. Hild II K., Erdogmus D., Principe J., Blind source separation using Renyi’s mutual information, IEEE Signal Process. Lett., 8:174–176, 2001.

    Article  Google Scholar 

  26. Hild II K., Erdogmus D., Principe J., An analysis of entropy estimators for blind source separation, Signal Process., 86(1):182–194, 2006.

    Article  MATH  Google Scholar 

  27. Hinton G. and Sejnowski T., Unsupervised learning: Foundations of neural computation, MIT Press, Cambridge, MA, 1999.

    Google Scholar 

  28. Hyvarinen A., Fast and Robust Fixed-Point Algorithms for Independent Component Analysis, IEEE Trans. Neural Netw., 10(3):626–634, 1999.

    Article  Google Scholar 

  29. Huber, P.J., Robust Estimation of a Location Parameter. Ann. Math. Statist., 35:73–101, 1964.

    Article  MATH  MathSciNet  Google Scholar 

  30. Jaynes E., Probability Theory, the Logic of Science, Cambridge University Press, Cambridge, UK, 2003.

    Book  MATH  Google Scholar 

  31. Jumarie G., Relative Information, Springer Verlag, New York, 1990

    Book  MATH  Google Scholar 

  32. Kegl B., Krzyzak A., Piecewise linear skeletonization using principal curves. IEEE Trans. Pattern Anal. Mach. Intell., 24(1):59–74, 2002.

    Article  Google Scholar 

  33. LeCun Y., Chopra S., Hadsell R., Ranzato M., Huang F., A tutorial on energy-based learning, in Predicting Structured Data, Bakir, Hofman, Scholkopf, Smola, Taskar (Eds.), MIT Press, Boston, 2006.

    Google Scholar 

  34. Lehn-Schieler T., Hegde H., Erdogmus D., and Principe J., Vector-quantization using information theoretic concepts. Natural Comput., 4:39–51, Jan. 2005.

    Article  Google Scholar 

  35. Linsker R., Towards an organizing principle for a layered perceptual network. In D. Z. Anderson (Ed.), Neural Information Processing Systems - Natural and Synthetic. American Institute of Physics, New York, 1988.

    Google Scholar 

  36. MacKay D., Information Theory, Inference and Learning Algorithms. Cambridge University Press, Cambridge, UK, 2003.

    MATH  Google Scholar 

  37. Marossero D., Erdogmus D., Euliano N., Principe J., Hild II, K., Independent components analysis for fetal electrocardiogram extraction: A case for the data efficient mermaid algorithm, Proceedings of NNSP’03, pp. 399–408, Toulouse, France, Sep 2003.

    Google Scholar 

  38. Nadal J., Parga N., Nonlinear neurons in the low noise limit: a factorial code maximizes information transfer, Network, 5:561–581, 1994.

    Article  Google Scholar 

  39. Oja. E., A simplified neuron model as a principal component analyzer. J. Math. Biol., 15:267–273, 1982.

    Article  MATH  MathSciNet  Google Scholar 

  40. Papoulis A., Probability, Random Variables and Stochastic Processes, McGraw-Hill, New York, 1965.

    MATH  Google Scholar 

  41. Pereira F., Tishby N., Lee L., Distributional clustering of english words. In Meeting of the Association for Computational Linguistics, pp. 183–190, 1993.

    Google Scholar 

  42. Pham D., Vrins, F., Verleysen, M., On the risk of using Renyi’s entropy for blind source separation, IEEE Trans. Signal Process., 56(10):4611–4620, 2008.

    Article  MathSciNet  Google Scholar 

  43. Principe J., Xu D., Information theoretic learning using Renyi’s quadratic entropy, in Proc. ICA’99, 407–412, Aussois, France, 1999.

    Google Scholar 

  44. Principe, J., Xu D., Fisher J., Information theoretic learning, in unsupervised adaptive filtering, Simon Haykin (Ed.), pp. 265–319, Wiley, New York, 2000.

    Google Scholar 

  45. Principe J., Euliano N., Lefebvre C., Neural Systems: Fundamentals through Simulations, CD-ROM textbook, John Wiley, New York, 2000.

    Google Scholar 

  46. Rao S., Unsupervised Learning: An Information Theoretic Learning Approach, Ph.D. thesis, University of Florida, Gainesville, 2008.

    Google Scholar 

  47. Roweis S., Saul L., Nonlinear dimensionality reduction by locally linear embedding. Science, 290:2323–2326, 2000.

    Article  Google Scholar 

  48. Slonim N. Tishby N., The power of word clusters for text classification. In 23rd European Colloquium on Information Retrieval Research, 2001.

    Google Scholar 

  49. Tishby N., Pereira F., and Bialek W., The information bottleneck method. In Proceedings of the 37th Annual Allerton Conference on Communication, Control and Computing, pp. 368–377, 1999.

    Google Scholar 

  50. Watanabe S., Pattern Recognition: Human and Mechanical. Wiley, New York, 1985.

    Google Scholar 

  51. Wu H., Principe J., Simultaneous diagonalization in the frequency domain for source separation, Proc. First Int. Workshop on Ind. Comp. Anal. ICA’99, 245–250, Aussois, France, 1999.

    Google Scholar 

  52. Wyszecki G., Stiles W., Color Science: Concepts and Methods, Quantitative Data and Formulae. Wiley, New York, 1982.

    Google Scholar 

  53. Xu D., Principe J., Fisher J., Wu H., A novel measure for independent component analysis (ICA), in Proc. of ICASSP’98, vol. 2, pp. 1161–1164, 1998

    Google Scholar 

  54. Zemel R., Hinton G., Learning population codes by minimizing the description length, in Unsupervised Learning, Hinton and Sejnowski (Eds.), pp. 261–276, MIT Press, Cambridge, MA, 1999.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Rao, S., Erdogmus, D., Xu, D., Hild, K. (2010). Self-Organizing ITL Principles for Unsupervised Learning. In: Information Theoretic Learning. Information Science and Statistics. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-1570-2_8

Download citation

  • DOI: https://doi.org/10.1007/978-1-4419-1570-2_8

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4419-1569-6

  • Online ISBN: 978-1-4419-1570-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics