Abstract
Learning and adaptation deal with the quantification and exploitation of the input source “structure” as pointed out perhaps for the first time by Watanabe [330]. Although structure is a vague and difficult concept to quantify, structure fills the space with identifiable patterns that may be distinguishable macroscopically by the shape of the probability density function. Therefore, entropy and the concept of dissimilarity naturally form the foundations for unsupervised learning because they are descriptors of PDFs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aczél J., Daróczy Z., On measures of information and their characterizations, Mathematics in Science and Engineering, vol. 115, Academic Press, New York, 1975.
Bach F., Jordan M., Finding clusters in independent component analysis, in Int. Symposium on Independent Component Analysis and Blind Signal Separation, Nara, Japan, pp. 891–896, 2003.
Ben-Hur A., Horn D., Siegelmann H., Vapnik V., Support vector clustering, J.Mach. Learn. Res., 2:125–137, 2001.
Carreira-Perpinan M., Mode-finding for mixtures of Gaussian distributions, IEEE Trans. Pattern Anal. Mach. Inte., 22(11):1318–1323, November 2000.
Carreira-Perpinan M., Gaussian mean shift is an EM algorithm, IEEE Trans. Pattern Anal. Mach. Inte., 29(5):767–776, 2007.
Cheng Y., Mean shift, mode seeking and clustering, IEEE Trans. Pattern Anal. Mach. Inte., 17(8):790–799, August 1995.
Comaniciu D., Ramesh V., Meer P., Real-time tracking of nonrigid objects using mean shift, in Proceedings of IEEE Conf. Comput. Vision and Pattern Recogn., 2:142–149, June 2000.
Comaniciu D., Meer P., Mean shift: A robust approach toward feature space analysis, IEEE Trans. Pattern Anal. Mach. Inte., 24(5):603–619, May 2002.
Ding H., He X., Zha H., Gu M., Simon H., A min-max cut algorithm for graph partitioning and data clustering. In Proc. IEEE Int. Conf. Data Mining, pp. 107–114, San Jose, CA, November 29–December 2, 2001.
Duda R., Hart P., Stork D., Pattern Classification and Scene Analysis. John Wiley & Sons, New York, 2nd edition, 2001.
Fine S., Scheinberg K., Cristianini N., Shawe-Taylor J., Williamson B., Efficient SVM training using low-rank kernel representations, J. Mach. Learn. Res., 2:243–264, 2001.
Friedman J., Tukey J., A Projection Pursuit Algorithm for Exploratory Data Analysis, IEEE Trans. Comput., Ser. C, 23:881–889, 1974.
Fukunaga K., An Introduction to Statistical Pattern Recognition, Academic Press, New York, 1972
Fukunaga K., Hostetler L., The estimation of the gradient of a density function with applications in pattern recognition, IEEE Trans. Inf. Theor., 21(1);32–40, January 1975.
Gdalyahu Y., Weinshall D., Werman M., Self-organization in vision: Stochastic clustering for image segmentation, perceptual grouping, and image database organization. IEEE Trans. Pattern Anal. Mach. Inte., 23(10):1053–1074, 2001.
Gokcay E., Principe J., Information theoretic clustering, IEEE Trans. Pattern Anal. Mach, Intell., 24(2):158–171, 2002.
Grossberg S., Competitive learning: From interactive activation to adaptive resonance, in Connectionist Models and Their Implications: Readings from Cognitive Science (Waltz, D. and Feldman, J. A., Eds.), Ablex, Norwood, NJ, pp. 243–283, 1988.
Hartigan J., Clustering Algorithms. John Wiley & Sons, New York, 1975.
Hofmann T. and Buhmann J., Pairwise Data Clustering by Deterministic Annealing, IEEE Trans. Pattern Anal. Mach. Intell., 19(1):1–14, 1997.
Jain K. and Dubes R., Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs, NJ, 1988.
Jenssen R., Principe J., Erdogmus D. and Eltoft T., The Cauchy–Schwartz divergence and Parzen windowing: Connections to graph theory and mercer kernels, J. Franklin Inst., 343:614–629, 2004.
Jenssen R., Erdogmus D., Hild II K., Principe J., Eltoft T., Optimizing the Cauchy–Schwarz PDF divergence for information theoretic, non-parametric clustering, in Proc. Int’l. Workshop on Energy Minimization Methods in Computer Vision and Pattern Recognition (EMMCVPR2005), pp. 34–45, St. Augustine, FL, November 2005.
Jenssen R., Erdogmus D., Hild II K., Principe J., Eltoft T., Information cut for clustering using a gradient descent approach, Pattern Recogn., 40:796–806, 2006.
King S., Step-wise clustering procedures. J. Amer. Statist. Assoc., pp. 86–101, 1967.
Kohonen T., Self-Organizing Maps, 2nd edition, Springer Verlag, New York, 1997.
Koontz W., Narendra P., Fukunaga K., A graph theoretic approach to non-parametric cluster analysis, IEEE Trans. Comput., 25:936–944, 1975.
MacQueen J., Some Methods for Classification and Analysis of Multivariate Observations, in Fifth Berkeley Symposium on Mathematical Statistics and Probability, 1967, pp. 281–297.
Murphy P., Ada D., UCI repository of machine learning databases, Tech. Rep., Department of Computational Science, University of California, Irvine, California, USA, 1994.
Ng Y., Jordan M., Weiss Y., On spectral clustering: Analysis and an algorithm, in Advances in Neural Information Processing Systems, 14, 2001, vol. 2, pp. 849–856.
Pavlidis T., Structural Pattern Recognition. Springer-Verlag, New York, 1977.
Principe J., Euliano N., Lefebvre C., Neural Systems: Fundamentals through Simulations, CD-ROM textbook, John Wiley, New York, 2000.
Ramanan D., Forsyth D., Finding and tracking people from the bottom up, in Proc.IEEE Conf. Computer Vision Pattern Recognition, June 2003, pp. 467–474.
Rao, S., Martins A., Principe J., Mean shift: An information theoretic perspective, Pattern Recogn. Lett., 30(1, 3):222–230, 2009.
Roberts S., Everson R., Rezek I., Maximum certainty data partitioning, Pattern Recogn., 33:833–839, 2000.
Rose K., Gurewitz E., Fox G., Vector quantization by deterministic annealing, IEEE Trans. Inf. Theor., 38(4):1249–1257, 1992.
Sands N., Cioffi J., An improved detector for channels with nonlinear intersymbol interference, Proc. Intl. Conf. on Communications, vol 2, pp 1226–1230, 1994.
Scanlon J., Deo N., Graph-theoretic algorithms for image segmentation. In IEEE International Symposium on Circuits and Systems, pp. VI141–144, Orlando, Florida, 1999.
Sheater S. Jones M., A reliable data-based bandwidth selection method for kernel density estimation, J. Roy. Statist. Soc., Ser. B, 53:683–690, 1991.
Shi J., Malik J., Normalized cuts and image segmentation, IEEE Trans. Pattern Anal. Mach. Intell., 22(8):888–905, 2000.
Sneath P. Sokal R., Numerical Taxonomy. Freeman, London, 1973.
Theodoridis S., K. Koutroumbas, Pattern Recognition, Academic Press, 1999.
Tishby N., Slonim N., Data clustering by markovian relaxation and the information bottleneck method, in Advances in Neural Information Processing Systems, 13, Denver, pp. 640–646, 2000.
Urquart R., Graph theoretical clustering based on limited neighbor sets, Pattern Recogn., 173–187, 1982
Watanabe S., Pattern Recognition: Human and Mechanical. Wiley, New York, 1985.
Wu Z. and Leahy R., An optimal graph theoretic approach to data clustering: Theory and its applications to image segmentation. IEEE Trans. Pattern Anal. and Mach. Intell., 15(11):1101–1113, 1993.
Zahn T., Graph theoretic methods for detecting and describing gestalt clusters. IEEE Trans. Comput., 20:68–86, 1971.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Jenssen, R., Rao, S. (2010). Clustering with ITL Principles. In: Information Theoretic Learning. Information Science and Statistics. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-1570-2_7
Download citation
DOI: https://doi.org/10.1007/978-1-4419-1570-2_7
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-1569-6
Online ISBN: 978-1-4419-1570-2
eBook Packages: Computer ScienceComputer Science (R0)