skip to main content
10.1145/1015330.1015431acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
Article

An information theoretic analysis of maximum likelihood mixture estimation for exponential families

Published: 04 July 2004 Publication History

Abstract

An important task in unsupervised learning is maximum likelihood mixture estimation (MLME) for exponential families. In this paper, we prove a mathematical equivalence between this MLME problem and the rate distortion problem for Bregman divergences. We also present new theoretical results in rate distortion theory for Bregman divergences. Further, an analysis of the problems as a trade-off between compression and preservation of information is presented that yields the information bottleneck method as an interesting special case.

References

[1]
Azoury, K. S., & Warmuth, M. K. (2001). Relative loss bounds for on-line density estimation with the exponential family of distributions. Machine Learning, 43, 211--246.
[2]
Banerjee, A., Merugu, S., Dhillon, I., & Ghosh, J. (2004). Clustering with Bregman divergences. (To appear) Proc. SIAM Intl. Conf. on Data Mining.
[3]
Berger, T. (1971). Rate distortion theory: A mathematical basis for data compression. Prentice-Hall.
[4]
Collins, M., Dasgupta, S., & Schapire, R. (2001). A generalization of principal component analysis to the exponential family. 15th NIPS.
[5]
Cover, T. M., & Thomas, J. A. (1991). Elements of information theory. Wiley-Interscience.
[6]
Dhillon, I., Mallela, S., & Kumar, R. (2003). A divisive information-theoretic feature clustering algorithm for text classification. Journal of Machine Learning Research, 3, 1265--1287.
[7]
Finamore, W. A., & Pearlman, W. A. (1980). Optimal encoding of discrete-time continuous-amplitude memoryless sources with finite output alphabet. IEEE Transactions on Information Theory, 26, 144--155.
[8]
Forster, J., & Warmuth, M. K. (2000). Relative expected instantaneous loss bounds. 13th COLT.
[9]
Gilad-Bachrach, R., Navot, A., & Tishby, N. (2003). An information theoretic tradeoff between complexity and accuracy. 16th COLT.
[10]
Kearns, M., Mansour, Y., & Ng, A. (1997). An information-theoretic analysis of hard and soft assignment methods for clustering. 13th UAI.
[11]
Neal, R., & Hinton, G. (1998). A view of the EM algorithm that justifies incremental, sparse, and other variants. Learning in Graphical Models (pp. 355--368). MIT Press.
[12]
Nigam, K. (2001). Using unlabeled data to improve text classification (Technical Report CMU-CS-01-126). Carnegie Mellon University.
[13]
Redner, R., & Walker, H. (1984). Mixture densities, maximum likelihood and the EM algorithm. SIAM Review, 26, 195--239.
[14]
Rockafeller, R. T. (1970). Convex analysis. Princeton University Press.
[15]
Rose, K. (1994). A mapping approach to rate-distortion computation and analysis. IEEE Transactions on Information Theory, 40, 1939--1952.
[16]
Rose, K. (1998). Deterministic annealing for clustering, compression, classification, regression, and related optimization problems. Proceedings of IEEE, 86, 2210--2239.
[17]
Slonim, N., & Weiss, Y. (2002). Maximum likelihood and the information bottleneck. 16th NIPS.
[18]
Tishby, N., Pereira, F. C., & Bialek, W. (1999). The information bottleneck method. Proc. 37th Annual Allerton Conf. on Communication, Control and Computing (pp. 368--377).

Cited By

View all
  • (2025)Measuring generalized divergence for multiple distributions with application to deep clusteringPattern Recognition10.1016/j.patcog.2024.110864157(110864)Online publication date: Jan-2025
  • (2024)A rate-distortion view of uncertainty quantificationProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692136(1631-1654)Online publication date: 21-Jul-2024
  • (2024)Environmental DNA mitochondrial markers to assess potential occupancy of Endangered Yaqui catfish in the Yaqui River basin, MexicoEndangered Species Research10.3354/esr0132053(569-586)Online publication date: 25-Apr-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICML '04: Proceedings of the twenty-first international conference on Machine learning
July 2004
934 pages
ISBN:1581138385
DOI:10.1145/1015330
  • Conference Chair:
  • Carla Brodley

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 July 2004

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)1
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Measuring generalized divergence for multiple distributions with application to deep clusteringPattern Recognition10.1016/j.patcog.2024.110864157(110864)Online publication date: Jan-2025
  • (2024)A rate-distortion view of uncertainty quantificationProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692136(1631-1654)Online publication date: 21-Jul-2024
  • (2024)Environmental DNA mitochondrial markers to assess potential occupancy of Endangered Yaqui catfish in the Yaqui River basin, MexicoEndangered Species Research10.3354/esr0132053(569-586)Online publication date: 25-Apr-2024
  • (2021)Risk Assessment of Anopheles philippinensis and Anopheles nivipes (Diptera: Culicidae) Invading China under Climate ChangeBiology10.3390/biology1010099810:10(998)Online publication date: 3-Oct-2021
  • (2016)Predictive Rate-Distortion for Infinite-Order Markov ProcessesJournal of Statistical Physics10.1007/s10955-016-1520-1163:6(1312-1338)Online publication date: 3-May-2016
  • (2013)ReviewSignal Processing10.1016/j.sigpro.2012.09.00393:4(621-633)Online publication date: 1-Apr-2013
  • (2012)Error concealment by means of motion refinement and regularized bregman divergenceProceedings of the 13th international conference on Intelligent Data Engineering and Automated Learning10.1007/978-3-642-32639-4_78(650-657)Online publication date: 29-Aug-2012
  • (2011)Medical Image Segmentation and Tracking Through the Maximisation or the Minimisation of Divergence Between PDFsBiomedical Diagnostics and Clinical Technologies10.4018/978-1-60566-280-0.ch002(34-61)Online publication date: 2011
  • (2011)Mixed-membership naive Bayes modelsData Mining and Knowledge Discovery10.1007/s10618-010-0198-223:1(1-62)Online publication date: 1-Jul-2011
  • (2010)Region-Based Active Contours with Exponential Family ObservationsJournal of Mathematical Imaging and Vision10.1007/s10851-009-0168-836:1(28-45)Online publication date: 1-Jan-2010
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media