Skip to main content

Fast Online Estimation of the Joint Probability Distribution

  • Conference paper
Advances in Knowledge Discovery and Data Mining (PAKDD 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5012))

Included in the following conference series:

Abstract

In this paper we propose an algorithm for the on-line maintenance of the joint probability distribution of a data stream. The joint probability distribution is modeled by a mixture of low dependence Bayesian networks, and maintained by an on-line EM-algorithm. Modeling the joint probability function by a mixture of low dependence Bayesian networks is motivated by two key observations. First, the probability distribution can be maintained with time cost linear in the number of data points and constant time per data point. Whereas other methods like Bayesian networks have polynomial time complexity. Secondly, looking at the literature there is empirical indication [1] that mixtures of Naive-Bayes structures can model the data as accurate as Bayesian networks. In this paper we relax the constraints of the mixture model of Naive-Bayes structures to that of the mixture models of arbitrary low dependence structures. Furthermore we propose an on-line algorithm for the maintenance of a mixture model of arbitrary Bayesian networks. We empirically show that speed-up is achieved with no decrease in performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Lowd, D., Domingos, P.: Naive bayes model for probability estimation. In: Twenty-Second International Conference on Machine Learning, pp. 529–536 (2005)

    Google Scholar 

  2. Aggarwal, C.: Data Streams: Models and Algorithms. Springer, Heidelberg (2007)

    MATH  Google Scholar 

  3. Chow, C.K., Liu, C.N.: Approximating discrete probability distributions with dependence trees. Transactions on Information Theory, 462–467 (1968)

    Google Scholar 

  4. Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society, Series B, 1–38 (1977)

    Google Scholar 

  5. Sato, M.A., ishii, S.: On-line EM algorithm for the normalized gaussian network. Neural Computation 12(2), 407–432 (1999)

    Article  Google Scholar 

  6. Bradley, P., Fayyad, U., Reina, C.: Scaling EM(expectation maximization) clustering to large databases. In: Technical Report MSR-TR-98-35, Microsoft Research (1998)

    Google Scholar 

  7. Friedman, N., Greiger, D., Goldszmidt, M.: Bayesian network classifiers. Machine Learning, 103–130 (1997)

    Google Scholar 

  8. Zhou, A., Cai, Z., Wei, L., Qian, W.: M-kernel merging: Towards density estimation over data streams. In: DASFAA 2003: Proceedings of the Eighth International Conference on Database Systems for Advanced Applications, pp. 285–292. IEEE Computer Society, Washington (2003)

    Chapter  Google Scholar 

  9. Heinz, C., Seeger, B.: Wavelet density estimators over data streams. In: The 20th Annual ACM Symposium on Applied Computing (2005)

    Google Scholar 

  10. Thiesson, B., Meek, C., Heckerman, D.: Accelerating em for large databases. Machine Learning 45(3), 279–299 (2001)

    Article  MATH  Google Scholar 

  11. Cooper, G., Herskovits, E.: A bayesian method for the induction of probabilistic networks from data. Machine Learning, 309–347 (1992)

    Google Scholar 

  12. Murphy (2004), http://www.cs.ubc.ca/~murphyk/software/bnt/bnt.html

Download references

Author information

Authors and Affiliations

Authors

Editor information

Takashi Washio Einoshin Suzuki Kai Ming Ting Akihiro Inokuchi

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Patist, J.P. (2008). Fast Online Estimation of the Joint Probability Distribution. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2008. Lecture Notes in Computer Science(), vol 5012. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68125-0_65

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-68125-0_65

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-68124-3

  • Online ISBN: 978-3-540-68125-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics