Skip to main content
Log in

Dependency trees in sub-linear time and bounded memory

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

We focus on the problem of efficient learning of dependency trees. Once grown, they can be used as a special case of a Bayesian network, for PDF approximation, and for many other uses. Given the data, a well-known algorithm can fit an optimal tree in time that is quadratic in the number of attributes and linear in the number of records. We show how to modify it to exploit partial knowledge about edge weights. Experimental results show running time that is near-constant in the number of records, without significant loss in accuracy of the generated trees.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Blake, C., Merz, C.: UCI repository of machine learning databases (1998) http://www.ics.uci.edu/~mlearn/MLRepository.html

  2. Chow, C.K., Liu, C.N.: Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory, 14, 462–467 (1968)

    Article  Google Scholar 

  3. Cormen, T.H., Leiserson, C.E., Rivest, R.L.: Introduction to Algorithms. McGraw-Hill (1989)

  4. Davies, S.: Scalable and practical probability density estimators for scientific anomaly detection. Doctoral dissertation, Carnegie-Mellon University (2002)

  5. Domingos, P., Hulten, G.: Mining high-speed data streams. In Proceedings of 6th International Conference on Knowledge Discovery and Data Mining, pp. 71–80, N.Y., ACM Press (2000)

  6. Domingos, P., Hulten, G.: A general method for scaling up machine learning algorithms and its application to clustering. In Proceedings of the Eighteenth International Conference on Machine Learning. Morgan Kaufmann (2001a)

  7. Domingos, P., Hulten, G.: Learning from infinite data in finite time. In Advances in Neural Information Processing Systems 14, Vancouver, British Columbia, Canada (2001b)

  8. Friedman, N., Goldszmidt, M., Lee, T.J.: Bayesian Network Classification with Continuous Attributes: Getting the Best of Both Discretization and Parametric Fitting. In Proceedings of the Fifteenth International Conference on Machine Learning. Morgan Kaufmann, San Francisco, CA (1998)

  9. Friedman, N., Nachman, I., Peér, D.: Learning bayesian network structure from massive datasets: The “sparse candidate” algorithm. In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence (UAI-99), pp. 206–215. Stockholm, Sweden 1999)

  10. Goldenberg, A., Moore, A.: Tractable learning of large bayes net structures from sparse data. In Proc. 21st International Conf. on Machine Learning (2004)

  11. Hettich, S., Bay, S.D.: The UCI KDD archive, (1999) http://kdd.ics.uci.edu

  12. Maron, O., Moore A.W.: Hoeffding races: Acdelerating model selection search for classification and function approximation. Advances in Neural Information Processing Systems, pp 59–66. Denver, Colorado, Morgan Kaufmann (1994)

  13. Meila, M.: An accelerated Chow and Liu algorithm: fitting tree distributions to high dimensional sparse data. In Proceedings of the Sixteenth International Conference on Machine Learning (1999a)

  14. Meila, M.: Learning with Mixtures of Trees. Doctoral dissertation. Massachusetts Institute of Technology (1999b)

  15. Moore, A.W., Lee M.S.: Efficient algorithms fro minimizing cross validation error. In Proceedings of the Eleventh International Conference on Machine Learning, pp. 190–198 New Brunswick, US: Morgan Kaufmann (1994)

  16. Pelleg, D.: Scalable and Practical Probability Density Estimators for Scientific Anomaly Detection. Doctoral dissertation, Carnegie-Mellon University (2004)

  17. Reza, F.: An Introduction to Information Theory. Dover Publications, pp. 282–283. New York (1994)

  18. SDSS. The Sloan Digital Sky Survey. (1998) http://www.sdss.org

  19. Tarjan, R.E.: Data Structures and Network Algorithms, Vol. 44 of CBMS-NSF Reg. Conf. Ser. Appl. Math. SIAM (1983)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dan Pelleg.

Additional information

Work done at Carnegie-Mellon university. This research was sponsored by the National Science Foundation (NSF) under grant no. ACI-0121671 and no. DMS-9873442.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pelleg, D., Moore, A. Dependency trees in sub-linear time and bounded memory. The VLDB Journal 15, 250–262 (2006). https://doi.org/10.1007/s00778-005-0170-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-005-0170-8

Keywords

Navigation