Dependency trees in sub-linear time and bounded memory

Pelleg, Dan; Moore, Andrew

doi:10.1007/s00778-005-0170-8

Dependency trees in sub-linear time and bounded memory

Regular Paper
Published: 02 February 2006

Volume 15, pages 250–262, (2006)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

Dan Pelleg¹ &
Andrew Moore²

79 Accesses
2 Citations
Explore all metrics

Abstract

We focus on the problem of efficient learning of dependency trees. Once grown, they can be used as a special case of a Bayesian network, for PDF approximation, and for many other uses. Given the data, a well-known algorithm can fit an optimal tree in time that is quadratic in the number of attributes and linear in the number of records. We show how to modify it to exploit partial knowledge about edge weights. Experimental results show running time that is near-constant in the number of records, without significant loss in accuracy of the generated trees.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Approximate structure learning for large Bayesian networks

Article 07 May 2018

Optimal Tree Decompositions Revisited: A Simpler Linear-Time FPT Algorithm

On Dasgupta’s Hierarchical Clustering Objective and Its Relation to Other Graph Parameters

References

Blake, C., Merz, C.: UCI repository of machine learning databases (1998) http://www.ics.uci.edu/~mlearn/MLRepository.html
Chow, C.K., Liu, C.N.: Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory, 14, 462–467 (1968)
Article Google Scholar
Cormen, T.H., Leiserson, C.E., Rivest, R.L.: Introduction to Algorithms. McGraw-Hill (1989)
Davies, S.: Scalable and practical probability density estimators for scientific anomaly detection. Doctoral dissertation, Carnegie-Mellon University (2002)
Domingos, P., Hulten, G.: Mining high-speed data streams. In Proceedings of 6th International Conference on Knowledge Discovery and Data Mining, pp. 71–80, N.Y., ACM Press (2000)
Domingos, P., Hulten, G.: A general method for scaling up machine learning algorithms and its application to clustering. In Proceedings of the Eighteenth International Conference on Machine Learning. Morgan Kaufmann (2001a)
Domingos, P., Hulten, G.: Learning from infinite data in finite time. In Advances in Neural Information Processing Systems 14, Vancouver, British Columbia, Canada (2001b)
Friedman, N., Goldszmidt, M., Lee, T.J.: Bayesian Network Classification with Continuous Attributes: Getting the Best of Both Discretization and Parametric Fitting. In Proceedings of the Fifteenth International Conference on Machine Learning. Morgan Kaufmann, San Francisco, CA (1998)
Friedman, N., Nachman, I., Peér, D.: Learning bayesian network structure from massive datasets: The “sparse candidate” algorithm. In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence (UAI-99), pp. 206–215. Stockholm, Sweden 1999)
Goldenberg, A., Moore, A.: Tractable learning of large bayes net structures from sparse data. In Proc. 21st International Conf. on Machine Learning (2004)
Hettich, S., Bay, S.D.: The UCI KDD archive, (1999) http://kdd.ics.uci.edu
Maron, O., Moore A.W.: Hoeffding races: Acdelerating model selection search for classification and function approximation. Advances in Neural Information Processing Systems, pp 59–66. Denver, Colorado, Morgan Kaufmann (1994)
Meila, M.: An accelerated Chow and Liu algorithm: fitting tree distributions to high dimensional sparse data. In Proceedings of the Sixteenth International Conference on Machine Learning (1999a)
Meila, M.: Learning with Mixtures of Trees. Doctoral dissertation. Massachusetts Institute of Technology (1999b)
Moore, A.W., Lee M.S.: Efficient algorithms fro minimizing cross validation error. In Proceedings of the Eleventh International Conference on Machine Learning, pp. 190–198 New Brunswick, US: Morgan Kaufmann (1994)
Pelleg, D.: Scalable and Practical Probability Density Estimators for Scientific Anomaly Detection. Doctoral dissertation, Carnegie-Mellon University (2004)
Reza, F.: An Introduction to Information Theory. Dover Publications, pp. 282–283. New York (1994)
SDSS. The Sloan Digital Sky Survey. (1998) http://www.sdss.org
Tarjan, R.E.: Data Structures and Network Algorithms, Vol. 44 of CBMS-NSF Reg. Conf. Ser. Appl. Math. SIAM (1983)

Download references

Author information

Authors and Affiliations

IBM Haifa Labs, Haifa, Israel
Dan Pelleg
Robotics Institute, Carnegie-Mellon University, Pittsburgh, PA
Andrew Moore

Authors

Dan Pelleg
View author publications
Search author on:PubMed Google Scholar
Andrew Moore
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Dan Pelleg.

Additional information

Work done at Carnegie-Mellon university. This research was sponsored by the National Science Foundation (NSF) under grant no. ACI-0121671 and no. DMS-9873442.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pelleg, D., Moore, A. Dependency trees in sub-linear time and bounded memory. The VLDB Journal 15, 250–262 (2006). https://doi.org/10.1007/s00778-005-0170-8

Download citation

Received: 23 February 2005
Accepted: 21 April 2005
Published: 02 February 2006
Issue Date: September 2006
DOI: https://doi.org/10.1007/s00778-005-0170-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dependency trees in sub-linear time and bounded memory

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Approximate structure learning for large Bayesian networks

Optimal Tree Decompositions Revisited: A Simpler Linear-Time FPT Algorithm

On Dasgupta’s Hierarchical Clustering Objective and Its Relation to Other Graph Parameters

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now