Computing contingency tables from sparse ADtrees

Ding, Fei; Zhuang, Yi

doi:10.1007/s10489-014-0624-z

Computing contingency tables from sparse ADtrees

Published: 25 December 2014

Volume 42, pages 777–789, (2015)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Fei Ding¹ &
Yi Zhuang¹

252 Accesses
Explore all metrics

Abstract

In data-mining algorithms contingency tables are frequently built from ADtrees, as ADtrees have been demonstrated to be an efficient data structure for caching sufficient statistics. This paper introduces three modifications. The first two use a one-dimensional array and a hash map for representing contingency tables, and the third uses the non-recursive approach to build contingency tables from sparse ADtrees. We implement algorithms to construct contingency tables with a two-dimensional array, a tree, a one-dimensional array, and a hash map using recursion and non-recursive approaches in Python. We empirically test these algorithms in five aspects with a large number of randomly generated datasets. We also apply the modified algorithms to Bayesian networks learning and test the performance improvements using three real-life datasets. We demonstrate experimentally that all three of these modifications improve algorithm performance. The improvements are more significant with higher arities and larger arity values.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Anderson B, Moore A (1998) ADtrees for fast counting and for fast learning of association rules. In: Proceedings of the Fourth International Conference on Knowledge Discovery in Data Mining. AAAI Press, pp 134–138
Benferhat S, Boudjelida A, Tabia K, Drias H (2013) An intrusion detection and alert correlation approach based on revising probabilistic classifiers using expert knowledge. Appl Intell 38(4):520–540
Article Google Scholar
Coenen F (2011) Data mining: past, present and future. Knowl Eng Rev 26 (01):25–29. doi:10.1017/S0269888910000378
Article Google Scholar
Daly R, Shen Q, Aitken S (2011) Learning bayesian networks: approaches and issues. Knowl Eng Rev 26 (02):99–157. doi:10.1017/S0269888910000251
Article Google Scholar
Dam RV, Langkilde-Geary I, Ventura D (2008) Adapting ADtrees for high arity features. In: Proceedings of the 23rd AAAI Conference on Artificial Intelligence, pp 708–713
Dam RV, Langkilde-Geary I, Ventura D (2013) Adapting ADtrees for improved performance on large datasets with high-arity features. Knowl Inf Syst 35(3):525–552. doi:10.1007/s10115-012-0510-0
Article Google Scholar
Dam RV, Ventura D (2007) ADtrees for sequential data and n-gram counting. In: IEEE International Conference on Systems Man and Cybernetics. IEEE, pp 492–497
Darwiche A (2002) A logical approach to factoring belief networks. In: Proceedings of the Eight International Conference on Principles of Knowledge Representation and Reasoning, vol 2. Morgan Kaufmann, pp 409–420
Ericson K, Pallickara S (2013) On the performance of high dimensional data clustering and classification algorithms. Futur Gener Comput Syst 29(4):1024–1034. doi:10.1016/j.future.2012.05.026
Article Google Scholar
Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann of eugenics 7(2):179–188
Article Google Scholar
Gao Y, Guan F (2008) Explore a new way to convert a recursion algorithm into a non-recursion algorithm. In: Computer And Computing Technologies In Agriculture, vol 1. Springer, pp 187–193
Goldenberg A, Moore A (2004) Tractable learning of large bayes net structures from sparse data. In: Proceedings of the twenty-first international conference on Machine learning, ACM, ACM Press, New York, p 44. doi:10.1145/1015330.1015406
Kdd cup 1999 (2014) Computer network intrusion detection .URL www.sigkdd.org/kdd-cup-1999-computer-network-intrusiondetection. Accessed: 2014-10-05
Komarek P, Moore A (2000) A dynamic adaptation of AD-trees for efficient machine learning on large data sets. In: International Conference on Machine Learning
Moore A, Lee MS (1998) Cached sufficient statistics for efficient machine learning with large datasets. J Artif Intell Res 8(1):67–91
MATH MathSciNet Google Scholar
Moraleda J, Miller T (2003) AD+tree: A compact adaptation of dynamic AD-trees for efficient machine learning on large data sets. Data Eng Autom Learn pp. 313–320
Roure J, Moore AW (2006) Sequential update of ADtrees. ACM Press, New York
Stumbleupon evergreen classification challenge (2013) URL http://www.kaggle.com/c/stumbleupon. Accessed: 2014-10-05
Ting JA, D’Souza A, Vijayakumar S, Schaal S (2010) Efficient learning and feature selection in high-dimensional regression. Neural Comput 22(4):831–86. doi:10.1162/neco.2009.02-08-702
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, 29 Yudao St., Nanjing, 210016, China
Fei Ding & Yi Zhuang

Authors

Fei Ding
View author publications
You can also search for this author in PubMed Google Scholar
Yi Zhuang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yi Zhuang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ding, F., Zhuang, Y. Computing contingency tables from sparse ADtrees. Appl Intell 42, 777–789 (2015). https://doi.org/10.1007/s10489-014-0624-z

Download citation

Published: 25 December 2014
Issue Date: June 2015
DOI: https://doi.org/10.1007/s10489-014-0624-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Computing contingency tables from sparse ADtrees

Abstract

Access this article

Similar content being viewed by others

A random forest guided tour

Density-Based Clustering Based on Hierarchical Density Estimates

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Computing contingency tables from sparse ADtrees

Abstract

Access this article

Similar content being viewed by others

A random forest guided tour

Density-Based Clustering Based on Hierarchical Density Estimates

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation