Skip to main content
Log in

Computing contingency tables from sparse ADtrees

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

In data-mining algorithms contingency tables are frequently built from ADtrees, as ADtrees have been demonstrated to be an efficient data structure for caching sufficient statistics. This paper introduces three modifications. The first two use a one-dimensional array and a hash map for representing contingency tables, and the third uses the non-recursive approach to build contingency tables from sparse ADtrees. We implement algorithms to construct contingency tables with a two-dimensional array, a tree, a one-dimensional array, and a hash map using recursion and non-recursive approaches in Python. We empirically test these algorithms in five aspects with a large number of randomly generated datasets. We also apply the modified algorithms to Bayesian networks learning and test the performance improvements using three real-life datasets. We demonstrate experimentally that all three of these modifications improve algorithm performance. The improvements are more significant with higher arities and larger arity values.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Anderson B, Moore A (1998) ADtrees for fast counting and for fast learning of association rules. In: Proceedings of the Fourth International Conference on Knowledge Discovery in Data Mining. AAAI Press, pp 134–138

  2. Benferhat S, Boudjelida A, Tabia K, Drias H (2013) An intrusion detection and alert correlation approach based on revising probabilistic classifiers using expert knowledge. Appl Intell 38(4):520–540

    Article  Google Scholar 

  3. Coenen F (2011) Data mining: past, present and future. Knowl Eng Rev 26 (01):25–29. doi:10.1017/S0269888910000378

    Article  Google Scholar 

  4. Daly R, Shen Q, Aitken S (2011) Learning bayesian networks: approaches and issues. Knowl Eng Rev 26 (02):99–157. doi:10.1017/S0269888910000251

    Article  Google Scholar 

  5. Dam RV, Langkilde-Geary I, Ventura D (2008) Adapting ADtrees for high arity features. In: Proceedings of the 23rd AAAI Conference on Artificial Intelligence, pp 708–713

  6. Dam RV, Langkilde-Geary I, Ventura D (2013) Adapting ADtrees for improved performance on large datasets with high-arity features. Knowl Inf Syst 35(3):525–552. doi:10.1007/s10115-012-0510-0

    Article  Google Scholar 

  7. Dam RV, Ventura D (2007) ADtrees for sequential data and n-gram counting. In: IEEE International Conference on Systems Man and Cybernetics. IEEE, pp 492–497

  8. Darwiche A (2002) A logical approach to factoring belief networks. In: Proceedings of the Eight International Conference on Principles of Knowledge Representation and Reasoning, vol 2. Morgan Kaufmann, pp 409–420

  9. Ericson K, Pallickara S (2013) On the performance of high dimensional data clustering and classification algorithms. Futur Gener Comput Syst 29(4):1024–1034. doi:10.1016/j.future.2012.05.026

    Article  Google Scholar 

  10. Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann of eugenics 7(2):179–188

    Article  Google Scholar 

  11. Gao Y, Guan F (2008) Explore a new way to convert a recursion algorithm into a non-recursion algorithm. In: Computer And Computing Technologies In Agriculture, vol 1. Springer, pp 187–193

  12. Goldenberg A, Moore A (2004) Tractable learning of large bayes net structures from sparse data. In: Proceedings of the twenty-first international conference on Machine learning, ACM, ACM Press, New York, p 44. doi:10.1145/1015330.1015406

  13. Kdd cup 1999 (2014) Computer network intrusion detection .URL www.sigkdd.org/kdd-cup-1999-computer-network-intrusiondetection. Accessed: 2014-10-05

  14. Komarek P, Moore A (2000) A dynamic adaptation of AD-trees for efficient machine learning on large data sets. In: International Conference on Machine Learning

  15. Moore A, Lee MS (1998) Cached sufficient statistics for efficient machine learning with large datasets. J Artif Intell Res 8(1):67–91

    MATH  MathSciNet  Google Scholar 

  16. Moraleda J, Miller T (2003) AD+tree: A compact adaptation of dynamic AD-trees for efficient machine learning on large data sets. Data Eng Autom Learn pp. 313–320

  17. Roure J, Moore AW (2006) Sequential update of ADtrees. ACM Press, New York

  18. Stumbleupon evergreen classification challenge (2013) URL http://www.kaggle.com/c/stumbleupon. Accessed: 2014-10-05

  19. Ting JA, D’Souza A, Vijayakumar S, Schaal S (2010) Efficient learning and feature selection in high-dimensional regression. Neural Comput 22(4):831–86. doi:10.1162/neco.2009.02-08-702

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yi Zhuang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ding, F., Zhuang, Y. Computing contingency tables from sparse ADtrees. Appl Intell 42, 777–789 (2015). https://doi.org/10.1007/s10489-014-0624-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-014-0624-z

Keywords

Navigation