Skip to main content
Log in

A fast calculation of metric scores for learning Bayesian network

  • Published:
International Journal of Automation and Computing Aims and scope Submit manuscript

Abstract

Frequent counting is a very so often required operation in machine learning algorithms. A typical machine learning task, learning the structure of Bayesian network (BN) based on metric scoring, is introduced as an example that heavily relies on frequent counting. A fast calculation method for frequent counting enhanced with two cache layers is then presented for learning BN. The main contribution of our approach is to eliminate comparison operations for frequent counting by introducing a multi-radix number system calculation. Both mathematical analysis and empirical comparison between our method and state-of-the-art solution are conducted. The results show that our method is dominantly superior to state-of-the-art solution in solving the problem of learning BN.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. D. J. Hand, H. Mannila, P. Smyth. Principles of Data Mining. USA: The MIT Press, 2001.

    Google Scholar 

  2. G. F. Cooper, E. Herskovits. A Bayesian method for the induction of probabilistic networks from data. Machine Learning, vol. 9, no. 4, pp. 309–347, 1992.

    MATH  Google Scholar 

  3. V. Harinarayan, A. Rajaraman, J. D. Ullman. Implementing data cubes efficiently. In Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, ACM, New York, USA, vol. 25, no. 2, pp. 205–216, 1996.

    Article  Google Scholar 

  4. A. Moore, M. S. Lee. Cached sufficient statistics for efficient machine learning with large datasets. Journal of Artificial Intelligence Research, vol. 8, no. 1, pp. 67–91, 1998.

    MathSciNet  MATH  Google Scholar 

  5. H. Mannila, H. Toivonen. Multiple uses of frequent sets and condensed representations. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, pp. 189–194, 1996. [Online], Available: http://www.aaai.org/Papers/KDD/1996/KDD96-031.pdf, June 22, 2011.

  6. Y. Tsin, Y. Liu, V. Ramesh. Texture replacement in real images. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Hawaii, vol. 2, pp. 539–544, 2001.

    Google Scholar 

  7. Q. Ding, Q. Ding, W. Perrizo. Association rule mining on remotely sensed images using p-trees. In Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, ACM, London, UK, pp. 66–79, 2002.

    Google Scholar 

  8. A. Dobra, A. F. Karr, A. P. Sanil. Preserving confidentiality of high-dimensional tabulated data: Statistical and computational issues. Statistics and Computing, vol. 13, no. 4, pp. 363–370, 2003.

    Article  MathSciNet  Google Scholar 

  9. S. Sanghai, P. Domingos, D. Weld. Dynamic probabilistic relational models. In Proceedings of the 18th International Joint Conference on Artificial Intelligence, ACM, San Francisco, USA, pp. 992–997, 2003.

    Google Scholar 

  10. P. Komarek, A. W. Moore. A dynamic adaptation of adtrees for efficient machine learning on large data sets. In Proceedings of the 17th International Conference on Machine Learning, ACM, San Francisco, USA, pp. 495–502, 2000.

    Google Scholar 

  11. A. W. Moore, J. G. Schneider. Real-valued all-dimensions search: Low-overhead rapid searching over subsets of attributes. In Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence, Auton Lab, San Francisco, USA, pp. 360–369, 2002.

    Google Scholar 

  12. S. M. Omohundro. Efficient algorithms with neural network behaviour. Journal of Complex Systems, vol. 1, no. 2, pp. 273–347, 1987.

    MathSciNet  MATH  Google Scholar 

  13. A. W. Moore, J. Schneider, K. Deng. Efficient locally weighted polynomial regression predictions. In Proceedings of the 14th International Conference on Machine Learning, ACM, San Francisco, USA, pp. 236–244, 1997.

    Google Scholar 

  14. R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, A. I. Verkamo. Fast discovery of association rules. Advances in Knowledge Discovery and Data Mining, U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, R. Uthurusamy, Eds., USA: AAAI Press, pp. 307–328, 1996.

    Google Scholar 

  15. D. Kumar, N. Ramakrishnan, R. F. Helm, M. Potts. Algorithms for storytelling. IEEE Transactions on Knowledge and Data Engineering, vol. 20, no. 6, pp. 736–751, 2008.

    Article  Google Scholar 

  16. A. A. B. Subramanian, R. Rajaram. Effective and efficient feature selection for large-scale data using Bayes’ theorem. International Journal of Automation and Computing, vol.6, no. 1, pp. 62–71, 2009.

    Article  Google Scholar 

  17. S. Nijssen, E. Fromont. Mining optimal decision trees from itemset lattices. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data mining, ACM, San Jose, USA, pp. 530–539, 2007.

    Chapter  Google Scholar 

  18. L. M. de Campos, J. M. Fernadez-Luna, J. A. Gámez, J. M. Puerta. Ant colony optimization for learning Bayesian networks. International Journal of Approximate Reasoning, vol. 31, no. 3, pp. 291–311, 2002.

    Article  MathSciNet  MATH  Google Scholar 

  19. L. M. de Campos, J. A. Gámez, J. M. Puerta. Learning Bayesian networks by ant colony optimization: Searching in the space of orderings. Mathware and Soft Computing, vol. 9, no. 2–3, pp. 251–268, 2002.

    MathSciNet  MATH  Google Scholar 

  20. J. S. Pan, Q. Lv, H. L. Wang. A parallel ant colonies approach to learning Bayesian network. Journal of Chinese Computer systems, vol. 28, no. 4, pp. 651–655, 2007. (in Chinese)

    Google Scholar 

  21. I. A. Beinlich, H. Suermondt, R. M. Chavez, G. F. Cooper. The alarm monitoring system: A case study with two probabilistic inference techniques for belief networks. In Proceedings of the 2nd European Conference on Artificial Intelligence in Medicine, Academic, Marseilles, France, vol. 38, pp. 247–256, 1989.

    Google Scholar 

  22. L. M. de Campos, J. M. Puerta. Stochastic local algorithms for learning belief networks: Searching in the space of the orderings. In Proceedings of the 6th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty, ACM, London, UK, pp. 228–239, 2001.

    Chapter  Google Scholar 

  23. D. Heckerman, D. Geiger, D. M. Chickering. Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning, vol. 20, no. 3, pp. 197–243, 1995.

    MATH  Google Scholar 

  24. A. Moore, W. K. Wong. Optimal reinsertion: A new search operator for accelerated and more accurate Bayesian network structure learning. In Proceedings of the 20th International Conference on Machine Learning, Auton Lab, California, USA, pp. 552–559, 2003.

    Google Scholar 

  25. K. Das, J. Schneider, D. B. Neill. Anomaly pattern detection in categorical datasets. In Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, Las Vegas, USA, pp. 169–176, 2008.

    Chapter  Google Scholar 

  26. I. Tsamardinos, L. E. Brown, C. F. Aliferis. The maxmin hill-climbing Bayesian network structure learning algorithm. Machine Learning, vol. 65, no. 1, pp. 31–78, 2006.

    Article  Google Scholar 

  27. Auton Lab. HC-ADtree, [online], Available: http://www.autonlab.org/autonweb/10530.html?branch=1&language=2, June 24,2011.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qiang Lv.

Additional information

This work was supported by National Natural Science Foundation of China (No. 60970055).

Qiang Lv graduated from Soochow University, PRC in 1988. He received the M. S. degree from China Eastern Institute of Technology in 1991 and the Ph.D. degree from Soochow University in 2006. He is currently a professor at the School of Computer Science and Technology, Soochow University.

His research interests include bioinformatics, meta heuristics search, and parallel and distributed computing.

Xiao-Yan Xia received the B. Sc. degree in computer science from the Soochow University, PRC in 2003. She is currently a research fellow of the Provincial Key Laboratory for Computer Information Processing Technology, Soochow University.

Her research interests include database system design and its application.

Pei-De Qian received the B. Sc. degree in computer science from Nanjing University, PRC in 1982. He is currently a professor at the School of Computer Science and Technology, Soochow University.

His research interests include Chinese information processing, distributed computing, and operating system.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lv, Q., Xia, XY. & Qian, PD. A fast calculation of metric scores for learning Bayesian network. Int. J. Autom. Comput. 9, 37–44 (2012). https://doi.org/10.1007/s11633-012-0614-8

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11633-012-0614-8

Keywords

Navigation