Abstract
Frequent counting is a very so often required operation in machine learning algorithms. A typical machine learning task, learning the structure of Bayesian network (BN) based on metric scoring, is introduced as an example that heavily relies on frequent counting. A fast calculation method for frequent counting enhanced with two cache layers is then presented for learning BN. The main contribution of our approach is to eliminate comparison operations for frequent counting by introducing a multi-radix number system calculation. Both mathematical analysis and empirical comparison between our method and state-of-the-art solution are conducted. The results show that our method is dominantly superior to state-of-the-art solution in solving the problem of learning BN.
Similar content being viewed by others
References
D. J. Hand, H. Mannila, P. Smyth. Principles of Data Mining. USA: The MIT Press, 2001.
G. F. Cooper, E. Herskovits. A Bayesian method for the induction of probabilistic networks from data. Machine Learning, vol. 9, no. 4, pp. 309–347, 1992.
V. Harinarayan, A. Rajaraman, J. D. Ullman. Implementing data cubes efficiently. In Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, ACM, New York, USA, vol. 25, no. 2, pp. 205–216, 1996.
A. Moore, M. S. Lee. Cached sufficient statistics for efficient machine learning with large datasets. Journal of Artificial Intelligence Research, vol. 8, no. 1, pp. 67–91, 1998.
H. Mannila, H. Toivonen. Multiple uses of frequent sets and condensed representations. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, pp. 189–194, 1996. [Online], Available: http://www.aaai.org/Papers/KDD/1996/KDD96-031.pdf, June 22, 2011.
Y. Tsin, Y. Liu, V. Ramesh. Texture replacement in real images. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Hawaii, vol. 2, pp. 539–544, 2001.
Q. Ding, Q. Ding, W. Perrizo. Association rule mining on remotely sensed images using p-trees. In Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, ACM, London, UK, pp. 66–79, 2002.
A. Dobra, A. F. Karr, A. P. Sanil. Preserving confidentiality of high-dimensional tabulated data: Statistical and computational issues. Statistics and Computing, vol. 13, no. 4, pp. 363–370, 2003.
S. Sanghai, P. Domingos, D. Weld. Dynamic probabilistic relational models. In Proceedings of the 18th International Joint Conference on Artificial Intelligence, ACM, San Francisco, USA, pp. 992–997, 2003.
P. Komarek, A. W. Moore. A dynamic adaptation of adtrees for efficient machine learning on large data sets. In Proceedings of the 17th International Conference on Machine Learning, ACM, San Francisco, USA, pp. 495–502, 2000.
A. W. Moore, J. G. Schneider. Real-valued all-dimensions search: Low-overhead rapid searching over subsets of attributes. In Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence, Auton Lab, San Francisco, USA, pp. 360–369, 2002.
S. M. Omohundro. Efficient algorithms with neural network behaviour. Journal of Complex Systems, vol. 1, no. 2, pp. 273–347, 1987.
A. W. Moore, J. Schneider, K. Deng. Efficient locally weighted polynomial regression predictions. In Proceedings of the 14th International Conference on Machine Learning, ACM, San Francisco, USA, pp. 236–244, 1997.
R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, A. I. Verkamo. Fast discovery of association rules. Advances in Knowledge Discovery and Data Mining, U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, R. Uthurusamy, Eds., USA: AAAI Press, pp. 307–328, 1996.
D. Kumar, N. Ramakrishnan, R. F. Helm, M. Potts. Algorithms for storytelling. IEEE Transactions on Knowledge and Data Engineering, vol. 20, no. 6, pp. 736–751, 2008.
A. A. B. Subramanian, R. Rajaram. Effective and efficient feature selection for large-scale data using Bayes’ theorem. International Journal of Automation and Computing, vol.6, no. 1, pp. 62–71, 2009.
S. Nijssen, E. Fromont. Mining optimal decision trees from itemset lattices. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data mining, ACM, San Jose, USA, pp. 530–539, 2007.
L. M. de Campos, J. M. Fernadez-Luna, J. A. Gámez, J. M. Puerta. Ant colony optimization for learning Bayesian networks. International Journal of Approximate Reasoning, vol. 31, no. 3, pp. 291–311, 2002.
L. M. de Campos, J. A. Gámez, J. M. Puerta. Learning Bayesian networks by ant colony optimization: Searching in the space of orderings. Mathware and Soft Computing, vol. 9, no. 2–3, pp. 251–268, 2002.
J. S. Pan, Q. Lv, H. L. Wang. A parallel ant colonies approach to learning Bayesian network. Journal of Chinese Computer systems, vol. 28, no. 4, pp. 651–655, 2007. (in Chinese)
I. A. Beinlich, H. Suermondt, R. M. Chavez, G. F. Cooper. The alarm monitoring system: A case study with two probabilistic inference techniques for belief networks. In Proceedings of the 2nd European Conference on Artificial Intelligence in Medicine, Academic, Marseilles, France, vol. 38, pp. 247–256, 1989.
L. M. de Campos, J. M. Puerta. Stochastic local algorithms for learning belief networks: Searching in the space of the orderings. In Proceedings of the 6th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty, ACM, London, UK, pp. 228–239, 2001.
D. Heckerman, D. Geiger, D. M. Chickering. Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning, vol. 20, no. 3, pp. 197–243, 1995.
A. Moore, W. K. Wong. Optimal reinsertion: A new search operator for accelerated and more accurate Bayesian network structure learning. In Proceedings of the 20th International Conference on Machine Learning, Auton Lab, California, USA, pp. 552–559, 2003.
K. Das, J. Schneider, D. B. Neill. Anomaly pattern detection in categorical datasets. In Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, Las Vegas, USA, pp. 169–176, 2008.
I. Tsamardinos, L. E. Brown, C. F. Aliferis. The maxmin hill-climbing Bayesian network structure learning algorithm. Machine Learning, vol. 65, no. 1, pp. 31–78, 2006.
Auton Lab. HC-ADtree, [online], Available: http://www.autonlab.org/autonweb/10530.html?branch=1&language=2, June 24,2011.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported by National Natural Science Foundation of China (No. 60970055).
Qiang Lv graduated from Soochow University, PRC in 1988. He received the M. S. degree from China Eastern Institute of Technology in 1991 and the Ph.D. degree from Soochow University in 2006. He is currently a professor at the School of Computer Science and Technology, Soochow University.
His research interests include bioinformatics, meta heuristics search, and parallel and distributed computing.
Xiao-Yan Xia received the B. Sc. degree in computer science from the Soochow University, PRC in 2003. She is currently a research fellow of the Provincial Key Laboratory for Computer Information Processing Technology, Soochow University.
Her research interests include database system design and its application.
Pei-De Qian received the B. Sc. degree in computer science from Nanjing University, PRC in 1982. He is currently a professor at the School of Computer Science and Technology, Soochow University.
His research interests include Chinese information processing, distributed computing, and operating system.
Rights and permissions
About this article
Cite this article
Lv, Q., Xia, XY. & Qian, PD. A fast calculation of metric scores for learning Bayesian network. Int. J. Autom. Comput. 9, 37–44 (2012). https://doi.org/10.1007/s11633-012-0614-8
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11633-012-0614-8