Hybrid Parrallel Bayesian Network Structure Learning from Massive Data Using MapReduce

Li, Shun; Wang, Biao

doi:10.1007/s11265-017-1275-1

Hybrid Parrallel Bayesian Network Structure Learning from Massive Data Using MapReduce

Published: 29 August 2017

Volume 90, pages 1115–1121, (2018)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

317 Accesses
3 Citations
Explore all metrics

Abstract

Bayesian Network (BN) is the popular and important data-mining model for representing uncertain knowledge. Much work has been done on migrating the BN structure learning algorithms, such as constraint-based (CB) and score-and-search-based (SSB) ones, to the MapReduce framework. But this approach is not suitable for hybrid algorithms, which have to conduct the Map and Reduce operation for all the data to get the scores, but not just the scores of the data in the pruned structures as in the traditional centralized version of hybrid algithm. This means the most time-comsuming part of the algorithm, the Map operations, will be run twice, once in CB and once in SSB. So in the MapReduce framework, when facing massive data, the simple migration of the traditional hybrid algorithm is almost equivalent to executing the CB and SSB sequentially, with little advantage. In this paper, we introduce a distributed hybrid BN structure learning algorithm. By using constraints and search methods that require the same data basis, the algorithm only needs to conduct the Map operation only once, in the CB stage, to prepare the data for the calculation of constraints and scores. Then it reuses intermediate results of constraints calculation in the SSB stage without Mapping the whole data again, thus greatly simplified the computing work. Experiment results show that the efficiency of the algorithm is more than doubled compared to the SSB, and the accuracy is improved by about 36% compared to the CB.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Network of plausible inference. San Mates: Morgan Kaufmann.
MATH Google Scholar
Russell, S., & Norvig, P. (2002). Artificial intelligence—A modern approach. Boston: Prentice-Hall.
MATH Google Scholar
Shi, D., & Tan, S. (2010). Incremental learning Bayesian network structures efficiently. In Proc. 11th Int. Conf. Control Autom. Robot. Vis. (ICARCV) (pp. 1719–1724). Singapore.
Xiang, Y., & Truong, M. (2014). Acquisition of causal models for local distributions in Bayesian networks. IEEE Trans. Cybern., 44(9), 1591–1604.
Article Google Scholar
Chickering, D. M. (1996). Learning Bayesian networks is NP-complete[J]. Learning from data: Artificial intelligence and statistics V, 112, 121–130.
Brenner, E., & Sontag, D. (2013). SparsityBoost: A new scoring function for learning Bayesian network structure. In Proc. Uncertainty Artif. Intell. (UAI) (pp. 112–121). Bellevue.
Cano, A., Masegosa, A., & Moral, S. (2011). A method for integrating expert knowledge when learning Bayesian networks from data[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 41(5), 1382–1394.
Article Google Scholar
Arias, J., Gámez, J., & Puerta, J. (2015). Structural learning of Bayesian networks via Constrained Hill climbing algorithms: Adjusting trade-off between efficiency and accuracy[J]. International Journal of Intelligent Systems, 30(3), 292–325.
Article Google Scholar
Campos, C., & Ji, Q. (2011). Efficient structure learning of Bayesian networks using constraints. Journal of Machine Learning Research, 12, 663–689.
MathSciNet MATH Google Scholar
Tsamardinos, I., Brown, L., & Aliferis, C. (2006). The max-min hill-climbing Bayesian network structure learning algorithm. Machine Learning, 65(1), 31–78.
Article Google Scholar
Lantz, E., Ray, S., & Page, D. (2007). Learning bayesian network structure form correlation-immune data. In Proceedings of the Twenty-third Conference on Uncertainty in Artificial Intelligence.
Cussens, J. (2008). Bayesian network learning by compiling to weighted max-sat. In Proceedings of the Twenty-forth Conference on Uncertainty in Artificial Intelligence.
Liu, H., Zhou, S., Lam, W., et al. (2017). A new hybrid method for learning bayesian networks: Separation and reunion[J]. Knowledge-Based Systems 121, 185–197.
Gasse, M., Aussem, A., & Elghazel, H. (2014). A hybrid algorithm for Bayesian network structure learning with application to multi-label learning[J]. Expert Systems with Applications, 41(15), 6755–6772.
Article Google Scholar
Cooper, G. F., & Herskovits, E. (1992). A bayesian method for the induction of probabilistic networks from data. Machine Learning, 9, 309–347.
MATH Google Scholar
Cheng, J., Greiner, R., Kelly, J., Bell, D., & Liu, W. (2002). Learning Bayesian networks from data: An information-theory based approach. Artificial Intelligence, 137, 43–49.
Article MathSciNet MATH Google Scholar
Friedman, N., Nachman, I., Peer, D. (1999). Learning bayesian network structure from massive datasets: The “sparse candidate” algorithm. In Proceedings of UAI’99 (pp. 206–215). ACM.
Dean, J., Ghemawat, S. (2004). MapReduce: Simplified data processing on large clusters. In Symposium on Operating System Design and Implementation (OSDI) (pp. 137–150). San Francisco.
Suzuki, J. (1999). Learning Bayesian belief networks based on the MDL principle: An efficient algorithm using the branch and bound technique. IEICE Trans. Information and Systems, E82-D(2), 356–367.
Google Scholar
Chu, C. T., Kim, S. K., Lin, Y. A., Yu, Y., Bradski, G., Ng, A. Y., Olukotun, K. (2007). Map-reduce for machine learning on multicore. In Advances in Neural Information Processing Systems (NIPS 19) (pp. 281–288).
Fang, Q., Yue, K., Fu, X., et al. (2013). A MapReduce-based method for learning Bayesian network from massive data[C]//Asia-Pacific Web Conference. Springer, Berlin, Heidelberg, 697–708.
Madsen, A. L., Jensen, F., Salmerón, A., et al. (2017). A parallel algorithm for Bayesian network structure learning from large data sets[J]. Knowledge-Based Systems, 117, 46–55.
Article Google Scholar
Yue, K., Fang, Q., Wang, X., et al. (2015). A parallel and incremental approach for data-intensive learning of bayesian networks[J]. IEEE transactions on cybernetics, 45(12), 2890–2904.
Article Google Scholar
Chen, W., Wang, T., Yang, D., et al. (2013). Massively parallel learning of Bayesian networks with MapReduce for factor relationship analysis[C]. In Neural Networks (IJCNN), The 2013 International Joint Conference on. IEEE (pp. 1–5).
Rissanen, J. (1987). Stochastic complexity. J. Royal Stat. Soc. B, 49(3), 223–239.
MathSciNet MATH Google Scholar
Neapolitan, R. E. (2004). Learning bayesian networks[M]. Upper Saddle River, NJ: Pearson Prentice Hall, 108–109
White, T. (2012). Hadoop: The definitive guide[M]. O'Reilly Media, Inc.
Cheng, J. (2011). Power constructor system. [online]. Available: http://webdocs.cs.ualberta.ca/∼jcheng/bnpc.htm.

Download references

Acknowledgments

This work is supported by the Ministry Research Fund (Z16051).

Author information

Authors and Affiliations

School of Information Science and Technology, University of International Relations, Beijing, 100091, China
Shun Li & Biao Wang

Authors

Shun Li
View author publications
You can also search for this author in PubMed Google Scholar
Biao Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shun Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, S., Wang, B. Hybrid Parrallel Bayesian Network Structure Learning from Massive Data Using MapReduce. J Sign Process Syst 90, 1115–1121 (2018). https://doi.org/10.1007/s11265-017-1275-1

Download citation

Received: 05 April 2017
Revised: 11 July 2017
Accepted: 09 August 2017
Published: 29 August 2017
Issue Date: September 2018
DOI: https://doi.org/10.1007/s11265-017-1275-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hybrid Parrallel Bayesian Network Structure Learning from Massive Data Using MapReduce

Abstract

Access this article

Similar content being viewed by others

A novel intelligent Fuzzy-AHP based evolutionary algorithm for detecting communities in complex networks

Big data analytics on Apache Spark

Puma optimizer (PO): a novel metaheuristic optimization algorithm and its application in machine learning

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Hybrid Parrallel Bayesian Network Structure Learning from Massive Data Using MapReduce

Abstract

Access this article

Similar content being viewed by others

A novel intelligent Fuzzy-AHP based evolutionary algorithm for detecting communities in complex networks

Big data analytics on Apache Spark

Puma optimizer (PO): a novel metaheuristic optimization algorithm and its application in machine learning

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation