A MapReduce-Based Method for Learning Bayesian Network from Massive Data

Fang, Qiyu; Yue, Kun; Fu, Xiaodong; Wu, Hong; Liu, Weiyi

doi:10.1007/978-3-642-37401-2_68

Qiyu Fang²⁰,
Kun Yue^20,21,
Xiaodong Fu²²,
Hong Wu²⁰ &
…
Weiyi Liu²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7808))

Included in the following conference series:

Asia-Pacific Web Conference

4783 Accesses
6 Citations

Abstract

Bayesian network (BN) is the popular and important probabilistic graphical model for representing and inferring uncertain knowledge. Learning BN from massive data is the basis for uncertain-knowledge-centered inferences, prediction and decision. The inherence of massive data makes BN learning be adjusted to the large data volume and executed in parallel. In this paper, we proposed a MapReduce-based approach for learning BN from massive data by extending the traditional scoring & search algorithm. First, in the scoring process, we developed map and reduce algorithms for obtaining the required parameters in parallel. Second, in the search process, for each node we developed map and reduce algorithms for scoring all the candidate local structures in parallel and selecting the local optimal structure with the highest score. Thus, the local optimal structures of each node are merged to the global optimal one. Experimental result indicates our proposed method is effective and efficient.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Kouzes, R., Anderson, G., Elbert, S., Gorton, L., Gracio, D.: The changing paradigm of data–intensive computing. IEEE Computer 42(1), 26–34 (2009)
Article Google Scholar
Agrawal, D., El Abbadi, A., Antony, S., Das, S.: Data Management Challenges in Cloud Computing Infrastructures. In: Kikuchi, S., Sachdeva, S., Bhalla, S. (eds.) DNIS 2010. LNCS, vol. 5999, pp. 1–10. Springer, Heidelberg (2010)
Chapter Google Scholar
Borkar, V., Carey, M., Grover, R., Onose, N., Vernica, R.: Hyracks: A flexible and extensible foundation for data–intensive computing. In: Abiteboul, S., Böhm, K., Koch, C., Tan, K. (eds.) Proc. of ICDE 2011, pp. 1151–1162. IEEE Computer Society, Hannover (2011)
Google Scholar
Deshpande, A., Sarawagi, S.: Probabilistic graphical models and their role in database. In: Koch, C., Gehrke, J., Garofalakis, M.N., et al. (eds.) VLDB 2007, pp. 1435–1436. ACM (2007)
Google Scholar
Pearl, J.: Probabilistic reasoning in intelligent systems: network of plausible inference. Morgan Kaufmann, San Mates (1988)
Google Scholar
Russel, S., Norvig, P.: Artificial intelligence-A modern approach. Pearson Education, Prentice Hall (2002)
Google Scholar
Song, W., Yu, J.X., Cheng, H., Liu, H., He, J., Du, X.: Bayesian Network Structure Learning from Attribute Uncertain Data. In: Gao, H., Lim, L., Wang, W., Li, C., Chen, L. (eds.) WAIM 2012. LNCS, vol. 7418, pp. 314–321. Springer, Heidelberg (2012)
Chapter Google Scholar
Cooper, G., Herskovits, E.: A Bayesian method for the induction of probabilistic networks from data. Machine Learning. Machine Learning 9(4), 309–347 (1992)
MATH Google Scholar
Heckerman, D., Geiger, D., Chickering, D.: Learning Bayesian networks: the combination of knowledge and statistic data. Machine Learning 20(3), 197–243 (1995)
MATH Google Scholar
Tsamardinos, I., Brown, L., Aliferis, C.: The max-min hill-climbing Bayesian network structure learning algorithm. Machine Learning 65(1), 31–78 (2006)
Article Google Scholar
Suzuki, J.: Learning Bayesian belief networks based on the MDL principle: An efficient algorithm using the branch and bound technique. IEICE Trans. Information and Systems E82-D(2), 356–367 (1999)
Google Scholar
Xiang, Y., Chu, T.: Parallel learning of belief networks in large and difficult domains. Data Mining Knowledge Discovery 3(3), 315–338 (1999)
Article Google Scholar
Yu, K., Wang, H., Wu, X.: A parallel algorithm for learning Bayesian networks. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 1055–1063. Springer, Heidelberg (2007)
Chapter Google Scholar
Yoshinori, T., Seiya, I., Satoru, M.: Parallel Algorithm for Learning Optimal Bayesian Network Structure. Journal of Machine Learning Research 12, 2437–2459 (2011)
Google Scholar
Zhang, Q., Wang, S., Qin, B.: Cleaning Uncertain Streams by Parallelized Probabilistic Graphical Models. In: Chen, L., Tang, C., Yang, J., Gao, Y. (eds.) WAIM 2010. LNCS, vol. 6184, pp. 274–279. Springer, Heidelberg (2010)
Chapter Google Scholar
Dean, J., Ghemawat, S.: MapReduce: a flexible data processing tool. Communications of the ACM 53(1), 72–77 (2010)
Article Google Scholar
Chu, C., Kim, S., Lin, Y., Yu, Y., Bradski, G., Ng, A., Olukotun, K.: Map-Reduce for Machine Learning on Multicore. In: Schölkopf, B., Platt, J.C., Hoffman, T. (eds.) NIPS 2006, pp. 281–288. MIT Press, Vancouver (2006)
Google Scholar
Low, L., Bickson, D., Gonzalez, J., Kyrola, A., Guestrin, G., Hellerstein, J.: Distributed GraphLab: a framework for machine learning and data mining in the cloud. PVLDB 5(8), 716–727 (2012)
Google Scholar
Chen, W., Zong, L., Huang, W., Ou, G., Wang, Y., Yang, D.: An empirical study of massively parallel Bayesian networks learning for sentiment extraction from unstructured text. In: Du, X., Fan, W., Wang, J., Peng, Z., Sharaf, M.A. (eds.) APWeb 2011. LNCS, vol. 6612, pp. 424–435. Springer, Heidelberg (2011)
Chapter Google Scholar
Bahmani, B., Kumar, R., Vassilvitskii, S.: Densest subgraph in streaming and MapReduce. PVLDB 5(5), 454–465 (2012)
Google Scholar
Yuan, P., Sha, C., Wang, X., Yang, B., Zhou, A., Yang, S.: XML Structural Similarity Search Using MapReduce. In: Chen, L., Tang, C., Yang, J., Gao, Y. (eds.) WAIM 2010. LNCS, vol. 6184, pp. 169–181. Springer, Heidelberg (2010)
Chapter Google Scholar
White, T.: Hadoop: The Definitive Guide. O’Reilly Media, Inc. (2009)
Google Scholar
Cheng, J.: PowerConstructor system, http://webdocs.cs.ualberta.ca/~jcheng/bnpc.htm

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, 650091, Kunming, China
Qiyu Fang, Kun Yue, Hong Wu & Weiyi Liu
Key Laboratory of Software Engineering of Yunnan Province, 650091, Kunming, China
Kun Yue
Faculty of Information Engineering and Automation, Kunming University of Science and Technology, 650500, Kunming, China
Xiaodong Fu

Authors

Qiyu Fang
View author publications
You can also search for this author in PubMed Google Scholar
Kun Yue
View author publications
You can also search for this author in PubMed Google Scholar
Xiaodong Fu
View author publications
You can also search for this author in PubMed Google Scholar
Hong Wu
View author publications
You can also search for this author in PubMed Google Scholar
Weiyi Liu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Information Engineering, Nagoya University, 464-8601, Nagoya, Japan
Yoshiharu Ishikawa
Department of Computer Science and Technology, Harbin Institute of Technology, 150006, Harbin, China
Jianzhong Li
School of Computer Science and Engineering, University of New South Wales, 2031, Sydney, NSW, Australia
Wei Wang & Wenjie Zhang &
Department of Computing and Information Systems, University of Melbourne, 3052, Melbourne, VIC, Australia
Rui Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fang, Q., Yue, K., Fu, X., Wu, H., Liu, W. (2013). A MapReduce-Based Method for Learning Bayesian Network from Massive Data. In: Ishikawa, Y., Li, J., Wang, W., Zhang, R., Zhang, W. (eds) Web Technologies and Applications. APWeb 2013. Lecture Notes in Computer Science, vol 7808. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37401-2_68

Download citation

DOI: https://doi.org/10.1007/978-3-642-37401-2_68
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37400-5
Online ISBN: 978-3-642-37401-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics