Abstract
Bayesian network (BN) is the popular and important probabilistic graphical model for representing and inferring uncertain knowledge. Learning BN from massive data is the basis for uncertain-knowledge-centered inferences, prediction and decision. The inherence of massive data makes BN learning be adjusted to the large data volume and executed in parallel. In this paper, we proposed a MapReduce-based approach for learning BN from massive data by extending the traditional scoring & search algorithm. First, in the scoring process, we developed map and reduce algorithms for obtaining the required parameters in parallel. Second, in the search process, for each node we developed map and reduce algorithms for scoring all the candidate local structures in parallel and selecting the local optimal structure with the highest score. Thus, the local optimal structures of each node are merged to the global optimal one. Experimental result indicates our proposed method is effective and efficient.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Kouzes, R., Anderson, G., Elbert, S., Gorton, L., Gracio, D.: The changing paradigm of data–intensive computing. IEEE Computer 42(1), 26–34 (2009)
Agrawal, D., El Abbadi, A., Antony, S., Das, S.: Data Management Challenges in Cloud Computing Infrastructures. In: Kikuchi, S., Sachdeva, S., Bhalla, S. (eds.) DNIS 2010. LNCS, vol. 5999, pp. 1–10. Springer, Heidelberg (2010)
Borkar, V., Carey, M., Grover, R., Onose, N., Vernica, R.: Hyracks: A flexible and extensible foundation for data–intensive computing. In: Abiteboul, S., Böhm, K., Koch, C., Tan, K. (eds.) Proc. of ICDE 2011, pp. 1151–1162. IEEE Computer Society, Hannover (2011)
Deshpande, A., Sarawagi, S.: Probabilistic graphical models and their role in database. In: Koch, C., Gehrke, J., Garofalakis, M.N., et al. (eds.) VLDB 2007, pp. 1435–1436. ACM (2007)
Pearl, J.: Probabilistic reasoning in intelligent systems: network of plausible inference. Morgan Kaufmann, San Mates (1988)
Russel, S., Norvig, P.: Artificial intelligence-A modern approach. Pearson Education, Prentice Hall (2002)
Song, W., Yu, J.X., Cheng, H., Liu, H., He, J., Du, X.: Bayesian Network Structure Learning from Attribute Uncertain Data. In: Gao, H., Lim, L., Wang, W., Li, C., Chen, L. (eds.) WAIM 2012. LNCS, vol. 7418, pp. 314–321. Springer, Heidelberg (2012)
Cooper, G., Herskovits, E.: A Bayesian method for the induction of probabilistic networks from data. Machine Learning. Machine Learning 9(4), 309–347 (1992)
Heckerman, D., Geiger, D., Chickering, D.: Learning Bayesian networks: the combination of knowledge and statistic data. Machine Learning 20(3), 197–243 (1995)
Tsamardinos, I., Brown, L., Aliferis, C.: The max-min hill-climbing Bayesian network structure learning algorithm. Machine Learning 65(1), 31–78 (2006)
Suzuki, J.: Learning Bayesian belief networks based on the MDL principle: An efficient algorithm using the branch and bound technique. IEICE Trans. Information and Systems E82-D(2), 356–367 (1999)
Xiang, Y., Chu, T.: Parallel learning of belief networks in large and difficult domains. Data Mining Knowledge Discovery 3(3), 315–338 (1999)
Yu, K., Wang, H., Wu, X.: A parallel algorithm for learning Bayesian networks. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 1055–1063. Springer, Heidelberg (2007)
Yoshinori, T., Seiya, I., Satoru, M.: Parallel Algorithm for Learning Optimal Bayesian Network Structure. Journal of Machine Learning Research 12, 2437–2459 (2011)
Zhang, Q., Wang, S., Qin, B.: Cleaning Uncertain Streams by Parallelized Probabilistic Graphical Models. In: Chen, L., Tang, C., Yang, J., Gao, Y. (eds.) WAIM 2010. LNCS, vol. 6184, pp. 274–279. Springer, Heidelberg (2010)
Dean, J., Ghemawat, S.: MapReduce: a flexible data processing tool. Communications of the ACM 53(1), 72–77 (2010)
Chu, C., Kim, S., Lin, Y., Yu, Y., Bradski, G., Ng, A., Olukotun, K.: Map-Reduce for Machine Learning on Multicore. In: Schölkopf, B., Platt, J.C., Hoffman, T. (eds.) NIPS 2006, pp. 281–288. MIT Press, Vancouver (2006)
Low, L., Bickson, D., Gonzalez, J., Kyrola, A., Guestrin, G., Hellerstein, J.: Distributed GraphLab: a framework for machine learning and data mining in the cloud. PVLDB 5(8), 716–727 (2012)
Chen, W., Zong, L., Huang, W., Ou, G., Wang, Y., Yang, D.: An empirical study of massively parallel Bayesian networks learning for sentiment extraction from unstructured text. In: Du, X., Fan, W., Wang, J., Peng, Z., Sharaf, M.A. (eds.) APWeb 2011. LNCS, vol. 6612, pp. 424–435. Springer, Heidelberg (2011)
Bahmani, B., Kumar, R., Vassilvitskii, S.: Densest subgraph in streaming and MapReduce. PVLDB 5(5), 454–465 (2012)
Yuan, P., Sha, C., Wang, X., Yang, B., Zhou, A., Yang, S.: XML Structural Similarity Search Using MapReduce. In: Chen, L., Tang, C., Yang, J., Gao, Y. (eds.) WAIM 2010. LNCS, vol. 6184, pp. 169–181. Springer, Heidelberg (2010)
White, T.: Hadoop: The Definitive Guide. O’Reilly Media, Inc. (2009)
Cheng, J.: PowerConstructor system, http://webdocs.cs.ualberta.ca/~jcheng/bnpc.htm
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fang, Q., Yue, K., Fu, X., Wu, H., Liu, W. (2013). A MapReduce-Based Method for Learning Bayesian Network from Massive Data. In: Ishikawa, Y., Li, J., Wang, W., Zhang, R., Zhang, W. (eds) Web Technologies and Applications. APWeb 2013. Lecture Notes in Computer Science, vol 7808. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37401-2_68
Download citation
DOI: https://doi.org/10.1007/978-3-642-37401-2_68
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37400-5
Online ISBN: 978-3-642-37401-2
eBook Packages: Computer ScienceComputer Science (R0)