Abstract
In recent years, with the open data movement around the world, more and more open data sets are available. But, the quality of the datasets poses issues for learning models. This study focuses on learning the Bayesian network structure from data sets containing noise. A novel approach called GBNL (Generalized Bayesian Structure Learning) is proposed. GBNL first uses a greedy algorithm to obtain an appropriate sliding window size for any dataset, then it leverages a difference array-based method to quickly improve the data quality by locating the noisy data sections and removing them. GBNL can not only evaluate the quality of the data set but also effectively reduce the noise in the data. We conduct experiments to evaluate GBNL on five large datasets, the experiment results validate the accuracy and the generalizability of this novel approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ben-Gal, I.: Bayesian Networks. Encyclopedia of Statistics in Quality and Reliability. Wiley, Hoboken (2007)
Jain, A.K., Duin, R.P.W., Mao, J.: Statistical pattern recognition: a review. IEEE Trans. Pattern Anal. Mach. Intell. 27(11), 1502–1502 (2002)
Njah, H., Jamoussi, S.: Weighted ensemble learning of Bayesian network for gene regulatory networks. Neurocomputing 150(B), 404–416 (2015)
Yang, J., Tong, Y., Liu, X., Tan, S.: Causal inference from financial factors: continuous variable based local structure learning algorithm. In: 2014 IEEE Conference on Computational Intelligence for Financial Engineering & Economics (CIFEr), pp. 278–285. IEEE (2014)
Giudici, P., Spelta, A.: Graphical network models for international financial flows. J. Bus. Econ. Stat. 34(1), 128–138 (2016)
Yue, K., Wu, H., Fu, X., Xu, J., Yin, Z., Liu, W.: A data-intensive approach for discovering user similarities in social behavioral interactions based on the Bayesian network. Neurocomputing 219, 364–375 (2017)
Tang, Y., Wang, Y., Cooper, K., Li, L.: Towards big data Bayesian network learning - an ensemble learning based approach. In: Proceedings of the IEEE International Congress on Big Data (BigData Congress), pp. 355–357 (2014)
Jensen, F.V.: Bayesian artificial intelligence. Pattern Anal. Appl. 7(2), 221–223 (2004)
Li, D., Chen, C., Lv, Q., Yan, J., Shang, L., Chu, S.: Low-rank matrix approximation with stability. In: International Conference on Machine Learning, pp. 295–303 (2016)
Cheng, J., Greiner, R., Kelly, J., Bell, D., Liu, W.: Learning Bayesian networks from data: an information-theory based approach. Artif. Intell. 137(1–2), 43–90 (2002)
Sessions, V., Valtorta, M.: Towards a method for data accuracy assessment utilizing a bayesian network learning algorithm. J. Data Inf. Qual. 1(3), 1–34 (2009)
Wang, S.C., Leng, C.P., Rui-Jie, D.U.: Noise smoothing in learning parameters of Bayesian network. J. Syst. Simul. 21(16), 5046–5053 (2009)
Ueno, M.: Robust learning Bayesian networks for prior belief. In: Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, pp. 698–707. AUAI Press (2011)
Tsamardinos, I., Brown, L.E., Aliferis, C.F.: The max-min hill-climbing Bayesian network structure learning algorithm. Mach. Learn. 65(1), 31–78 (2006)
Smith, J.Q., Daneshkhah, A.: On the robustness of Bayesian networks to learning from non-conjugate sampling. Int. J. Approximate Reason. 51(5), 558–572 (2010)
Yaramakala, S., Margaritis, D.: Speculative Markov blanket discovery for optimal feature selection. In: Fifth IEEE International Conference on Data Mining (ICDM 2005), pp. 809–812. IEEE (2005)
Wang, J., Yan, T., Mai, N., Altintas, I.: A scalable data science workflow approach for big data Bayesian network learning. In: IEEE/ACM International Symposium on Big Data Computing (2015)
Wit, E., Heuvel, E.V.D.: ‘All models are wrong...’: an introduction to model uncertainty. Statistica Neerlandica 66(3), 217–236 (2012)
Scutari, M.: Bayesian network constraint-based structure learning algorithms: parallel and optimised implementations in the bnlearn R package. J. Stat. Softw. 077 (2017)
Ruohai, D., Xiaoguang, G., Zhigao, G.: Parameter learning of discrete Bayesian networks based on monotonic constraints. Syst. Eng. Electron. 36(2), 272–277 (2014)
Acknowledgments
The work was supported by Key Technologies Research and Development Program of China (2017YFC0405805-04).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Tang, Y., Chen, Y., Ge, G. (2019). Generalized Bayesian Structure Learning from Noisy Datasets. In: Li, G., Yang, J., Gama, J., Natwichai, J., Tong, Y. (eds) Database Systems for Advanced Applications. DASFAA 2019. Lecture Notes in Computer Science(), vol 11448. Springer, Cham. https://doi.org/10.1007/978-3-030-18590-9_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-18590-9_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-18589-3
Online ISBN: 978-3-030-18590-9
eBook Packages: Computer ScienceComputer Science (R0)