Generalized Bayesian Structure Learning from Noisy Datasets

Tang, Yan; Chen, Yu; Ge, Gaolong

doi:10.1007/978-3-030-18590-9_11

Yan Tang¹⁹,
Yu Chen¹⁹ &
Gaolong Ge¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11448))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

3809 Accesses

Abstract

In recent years, with the open data movement around the world, more and more open data sets are available. But, the quality of the datasets poses issues for learning models. This study focuses on learning the Bayesian network structure from data sets containing noise. A novel approach called GBNL (Generalized Bayesian Structure Learning) is proposed. GBNL first uses a greedy algorithm to obtain an appropriate sliding window size for any dataset, then it leverages a difference array-based method to quickly improve the data quality by locating the noisy data sections and removing them. GBNL can not only evaluate the quality of the data set but also effectively reduce the noise in the data. We conduct experiments to evaluate GBNL on five large datasets, the experiment results validate the accuracy and the generalizability of this novel approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Adaptive Bayesian Network Structure Learning from Big Datasets

Towards Gaussian Bayesian Network Fusion

Multivariate Cluster-Based Discretization for Bayesian Network Structure Learning

References

Ben-Gal, I.: Bayesian Networks. Encyclopedia of Statistics in Quality and Reliability. Wiley, Hoboken (2007)
Google Scholar
Jain, A.K., Duin, R.P.W., Mao, J.: Statistical pattern recognition: a review. IEEE Trans. Pattern Anal. Mach. Intell. 27(11), 1502–1502 (2002)
Google Scholar
Njah, H., Jamoussi, S.: Weighted ensemble learning of Bayesian network for gene regulatory networks. Neurocomputing 150(B), 404–416 (2015)
Article Google Scholar
Yang, J., Tong, Y., Liu, X., Tan, S.: Causal inference from financial factors: continuous variable based local structure learning algorithm. In: 2014 IEEE Conference on Computational Intelligence for Financial Engineering & Economics (CIFEr), pp. 278–285. IEEE (2014)
Google Scholar
Giudici, P., Spelta, A.: Graphical network models for international financial flows. J. Bus. Econ. Stat. 34(1), 128–138 (2016)
Article MathSciNet Google Scholar
Yue, K., Wu, H., Fu, X., Xu, J., Yin, Z., Liu, W.: A data-intensive approach for discovering user similarities in social behavioral interactions based on the Bayesian network. Neurocomputing 219, 364–375 (2017)
Article Google Scholar
Tang, Y., Wang, Y., Cooper, K., Li, L.: Towards big data Bayesian network learning - an ensemble learning based approach. In: Proceedings of the IEEE International Congress on Big Data (BigData Congress), pp. 355–357 (2014)
Google Scholar
Jensen, F.V.: Bayesian artificial intelligence. Pattern Anal. Appl. 7(2), 221–223 (2004)
Article Google Scholar
Li, D., Chen, C., Lv, Q., Yan, J., Shang, L., Chu, S.: Low-rank matrix approximation with stability. In: International Conference on Machine Learning, pp. 295–303 (2016)
Google Scholar
Cheng, J., Greiner, R., Kelly, J., Bell, D., Liu, W.: Learning Bayesian networks from data: an information-theory based approach. Artif. Intell. 137(1–2), 43–90 (2002)
Article MathSciNet Google Scholar
Sessions, V., Valtorta, M.: Towards a method for data accuracy assessment utilizing a bayesian network learning algorithm. J. Data Inf. Qual. 1(3), 1–34 (2009)
Article Google Scholar
Wang, S.C., Leng, C.P., Rui-Jie, D.U.: Noise smoothing in learning parameters of Bayesian network. J. Syst. Simul. 21(16), 5046–5053 (2009)
Google Scholar
Ueno, M.: Robust learning Bayesian networks for prior belief. In: Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, pp. 698–707. AUAI Press (2011)
Google Scholar
Tsamardinos, I., Brown, L.E., Aliferis, C.F.: The max-min hill-climbing Bayesian network structure learning algorithm. Mach. Learn. 65(1), 31–78 (2006)
Article Google Scholar
Smith, J.Q., Daneshkhah, A.: On the robustness of Bayesian networks to learning from non-conjugate sampling. Int. J. Approximate Reason. 51(5), 558–572 (2010)
Article MathSciNet Google Scholar
Yaramakala, S., Margaritis, D.: Speculative Markov blanket discovery for optimal feature selection. In: Fifth IEEE International Conference on Data Mining (ICDM 2005), pp. 809–812. IEEE (2005)
Google Scholar
Wang, J., Yan, T., Mai, N., Altintas, I.: A scalable data science workflow approach for big data Bayesian network learning. In: IEEE/ACM International Symposium on Big Data Computing (2015)
Google Scholar
Wit, E., Heuvel, E.V.D.: ‘All models are wrong...’: an introduction to model uncertainty. Statistica Neerlandica 66(3), 217–236 (2012)
Google Scholar
Scutari, M.: Bayesian network constraint-based structure learning algorithms: parallel and optimised implementations in the bnlearn R package. J. Stat. Softw. 077 (2017)
Google Scholar
Ruohai, D., Xiaoguang, G., Zhigao, G.: Parameter learning of discrete Bayesian networks based on monotonic constraints. Syst. Eng. Electron. 36(2), 272–277 (2014)
MATH Google Scholar

Download references

Acknowledgments

The work was supported by Key Technologies Research and Development Program of China (2017YFC0405805-04).

Author information

Authors and Affiliations

College of Computer and Information, Hohai University, Nanjing, 210098, China
Yan Tang, Yu Chen & Gaolong Ge

Authors

Yan Tang
View author publications
You can also search for this author in PubMed Google Scholar
Yu Chen
View author publications
You can also search for this author in PubMed Google Scholar
Gaolong Ge
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yan Tang .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Guoliang Li
Duke University, Durham, NC, USA
Jun Yang
University of Porto, Porto, Portugal
Joao Gama
Chiang Mai University, Chiang Mai, Thailand
Juggapong Natwichai
Beihang University, Beijing, China
Yongxin Tong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tang, Y., Chen, Y., Ge, G. (2019). Generalized Bayesian Structure Learning from Noisy Datasets. In: Li, G., Yang, J., Gama, J., Natwichai, J., Tong, Y. (eds) Database Systems for Advanced Applications. DASFAA 2019. Lecture Notes in Computer Science(), vol 11448. Springer, Cham. https://doi.org/10.1007/978-3-030-18590-9_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-18590-9_11
Published: 24 April 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-18589-3
Online ISBN: 978-3-030-18590-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics