Online Markov Blanket Learning for High-Dimensional Data

Ling, Zhaolong; Li, Bo; Zhang, Yiwen; Li, Ying; Ling, Haifeng

doi:10.1007/s10489-022-03841-5

Online Markov Blanket Learning for High-Dimensional Data

Published: 05 July 2022

Volume 53, pages 5977–5997, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Zhaolong Ling¹,
Bo Li¹,
Yiwen Zhang¹,
Ying Li¹ &
…
Haifeng Ling²

335 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Since the Markov blanket (MB) of a class variable captures the causal relationship between the class variable and selected features, employing the MB of a class variable for feature selection improves the interpretability and robustness of the predictive model. Online MB learning aims to identify the MB with streaming features. However, the only existing online MB learning algorithm needs to enumerate the subsets of selected PC (i.e., parents and children) and spouses and may include false-positives in the found MB, thus affecting the efficiency and accuracy on high-dimensional data. To address this issue, in this paper, we propose two online MB learning algorithms, called Online SimulTaneous MB learning (O-ST) algorithm and Online Divide-and-Conquer MB learning (O-DC) algorithm. When a new feature arrived, O-ST simultaneously learns the PC and spouses (i.e., the MB) conditioned on the currently selected MB, and O-DC learns the PC and spouses separately by sequentially comparing the mutual information in the currently selected PC. The comprehensive experimental results validate that the proposed algorithms achieve higher efficiency and better accuracy than the state-of-the-art online MB learning algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

Article 09 November 2022

Learning from positive and unlabeled data: a survey

Article 02 April 2020

Feature selection techniques for machine learning: a survey of more than two decades of research

Article 01 December 2023

Notes

These datasets are publicly available at http://pages.mtu.edu/~lebrown/supplements/mmhc_paper/mmhc_index.html.

References

Pearl J (1988) Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Francisco
MATH Google Scholar
Pearl J (2014) Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Elsevier, Amsterdam
MATH Google Scholar
Aliferis CF, Statnikov A, Tsamardinos I, Mani S, Koutsoukos XD (2010) Local causal and markov blanket induction for causal discovery and feature selection for classification part I: algorithms and empirical evaluation. J Mach Learn Res 11(1):171–234
MathSciNet MATH Google Scholar
Aliferis CF, Statnikov A, Tsamardinos I, Mani S, Koutsoukos XD (2010) Local causal and markov blanket induction for causal discovery and feature selection for classification part ii: Analysis and extensions. J Mach Learn Res 11(Jan):235–284
MathSciNet MATH Google Scholar
Yu K, Liu L, Li J (2021) A unified view of causal and non-causal feature selection. ACM Trans Knowl Discov Data 15(4):1–46
Article Google Scholar
Guyon I, Aliferis C, et al. (2007) Causal feature selection. In: Computational methods of feature selection, pp 75–97, Chapman and hall/CRC, New York
Yu K, Liu L, Li J, Ding W, Le TD (2019) Multi-source causal feature selection. IEEE Trans Pattern Anal Mach Intell 42(9):2240–2256
Article Google Scholar
Yu K, Guo X, Liu L, Li J, Wang H, Ling Z, Wu X (2020) Causality-based feature selection: Methods and evaluations. ACM Comput Surv 53(5):1–36
Article Google Scholar
Wu X, Yu K, Ding W, Wang H, Zhu X (2012) Online feature selection with streaming features. IEEE Trans Pattern Anal Mach Intell 35(5):1178–1192
Google Scholar
Li J, Cheng K, Wang S, Morstatter F, Trevino RP, Tang J, Liu H (2017) Feature selection: A data perspective. ACM Comput Surv 50(6):1–45
Article Google Scholar
Hosu V, Lin H, Sziranyi T, Saupe D (2020) Koniq-10k: an ecologically valid database for deep learning of blind image quality assessment. IEEE Trans Image Process 29:4041–4056
Article MATH Google Scholar
You D, Li R, Liang S, Sun M, Ou X, Yuan F, Shen L, Wu X (2021) Online causal feature selection for streaming features. IEEE Trans Neural Netw Learn Syst, https://doi.org/10.1109/TNNLS.2021.3105585
Hu W, Yang S, Guo X, Yu K (2021) Accelerating learning bayesian network structures by reducing redundant ci tests. In: International conference on big knowledge (ICBK). IEEE, pp 46–53
Tsamardinos I, Aliferis CF, Statnikov AR, Statnikov E (2003) Algorithms for large scale markov blanket discovery. In: FLAIRS conference, vol 2, pp 376–380
Borboudakis G, Tsamardinos I (2019) Forward-backward selection with early dropping. J Mach Learn Res 20(1):276– 314
MathSciNet MATH Google Scholar
Guo X, Yu K, Cao F, Li P, Wang H (2022) Error-aware markov blanket learning for causal feature selection. Inf Sci 589:849– 877
Article Google Scholar
Zhang H, Zhou S, Zhang K, Guan J (2017) Causal discovery using regression-based conditional independence tests. In: Thirty-first AAAI conference on artificial intelligence
Salimi B, Parikh H, Kayali M, Getoor L, Roy S, Suciu D (2020) Causal relational learning. In: Proceedings of the 2020 ACM SIGMOD international conference on management of data, pp 241–256
Pena JM, Nilsson R, Björkegren J, Tegnér J (2007) Towards scalable and data efficient learning of markov boundaries. Int J Approx Reason 45(2):211–232
Article MATH Google Scholar
Gao T, Ji Q (2017) Efficient markov blanket discovery and its application. IEEE Trans Cybern 47(5):1169–1179
Article MathSciNet Google Scholar
Ling Z, Yu K, Wang H, Liu L, Ding W, Wu X (2019) Bamb: A balanced markov blanket discovery approach to feature selection. ACM Trans Intell Syst Technol 10(5):1–25
Article Google Scholar
Wang H, Ling Z, Yu K, Wu X (2020) Towards efficient and effective discovery of markov blankets for feature selection. Inf Sci 509:227–242
Article Google Scholar
Wu X, Jiang B, Yu K, Chen H (2019) Accurate markov boundary discovery for causal feature selection. IEEE Trans Cybern 50(12):4983–4996
Article Google Scholar
Wang Y, Li X, Ruiz R (2018) Weighted general group lasso for gene selection in cancer classification. IEEE Trans Cybern 49(8):2860–2873
Article Google Scholar
Jiang B, Li C, Rijke MD, Yao X, Chen H (2019) Probabilistic feature selection and classification vector machine. ACM Trans Knowl Discov Data 13(2):1–27
Article Google Scholar
Zhou P, Chen J, Du L, Li X (2022) Balanced spectral feature selection. IEEE Transactions on Cybernetics. https://doi.org/10.1109/TCYB.2022.3160244
Cui X, Li Y, Fan J, Wang T (2022) A novel filter feature selection algorithm based on relief. Appl Intell 52(5):5063–5081
Article Google Scholar
Das A, Kempe D (2018) Approximate submodularity and its applications: Subset selection, sparse approximation and dictionary selection. J Mach Learn Res 19(1):74–107
MathSciNet MATH Google Scholar
Zhou H, Wang X, Zhu R (2022) Feature selection based on mutual information with correlation coefficient. Appl Intell 52(5):5457–5474
Article Google Scholar
Yu K, Wu X, Ding W, Pei J (2016) Scalable and accurate online feature selection for big data. ACM Trans Knowl Discov Data 11(2):1–39
Article Google Scholar
Zhou P, Li P, Zhao S, Wu X (2020) Feature interaction for streaming feature selection. IEEE Trans Neural Netw Learn Syst 32(10):4691–4702
Article MathSciNet Google Scholar
Zhou P, Zhao S, Yan Y, Wu X (2022) Online scalable streaming feature selection via dynamic decision. ACM Trans Knowl Discov Data 16(5):1–20
Article Google Scholar
Spirtes P, Glymour CN, Scheines R (2000) causation, prediction, and search. MIT press, Cambridge
Bonnlander BV, Weigend AS (1994) Selecting input variables using mutual information and nonparametric density estimation. In: Proceedings of the 1994 international symposium on artificial neural networks (ISANN94), pp 42–50, Citeseer
Ling Z, Yu K, Zhang Y, Liu L, Li J (2021) Causal learner: A toolbox for causal structure and markov blanket learning. arXiv:2103.06544
Dheeru D, Karra Taniskidou E (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml. Accessed 15 June 2022
Yu K, Liu L, Li J (2019) Learning markov blankets from multiple interventional data sets. IEEE Trans Neural Netw Learn Syst 31(6):2005–2019
Article MathSciNet Google Scholar
Szymanski P, Kajdanowicz T (2019) Scikit-multilearn: A scikit-based python environment for performing multi-label classification. J Mach Learn Res 20(1):209–230
Google Scholar
Yu K, Yang Y, Ding W (2022) Causal feature selection with missing data. ACM Trans Knowl Discov Data 16(4):1– 24
Article Google Scholar

Download references

Acknowledgements

This work was supported by the National Key Research and Development Program of China (No. 2019YFB1704101), the National Natural Science Foundation of China (No. U1936220, 61872002, and 62006003), and the Natural Science Foundation of Anhui Province of China (No. 2108085QF270 and 2008085QF307).

Author information

Authors and Affiliations

School of Computer Science and Technology, Anhui University, Hefei, Anhui Province, 230601, People’s Republic of China
Zhaolong Ling, Bo Li, Yiwen Zhang & Ying Li
School of Management, Hefei University of Technology, Hefei, Anhui Province, 230009, People’s Republic of China
Haifeng Ling

Authors

Zhaolong Ling
View author publications
You can also search for this author in PubMed Google Scholar
Bo Li
View author publications
You can also search for this author in PubMed Google Scholar
Yiwen Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Ying Li
View author publications
You can also search for this author in PubMed Google Scholar
Haifeng Ling
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yiwen Zhang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Zhaolong Ling and Haifeng Ling contributed equally to this work

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ling, Z., Li, B., Zhang, Y. et al. Online Markov Blanket Learning for High-Dimensional Data. Appl Intell 53, 5977–5997 (2023). https://doi.org/10.1007/s10489-022-03841-5

Download citation

Accepted: 19 May 2022
Published: 05 July 2022
Issue Date: March 2023
DOI: https://doi.org/10.1007/s10489-022-03841-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Online Markov Blanket Learning for High-Dimensional Data

Abstract

Access this article

Similar content being viewed by others

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

Learning from positive and unlabeled data: a survey

Feature selection techniques for machine learning: a survey of more than two decades of research

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Online Markov Blanket Learning for High-Dimensional Data

Abstract

Access this article

Similar content being viewed by others

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

Learning from positive and unlabeled data: a survey

Feature selection techniques for machine learning: a survey of more than two decades of research

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation