Mutual-information-inspired heuristics for constraint-based causal structure learning
Introduction
Bayesian network is a kind of probabilistic graph model. Due to its compact representation and powerful reasoning ability under uncertain environment, it has attracted an increasing number of attention from researchers, and it has been widely used in the fields of neuroscience [1], computer science [2], industrial applications [3], dependability and risk analysis [4].
A Bayesian network is mainly composed of two parts: a directed acyclic graph and a probability distribution table, where the nodes represent random variables, and the edges represent direct dependencies between two variables. The probability distribution table describes the dependence of each variable on its parent nodes. The Bayesian network structure learning techniques in the literature can be roughly grouped into three categories: approaches based on structural equation models, score-based approaches, and constraint-based approaches.
Utilizing an appropriately defined structural equation model to learn causal structures is essentially a “data-driven” approach where the observed data distribution is used to discover the underlying causal relations [5]. Some representative models include the classic Linear Non-Gaussian Acyclic Model, Nonlinear additive noise model, and Post-nonlinear causal model [6], where independence testing is used to identify the causal direction between variables. Others have attempted to formalize causal structure learning as a continuous optimization problem. Such methods include DAG-GNN learning method based on graph neural network [7] and RL-BIC2 learning method based on reinforcement learning technology [8].
In the score-based approaches, a scoring function is used to evaluate how well each candidate structure can fit the observation data, and the final output is one or more directed acyclic graphs with the best score. Since the space of the candidate structures increases exponentially with the number of variables, researchers have investigated various search strategies such as structural space search, equivalence class space search, and variable order space search. There are exact learning approaches that attempt to find the global optimal solution, such as integer linear programming methods, dynamic programming methods, branch and bound methods [9], as well as approximate approaches such as evolutionary methods [10], heuristic algorithms [11], Local–global learning methods [12], [13], surrogate model [14], and bounded tree-width structure optimization [15].
The constraint-based approaches use variable independence testing and edge orientation rules to learn the Bayesian network structure that can best explain the observed dependencies. As shown in Fig. 1, a constraint-based approach typically has three steps.
- •
Step 1: Skeleton learning (adjacency search). Some conditional independence test, such as test, mutual information and Fisher’s Z-test, is used to identify whether there exists an edge between any two variables.
- •
Step 2: V-structure recognition. The conditional-separation sets generated in Step 1 are used to identify potential v-structures.
- •
Step 3: The v-structures identified in Step 2, Meek’s orientation rules[16], together with the observational data, are used to transform the undirected graph produced in Step 1 into a directed acyclic graph.
The classic constraint-based learning approach is the PC algorithm introduced by Spirtes et al. [17]. Researchers have recently tried to extend the constraint-based algorithm to learn from time series data [18] and nonstationary data [19]. Because the PC algorithm is order-dependent, researchers have successively proposed order-independent methods such as PC-stable algorithm [20] and PC-MI algorithm [21].
Most constraint-based approaches take the orientation-faithfulness assumption, which, however, is hard to satisfy in real world situations. Researchers have proposed CPC algorithm [22], MPC algorithm, CPC-stable algorithm and MPC-stable algorithm [20] for learning Bayesian network structures when the orientation-faithfulness assumption is violated. Although these methods can improve the quality of structure learning, they often suffer from high computational cost and tend to generate graphs with many unoriented edges.
In this study, we mainly focus on steps 1 and 2, investigating effective strategies for learning high-quality Bayesian network structures when the orientation-faithfulness assumption is partially violated. Our contribution is threefold:
- •
We propose an algorithm named MIIPC for learning Bayesian network structures. The adjacency search step of MIIPC is an extension of our PC-MI algorithm [21], where we integrate the WEF heuristic strategy with the notion of Markov-chain consistency, in the hope to effectively reduce the number of conditional independence testing, and to sustain the number of false positive edges.
- •
We propose the Smaller Adjacency-Set heuristics for v-structure recognition. We also prove that the Smaller Adjacency-Set is surprisingly powerful in the sense that it can capture sufficient information for determining whether an unshielded triple forms a v-structure.
- •
We experimentally show that the proposed algorithm MIIPC, empowered by the WEF and SAS strategies, outperforms the state-of-the-art approaches in both the quality of causal structure learning and the execution time.
Section snippets
Preliminaries and related work
This section introduces terminology and relevant constraint-based approaches.
Two random variables X and Y are independent conditional on , denoted by , iff , for all values x of of Y, and of such that .
Let be a graph, where is a set of random variables (vertices) in the problem domain under concern, and is a set of edges between vertices. A directed edge from to is represented by , and an undirected edge
M-order-based causal structure learning
We introduce the Markov-chain Consistency in Section 3.1, and discuss our algorithm called MIIPC in Section 3.2 M-order-based skeleton construction, 3.3 M-order-based v-structure determination.
Empirical evaluation
We evaluate the proposed MIIPC algorithm in two aspects: computing time and the quality of learned network structures. For the latter, we use the edge-related measurements [26], [27].
- •
Extra edges (false positive): the number of edges that are found in the learned structure but are not present in the original “gold-standard” structure.
- •
Missing edges (false negative): the number of edges that are present in the original structure but are missing in the learned structure.
- •
Reverse edges (orientation
Conclusion and future work
The state-of-the-art constraint-based approaches tend to yield unstable results which can be greatly affected by the order of choosing variable pairs and decisions about v-structures. In addition, the number of conditional independence tests in PC-like algorithms increases exponentially as the number of variables increases.
Inspired by the strong connection between the degree of mutual information shared by two variables and their conditional independence, we have introduced a causal structure
CRediT authorship contribution statement
Xiaolong Qi: Conceptualization, Methodology, Software, Writing - original draft. Xiaocong Fan: Methodology, Writing - review & editing. Huiling Wang: Writing - review & editing. Ling Lin: Writing - review & editing. Yang Gao: Supervision, Writing - review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work was supported by the National Key R&D Program of China (2017YFB0702600, 2017YFB0702601);the National Natural Science Foundation of China (61432008, 61663045, 61503178);SRPH in Xinjiang (XJEDU2020Y036); Yili Normal University Ph.D. Startup Fund (2020YSB007) and scientific research project of Yili Normal University (2020YSZD004).
References (30)
- et al.
A novel fault propagation path identification inference algorithm using parent nodes filter
J. Data Sci.
(2019) - et al.
Overview on bayesian networks applications for dependability, risk analysis and maintenance areas
Eng. Appl. Artif. Intell.
(2012) - et al.
Bnc-pso: structure learning of bayesian networks by particle swarm optimization
Inf. Sci.
(2016) - et al.
Efficient score-based markov blanket discovery
Int. J. Approx. Reason.
(2017) - et al.
Learning bayesian network structures using weakest mutual-information-first strategy
Int. J. Approx. Reason.
(2019) - et al.
Bayesian networks in neuroscience: a survey
Front. Comput. Neurosci.
(2014) - et al.
Software defect prediction using bayesian networks
Empir. Software Eng.
(2014) - et al.
Survey of causality discovery based on non-time series observation data
Chin. J. Comput.
(2017) - P. Spirtes, K. Zhang, Causal discovery and inference: concepts and recent methodological advances, in: Applied...
- Y. Yu, J. Chen, T. Gao, M. Yu, Dag-gnn: dag structure learning with graph neural networks, in: International Conference...
A survey on bayesian network structure learning from data
Prog. Artif. Intell.
Bayesian network structure learning using quantum annealing
Eur. Phys. J. Special Top.
Cited by (5)
Error-aware Markov blanket learning for causal feature selection
2022, Information SciencesCitation Excerpt :Embedded methods combine the filter selection stage with the learning step and obtain the feature subsets by optimizing the objective function, such as regression shrinkage and selection via the lasso (LASSO) [17]. However, most of the traditional feature selection algorithms do not explicitly uncover cause relationships between features and the class variable, and thus they are lack of interpretability and robustness [18–21]. To address this problem, causal feature selection algorithms are presented.
A novel data enhancement approach to DAG learning with small data samples
2023, Applied IntelligenceA novel feature selection method via mining Markov blanket
2023, Applied IntelligenceSelf-Awakened Particle Swarm Optimization BN Structure Learning Algorithm Based on Search Space Constraint
2023, Computers, Materials and Continua