Separation and recovery Markov boundary discovery and its application in EEG-based emotion recognition
Introduction
Recent years have witnessed the proliferation of the application of Markov boundary (MB) in machine learning [1], [2] and data mining [3], [4], which could explicitly induce the local causal relationships around a target variable [5] and thus be used to improve the interpretability and robustness of learning models [6]. In a faithful [5] Bayesian network (BN), MB consists of the direct causes (parents), direct effects (children), and other direct causes of direct effects (spouses) of its corresponding target [5]. These variables provide a complete picture of the local causal structure around the target [7], and have the potential ability to imply the underlying mechanism of the target. Thus, MB discovery algorithms are widely applied to numerous real-world tasks. For example, MB discovery is the first step in causal learning and BN structure learning, where the skeleton of the BN without orientation is constructed through learning MB (or subset of MB) [8]. Another important application is feature selection [9], since all other features are independent of the class attribute conditioned on its MB [9]. Some studies [10], [11] have proved that the MB set is the theoretically optimal subset for learning and inference tasks. To search the MB, extensive algorithms are proposed, including simultaneous MB learning algorithms and divide-and-conquer MB learning algorithms according to a recent review [6].
Some up-to-date research studies [12], [13], [14], [15] targeted the improvement in the accuracy of MB discovery. However, some true positives1 still fail to be identified especially when training samples are small-scale and insufficient. The main reason behind this issue is that the studies focus on theoretical guarantee and employ a strict criterion to filter false positives in a discovered MB set. In this case, some true positives are easily discarded by mistake, leading to negligible improvement in accuracy. Conditional independence (CI) test between variables is another primary cause for this problem since the CI test is limited by the scale of its conditioning set. More precisely, when the conditioning set in a CI test is large, the CI test might fail and the tested relationship is directly judged to be independent. Therefore, MB discovery algorithms based on the CI test will discard some true positives when the input target has a relatively large-scale MB. Nevertheless, it is pertinent to note that these algorithms guarantee the absence of false positives in a discovered MB under the faithfulness assumption, which is the basis of the idea in this article.
The above analyses inspire us to design a two-phase-discovery algorithm targeting the detection of true positives in MB discovery process which may be ignored due to the issues pointed above. More specifically, the first phase, called separation phase, uses a time-efficient MB discovery algorithm to get an initial MB, which is incomplete but includes very few false variables, especially when the faithfulness assumption is satisfied. Based on the initial MB, the second phase, called recovery phase, exploits a recovery process to retrieve discarded parent–child (PC) and spouse (SP) variables through a divide-and-conquer search. Due to the complementarity between the two phases, the proposed algorithm discovers a more accurate MB.
However, the two-phase-discovery strategy gives rise to another problem, that is, how to distinguish between PC variables and SP variables in a discovered MB from the first phase? To solve this problem, a score function is designed to rank variables in an MB, where the statistical property differences between PC and SP variables are analyzed and the score is assigned in a way that PC and SP variables receive a distinct score based on their difference. Based on the score function, we propose an algorithm called SeparateMB to identify spouses in a ranked MB variable list, which could effectively and efficiently separate the PC and SP sets. To maintain high accuracy in real-world applications violating the faithfulness assumption, a symmetry test is added in the recovery phase, to help avoid many errors by testing the symmetry property between parent and child variables.
Motivated by the aforementioned ideas, we proposed a two-phase-discovery algorithm, called separation and recovery MB discovery (SRMB) algorithm. The main contributions of this article are summarized as follows:
- 1.
Based on the proposed two-phase-discovery strategy, SRMB could detect more true positives. Thus, SRMB is a more accurate and more data-efficient MB discovery algorithm compared with existing algorithms. Furthermore, SRMB is robust in real-world applications due to the symmetry test in its recovery process, which could avoid most of the errors introduced by unfaithfulness.
- 2.
For the first time, the statistical properties of PC and SP variables are analyzed in this study to differentiate between PC and SP variables in a discovered MB. This analysis not only helps to design a novel SRMB algorithm, but could facilitate the other relevant research studies in this domain.
- 3.
We theoretically prove the correctness of the proposed SRMB, and conduct extensive experiments on BN data sets and real-world data sets to validate its superiority over other algorithms in terms of MB discovery, BN structure learning, and causal feature selection. Moreover, we specially employ SRMB to select signal features in an EEG-based emotion recognition data set, which demonstrates the effectiveness of SRMB for EEG-based emotion recognition and further proves that the critical features belong to Gamma and Beta Frequency bands. We also conclude that these critical features are distributed at the lateral temporal area.
Section snippets
Brief review of MB discovery algorithms
The concept of MB is first proposed and discussed by Judea Pearl [5], which provides a complete picture of the local causal relationships around its corresponding target. MB is naturally capable of bridging causation with predictivity and thus has an extensive application prospect for many real-world tasks, such as causal discovery, causal inference, and feature selection. Hence, MB has attracted much attention and numerous MB discovery algorithms have been proposed recently. A recent
The proposed algorithm
In this section, the novel MB discovery algorithm is presented. We first give an overview of the proposed SRMB in Section 3.1. Afterwards, we explain the MB separation process in Section 3.2, and the recovery process in Section 3.3. Finally, we present theoretical analyses of the proposed SRMB in Section 3.4.
Experimental studies
In this section, we compare the proposed SRMB with four state-of-the-art MB discovery algorithms on different tasks. The comparing algorithms include one simultaneous MB learning algorithm (IAMB [18]) and three divide-and-conquer MB learning algorithms (PCMB [12], MBOR [13], and STMB [15]). In Section 4.1, we first use standard BN data sets to demonstrate the superiority of SRMB for MB discovery. Furthermore, we apply SRMB to BN structure learning and feature selection tasks with 5 BN data sets
Applied to EEG-based emotion recognition
In this section, we apply the SRMB to emotion recognition task based on electroencephalography (EEG) data. EEG is used to record the electrical activity of the brain through several electrodes placed on the scalp. As we know, EEG data over multiple sessions are unstable [41], [42] since recordings are done in different sessions where possible distribution shift might exist between these data from multiple sessions. This shift is induced due to changes in parameters such as [43], [44]: the
Conclusion: Discussion, limitation, and future work
This research study focuses on the problem that existing MB discovery algorithms discard many true positives, and designs a new strategy to address this problem, called the two-phase-discovery strategy. Based on this strategy, we propose a novel MB discovery algorithm, SRMB, which is a more accurate and more data-efficient method.
Compared with state-of-the-art algorithms [21], [13], [15], [22], SRMB combines the advantage of simultaneous MB learning algorithms and divide-and-conquer MB learning
CRediT authorship contribution statement
Xingyu Wu: Investigation, Methodology, Software, Writing - original draft. Bingbing Jiang: Investigation, Writing - original draft, Writing - review & editing. Kui Yu: Resources, Writing - review & editing. Huanhuan Chen: Resources, Writing - review & editing, Supervision.
Declaration of Competing Interest
The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: Organizations: University of Science and Technology of China, Hefei, 230027, China. Hangzhou Normal University, Hangzhou, 311121, China. Hefei University of Technology, Hefei, 230601, China.
Acknowledgments
We thank the native speaker Muhammad Usman for help to improve the language. This research is supported in part by the National Natural Science Foundation of China under Grant No. 91746209, 62006065, and 61876206, the Open Project Foundation of Intelligent Information Processing Key Laboratory of Shanxi Province under Grant No. CICIP2020003, the Scientific Research Foundation of HZNU under Grant No. 4115C50220204003, and the Fundamental Research Funds for the Central Universities.
References (48)
- et al.
Towards efficient and effective discovery of markov blankets for feature selection
Information Sciences
(2020) - et al.
Hybridising harmony search with a markov blanket for gene selection problems
Information Sciences
(2014) - et al.
Towards scalable and data efficient learning of Markov boundaries
International Journal of Approximate Reasoning
(2007) - Miguel García-Torres, Francisco Gómez-Vela, Belén Melián-Batista, J. Marcos Moreno-Vega, High-dimensional feature...
- et al.
Exploiting relational tag expansion for dynamic user profile in a tag-aware ranking recommender system
Information Sciences
(2020) Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference
(1998)- et al.
Causality-based feature selection: Methods and evaluations
ACM Computing Surveys
(2020) - Peter Spirtes, Clark N. Glymour, Richard Scheines, David Heckerman, Christopher Meek, Gregory Cooper, Thomas...
- et al.
Local causal discovery of direct causes and effects
- et al.
Causal feature selection
Using Markov blankets for causal structure learning
Journal of Machine Learning Research
A novel scalable and data efficient feature subset selection algorithm
HITON: a novel Markov blanket algorithm for optimal variable selection
Efficient Markov blanket discovery and its application
IEEE Transactions on Cybernetics
Toward optimal feature selection
Bayesian network induction via local neighborhoods
Algorithms for large scale Markov blanket discovery
Speculative Markov blanket discovery for optimal feature selection
Time and sample efficient discovery of Markov blankets and direct causal relations
Fast Markov blanket discovery algorithm via local learning within single pass
Accurate markov boundary discovery for causal feature selection
IEEE Transactions on Cybernetics
Multi-label causal feature selection
Parallel simulated annealing with a greedy algorithm for bayesian network structure learning
IEEE Transactions on Knowledge and Data Engineering
Cited by (24)
Remote Parkinson's disease severity prediction based on causal game feature selection
2024, Expert Systems with ApplicationsLoose-to-strict Markov blanket learning algorithm for feature selection
2024, Knowledge-Based SystemsNonlinear learning method for local causal structures
2024, Information SciencesAdaptive neural decision tree for EEG based emotion recognition
2023, Information SciencesA deep learning approach for subject-dependent & subject-independent emotion recognition using brain signals with dimensional emotion model
2023, Biomedical Signal Processing and ControlEEG based classification of children with learning disabilities using shallow and deep neural network
2023, Biomedical Signal Processing and ControlCitation Excerpt :However, when the features of all the bands are concatenated together it yielded in the highest maximum accuracy of 94.8 % using cubic SVM classifier whereas shallow and deep network obtained highest maximum accuracy of 94.4 % and 91.6 %, respectively. However using the whole feature set may affect the classifier performance due to feature redundancy and also increases the computational time of the classifiers [50,51]. So, the selected top 10, 20 and 30 subset of features were used and we obtained the highest average and maximum accuracy of 95.8 % and 97.5 % respectively using a shallow neural network with top 30 features extracted using reliefF algorithm.