research-article

Causal Feature Selection in the Presence of Sample Selection Bias

Authors:

Xiaoling Huang,

Tingting Jiang,

Lichuan GuAuthors Info & Claims

ACM Transactions on Intelligent Systems and Technology, Volume 14, Issue 5

Article No.: 78, Pages 1 - 18

https://doi.org/10.1145/3604809

Published: 11 August 2023 Publication History

Abstract

Almost all existing causal feature selection methods are proposed without considering the problem of sample selection bias. However, in practice, as data-gathering process cannot be fully controlled, sample selection bias often occurs, leading to spurious correlations between features and the class variable, which seriously deteriorates the performance of those existing methods. In this article, we study the problem of causal feature selection under sample selection bias and propose a novel Progressive Causal Feature Selection (PCFS) algorithm which has three phases. First, PCFS learns the sample weights to balance the treated group and control group distributions corresponding to each feature for removing spurious correlations. Second, based on the sample weights, PCFS uses a weighted cross-entropy model to estimate the causal effect of each feature and removes some irrelevant features from the confounder set. Third, PCFS progressively repeats the first two phases to remove more irrelevant features and finally obtains a causal feature set. Using synthetic and real-world datasets, the experiments have validated the effectiveness of PCFS, in comparison with several state-of-the-art classical and causal feature selection methods.

References

[1]

Constantin F. Aliferis, Ioannis Tsamardinos, and Alexander R. Statnikov. 2003. HITON: A novel Markov blanket algorithm for optimal variable selection. In Proceedings of the American Medical Informatics Association Annual Symposium.

[2]

Susan Athey, Guido W. Imbens, and Stefan Wager. 2016. Efficient Inference of Average Treatment Effects in High Dimensions via Approximate Residual Balancing. Technical Report.

[3]

Giorgos Borboudakis and Ioannis Tsamardinos. 2019. Forward-backward selection with early dropping. Journal of Machine Learning Research 20, 1 (2019), 276–314.

[4]

Ruichu Cai, Jiawei Chen, Zijian Li, Wei Chen, Keli Zhang, Junjian Ye, Zhuozhang Li, Xiaoyan Yang, and Zhenjie Zhang. 2021. Time series domain adaptation via sparse associative structure alignment. In Proceedings of the 35th AAAI Conference on Artificial Intelligence. 6859–6867.

[5]

Ruichu Cai, Jiahao Li, Zhenjie Zhang, Xiaoyan Yang, and Zhifeng Hao. 2020. DACH: Domain adaptation without domain information. IEEE Transactions on Neural Networks and Learning Systems 31, 12 (2020), 5055–5067.

[6]

Ruichu Cai, Jie Qiao, Zhenjie Zhang, and Zhifeng Hao. 2018. SELF: Structural equational likelihood framework for causal discovery. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 1787–1794.

[7]

Ruichu Cai, Siyu Wu, Jie Qiao, Zhifeng Hao, Keli Zhang, and Xi Zhang. 2022. THPs: Topological hawkes processes for learning causal structure on event sequences. IEEE Transactions on Neural Networks and Learning Systems. (2022), 1–15. DOI:https://doi.org/10.1109/TNNLS.2022.3175622

[8]

Ruichu Cai, Zhenjie Zhang, and Zhifeng Hao. 2011. BASSUM: A Bayesian semi-supervised method for classification feature selection. Pattern Recognition 44, 4 (2011), 811–820.

[9]

Janez Demsar. 2006. Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7 (2006), 1–30.

[10]

Xianjie Guo, Kui Yu, Fuyuan Cao, Pei-Pei Li, and Hao Wang. 2022. Error-aware Markov blanket learning for causal feature selection. Information Sciences 589 (2022), 849–877.

[11]

Kun Kuang, Peng Cui, Susan Athey, Ruoxuan Xiong, and Bo Li. 2018. Stable prediction across unknown environments. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1617–1626.

Digital Library

[12]

Kun Kuang, Peng Cui, Bo Li, Meng Jiang, and Shiqiang Yang. 2017. Estimating treatment effect in the wild via differentiated confounder balancing. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 265–274.

Digital Library

[13]

Kun Kuang, Haotian Wang, Yue Liu, Ruoxuan Xiong, Runze Wu, Weiming Lu, Yue Ting Zhuang, Fei Wu, Peng Cui, and Bo Li. 2023. Stable prediction with leveraging seed variable. IEEE Transactions on Knowledge and Data Engineering 35, 6 (2023), 6392–6404.

[14]

Jundong Li, Kewei Cheng, Suhang Wang, Fred Morstatter, Robert P. Trevino, Jiliang Tang, and Huan Liu. 2018. Feature selection: A data perspective. Computing Surveys 50, 6 (2018), 94:1–94:45.

[15]

Zhaolong Ling, Kui Yu, Hao Wang, Lin Liu, Wei Ding, and Xindong Wu. 2019. BAMB: A balanced Markov blanket discovery approach to feature selection. ACM Transactions on Intelligent Systems and Technology 10, 5 (2019), 52:1–52:25.

[16]

Dimitris Margaritis and Sebastian Thrun. 1999. Bayesian network induction via local neighborhoods. Advances in Neural Information Processing Systems 12 (1999), 505–511.

[17]

Judea Pearl. 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann.

Digital Library

[18]

Karl Pearson. 1901. LIII. On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 2, 11 (1901), 559–572.

[19]

Donald B. Rubin. 1973. Matching to remove bias in observational studies. Biometrics (1973), 159–183.

[20]

Zheyan Shen, Peng Cui, Kun Kuang, Bo Li, and Peixuan Chen. 2018. Causally regularized learning with agnostic data selection bias. In Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference. 411–419.

Digital Library

[21]

Claudia Shi, David M. Blei, and Victor Veitch. 2019. Adapting neural networks for the estimation of treatment effects. In Proceedings of the Advances in Neural Information Processing Systems. 2503–2513.

[22]

Baochen Sun, Jiashi Feng, and Kate Saenko. 2016. Return of frustratingly easy domain adaptation. In Proceedings of the 30th AAAI Conference on Artificial Intelligence. 2058–2065.

[23]

Chang Tang, Xinzhong Zhu, Jiajia Chen, Pichao Wang, Xinwang Liu, and Jie Tian. 2018. Robust graph regularized unsupervised feature selection. Expert Systems with Applications 96 (2018), 64–76.

[24]

Ioannis Tsamardinos and Constantin F. Aliferis. 2003. Towards principled feature selection: Relevancy, filters and wrappers. In Proceedings of the 9th International Workshop on Artificial Intelligence and Statistics.

[25]

Ioannis Tsamardinos, Constantin F. Aliferis, and Alexander Statnikov. 2003. Time and sample efficient discovery of Markov blankets and direct causal relations. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 673–678.

Digital Library

[26]

Hao Wang, Zhaolong Ling, Kui Yu, and Xindong Wu. 2020. Towards efficient and effective discovery of Markov blankets for feature selection. Information Sciences 509 (2020), 227–242.

[27]

Jindong Wang, Wenjie Feng, Yiqiang Chen, Han Yu, Meiyu Huang, and Philip S. Yu. 2018. Visual domain adaptation with manifold embedded distribution alignment. In Proceedings of the ACM Multimedia Conference on Multimedia Conference. 402–410.

Digital Library

[28]

Xingyu Wu, Bingbing Jiang, Kui Yu, Chunyan Miao, and Huanhuan Chen. 2020. Accurate Markov boundary discovery for causal feature selection. IEEE Transactions on Cybernetics 50, 12 (2020), 4983–4996.

[29]

Shuai Yang, Kui Yu, Fuyuan Cao, Lin Liu, Hao Wang, and Jiuyong Li. 2023. Learning causal representations for robust domain adaptation. IEEE Transactions on Knowledge and Data Engineering 35, 3 (2023), 2750–2764.

[30]

Liuyi Yao, Zhixuan Chu, Sheng Li, Yaliang Li, Jing Gao, and Aidong Zhang. 2021. A survey on causal inference. ACM Transactions on Knowledge Discovery from Data 15, 5 (2021), 74:1–74:46.

[31]

Sandeep Yaramakala and Dimitris Margaritis. 2005. Speculative Markov blanket discovery for optimal feature selection. In Proceedings of the 5th IEEE International Conference on Data Mining. 4.

Digital Library

[32]

Kui Yu, Xianjie Guo, Lin Liu, Jiuyong Li, Hao Wang, Zhaolong Ling, and Xindong Wu. 2020. Causality-based feature selection: Methods and evaluations. Computing Surveys 53, 5 (2020), 111:1–111:36.

[33]

Kui Yu, Lin Liu, and Jiuyong Li. 2021. A unified view of causal and non-causal feature selection. ACM Transactions on Knowledge Discovery from Data 15, 4 (2021), 63:1–63:46.

[34]

Kui Yu, Lin Liu, Jiuyong Li, Wei Ding, and Thuc Duy Le. 2020. Multi-source causal feature selection. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 9 (2020), 2240–2256.

[35]

Lei Yu and Huan Liu. 2004. Efficient feature selection via analysis of relevance and redundancy. Journal of Machine Learning Research 5 (2004), 1205–1224.

Cited By

Guo XYu KLiu LLi JWooldridge MDy JNatarajan S(2024)FedCSLProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i11.29113(12235-12243)Online publication date: 20-Feb-2024
https://dl.acm.org/doi/10.1609/aaai.v38i11.29113
Jin XLi NKong WTang JYang B(2024)Unbiased Semantic Representation Learning Based on Causal Disentanglement for Domain GeneralizationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365995320:8(1-20)Online publication date: 12-Jun-2024
https://dl.acm.org/doi/10.1145/3659953
Zhu YYi JXie JChen Z(2024)Deep Causal Reasoning for RecommendationsACM Transactions on Intelligent Systems and Technology10.1145/365398515:4(1-25)Online publication date: 18-Jun-2024
https://doi.org/10.1145/3653985
Show More Cited By

Index Terms

Causal Feature Selection in the Presence of Sample Selection Bias
1. Computing methodologies
  1. Machine learning
    1. Machine learning algorithms
      1. Feature selection

Recommendations

A Unified View of Causal and Non-causal Feature Selection

In this article, we aim to develop a unified view of causal and non-causal feature selection methods. The unified view will fill in the gap in the research of the relation between the two types of methods. Based on the Bayesian network framework and ...
Causal Feature Selection with Missing Data
Causal feature selection aims at learning the Markov blanket (MB) of a class variable for feature selection. The MB of a class variable implies the local causal structure among the class variable and its MB and all other features are probabilistically ...
Incremental feature selection by sample selection and feature-based accelerator▪
Abstract
Incremental feature selection is an efficient paradigm that updates an optimal feature subset from added-in data without forgetting the previously learned knowledge. Most existing studies of rough set-based incremental feature ...
Highlights
- A new feature selection framework is proposed based on discernibility score.
- ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Intelligent Systems and Technology

ACM Transactions on Intelligent Systems and Technology Volume 14, Issue 5

October 2023

472 pages

ISSN:2157-6904

EISSN:2157-6912

DOI:10.1145/3615589

Editor:
Huan Liu
Arizona State University, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 August 2023

Online AM: 17 June 2023

Accepted: 05 June 2023

Revised: 05 May 2023

Received: 18 November 2022

Published in TIST Volume 14, Issue 5

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Key Research and Development Program of China
National Natural Science Foundation of China
University Synergy Innovation Program of Anhui Province

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
659
Total Downloads

Downloads (Last 12 months)253
Downloads (Last 6 weeks)12

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Guo XYu KLiu LLi JWooldridge MDy JNatarajan S(2024)FedCSLProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i11.29113(12235-12243)Online publication date: 20-Feb-2024
https://dl.acm.org/doi/10.1609/aaai.v38i11.29113
Jin XLi NKong WTang JYang B(2024)Unbiased Semantic Representation Learning Based on Causal Disentanglement for Domain GeneralizationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365995320:8(1-20)Online publication date: 12-Jun-2024
https://dl.acm.org/doi/10.1145/3659953
Zhu YYi JXie JChen Z(2024)Deep Causal Reasoning for RecommendationsACM Transactions on Intelligent Systems and Technology10.1145/365398515:4(1-25)Online publication date: 18-Jun-2024
https://doi.org/10.1145/3653985
Chen CLee CHuang SPeng W(2024)Credit Card Fraud Detection via Intelligent Sampling and Self-supervised LearningACM Transactions on Intelligent Systems and Technology10.1145/364128315:2(1-29)Online publication date: 28-Mar-2024
https://dl.acm.org/doi/10.1145/3641283
Liu JChen CLee CHuang S(2024)Evolving Knowledge Graph Representation Learning with Multiple Attention Strategies for Citation Recommendation SystemACM Transactions on Intelligent Systems and Technology10.1145/363527315:2(1-26)Online publication date: 28-Mar-2024
https://dl.acm.org/doi/10.1145/3635273
Liu HShi QCai YWang NZhang LLiu D(2024)Fast Shrinking parents-children learning for Markov blanket-based feature selectionInternational Journal of Machine Learning and Cybernetics10.1007/s13042-024-02108-415:8(3553-3566)Online publication date: 7-Mar-2024
https://doi.org/10.1007/s13042-024-02108-4
Xue ZLu HZhang TGuo XGao L(2023)Remote assessment of Parkinson’s disease symptom severity based on group interaction feature assistanceInternational Journal of Machine Learning and Cybernetics10.1007/s13042-023-02050-x15:7(2595-2618)Online publication date: 21-Dec-2023
https://doi.org/10.1007/s13042-023-02050-x

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents