ABSTRACT
Machine learning methods for fraud detection have achieved impressive prediction performance, but often sacrifice critical interpretability in many applications. In this work, we propose to learn interpretable models for fraud detection as a simple rule set. More specifically, we design a novel neural rule learning method by building a condition graph with an expectation to capture the high-order feature interactions. Each path in this condition graph can be regarded as a single rule. Inspired by the key idea of meta learning, we combine the neural rules with rules extracted from the tree-based models in order to provide generalizable rule candidates. Finally, we propose a flexible rule set learning framework by designing a greedy optimization method towards maximizing the recall number of fraud samples with a predefined criterion as the cost. We conduct comprehensive experiments on large-scale industrial datasets. Interestingly, we find that the neural rules and rules extracted from tree-based models can be complementary to each other to improve the prediction performance.
- Leo Breiman. 2001. Random forests. Machine learning, Vol. 45, 1 (2001), 5--32.Google ScholarDigital Library
- Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 785--794.Google ScholarDigital Library
- Peter Clark and Tim Niblett. 1989. The CN2 induction algorithm. Machine learning, Vol. 3, 4 (1989), 261--283.Google Scholar
- William W Cohen. 1995. Fast effective rule induction. In Machine learning proceedings 1995. Elsevier, 115--123.Google ScholarDigital Library
- Sanjeeb Dash, Oktay Gunluk, and Dennis Wei. 2018. Boolean decision rules via column generation. Advances in neural information processing systems, Vol. 31 (2018).Google Scholar
- Waleed Hilal, S Andrew Gadsden, and John Yawney. 2021. A Review of Anomaly Detection Techniques and Applications in Financial Fraud. Expert Systems with Applications (2021), 116429.Google Scholar
- Stephen H. Bach Himabindu Lakkaraju and Jure Leskovec. 2016. Interpretable decision sets: A joint framework for description and prediction. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1675--1684.Google Scholar
- Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems, Vol. 30 (2017).Google Scholar
- Himabindu Lakkaraju, Stephen H Bach, and Jure Leskovec. 2016. Interpretable decision sets: A joint framework for description and prediction. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 1675--1684.Google ScholarDigital Library
- Wenmin Li, Jiawei Han, and Jian Pei. 2001. CMAR: Accurate and efficient classification based on multiple class-association rules. In Proceedings 2001 IEEE international conference on data mining. IEEE, 369--376.Google Scholar
- Litao Qiao. 2020. Learning Accurate and Interpretable Decision Rule Sets from Neural Networks. In 35th AAAI Conference on Artificial Intelligence.Google Scholar
- J Ross Quinlan and R Mike Cameron-Jones. 1993. FOIL: A midterm report. In European conference on machine learning. 1--20.Google ScholarDigital Library
- Juan Ramos. 2003. Using tf-idf to determine word relevance in document queries. In Proceedings of the first instructional conference on machine learning, Vol. 242. 29--48.Google Scholar
- Juan Ramos et al. 2018. Graph attention networks. In Proceedings of the 6th International Conference on Learning Representations.Google Scholar
- Cynthia Rudin. 2019. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, Vol. 1, 5 (2019), 206--215.Google ScholarCross Ref
- Cynthia Rudin, Chaofan Chen, Zhi Chen, Haiyang Huang, Lesia Semenova, and Chudi Zhong. 2022. Interpretable machine learning: Fundamental principles and 10 grand challenges. Statistics Surveys, Vol. 16 (2022), 1--85.Google ScholarCross Ref
- Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S Yu, and Tianyi Wu. 2011. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. Proceedings of the VLDB Endowment, Vol. 4, 11 (2011), 992--1003.Google ScholarDigital Library
- Finale Doshi-Velez Yimin Liu Erica Klampfl Tong Wang, Cynthia Rudin and Perry MacNeille. 2017. A bayesian framework for learning rule sets for interpretable classification. The Journal of Machine Learning Research, Vol. 18, 1 (2017), 2357--2393.Google Scholar
- Tong Wang, Cynthia Rudin, Finale Doshi-Velez, Yimin Liu, Erica Klampfl, and Perry MacNeille. 2017. A bayesian framework for learning rule sets for interpretable classification. The Journal of Machine Learning Research, Vol. 18, 1 (2017), 2357--2393.Google ScholarCross Ref
- Zhuo Wang, Wei Zhang, Ning Liu, and Jianyong Wang. 2021. Scalable Rule-Based Representation Learning for Interpretable Classification. In Advances in Neural Information Processing Systems.Google Scholar
- Fan Yang, Kai He, Linxiao Yang, Hongxia Du, Jingbang Yang, Bo Yang, and Liang Sun. 2021. Learning Interpretable Decision Rule Sets: A Submodular Optimization Approach. Advances in Neural Information Processing Systems, Vol. 34 (2021).Google Scholar
- Xiaoxin Yin and Jiawei Han. 2003. CPAR: Classification based on predictive association rules. In Proceedings of the 2003 SIAM international conference on data mining. SIAM, 331--335.Google ScholarCross Ref
Index Terms
- MetaRule: A Meta-path Guided Ensemble Rule Set Learning for Explainable Fraud Detection
Recommendations
An Adaptive Framework for Confidence-constraint Rule Set Learning Algorithm in Large Dataset
CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge ManagementDecision rules have been successfully used in various classification applications because of their interpretability and efficiency. In many real-world scenarios, especially in industrial applications, it is necessary to generate rule sets under certain ...
Sequential covering rule induction algorithm for variable consistency rough set approaches
We present a general rule induction algorithm based on sequential covering, suitable for variable consistency rough set approaches. This algorithm, called VC-DomLEM, can be used for both ordered and non-ordered data. In the case of ordered data, the ...
Meta-interpretive learning as metarule specialisation
AbstractIn Meta-interpretive learning (MIL) the metarules, second-order datalog clauses acting as inductive bias, are manually defined by the user. In this work we show that second-order metarules for MIL can be learned by MIL. We define a generality ...
Comments