Geometrically interpretable Variance Hyper Rectangle learning for pattern classification
Graphical abstract
Introduction
Machine learning refers to the process of using certain algorithms to take in some known training data to obtain an appropriate model, and using the model to make judgments on new situations. Machine learning technology has been applied to many application fields, especially in the data processing of video, image, voice, text, sensor, Internet behavior, and other related fields. We know that in many cases, the performance of a machine learning model is the most important factor. In the application domains that require a high level of fairness and safety, such as transportation (Mirnig et al., 2018), health care (Vellido, 2020), law (Rudin and Ustun, 2018), finance (Gogas and Papadimitriou, 2021), military (Xue and Tong, 2019, Xue and Tong, 2020) etc., machine learning models are more demanding. They require that the calculation process and results of the model must be interpretable and trustworthy.
In order to meet the needs of these requirements, the machine learning community also provides many highly interpretable learning models for use. The problem now is that though the current inherently interpretable machine learning models are already mature in theory and practice, their performances are gradually getting farther and farther away from the requirements of technological development in the key areas that emphasize interpretability. These problems are : first, the algorithm results are often not good enough, or not stable enough; second, they are unable to cope with big data. When the amount of data is large enough, the algorithm programs may crash.
To alleviate these problems, we propose a new strongly interpretable geometry-based learning model in this paper. The main contributions can be summarized as follows:
- –
A new idea of wrapping data region with geometry is introduced. It is observed that different categories of data are distributed in different regions, as confirmed in the experiments. This makes it possible to implement the wrapping idea and motivate more future work alone the direction.
- –
A new interpretable model is proposed for pattern classification in this paper that utilizes hyper rectangles to wrap data regions—the Variance Hyper Rectangle (VHR) model. The VHR model has strong geometric interpretability, making it much easy to understand and gain more trust from algorithm users.
- –
The VHR model is able to provide a clear range of values for a category of data in each dimension, which can serve as heuristics for further processing. Moreover,it is possible to find out the quantitative characteristic differences between different things, so as to better understand them.
- –
The VHR model naturally supports incremental learning. It is able to construct hyper rectangles for a new category of data in the existing feature space, which contains the already built hyper rectangles of other categories of data.
Organization This paper is arranged as follows. In Section 2, we review the related work. Section 3 introduces the principle of the VHR model proposed in this paper. Section 4 presents the VHR learning and classification algorithms. Section 5 discusses the characteristics and advantages of VHR. In Section 6, a series of experiments are employed to test the performances of the different measures. Section 7 makes the conclusions and provides some suggestions for future work.
Section snippets
Related work
Interpretable machine learning techniques can generally be grouped into two categories: intrinsic interpretability and post-hoc interpretability, depending on the time when the interpretability is obtained.
Principles
In this section, we first introduce some definitions utilized in the paper. Then we present the VHR approach. Table 1 summarizes the notations frequently used throughout the paper.
Method
Now it comes to how to learn hyper rectangles for a data set in the VHR model. The overall framework of VHR is shown in Fig. 2. The first step is to divide the data into several sub-data sets by clustering algorithm so that each subset is convex. the general clustering algorithms can be used in this step, so it will not be discussed in this paper. The second step is to construct a wrapping hyper rectangle for each split sub-data set. After that, the model is established and ready for some
Characteristics and advantages
– Geometric interpretability.
At present, there is no clear definition to quantitatively assess the interpretability of machine learning models. Interpretability analysis from a geometric perspective is not even seen in previous literature. In view of this situation, this paper attempts to qualitatively give the following three levels of geometric interpretability according to the strength of geometric interpretability:
- (1)
Level C: Clear geometric features are used in the calculation principle of
Experiment design and datasets
To prove the effectiveness of the VHR model, we designed a series of experiments. The experiments are mainly carried out from four aspects: (1) the validation of the VHR model; (2) the performance of the VHR-based classification algorithm; (3) the VHR parameter setting; (4) and the VHR performance on real application.
The data used in the experiments are taken from data sets publicly available on the Internet. The basic information of them is listed in Table 3. Except for ORL (Cai, 2021) and
Conclusions and discussions
The experimental results well justifies the effectiveness of the VHR model. That is to say, the hyper rectangles of the VHR model can wrap the data region properly, neither smaller nor larger than the data region. This is the guarantee of good performance.
The biggest advantage of the VHR model is that it has both strong interpretability and good performance. Good performance ensures that VHR produces correct results, while strong interpretability makes the results reliable and trustworthy.
CRediT authorship contribution statement
Jie Sun: Data curation, Investigation, Writing – original draft. Huamao Gu: Conceptualization, Methodology, Writing – review & editing. Haoyu Peng: Software, Validation. Yili Fang: Software. Xun Wang: Supervision.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work was supported in part by Natural Science Foundation of Zhejiang Province of China (grant Grant LY20F030002, LTY21F020001), the National Science Foundation of China (grant number 92046002, 61976188, 61972353, 61976187), the Zhejiang Provincial Basic Public Welfare Research Project, China(grant number LGG20F020006), the Science and Technology Program of Zhejiang Province, China (Key Research and Development Plan, Grant number 2021C01120).
References (46)
- et al.
Integration of deep feature extraction and ensemble learning for outlier detection
Pattern Recognit.
(2019) - et al.
Random forest explainability using counterfactual sets
Inf. Fusion
(2020) - et al.
Fisher’s linear discriminant embedded metric learning
Neurocomputing
(2014) - et al.
Large margin principle in hyperrectangle learning
Neurocomputing
(2014) - et al.
Explaining nonlinear classification decisions with deep Taylor decomposition
Pattern Recognit.
(2017) - et al.
The naive Bayes classifier for functional data
Stat. Probab. Lett.
(2019) - et al.
Peeking inside the black-box: A survey on explainable artificial intelligence (XAI)
IEEE Access
(2018) - et al.
On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation
PLOS ONE
(2015) - et al.
UCI machine learning repository
(2021) - et al.
Prototype selection for interpretable classification
Ann. Appl. Stat.
(2011)
Four face databases in matlab format
Learning explainable decision rules via maximum satisfiability
IEEE Access
Interpretable rule discovery through bilevel optimization of split-rules of nonlinear decision trees for classification problems
IEEE Trans. Cybern.
Semi-supervised SVM with extended hidden features
IEEE Trans. Cybern.
Model class reliance: Variable importance measures for any machine learning model class, from the rashomon perspective
Predictive learning via rule ensembles
Ann. Appl. Stat.
Machine learning in economics and finance
Comput. Econ.
Peeking inside the black box: Visualizing statistical learning with plots of individual conditional expectation
J. Comput. Graph. Stat.
Pdp: An r package for constructing partial dependence plots
R J.
Distilling the knowledge in a neural network
Tuning-free ridge estimators for high-dimensional generalized linear models
Comput. Stat. Data Anal.
Lending club loan data
Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model
Ann. Appl. Stat.
Cited by (1)
A Multiclustering Evolutionary Hyperrectangle-Based Algorithm
2023, International Journal of Computational Intelligence Systems