Abstract
A Subgroup Discovery algorithms is usually considered better than other method if the average of all its mined subgroups is higher, with respect to some predefined quality measures. This process has some drawbacks: it ignores the redundancy in mined patterns and it might hide important differences among algorithms that return subgroup sets with the same averaged value. In this paper, we propose a new method to evaluate and compare subgroup discovery algorithms. This method starts by removing redundancy using a novel procedure based on the examples covered by the patterns and the statistical redundancy between them. Then, a new similarity and quality methods is used to compared the algorithms based on their ability to detect the patterns and the quality of the mined patterns, respectively. The experimental results obtained show some interesting results that would be unnoticed by the traditional approach.
You have full access to this open access chapter, Download conference paper PDF
1 Introduction
Subgroup Discovery (SD) [1, 9, 16] is a pattern recognition task that identifies subsets descriptions of a dataset that show different behavior with respect to certain interestingness criteria. SD searches local pattern generally in the form of rules, where the body has constrains applied to data and the head represents the best supported class. Different approaches have been presented to discovery subgroup sets [3, 8].
Most of these papers evaluate their proposals with an experimental study that only uses the estimation of some quality measures through a 10-fold cross-validation. To summarize the results, the average for all subgroups mined in each fold are computed and then the average of the results obtained by all partition is calculated. Notice that the average is highly sensitive to extreme values, and it might hide important differences between miners that returns the same averaged value. Common SD quality measures are: unusualness or weighted relative accuracy, sensitivity or recall, and confidence or precision [5].
In typical comparison methodology, subgroup’s redundancy is ignored, even when it introduces errors on the computation of the average metrics. Subgroups are represented as patterns, and the redundant ones may present similar quality when they cover overlapped sets of instances in a dataset. On the other hand, this methodology does not consider the individual quality of the subgroups mined and the similarity between the sets obtained by the different algorithms.
This paper proposes a new method to evaluate and compare SD algorithms taking into account the redundancy, quality and similarity between the subgroup sets obtained by them. To do so, we first apply a novel algorithm to remove the redundant subgroups, which is based on the examples covered by the patterns and the statistical redundancy between them. Then, the similarity and quality procedures proposed can be applied over the subgroups obtained in the previous step. The quality evaluation procedure allows to consider the quality distribution of the subgroup set. Also, the similarity approach allows the user to select the algorithm that provide more different information of the dataset. Finally, different graphics for each method are presented to improve the comprehensibility of the results.
2 Subgroup Discovery: Redundancy and Comparison
The main objective of SD task is to identify interesting group of individuals, where interestingness is defined as a distributional unusualness with respect to a certain property of interest. In most SD algorithms, a set of the best qualified subgroups are provided, where its quality is defined as the mean values of the measures obtained by all the subgroups mined [6]. So, this typical comparison methodology involves the summarization of the results using average, and then the subgroup set with the highest values is selected as the best ones [1, 7, 8].
This comparison approach has some drawbacks related with the natures of the average. It is known that the average is affected by outliers and data that not follow a central distribution. For instance, let us consider two subgroup sets A and B, where A has almost all subgroups with high quality except a few of them and most of the subgroups from B present lower quality that the best ones from A. For this reason, the average of the quality measure obtained by B would be higher than the one obtained by A. Then, using the traditional methodology the B set would be better than A, even when A present more subgroups with better quality than B. The distribution of the subgroups among the quality measure domain is also important to the comparison of subgroup sets.
Other major problem that affect not only the average measure calculation, but the comprehensibility of the results is redundancy. Dependencies between the non-target attributes lead to large numbers of variations of a particular subgroup. Since many descriptions can have a similar coverage of the given data, this may lead to many redundant evaluations of the quality function and to a subgroup set that contains multiple descriptions of the same subpopulation. Moreover, the average of measures of a subgroup set with high redundancy level can be affected, as is shown in Fig. 1. So, the redundancy is an important factor to take into account in subgroup sets comparison [2, 10].
Redundant subgroups are those that cover a subset or a similar set of data records of some other subgroups [10]. Several approaches have been presented to detect and remove redundancy. Li et al. in [12] propose an interesting approach to detect and prune the redundant subgroups using an heuristic search and the error bounds of the OddsRatio measure [11]. In [2] a closure system is used to represent a subgroup by its coverage of a dataset. Van Leeuwen et al. [10] propose some selection strategies in order to eliminate redundancy in heuristic search algorithms. In general, these proposals employ one of two search space for redundancy detection: description or coverage space. The first one is more efficient but less precise that the second one.
3 A New Method to Evaluate Subgroups Discovery Algorithms
In this section, we present a new method to evaluate and compare SD algorithms, analyzing redundancy, quality and similarity of the subgroups obtained by different approaches. First, this method removes the redundant subgroups using a novel procedure based on the examples covered by the patterns and the statistical redundancy between them. Then, the quality and similarity procedures can be applied over the subgroups obtained in the first step. All their characteristics are presented in detail in the following.
Evaluating and Removing Redundancy
We propose a new procedure to identify whether two patterns are redundant using the follows properties: the ratio of examples covered by the two patterns and the statistical redundancy between them. We use the covered example ratio presented in [13], which represents the maximum percentage of covered examples by the two patterns regarding the examples covered for each pattern. If this ratio is higher than a threshold value \(CovRat_{min}\), then these patterns would appear to provide us with similar information of the search space. However, these patterns could be describing different class distribution that can be interesting for the users. Because of this, the statistical redundancy proposed in [12] is also calculated, which is based on the confidence intervals of OddsRatio. If the confidence intervals of the OddsRatio of the patterns overlap, then they are redundant.
The ratio of examples covered takes values in the range [0,1], where values close to 0 show that the rules cover a few common examples and values close to 1 that the rules cover almost the same examples. Notice that the \(CovRat_{min}\) threshold allows the user to determine the overlap degree of the compared subgroups. This is defined as \(CovRat(P_1,P_2) = MAX \left[ \tfrac{cov(P_1\,\wedge \,P_2)}{cov(P_1)} ,\tfrac{cov(P_1\,\wedge \,P_2)}{cov(P_2)}\right] \), where \(CovRat(P_1 P_2)\) represents the number of common examples covered by both subgroups \(P_1\) and \(P_2\), and \(CovRat(P_1)\) and \(CovRat(P_2)\) represent the number of examples covered by \(P_1\) and \(P_2\), respectively.
The OddsRatio of subgroup P is defined as \(OR(P) = \tfrac{TP\,*\,TN}{FP\,*\,FN}\), where TP, FP, FN, TN are the terms of the contingency table. The confidence interval of the OddsRatio is calculated as \(\left[ OR(P)e^{-w},OR(P)e^w\right] \), where \(w = z_{\alpha /2}*\sqrt{\tfrac{1}{TP}+\tfrac{1}{FP}+\tfrac{1}{FN}+\tfrac{1}{TN}}\). The critical value of the confidence interval for a 95% confidence is \(z_{\alpha /2}=1,96\).
Notice that, algorithms with a large percentage of redundant subgroups are less efficient and the subgroup set mined bring information that can potentially decrease the user ability to understand the results. To analyze the redundancy detected by this method, we propose to use a bar chart graphic that shows the percentage of redundant subgroups obtained by the algorithms considered on each dataset as can be seen in Fig. 2(a).
Similarity Between Mined Subgroup Sets
Similarity of subgroup sets can be defined by the amount of commons or similar subgroups between the sets obtained by the algorithms analyzed, where two patterns are common when they are redundant. To do so, all pattern mined by both algorithms are added to a pool. Then, the similar subgroups are identified using the method presented previously. In this way, we can obtain the set of common patterns and the subgroup set obtained only by each one of the algorithms analyzed. Notice that, the subgroup sets obtained on the same dataset partition are the ones to be considered in this method.
To better analyze this results, we propose to use a stacked bar graphic for each pair of algorithms in all the datasets as can be seen in Fig. 2(b). This figure shows a similarity comparison between alg1 and alg2, each bars represent the total number (100%) of patterns founded by both algorithms by dataset. The gray color represents the proportion of common subgroups found, and the black and white color are the ones obtained only by alg1 and alg2 respectively. We can see how the alg1 and alg2 obtain very similar results for the db2 dataset since the percentage of commons patterns is large. Moreover, it can be seen how alg1 extracts more information from the db2 dataset that alg2 since it can obtain all the common subgroups and it gets more different subgroups than the ones obtained by alg2. Notice that, this similarity analysis allows the user to select the algorithm that provide more different information of the dataset.
Comparing the Quality of the Mined Subgroups
The quality of a subgroup is defined by the values of different quality measures proposed in the literature as confidence, sensitivity and unusualness. For all of these measures the highest values are the better ones. Then, we can divide the range of the values obtained in a number of intervals \(N_{Interv}\) to identify the quality of a subgroup depending on the interval it belongs to. The \(N_{Interv}\) and its limits are determined by the user. In this work, we empirically set \(N_{Interv} = 3\) to identify the lowest, middle and highest quality intervals. The range of the values is divided into three equal parts to define the intervals limits. Then, the quality of a subgroup set can be also analyzed considering the percentage of patterns from each of these quality intervals, which allow us to consider the quality distribution of the subgroup set. Finally, the lower and upper bound of the range of values are defined by the minimum and maximum value found from all the patterns obtained by the algorithms analyzed.
To represent the results of this method, we employ a composite graphic like the one in Fig. 2(c). This comparison is pairwise, so the representation employs two bar graphics that have to be interpreted as a whole, since the intervals boundaries for both algorithms are determined by the conjunction of its results. Each subgraphic represents the results obtained by each algorithm, where each bar show the percentage of pattern that belong to each quality interval by dataset. The black, gray and white colors represents the lowest, middle and highest intervals of the domain of metrics respectively. The higher the percentage of patterns in the upper interval, the better is the subgroup set. Figure 2(c) shows how alg2 has more subgroups with high quality than alg1.
4 Experimental Validation
To validate the new evaluation method, we compare 3 well known SD algorithms: SD-map [1], Apriori-SD [8] and NMEEF-SD [3]. We have considered the follows 20 datasets from the UCI Repository of machine learning databases [4]: Appendicitis, Australian,Balance,Brest Cancer,Bridges, Bupa, Cleveland, Diabetes, Echo, German, Glass, Haberman, Heart, Hepatitis, Ionosphere, Iris, Led, Primary Tumor, Vehicle, Wine. The parameters of the analyzed algorithms are presented in Table 1. These parameters were selected using the recommendations of the authors. Apriori-SD and SD-map implementation don’t allow continuous variables, so a ID3 [15] discretization was applied. To develop the different experiments, the parameters of our proposal are defined as \(CovRat_{min} = 0.75\) and \(N_{Interv} = 3\). We consider the average results 10-fold cross-validation. In addition, as NMEEF-SD is stochastic, three runs are performed.
In these experiments, first we apply the traditional methodology to evaluate the algorithms analyzed. Then, we use the new evaluation methods to show how they can improve the quality of the comparison, providing more information about it. In this study, we show a pairwise comparison between the algorithms considered. To apply the traditional methodology, we statically compare the average of the quality measures obtained by the algorithms in all datasets. We analyzed these results considering all the subgroups discovered and the ones found when the redundant subgroups are removed. We have used a Wilcoxon’s test [14] with a level of significance of 0.05, Table 2 shows the results obtained.
The redundancy analysis also is shown in Fig. 3, where the percentage of redundant subgroups found are described for each dataset. The similarity between the algorithms studied is presented in Fig. 4. To consider the quality distribution of the subgroup sets mined using different quality intervals, we show in Fig. 5 the percentage of patterns for each algorithm than belong to the low, middle and high quality interval.
We can draw the following conclusions based on an analysis of the results presented of the pairwise comparison between the algorithms considered:
-
SD-map vs Apriori-SD: We can see from Fig. 3 how Apriori-SD obtain less redundant subgroups than SD-map. The similarity analysis presented in Fig. 4(a) shows that Apriori-SD discovers more knowledge than SD-map since it can obtain more than 50% of the total number of subgroups mined by both methods in most of the datasets. The statistical analysis shows that there are not significant difference between the average results of the confidence and unusualness measures. However, Fig. 5 shows how more subgroups mined by Apriori-SD obtain better values for these measures than the ones obtained by SD-map. So, Apriori-SD can be considered better than SD-map because it provides more diverse knowledge with better quality in most of the measures considered.
-
SD-map vs NMEEF-SD: The redundancy analysis shows that NMEEF-SD mines more redundant subgroups than SD-map. Table 2 shows how the statistical results between these two algorithms change when the redundant subgroups are removed. Moreover, if we analyze only the test’s results once the redundant subgroups are removed, we can see how the difference for the measures unusualness and sensibility are not significant. However, most of the subgroups obtained by NMEEF-SD get higher values for these measures than the ones mined by SD-map, as can be seen in Fig. 5. The confidence values obtained by SD-map are better than the values obtained by NMEEF-SD as the statistical results and Fig. 5 show. The similarity analysis shows how the subgroups obtained are very different, having few subgroups in common. Finally, both algorithms provide different knowledge to the users, where the subgroups mined by NMEEF-SD present better values for the measures unusualness and sensibility, who are more interesting in the SD task.
-
Apriori-SD vs NMEEF-SD: The statistical comparison between these algorithms don’t change when the redundant subgroups are removed, being NMEEF-SD better than Apriori-SD for unusualness and sensibility measure as is shown in Table 2. Figure 5 also shows that most of the subgroups mined by NMEEF-SD belong to the highest quality interval. These algorithms present low similarity between them. NMEEF-SD can be considered better than Apriori-SD because it provides more diverse knowledge with better quality in most of the measures considered.
5 Conclusions
In this paper, we propose a new method to evaluate and compare SD algorithms considering redundancy, quality and similarity between the subgroup sets obtained by them. First, this method removes the redundant subgroups using a novel procedure based on the examples covered by the patterns and the statistical redundancy between them. Then, despite previous researches, which estimate the quality of the algorithms using the average of the quality measures obtained from 10 fold cross validation, we perform a paired comparison between subgroup set obtained from the same chunk of the dataset to determine the quality distribution of the subgroup sets mined. Moreover, the method proposed can also determines how much a subgroup set is similar to another using a new procedure which gets the common subgroups between the sets obtained by the algorithms analyzed, where two patterns are common when they are redundant. Finally, the experimental validation shows how our proposal and its associated graphics can provide more useful information to the users in order to select the best algorithm for their SD’s problems.
References
Atzmueller, M., Puppe, F.: SD-map – a fast algorithm for exhaustive subgroup discovery. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 6–17. Springer, Heidelberg (2006). https://doi.org/10.1007/11871637_6
Boley, M., Grosskreutz, H.: Non-redundant subgroup discovery using a closure system. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009. LNCS (LNAI), vol. 5781, pp. 179–194. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04180-8_29
Carmona, C.J., González, P., del Jesus, M.J., Herrera, F.: NMEEF-SD: non-dominated multiobjective evolutionary algorithm for extracting fuzzy rules in subgroup discovery. IEEE Trans. Fuzzy Syst. 18(5), 958–970 (2010)
Dheeru, D., Karra Taniskidou, E.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
García-Borroto, M., Loyola-González, O., Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A.: Evaluation of quality measures for contrast patterns by using unseen objects. Expert Syst. Appl. 83, 104–113 (2017)
Grosskreutz, H., Rüping, S.: On subgroup discovery in numerical domains. Data Mining Knowl. Discov. 19(2), 210–226 (2009)
del Jesus, M., Gonzalez, P., Herrera, F.: Multiobjective genetic algorithm for extracting subgroup discovery fuzzy rules. In: IEEE Symposium on Computational Intelligence in Multicriteria Decision Making (2007)
Kavsek, B., Lavrac, N.: APRIORI-SD: adapting association rule learning to subgroup discovery. Appl. Artif. Intell. 20(7), 543–583 (2006)
Klösgen, W.: Explora: a multipattern and multistrategy discovery assistant. In: Advances in Knowledge Discovery and Data Mining, pp. 249–271. American Association for Artificial Intelligence (1996)
van Leeuwen, M., Knobbe, A.: Diverse subgroup set discovery. Data Mining Knowl. Discov. 25(2), 208–242 (2012)
Li, H., Li, J., Wong, L., Feng, M., Tan, Y.P.: Relative risk and odds ratio: a data mining perspective. In: Proceedings of the Twenty-Fourth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 368–377. ACM (2005)
Li, J., Liu, J., Toivonen, H., Satou, K., Sun, Y., Sun, B.: Discovering statistically non-redundant subgroups. Knowl.-Based Syst. 67, 315–327 (2014)
Martín, D., Alcalá-Fdez, J., Rosete, A., Herrera, F.: NICGAR: a niching genetic algorithm to mine a diverse set of interesting quantitative association rules. Inf. Sci. 355, 208–228 (2016)
Quinlan, J.: Induction of decision trees. Mach. Learn. 1, 81–106 (1986)
Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics 1, 80–83 (1945)
Wrobel, S.: An algorithm for multi-relational discovery of subgroups. In: Komorowski, J., Zytkow, J. (eds.) PKDD 1997. LNCS, vol. 1263, pp. 78–87. Springer, Heidelberg (1997). https://doi.org/10.1007/3-540-63223-9_108
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Bravo Ilisástigui, L., Martín Rodríguez, D., García-Borroto, M. (2019). A New Method to Evaluate Subgroup Discovery Algorithms. In: Nyström, I., Hernández Heredia, Y., Milián Núñez, V. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2019. Lecture Notes in Computer Science(), vol 11896. Springer, Cham. https://doi.org/10.1007/978-3-030-33904-3_39
Download citation
DOI: https://doi.org/10.1007/978-3-030-33904-3_39
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33903-6
Online ISBN: 978-3-030-33904-3
eBook Packages: Computer ScienceComputer Science (R0)