Abstract
Subgroup discovery is a key data mining method that aims at identifying descriptions of subsets of the data that show an interesting distribution with respect to a pre-defined target concept. For practical applications the integration of numerical data is crucial. Therefore, a wide variety of interestingness measures has been proposed in literature that use a numerical attribute as the target concept. However, efficient mining in this setting is still an open issue. In this paper, we present novel techniques for fast exhaustive subgroup discovery with a numerical target concept. We initially survey previously proposed measures in this setting. Then, we explore options for pruning the search space using optimistic estimate bounds. Specifically, we introduce novel bounds in closed form and ordering-based bounds as a new technique to derive estimates for several types of interestingness measures with no previously known bounds. In addition, we investigate efficient data structures, namely adapted FP-trees and bitset-based data representations, and discuss their interdependencies to interestingness measures and pruning schemes. The presented techniques are incorporated into two novel algorithms. Finally, the benefits of the proposed pruning bounds and algorithms are assessed and compared in an extensive experimental evaluation on 24 publicly available datasets. The novel algorithms reduce runtimes consistently by more than one order of magnitude.


Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Available at www.vikamine.org.
References
Alcala-Fernandez J, Fernandez A, Luengo J, Derrac J, Garcia S, Sanchez L, Herrera F (2011) KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Mult Valued Logic Soft Comput 17(2–3):255–287
Atzmueller M (2015) Subgroup discovery—advanced review. WIREs Data Mining Knowl Discov 5(1):35–49
Atzmueller M, Lemmerich F (2009) Fast subgroup discovery for continuous target concepts. In: Proceedings of the 18th international symposium on foundations of intelligent systems (ISMIS), p 35–44
Atzmueller M, Lemmerich F (2012) VIKAMINE—Open-source subgroup discovery, pattern mining, and analytics. In: Proceedings of the European conference on machine learning and knowledge discovery in databases (ECML/PKDD), p 842–845
Atzmueller M, Lemmerich F (2013) Exploratory pattern mining on social media using geo-references and social tagging information. Int J Web Sci 2(1–2):80–112
Atzmueller M, Lemmerich F, Krause B, Hotho A (2009) Who are the spammers? Understandable local patterns for concept description. In: Proceedings of the 7th conference on computer methods and systems
Atzmueller M, Mueller J, Becker M (2015) Exploratory subgroup analytics on ubiquitous data. In: Atzmueller A, Chin A, Scholz C, Trattner C (Ed.), Mining, modeling and recommending ’things’ in social media, p 1–20. Springer
Atzmueller M, Puppe F (2006) SD-Map—a fast algorithm for exhaustive subgroup discovery. In: Proceedings of the 10th European conference on principles and practice of knowledge discovery in databases (PKDD), p 6–17
Atzmueller M, Puupe F (2009) A knowledge-intensive approach for semi-automatic causal subgroup discovery. In: Berendt B et al (eds) Knowledge discovery enhanced with semantic and social information, vol 220. Springer, Berlin, pp 19–36
Aumann Y, Lindell Y (1999) A statistical theory for quantitative association rules. In: Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), p 261–270
Aumann Y, Lindell Y (2003) A statistical theory for quantitative association rules. J Intell Inf Syst 20(3):255–283
Batal I, Hauskrecht M (2010) A concise representation of association rules using minimal predictive rules. In: Proceedings of the 2010 European conference on machine learning and knowledge discovery in databases (ECML/PKDD), p 87–102
Bay SD, Pazzani MJ (2001) Detecting group differences: mining contrast sets. Data Min Knowl Discov 5(3):213–246
Bayardo RJ (1998) Efficiently mining long patterns from databases. In: Proceedings of the 1998 ACM SIGMOD international conference on management of data, p 85–93
Bayardo RJ, Agrawal R, Gunopulos D (1999) Constraint-based rule mining in large, dense databases. Data Min Knowl Discov 4(2–3):217–240
Box GEP (1953) Non-normality and tests on variances. Biometrika 40:318–335
Breiman L, Friedman JH, Stone CJ, Olshen RA (1984) Classification and regression trees. Chapman & Hall, Boca Raton
Brin S, Rastogi R, Shim K (2003) Mining optimized gain rules for numeric attributes. IEEE Trans Knowl Data Eng 15(2):324–338
Cheng H, Yan X, Han J, Yu PS (2008) Direct discriminative pattern mining for effective classification. In: Proceedings of the 24th international conference on data engineering (ICDE), p 169–178
Dong G, Li J (1999) Efficient mining of emerging patterns: discovering trends and differences. In: Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), p 43–52
Duivesteijn W, Knobbe AJ, Feelders A, van Leeuwen M (2010) Subgroup discovery meets bayesian networks—an exceptional model mining approach. In: Proceedings of the 10th international conference on data mining (ICDM), p 158–167
El-Qawasmeh E (2003) Beating the popcount. Int J Inf Technol 9(1):1–18
Fawcett T (2006) An introduction to ROC analysis. Pattern Recognit Lett 27(8):861–874
Fayyad UM, Irani KB (1993) Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of the 13th international joint conference on artificial intelligence (IJCAI), p 1022–1027
Freidlin B, Gastwirth JL (2000) Should the median test be retired from general use? Am Stat 54(3):161–164
Fukuda T, Morimoto Y, Morishita S, Tokuyama T (1996) Mining optimized association rules for numeric attributes. In: Proceedings of the 15th ACM symposium on principles of database systems (PODS), p 182–191
García S, Luengo J, Saez JA, Lopez V, Herrera F (2013) A survey of discretization techniques: taxonomy and empirical analysis in supervised learning. IEEE Trans Knowl Data Eng 25(4):734–750
Geng L, Hamilton HJ (2006) Interestingness measures for data mining: a survey. ACM Comput Surv 38(3):9
Grosskreutz H (2008) Cascaded subgroups discovery with an application to regression. In: From local patterns to global models, workshop at the ECML/PKDD, p 275–286
Grosskreutz H, Rüping S (2009) On subgroup discovery in numerical domains. Data Min Knowl Discov 19(2):210–226
Grosskreutz H, Rüping S, Wrobel S (2008) Tight optimistic estimates for fast subgroup discovery. In: Proceedings of the 2008 European conference on machine learning and knowledge discovery in databases (ECML/PKDD), p 440–456
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. ACM SIGMOD Rec 29(2):1–12
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Discov 8(1):53–87
Hart PE, Nilsson NJ, Raphael B (1968) A formal basis for the heuristic determination of minimum cost paths. IEEE Trans Syst Sci Cybernet 4(2):100–107
Jorge AM, Azevedo PJ, Pereira F (2006) Distribution rules with numeric attributes of interest. In: Proceedings of the 10th European conference on principles and practice of knowledge discovery in databases (PKDD), p 247–258
Kavšek B, Lavrač N (2006) Apriori-SD: adapting association rule learning to subgroup discovery. Appl Artif Intell 20:543–583
Klösgen W (1994) Exploration of simulation experiments by discovery. Technical Report WS-04-03
Klösgen W (1995) Efficient discovery of interesting statements in databases. J Intell Inf Syst 4(1):53–69
Klösgen W (1996) Explora: a multipattern and multistrategy discovery assistant. In: Fayyad U-M, Piatetsky-Shapiro G, Smyth P, Uthurusamy R (eds) Advances in knowledge discovery and data mining. MIT Press, Cambridge, pp 249–271
Klösgen W (2002) Data mining tasks and methods: subgroup discovery: deviation analysis. In: Klösgen W, Zytkow JM (ed), Handbook of Data Mining and Knowledge Discovery, p 354–361
Klösgen W, May M (2002) Census data mining—an application. In: Proceedings of the 6th European conference on principles and practice of knowledge discovery in databases (PKDD)
Kotsiantis S, Kanellopoulos D (2006) Discretization techniques: a recent survey. GESTS Int Trans Comput Sci Eng 32(1):47–58
Kralj Novak P, Lavrač N, Webb GI (2009) Supervised descriptive rule discovery: a unifying survey of contrast set, emerging pattern and subgroup mining. J Mach Learn Res 10:377–403
Lavrač N, Kavšek B, Flach PA, Todorovski L (2004) Subgroup discovery with CN2-SD. J Mach Learn Res 5:153–188
Leman D, Feelders A, Knobbe AJ (2008) Exceptional model mining. In: Proceedings of the European conference on machine learning and knowledge discovery in databases (ECML/PKDD), p 1–16
Lemmerich F (2014) Novel techniques for efficient and effective subgroup discovery. PhD thesis, Universität Würzburg
Lemmerich F, Atzmueller M (2012) Describing locations using tags and images: explorative pattern mining in social media. In: Revised selected papers from the workshops on modeling and mining ubiquitous social media, p 77–96
Lemmerich F, Becker M, Atzmueller M (2012) Generic pattern trees for exhaustive exceptional model mining. In: Proceedings of the European conference on machine learning and knowledge discovery in databases (ECML/PKDD), p 277–292
Lemmerich F, Becker M, Puppe F (2013) Difference-based estimates for generalization-aware subgroup discovery. In: Proceedings of the European conference on machine learning and knowledge discovery in databases (ECML/PKDD), p 288–303
Lemmerich F, Puppe F (2011) Local models for expectation-driven subgroup discovery. In: Proceedings of the 11th international conference on data mining (ICDM), p 360–369
Lemmerich F, Rohlfs M, Atzmueller M (2010) Fast discovery of relevant subgroup patterns. In: Proceedings of the 23rd Florida artificial intelligence research society conference (FLAIRS), p 428–433
Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml
Lucas JP, Jorge AM, Pereira F, Pernas AM, Machado AA (2007) A tool for interactive subgroup discovery using distribution rules. In: Proceedings of the artificial intelligence 13th Portuguese conference on progress in artificial intelligence (EPIA), p 426–436
Mampaey M, Nijssen S, Feelders A, Knobbe AJ (2012) Efficient algorithms for finding richer subgroup descriptions in numeric and nominal data. In: Proceedings of the 12th international conference on data mining (ICDM), p 499–508
Moreland K, Truemper K (2009) Discretization of target attributes for subgroup discovery. In: Proceedings of the 6th international conference on machine learning and data mining in pattern recognition (MLDM), p 44–52
Morishita S (1998) On classification and regression. In: Proceedings of the first international conference on discovery science, p 40–57
Morishita S, Sese J (2000) Traversing itemset lattices with statistical metric pruning. In: Proceedings of the 19th ACM symposium on principles of database systems (PODS), p 226–236
Pieters BFI (2010) Subgroup discovery on numeric and ordinal targets, with an application to biological data aggregation. Technical report, Universiteit Utrecht
Pieters BFI, Knobbe AJ, Džeroski S (2010) Subgroup discovery in ranked data, with an application to gene set enrichment. In: Preference learning, workshop at the ECML/PKDD, vol. 10, p 1–18
Rastogi R, Shim K (2002) Mining optimized association rules with categorical and numeric attributes. IEEE Trans Knowl Data Eng 14(1):29–50
Webb GI (1995) OPUS: an efficient admissible algorithm for unordered search. J Artif Intell Res 3(1):431–465
Webb GI (2001) Discovering associations with numeric variables. In: Proceedings of the 7th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), p 383–388
Wrobel S (1997) An algorithm for multi-relational discovery of subgroups. In: Proceedings of the 1st European symposium on principles of data mining and knowledge discovery (PKDD), p 78–87
Zaki MJ (2000) Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12(3):372–390
Zimmermann A, De Raedt L (2009) Cluster-grouping: from subgroup discovery to clustering. Mach Learn 77(1):125–159
Acknowledgments
This work has been partially supported by the VENUS research cluster at the interdisciplinary Research Center for Information System Design (ITeG) at Kassel University.
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: M.J. Zaki.
This paper summarizes and extends contents of the dissertation of the first author (Lemmerich 2014). A small part of this work, i.e., the SD-Map* algorithm for mean-based interestingness measures only, has been described in a previous publication (Atzmueller and Lemmerich 2009).
Appendix
Appendix
Lemma 1
Using the notations of Theorem 9, the function \(f^a (x) (n+x)^a \cdot \left( \frac{\sigma + x \cdot \theta }{n+x}-\mu _\emptyset \right) \) has no local maxima inside its domain of definition:
Proof
We distinguish three cases by the parameter a of the applied generic mean interestingness measure:
first, for \(a = 1\), it holds that
As this is a linear function in x, the function \(f^1(x)\) is strictly increasing for \(\theta > \mu _\emptyset \) and strictly decreasing otherwise. Thus, the theorem holds for \(a=1\).
Second, we consider the case \((a \ne 1) \wedge (\sigma = \theta n)\), that is, the first n instances all had the same target value. In this case, the function \(f^a(x)\) is given by \(f^a(x) = (n+x)^a (\theta -\mu _\emptyset )\). This is strictly monotone since \(n > 0, x > 0\). Thus, again \(f^a(x)\) has no local maximum.
Third, the case \((a \ne 1) \wedge (\sigma \ne \theta n)\) is considered in detail: since \(\sigma \) was computed as a sum of n values that are at least as large as \(\theta \) it can be assumed that \(\theta \cdot n < \sigma \). In the following, the maxima of \(f^a(x)\) is determined by deriving this function twice.
In line 2, the product rule is used. In line 3 the chain rule is applied, substituting (n+x). \(\mu _\emptyset \) can be omitted, as it is constant with respect to x. In line 4 the quotient rule is used. Finally, in line 5 \((n+x)^{a-2}\) is factored out.
Since \(x > 0, n > 0\) by definition, the first factor is obviously greater than zero for any valid x. For \(a = 0\) or \(\theta = \mu _\emptyset \), the second factor of this function is independent from x, so it has no root, thus f(x) has no maxima except the definition boundaries in this case. Otherwise the root of this function and therefore the only candidate for a maximum of \(f^a(x)\) is given at the point
In the following, it is shown that \(x^*\) can not be a maximum value in our setting. For that purpose, the second derivative of f(x) is computed at the point \(x^*\):
We now can determine the second derivative of f in \(x^*\):
Since \(f^a(x)\) is defined only for positive x, the first factor is always positive. Since by premise \(a < 1\) and \(\theta n < \sigma \), the second derivative at point \(x^*\) is always positive. Thus, if \(x^*\) is an extreme value of f(x), then it is a local minimum. Since it was shown above that f(x) has no other candidates for extreme values besides \(x^*\), this proves the lemma.
Lemma 2
The generic mean-based measures \(q_{mean}^a\) are convex for \(a=1\) in the \((\sum T(c), i_P)\) space. They are not convex for arbitrary a.
Proof
For \(a=1\), the interestingness measure is given by \(q_{mean}^1 (P) = i_P \cdot (\mu _P - \mu _\emptyset ) = i_P \cdot (\frac{\sum _{c \in P} T(c)}{i_P} - \mu _\emptyset ) = \sum _{c \in P} T(c) - i_P \mu _\emptyset \). This function is linear in both \(\sum T(C)\) and \(i_P\). Since linear functions are known to be convex, \(q_{mean}^1\) is convex.
To show that generic mean-based measures are not convex in general, we show an example where the definition of convex for a function f(x), that is, \(\forall x,y, \lambda \in (0,1): f((1-\lambda ) x + \lambda y) \le (1-\lambda ) f (x) + \lambda f (y)\), is violated. In our case, x and y are each two-dimensional points in the \((\sum T(c), i_P)\) space. In that regard, we consider a dataset with \(\mu _\emptyset = 0\) and the mean test interestingness measure \(q_{mean}^{0.5}\). Then, the considered interestingness measure is given by \(q_{mean}^{0.5} = i_P^{0.5} \cdot (\mu _P - \mu _\emptyset ) = \frac{\sum _{c \in P} T(c)}{\sqrt{i_P}} := f(x)\). As two points in the \((\sum T(c), i_P)\) space for which the convexity condition is violated we choose \(x=(-100, 2)\) and \(y=(-100, 10)\). Additionally, we choose \(\lambda =0.5\). Then, the convexity inequality is violated:
Since the definition of convexity is violated in at least one example, the mean test interestingness measure \(q_{mean}^{0.5}\) is not convex.
The non-convexity of \(q_{mean}^{0.5}\) is also evident by a surface plot of the function for \(\mu _\emptyset =0\), see Fig. 3. \(\square \)
Rights and permissions
About this article
Cite this article
Lemmerich, F., Atzmueller, M. & Puppe, F. Fast exhaustive subgroup discovery with numerical target concepts. Data Min Knowl Disc 30, 711–762 (2016). https://doi.org/10.1007/s10618-015-0436-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-015-0436-8