Abstract
As is well known since Fürnkranz and Flach’s 2005 ROC ‘n’ Rule Learning paper [6], rule learning can benefit from result evaluation based on ROC analysis. More specifically, given a (set of) rule(s), the Area Under the ROC Curve (AUC) can be interpreted as the probability that the (best) rule(s) will rank a positive example before a negative example. This interpretation is well-defined (and stimulates the intuition!) for the situation where the rule (set) concerns a classification problem. For a regression problem, however, the concepts of “positive example” and “negative example” become ill-defined, hindering both ROC analysis and AUC interpretation. We argue that for a regression problem, an interesting property to gauge is the probability that the (best) rule(s) will rank an example with a high target value before an example with a low target value. Moreover, it will do so consistently for all possible thresholds separating the target values into the high and the low. For each such threshold, one can retrieve an old-fashioned binary-target ROC curve for a given rule set. Aggregating all such ROC curves, we introduce SCHEP: the Surface of the Convex-Hull-Enclosing Polygon. This is a geometric quality measure, gauging how consistently a given rule (set) performs the aforementioned separation when the threshold is varied through the target space.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.I.: Fast discovery of association rules. Adv. Knowl. Disc. Data Min. 12, 307–328 (1996)
Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn. 30, 1145–1159 (1997)
Egan, J.P.: Signal Detection Theory and ROC Analysis. Series in Cognition and Perception. Academic Press, New York (1975)
Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27, 861–874 (2006)
Flach, P.A., Hernández-Orallo, J., Ferri Ramirez, C.: A coherent interpretation of AUC as a measure of aggregated classification performance. In: Proceedings of the ICML, pp. 657–664 (2011)
Fürnkranz, J., Flach, P.A.: ROC ‘n’ rule learning - towards a better understanding of covering algorithms. Mach. Learn. 58(1), 39–77 (2005)
Grosskreutz, H., Paurat, D.: Fast and memory-efficient discovery of the top-k relevant subgroups in a reduced candidate space. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011, Part I. LNCS, vol. 6911, pp. 533–548. Springer, Heidelberg (2011)
Hand, D., Adams, N., Bolton, R. (eds.): Pattern Detection and Discovery. LNCS, vol. 2447. Springer, Heidelberg (2002)
Hand, D.J., Till, R.J.: A simple generalization of the area under the ROC curve to multiple class classification problems. Mach. Learn. 45(2), 171–186 (2001)
Hanley, J.A., McNeil, B.J.: The meaning and use of the area under an ROC curve. Radiology 143, 29–36 (1982)
Hernández-Orallo, J.: ROC curves for regression. Pattern Recogn. 46(12), 3395–3411 (2013)
Herrera, F., Carmona, C.J., González, P., del Jesus, M.J.: An overview on subgroup discovery: foundations and applications. Knowl. Inf. Syst. 29(3), 495–525 (2011)
Klösgen, W.: Explora: a multipattern and multistrategy discovery assistant. In: Fayyad, U.M., Piatetski-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 249–271. MIT Press, Cambridge (1996)
Krzanowski, W.J., Hand, D.J.: ROC Curves for Continuous Data. Chapman and Hall, London (2009)
Lane, T.: Extensions of ROC analysis to multi-class domains. In: Proceedings of the ICML 2000 Workshop on Cost-Sensitive Learning (2000)
van Leeuwen, M., Knobbe, A.J.: Diverse subgroup set discovery. Data Min. Knowl. Disc. 25(2), 208–242 (2012)
Lichman, M.: UCI Machine Learning Repository. School of Information and Computer Science, University of California, Irvine (2013). http://archive.ics.uci.edu/ml
Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledge discovery. Data Min. Knowl. Disc. 1(3), 241–258 (1997)
Morik, K., Boulicaut, J.F., Siebes, A. (eds.): Local Pattern Detection. Springer, New York (2005)
Pieters, B.F.I., Knobbe, A., Džeroski, S.: Subgroup discovery in ranked data, with an application to gene set enrichment. In: Proceedings of the Preference Learning workshop (PL 2010) at ECML PKDD (2010)
Provost, F., Domingos, P.: Well-trained PETs: improving probability estimation trees. CeDER Working Paper #IS-00-04, Stern School of Business, New York University (2001)
Provost, F.J., Fawcett, T.: Robust classification for imprecise environments. Mach. Learn. 42(3), 203–231 (2001)
Spackman, K.A.: Signal detection theory: valuable tools for evaluating inductive learning. In: Proceedings of the International Workshop on Machine Learning, pp. 160–163 (1989)
Srinivasan, A.: Note on the location of optimal classifiers in n-dimensional ROC space. Technical report PRG-TR-2-99, Oxford University Computing Laboratory, Oxford, England (1999)
Swets, J.: Measuring the accuracy of diagnostic systems. Science 240, 1285–1293 (1988)
Swets, J.A., Dawes, R.M., Monahan, J.: Better decisions through science. Sci. Am. 283, 82–87 (2000)
Wrobel, S.: An algorithm for multi-relational discovery of subgroups. In: Proceedings of the PKDD, pp. 78–87 (1997)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Duivesteijn, W., Meeng, M. (2016). SCHEP — A Geometric Quality Measure for Regression Rule Sets, Gauging Ranking Consistency Throughout the Real-Valued Target Space. In: Michaelis, S., Piatkowski, N., Stolpe, M. (eds) Solving Large Scale Learning Tasks. Challenges and Algorithms. Lecture Notes in Computer Science(), vol 9580. Springer, Cham. https://doi.org/10.1007/978-3-319-41706-6_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-41706-6_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41705-9
Online ISBN: 978-3-319-41706-6
eBook Packages: Computer ScienceComputer Science (R0)