SCHEP — A Geometric Quality Measure for Regression Rule Sets, Gauging Ranking Consistency Throughout the Real-Valued Target Space

Duivesteijn, Wouter; Meeng, Marvin

doi:10.1007/978-3-319-41706-6_14

Wouter Duivesteijn¹⁶ &
Marvin Meeng¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9580))

1429 Accesses
1 Citations

Abstract

As is well known since Fürnkranz and Flach’s 2005 ROC ‘n’ Rule Learning paper [6], rule learning can benefit from result evaluation based on ROC analysis. More specifically, given a (set of) rule(s), the Area Under the ROC Curve (AUC) can be interpreted as the probability that the (best) rule(s) will rank a positive example before a negative example. This interpretation is well-defined (and stimulates the intuition!) for the situation where the rule (set) concerns a classification problem. For a regression problem, however, the concepts of “positive example” and “negative example” become ill-defined, hindering both ROC analysis and AUC interpretation. We argue that for a regression problem, an interesting property to gauge is the probability that the (best) rule(s) will rank an example with a high target value before an example with a low target value. Moreover, it will do so consistently for all possible thresholds separating the target values into the high and the low. For each such threshold, one can retrieve an old-fashioned binary-target ROC curve for a given rule set. Aggregating all such ROC curves, we introduce SCHEP: the Surface of the Convex-Hull-Enclosing Polygon. This is a geometric quality measure, gauging how consistently a given rule (set) performs the aforementioned separation when the threshold is varied through the target space.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.I.: Fast discovery of association rules. Adv. Knowl. Disc. Data Min. 12, 307–328 (1996)
Google Scholar
Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn. 30, 1145–1159 (1997)
Article Google Scholar
Egan, J.P.: Signal Detection Theory and ROC Analysis. Series in Cognition and Perception. Academic Press, New York (1975)
Google Scholar
Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27, 861–874 (2006)
Article Google Scholar
Flach, P.A., Hernández-Orallo, J., Ferri Ramirez, C.: A coherent interpretation of AUC as a measure of aggregated classification performance. In: Proceedings of the ICML, pp. 657–664 (2011)
Google Scholar
Fürnkranz, J., Flach, P.A.: ROC ‘n’ rule learning - towards a better understanding of covering algorithms. Mach. Learn. 58(1), 39–77 (2005)
Article MATH Google Scholar
Grosskreutz, H., Paurat, D.: Fast and memory-efficient discovery of the top-k relevant subgroups in a reduced candidate space. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011, Part I. LNCS, vol. 6911, pp. 533–548. Springer, Heidelberg (2011)
Chapter Google Scholar
Hand, D., Adams, N., Bolton, R. (eds.): Pattern Detection and Discovery. LNCS, vol. 2447. Springer, Heidelberg (2002)
MATH Google Scholar
Hand, D.J., Till, R.J.: A simple generalization of the area under the ROC curve to multiple class classification problems. Mach. Learn. 45(2), 171–186 (2001)
Article MATH Google Scholar
Hanley, J.A., McNeil, B.J.: The meaning and use of the area under an ROC curve. Radiology 143, 29–36 (1982)
Article Google Scholar
Hernández-Orallo, J.: ROC curves for regression. Pattern Recogn. 46(12), 3395–3411 (2013)
Article MATH Google Scholar
Herrera, F., Carmona, C.J., González, P., del Jesus, M.J.: An overview on subgroup discovery: foundations and applications. Knowl. Inf. Syst. 29(3), 495–525 (2011)
Article Google Scholar
Klösgen, W.: Explora: a multipattern and multistrategy discovery assistant. In: Fayyad, U.M., Piatetski-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 249–271. MIT Press, Cambridge (1996)
Google Scholar
Krzanowski, W.J., Hand, D.J.: ROC Curves for Continuous Data. Chapman and Hall, London (2009)
Book MATH Google Scholar
Lane, T.: Extensions of ROC analysis to multi-class domains. In: Proceedings of the ICML 2000 Workshop on Cost-Sensitive Learning (2000)
Google Scholar
van Leeuwen, M., Knobbe, A.J.: Diverse subgroup set discovery. Data Min. Knowl. Disc. 25(2), 208–242 (2012)
Article MathSciNet Google Scholar
Lichman, M.: UCI Machine Learning Repository. School of Information and Computer Science, University of California, Irvine (2013). http://archive.ics.uci.edu/ml
Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledge discovery. Data Min. Knowl. Disc. 1(3), 241–258 (1997)
Article Google Scholar
Morik, K., Boulicaut, J.F., Siebes, A. (eds.): Local Pattern Detection. Springer, New York (2005)
Google Scholar
Pieters, B.F.I., Knobbe, A., Džeroski, S.: Subgroup discovery in ranked data, with an application to gene set enrichment. In: Proceedings of the Preference Learning workshop (PL 2010) at ECML PKDD (2010)
Google Scholar
Provost, F., Domingos, P.: Well-trained PETs: improving probability estimation trees. CeDER Working Paper #IS-00-04, Stern School of Business, New York University (2001)
Google Scholar
Provost, F.J., Fawcett, T.: Robust classification for imprecise environments. Mach. Learn. 42(3), 203–231 (2001)
Article MATH Google Scholar
Spackman, K.A.: Signal detection theory: valuable tools for evaluating inductive learning. In: Proceedings of the International Workshop on Machine Learning, pp. 160–163 (1989)
Google Scholar
Srinivasan, A.: Note on the location of optimal classifiers in n-dimensional ROC space. Technical report PRG-TR-2-99, Oxford University Computing Laboratory, Oxford, England (1999)
Google Scholar
Swets, J.: Measuring the accuracy of diagnostic systems. Science 240, 1285–1293 (1988)
Article MathSciNet MATH Google Scholar
Swets, J.A., Dawes, R.M., Monahan, J.: Better decisions through science. Sci. Am. 283, 82–87 (2000)
Article Google Scholar
Wrobel, S.: An algorithm for multi-relational discovery of subgroups. In: Proceedings of the PKDD, pp. 78–87 (1997)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Engineering Mathematics, University of Bristol, Bristol, UK
Wouter Duivesteijn
Leiden Institute of Advanced Computer Science, Leiden University, Leiden, The Netherlands
Marvin Meeng

Authors

Wouter Duivesteijn
View author publications
You can also search for this author in PubMed Google Scholar
Marvin Meeng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wouter Duivesteijn .

Editor information

Editors and Affiliations

TU Dortmund , Dortmund, Germany
Stefan Michaelis
TU Dortmund , Dortmund, Germany
Nico Piatkowski
TU Dortmund , Dortmund, Germany
Marco Stolpe

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Duivesteijn, W., Meeng, M. (2016). SCHEP — A Geometric Quality Measure for Regression Rule Sets, Gauging Ranking Consistency Throughout the Real-Valued Target Space. In: Michaelis, S., Piatkowski, N., Stolpe, M. (eds) Solving Large Scale Learning Tasks. Challenges and Algorithms. Lecture Notes in Computer Science(), vol 9580. Springer, Cham. https://doi.org/10.1007/978-3-319-41706-6_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-41706-6_14
Published: 03 July 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41705-9
Online ISBN: 978-3-319-41706-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics