Skip to main content

SCHEP — A Geometric Quality Measure for Regression Rule Sets, Gauging Ranking Consistency Throughout the Real-Valued Target Space

  • Chapter
  • First Online:
Solving Large Scale Learning Tasks. Challenges and Algorithms

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9580))

Abstract

As is well known since Fürnkranz and Flach’s 2005 ROC ‘n’ Rule Learning paper [6], rule learning can benefit from result evaluation based on ROC analysis. More specifically, given a (set of) rule(s), the Area Under the ROC Curve (AUC) can be interpreted as the probability that the (best) rule(s) will rank a positive example before a negative example. This interpretation is well-defined (and stimulates the intuition!) for the situation where the rule (set) concerns a classification problem. For a regression problem, however, the concepts of “positive example” and “negative example” become ill-defined, hindering both ROC analysis and AUC interpretation. We argue that for a regression problem, an interesting property to gauge is the probability that the (best) rule(s) will rank an example with a high target value before an example with a low target value. Moreover, it will do so consistently for all possible thresholds separating the target values into the high and the low. For each such threshold, one can retrieve an old-fashioned binary-target ROC curve for a given rule set. Aggregating all such ROC curves, we introduce SCHEP: the Surface of the Convex-Hull-Enclosing Polygon. This is a geometric quality measure, gauging how consistently a given rule (set) performs the aforementioned separation when the threshold is varied through the target space.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.I.: Fast discovery of association rules. Adv. Knowl. Disc. Data Min. 12, 307–328 (1996)

    Google Scholar 

  2. Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn. 30, 1145–1159 (1997)

    Article  Google Scholar 

  3. Egan, J.P.: Signal Detection Theory and ROC Analysis. Series in Cognition and Perception. Academic Press, New York (1975)

    Google Scholar 

  4. Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27, 861–874 (2006)

    Article  Google Scholar 

  5. Flach, P.A., Hernández-Orallo, J., Ferri Ramirez, C.: A coherent interpretation of AUC as a measure of aggregated classification performance. In: Proceedings of the ICML, pp. 657–664 (2011)

    Google Scholar 

  6. Fürnkranz, J., Flach, P.A.: ROC ‘n’ rule learning - towards a better understanding of covering algorithms. Mach. Learn. 58(1), 39–77 (2005)

    Article  MATH  Google Scholar 

  7. Grosskreutz, H., Paurat, D.: Fast and memory-efficient discovery of the top-k relevant subgroups in a reduced candidate space. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011, Part I. LNCS, vol. 6911, pp. 533–548. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  8. Hand, D., Adams, N., Bolton, R. (eds.): Pattern Detection and Discovery. LNCS, vol. 2447. Springer, Heidelberg (2002)

    MATH  Google Scholar 

  9. Hand, D.J., Till, R.J.: A simple generalization of the area under the ROC curve to multiple class classification problems. Mach. Learn. 45(2), 171–186 (2001)

    Article  MATH  Google Scholar 

  10. Hanley, J.A., McNeil, B.J.: The meaning and use of the area under an ROC curve. Radiology 143, 29–36 (1982)

    Article  Google Scholar 

  11. Hernández-Orallo, J.: ROC curves for regression. Pattern Recogn. 46(12), 3395–3411 (2013)

    Article  MATH  Google Scholar 

  12. Herrera, F., Carmona, C.J., González, P., del Jesus, M.J.: An overview on subgroup discovery: foundations and applications. Knowl. Inf. Syst. 29(3), 495–525 (2011)

    Article  Google Scholar 

  13. Klösgen, W.: Explora: a multipattern and multistrategy discovery assistant. In: Fayyad, U.M., Piatetski-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 249–271. MIT Press, Cambridge (1996)

    Google Scholar 

  14. Krzanowski, W.J., Hand, D.J.: ROC Curves for Continuous Data. Chapman and Hall, London (2009)

    Book  MATH  Google Scholar 

  15. Lane, T.: Extensions of ROC analysis to multi-class domains. In: Proceedings of the ICML 2000 Workshop on Cost-Sensitive Learning (2000)

    Google Scholar 

  16. van Leeuwen, M., Knobbe, A.J.: Diverse subgroup set discovery. Data Min. Knowl. Disc. 25(2), 208–242 (2012)

    Article  MathSciNet  Google Scholar 

  17. Lichman, M.: UCI Machine Learning Repository. School of Information and Computer Science, University of California, Irvine (2013). http://archive.ics.uci.edu/ml

  18. Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledge discovery. Data Min. Knowl. Disc. 1(3), 241–258 (1997)

    Article  Google Scholar 

  19. Morik, K., Boulicaut, J.F., Siebes, A. (eds.): Local Pattern Detection. Springer, New York (2005)

    Google Scholar 

  20. Pieters, B.F.I., Knobbe, A., Džeroski, S.: Subgroup discovery in ranked data, with an application to gene set enrichment. In: Proceedings of the Preference Learning workshop (PL 2010) at ECML PKDD (2010)

    Google Scholar 

  21. Provost, F., Domingos, P.: Well-trained PETs: improving probability estimation trees. CeDER Working Paper #IS-00-04, Stern School of Business, New York University (2001)

    Google Scholar 

  22. Provost, F.J., Fawcett, T.: Robust classification for imprecise environments. Mach. Learn. 42(3), 203–231 (2001)

    Article  MATH  Google Scholar 

  23. Spackman, K.A.: Signal detection theory: valuable tools for evaluating inductive learning. In: Proceedings of the International Workshop on Machine Learning, pp. 160–163 (1989)

    Google Scholar 

  24. Srinivasan, A.: Note on the location of optimal classifiers in n-dimensional ROC space. Technical report PRG-TR-2-99, Oxford University Computing Laboratory, Oxford, England (1999)

    Google Scholar 

  25. Swets, J.: Measuring the accuracy of diagnostic systems. Science 240, 1285–1293 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  26. Swets, J.A., Dawes, R.M., Monahan, J.: Better decisions through science. Sci. Am. 283, 82–87 (2000)

    Article  Google Scholar 

  27. Wrobel, S.: An algorithm for multi-relational discovery of subgroups. In: Proceedings of the PKDD, pp. 78–87 (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wouter Duivesteijn .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Duivesteijn, W., Meeng, M. (2016). SCHEP — A Geometric Quality Measure for Regression Rule Sets, Gauging Ranking Consistency Throughout the Real-Valued Target Space. In: Michaelis, S., Piatkowski, N., Stolpe, M. (eds) Solving Large Scale Learning Tasks. Challenges and Algorithms. Lecture Notes in Computer Science(), vol 9580. Springer, Cham. https://doi.org/10.1007/978-3-319-41706-6_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-41706-6_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-41705-9

  • Online ISBN: 978-3-319-41706-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics