Skip to main content

Relevancy in Constraint-Based Subgroup Discovery

  • Conference paper
Constraint-Based Mining and Inductive Databases

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3848))

Abstract

This chapter investigates subgroup discovery as a task of constraint-based mining of local patterns, aimed at describing groups of individuals with unusual distributional characteristics with respect to the property of interest. The chapter provides a novel interpretation of relevancy constraints and their use for feature filtering, introduces relevancy-based mechanisms for handling unknown values in the examples, and discusses the concept of relevancy as an approach to avoiding overfitting in subgroup discovery. The proposed approach to constraint-based subgroup mining, using the SD algorithm, was successfully applied to gene expression data analysis in functional genomics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Almuallim, H., Dietterich, T.G.: Learning with many irrelevant features. In: Proceedings of the 9th National Conference on Artificial Intelligence, pp. 547–552. The MIT Press, Cambridge (1991)

    Google Scholar 

  2. Bayardo, R.J., Agrawal, R., Gunopulos, D.: Constraint-based rule mining in large, dense databases. In: Proc. of the 15th Conference on Data Engineering, pp. 188–197 (1999)

    Google Scholar 

  3. Bayardo, R.J. (ed.): Constraints in Data Mining. Special issue of SIGKDD Explorations 4(1) (2002)

    Google Scholar 

  4. Bruha, I., Franek, F.: Comparison of various routines for unknown attribute value processing. Journal of Pattern Recognition and Artificial Intelligence 10(8), 939–955 (1996)

    Article  Google Scholar 

  5. Clark, P., Niblett, T.: The CN2 induction algorithm. Machine Learning 3(4), 261–283 (1989)

    Google Scholar 

  6. Fayyad, U.M., Irani, K.B.: On the handling of continuous-valued attributes in decision tree generation. Machine Learning 8, 87–102 (1992)

    MATH  Google Scholar 

  7. Gamberger, D., Lavrač, N.: Expert-guided subgroup discovery: Methodology and application. Journal of Artificial Intelligence Research 17, 501–527 (2002)

    MATH  Google Scholar 

  8. Gamberger, D., Lavrač, N., Železný, F., Tolar, J.: Induction of comprehensible models for gene expression datasets by the subgroup discovery methodology. Journal of Biomedical Informatics 37, 269–284 (2004)

    Article  Google Scholar 

  9. Kira, K., Rendell, L.A.: A practical approach to feature selection. In: Proceedings of the 9th International Conference on Machine Learning, pp. 249–256. Morgan Kaufmann, San Francisco (1992)

    Google Scholar 

  10. Klösgen, W.: Explora: A multipattern and multistrategy discovery assistant. In: Advances in Knowledge Discovery and Data Mining, pp. 249–271. MIT Press, Cambridge (1996)

    Google Scholar 

  11. Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artificial Intelligence, Special Issue on Relevance 97, 273–324 (1997)

    MATH  Google Scholar 

  12. Koller, D., Sahami, M.: Toward optimal feature selection. In: Proceedings of the 13th International Conference on Machine Learning, pp. 284–292. Morgan Kaufmann, San Francisco (1996)

    Google Scholar 

  13. Kononenko, I.: Estimating attributes: Analysis and extensions of Relief. In: Bergadano, F., De Raedt, L. (eds.) ECML 1994. LNCS, vol. 784, pp. 171–182. Springer, Heidelberg (1994)

    Google Scholar 

  14. Lavrač, N., Gamberger, D., Turney, P.: A relevancy filter for constructive induction. IEEE Intelligent Systems and their Applications 13, 50–56 (1998)

    Article  Google Scholar 

  15. Lavrač, N., Gamberger, D., Jovanoski, V.: A study of relevance for learning in deductive databases. Journal of Logic Programming 40, 215–249 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  16. Lavrač, N., Kavšek, B., Flach, P., Todorovski, L.: Subgroup discovery with CN2-SD. Journal of Machine Learning Research 5, 153–188 (2004)

    Google Scholar 

  17. Li, J., Wong, L.: Geography of differences between two classes of data. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 325–337. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  18. Liu, H., Motoda, H. (eds.): Feature Selection for Knowledge Discovery and Data Mining. Kluwer, Dordrecht (1998)

    MATH  Google Scholar 

  19. Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery 1(3), 241–258 (1997)

    Article  Google Scholar 

  20. Michalski, R.S.: A theory and methodology of inductive learning. In: Michalski, R., Carbonell, J., Mitchell, T. (eds.) Machine Learning: An Artificial Intelligence Approach, Tioga, pp. 83–134 (1983)

    Google Scholar 

  21. Morishita, S., Sese, J.: Traversing itemset lattices with statistical metric pruning. In: Proceedings of the Nineteenth Symposium on Principles of Database Systems, pp. 226–236 (2000)

    Google Scholar 

  22. Oliveira, A.L., Sangiovanni-Vincentelli, A.: Constructive induction using a non-greedy strategy for feature selection. In: Proceedings of the 9th International Conference on Machine Learning, pp. 354–360. Morgan Kaufmann, San Francisco (1992)

    Google Scholar 

  23. Provost, F., Fawcett, T.: Robust classification for imprecise environments. Machine Learning 42(3), 203–231 (2001)

    Article  MATH  Google Scholar 

  24. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)

    Google Scholar 

  25. Ramaswamy, S., et al.: Multiclass cancer diagnosis using tumor gene expression signatures. Proc. Natl. Acad. Sci. USA 98(26), 15149–15154 (2001)

    Article  Google Scholar 

  26. Wrobel, S.: An algorithm for multi-relational discovery of subgroups. In: Komorowski, J., Żytkow, J.M. (eds.) PKDD 1997. LNCS, vol. 1263, pp. 78–87. Springer, Heidelberg (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lavrač, N., Gamberger, D. (2006). Relevancy in Constraint-Based Subgroup Discovery. In: Boulicaut, JF., De Raedt, L., Mannila, H. (eds) Constraint-Based Mining and Inductive Databases. Lecture Notes in Computer Science(), vol 3848. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11615576_12

Download citation

  • DOI: https://doi.org/10.1007/11615576_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-31331-1

  • Online ISBN: 978-3-540-31351-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics