Relevancy in Constraint-Based Subgroup Discovery

Lavrač, Nada; Gamberger, Dragan

doi:10.1007/11615576_12

Nada Lavrač^21,22 &
Dragan Gamberger²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3848))

349 Accesses
13 Citations

Abstract

This chapter investigates subgroup discovery as a task of constraint-based mining of local patterns, aimed at describing groups of individuals with unusual distributional characteristics with respect to the property of interest. The chapter provides a novel interpretation of relevancy constraints and their use for feature filtering, introduces relevancy-based mechanisms for handling unknown values in the examples, and discusses the concept of relevancy as an approach to avoiding overfitting in subgroup discovery. The proposed approach to constraint-based subgroup mining, using the SD algorithm, was successfully applied to gene expression data analysis in functional genomics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

For real: a thorough look at numeric attributes in subgroup discovery

Article Open access 21 September 2020

Subgroup Discovery Algorithms: A Survey and Empirical Evaluation

Article 06 May 2016

Robust subgroup discovery

Article Open access 12 August 2022

References

Almuallim, H., Dietterich, T.G.: Learning with many irrelevant features. In: Proceedings of the 9th National Conference on Artificial Intelligence, pp. 547–552. The MIT Press, Cambridge (1991)
Google Scholar
Bayardo, R.J., Agrawal, R., Gunopulos, D.: Constraint-based rule mining in large, dense databases. In: Proc. of the 15th Conference on Data Engineering, pp. 188–197 (1999)
Google Scholar
Bayardo, R.J. (ed.): Constraints in Data Mining. Special issue of SIGKDD Explorations 4(1) (2002)
Google Scholar
Bruha, I., Franek, F.: Comparison of various routines for unknown attribute value processing. Journal of Pattern Recognition and Artificial Intelligence 10(8), 939–955 (1996)
Article Google Scholar
Clark, P., Niblett, T.: The CN2 induction algorithm. Machine Learning 3(4), 261–283 (1989)
Google Scholar
Fayyad, U.M., Irani, K.B.: On the handling of continuous-valued attributes in decision tree generation. Machine Learning 8, 87–102 (1992)
MATH Google Scholar
Gamberger, D., Lavrač, N.: Expert-guided subgroup discovery: Methodology and application. Journal of Artificial Intelligence Research 17, 501–527 (2002)
MATH Google Scholar
Gamberger, D., Lavrač, N., Železný, F., Tolar, J.: Induction of comprehensible models for gene expression datasets by the subgroup discovery methodology. Journal of Biomedical Informatics 37, 269–284 (2004)
Article Google Scholar
Kira, K., Rendell, L.A.: A practical approach to feature selection. In: Proceedings of the 9th International Conference on Machine Learning, pp. 249–256. Morgan Kaufmann, San Francisco (1992)
Google Scholar
Klösgen, W.: Explora: A multipattern and multistrategy discovery assistant. In: Advances in Knowledge Discovery and Data Mining, pp. 249–271. MIT Press, Cambridge (1996)
Google Scholar
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artificial Intelligence, Special Issue on Relevance 97, 273–324 (1997)
MATH Google Scholar
Koller, D., Sahami, M.: Toward optimal feature selection. In: Proceedings of the 13th International Conference on Machine Learning, pp. 284–292. Morgan Kaufmann, San Francisco (1996)
Google Scholar
Kononenko, I.: Estimating attributes: Analysis and extensions of Relief. In: Bergadano, F., De Raedt, L. (eds.) ECML 1994. LNCS, vol. 784, pp. 171–182. Springer, Heidelberg (1994)
Google Scholar
Lavrač, N., Gamberger, D., Turney, P.: A relevancy filter for constructive induction. IEEE Intelligent Systems and their Applications 13, 50–56 (1998)
Article Google Scholar
Lavrač, N., Gamberger, D., Jovanoski, V.: A study of relevance for learning in deductive databases. Journal of Logic Programming 40, 215–249 (1999)
Article MATH MathSciNet Google Scholar
Lavrač, N., Kavšek, B., Flach, P., Todorovski, L.: Subgroup discovery with CN2-SD. Journal of Machine Learning Research 5, 153–188 (2004)
Google Scholar
Li, J., Wong, L.: Geography of differences between two classes of data. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 325–337. Springer, Heidelberg (2002)
Chapter Google Scholar
Liu, H., Motoda, H. (eds.): Feature Selection for Knowledge Discovery and Data Mining. Kluwer, Dordrecht (1998)
MATH Google Scholar
Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery 1(3), 241–258 (1997)
Article Google Scholar
Michalski, R.S.: A theory and methodology of inductive learning. In: Michalski, R., Carbonell, J., Mitchell, T. (eds.) Machine Learning: An Artificial Intelligence Approach, Tioga, pp. 83–134 (1983)
Google Scholar
Morishita, S., Sese, J.: Traversing itemset lattices with statistical metric pruning. In: Proceedings of the Nineteenth Symposium on Principles of Database Systems, pp. 226–236 (2000)
Google Scholar
Oliveira, A.L., Sangiovanni-Vincentelli, A.: Constructive induction using a non-greedy strategy for feature selection. In: Proceedings of the 9th International Conference on Machine Learning, pp. 354–360. Morgan Kaufmann, San Francisco (1992)
Google Scholar
Provost, F., Fawcett, T.: Robust classification for imprecise environments. Machine Learning 42(3), 203–231 (2001)
Article MATH Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Google Scholar
Ramaswamy, S., et al.: Multiclass cancer diagnosis using tumor gene expression signatures. Proc. Natl. Acad. Sci. USA 98(26), 15149–15154 (2001)
Article Google Scholar
Wrobel, S.: An algorithm for multi-relational discovery of subgroups. In: Komorowski, J., Żytkow, J.M. (eds.) PKDD 1997. LNCS, vol. 1263, pp. 78–87. Springer, Heidelberg (1997)
Google Scholar

Download references

Author information

Authors and Affiliations

Jožef Stefan Institute, Jamova 39, 1000, Ljubljana, Slovenia
Nada Lavrač
Nova Gorica Polytechnic, Vipavska 13, 5000, Nova Gorica, Slovenia
Nada Lavrač
Rudjer Bošković Institute, Bijenička 54, 10000, Zagreb, Croatia
Dragan Gamberger

Authors

Nada Lavrač
View author publications
You can also search for this author in PubMed Google Scholar
Dragan Gamberger
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

INSA-Lyon, LIRIS CNRS UMR5205, F-69621, Villeurbanne, France
Jean-François Boulicaut
Department of Computer Science, Katholieke Universiteit Leuven, Celestijnenlaan 200A, 3001, Heverlee, Belgium
Luc De Raedt
HIIT, Helsinki University of Technology and, University of Helsinki, Finland
Heikki Mannila

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lavrač, N., Gamberger, D. (2006). Relevancy in Constraint-Based Subgroup Discovery. In: Boulicaut, JF., De Raedt, L., Mannila, H. (eds) Constraint-Based Mining and Inductive Databases. Lecture Notes in Computer Science(), vol 3848. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11615576_12

Download citation

DOI: https://doi.org/10.1007/11615576_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31331-1
Online ISBN: 978-3-540-31351-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Relevancy in Constraint-Based Subgroup Discovery

Abstract

Access this chapter

Preview

Similar content being viewed by others

For real: a thorough look at numeric attributes in subgroup discovery

Subgroup Discovery Algorithms: A Survey and Empirical Evaluation

Robust subgroup discovery

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Relevancy in Constraint-Based Subgroup Discovery

Abstract

Access this chapter

Preview

Similar content being viewed by others

For real: a thorough look at numeric attributes in subgroup discovery

Subgroup Discovery Algorithms: A Survey and Empirical Evaluation

Robust subgroup discovery

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation