Abstract
This chapter investigates subgroup discovery as a task of constraint-based mining of local patterns, aimed at describing groups of individuals with unusual distributional characteristics with respect to the property of interest. The chapter provides a novel interpretation of relevancy constraints and their use for feature filtering, introduces relevancy-based mechanisms for handling unknown values in the examples, and discusses the concept of relevancy as an approach to avoiding overfitting in subgroup discovery. The proposed approach to constraint-based subgroup mining, using the SD algorithm, was successfully applied to gene expression data analysis in functional genomics.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Almuallim, H., Dietterich, T.G.: Learning with many irrelevant features. In: Proceedings of the 9th National Conference on Artificial Intelligence, pp. 547–552. The MIT Press, Cambridge (1991)
Bayardo, R.J., Agrawal, R., Gunopulos, D.: Constraint-based rule mining in large, dense databases. In: Proc. of the 15th Conference on Data Engineering, pp. 188–197 (1999)
Bayardo, R.J. (ed.): Constraints in Data Mining. Special issue of SIGKDD Explorations 4(1) (2002)
Bruha, I., Franek, F.: Comparison of various routines for unknown attribute value processing. Journal of Pattern Recognition and Artificial Intelligence 10(8), 939–955 (1996)
Clark, P., Niblett, T.: The CN2 induction algorithm. Machine Learning 3(4), 261–283 (1989)
Fayyad, U.M., Irani, K.B.: On the handling of continuous-valued attributes in decision tree generation. Machine Learning 8, 87–102 (1992)
Gamberger, D., Lavrač, N.: Expert-guided subgroup discovery: Methodology and application. Journal of Artificial Intelligence Research 17, 501–527 (2002)
Gamberger, D., Lavrač, N., Železný, F., Tolar, J.: Induction of comprehensible models for gene expression datasets by the subgroup discovery methodology. Journal of Biomedical Informatics 37, 269–284 (2004)
Kira, K., Rendell, L.A.: A practical approach to feature selection. In: Proceedings of the 9th International Conference on Machine Learning, pp. 249–256. Morgan Kaufmann, San Francisco (1992)
Klösgen, W.: Explora: A multipattern and multistrategy discovery assistant. In: Advances in Knowledge Discovery and Data Mining, pp. 249–271. MIT Press, Cambridge (1996)
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artificial Intelligence, Special Issue on Relevance 97, 273–324 (1997)
Koller, D., Sahami, M.: Toward optimal feature selection. In: Proceedings of the 13th International Conference on Machine Learning, pp. 284–292. Morgan Kaufmann, San Francisco (1996)
Kononenko, I.: Estimating attributes: Analysis and extensions of Relief. In: Bergadano, F., De Raedt, L. (eds.) ECML 1994. LNCS, vol. 784, pp. 171–182. Springer, Heidelberg (1994)
Lavrač, N., Gamberger, D., Turney, P.: A relevancy filter for constructive induction. IEEE Intelligent Systems and their Applications 13, 50–56 (1998)
Lavrač, N., Gamberger, D., Jovanoski, V.: A study of relevance for learning in deductive databases. Journal of Logic Programming 40, 215–249 (1999)
Lavrač, N., Kavšek, B., Flach, P., Todorovski, L.: Subgroup discovery with CN2-SD. Journal of Machine Learning Research 5, 153–188 (2004)
Li, J., Wong, L.: Geography of differences between two classes of data. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 325–337. Springer, Heidelberg (2002)
Liu, H., Motoda, H. (eds.): Feature Selection for Knowledge Discovery and Data Mining. Kluwer, Dordrecht (1998)
Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery 1(3), 241–258 (1997)
Michalski, R.S.: A theory and methodology of inductive learning. In: Michalski, R., Carbonell, J., Mitchell, T. (eds.) Machine Learning: An Artificial Intelligence Approach, Tioga, pp. 83–134 (1983)
Morishita, S., Sese, J.: Traversing itemset lattices with statistical metric pruning. In: Proceedings of the Nineteenth Symposium on Principles of Database Systems, pp. 226–236 (2000)
Oliveira, A.L., Sangiovanni-Vincentelli, A.: Constructive induction using a non-greedy strategy for feature selection. In: Proceedings of the 9th International Conference on Machine Learning, pp. 354–360. Morgan Kaufmann, San Francisco (1992)
Provost, F., Fawcett, T.: Robust classification for imprecise environments. Machine Learning 42(3), 203–231 (2001)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Ramaswamy, S., et al.: Multiclass cancer diagnosis using tumor gene expression signatures. Proc. Natl. Acad. Sci. USA 98(26), 15149–15154 (2001)
Wrobel, S.: An algorithm for multi-relational discovery of subgroups. In: Komorowski, J., Żytkow, J.M. (eds.) PKDD 1997. LNCS, vol. 1263, pp. 78–87. Springer, Heidelberg (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lavrač, N., Gamberger, D. (2006). Relevancy in Constraint-Based Subgroup Discovery. In: Boulicaut, JF., De Raedt, L., Mannila, H. (eds) Constraint-Based Mining and Inductive Databases. Lecture Notes in Computer Science(), vol 3848. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11615576_12
Download citation
DOI: https://doi.org/10.1007/11615576_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31331-1
Online ISBN: 978-3-540-31351-9
eBook Packages: Computer ScienceComputer Science (R0)