Abstract
We propose an approach to subgroup discovery in relational databases containing numerical attributes. The approach is based on detecting bumps in histograms constructed from substitution sets resulting from matching a first-order query against the input relational database. The approach is evaluated on seven data sets, discovering interpretable subgroups. The subgroups’ rate of survival from the training split to the testing split varies among the experimental data sets, but at least on three of them it is very high.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Atzmueller, M., Lemmerich, F.: Fast Subgroup Discovery for Continuous Target Concepts. In: Rauch, J., Raś, Z.W., Berka, P., Elomaa, T. (eds.) ISMIS 2009. LNCS, vol. 5722, pp. 35–44. Springer, Heidelberg (2009)
Berka, P., Sochorová, M.: Guide to the financial data set (1999), http://lisp.vse.cz/pkdd99/berka.html
Escobar, M.D., West, M.: Bayesian density estimation and inference using mixtures. Journal of the American Statistical Association 90, 577–588 (1994)
Friedman, J.H., Fisher, N.I.: Bump hunting in high-dimensional data. Statistics and Computing 9, 123–143 (1999)
Grosskreutz, H., Rüping, S.: On subgroup discovery in numerical domains. Data Mining and Knowledge Discovery 19, 210–226 (2009), doi:10.1007/s10618-009-0136-3
Kavšek, B., Lavrač, N.: APRIORI-SD: adapting association rule learning to subgroup discovery. Applied Artificial Intelligence 20(7), 543–583 (2006), http://www.tandfonline.com/doi/abs/10.1080/08839510600779688
Klösgen, W.: Explora: a multipattern and multistrategy discovery assistant. In: Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 249–271. American Association for Artificial Intelligence, Menlo Park (1996), http://dl.acm.org/citation.cfm?id=257938.257965
Kralj-Novak, P., Lavrač, N., Webb, G.I.: Supervised descriptive rule discovery: A unifying survey of contrast set, emerging pattern and subgroup mining. Journal of Machine Learning Research 10, 377–403 (2009)
Krogel, M.-A., Wrobel, S.: Transformation-Based Learning Using Multirelational Aggregation. In: Rouveirol, C., Sebag, M. (eds.) ILP 2001. LNCS (LNAI), vol. 2157, pp. 142–155. Springer, Heidelberg (2001)
Kuželka, O., Szabóová, A., Holec, M., Železný, F.: Gaussian Logic for Predictive Classification. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011. LNCS, vol. 6912, pp. 277–292. Springer, Heidelberg (2011)
Landwehr, N., Passerini, A., De Raedt, L., Frasconi, P.: Fast learning of relational kernels. Machine Learning 78(3), 305–342 (2010)
Landwehr, N., Kersting, K., De Raedt, L.: nFOIL: integrating naive bayes and FOIL. In: Proceedings of the 20th National Conference on Artificial Intelligence, vol. 2, pp. 795–800. AAAI Press (2005), http://dl.acm.org/citation.cfm?id=1619410.1619460
Landwehr, N., Passerini, A., De Raedt, L., Frasconi, P.: kFOIL: learning simple relational kernels. In: Proceedings of the 21st National Conference on Artificial Intelligence, vol. 1, pp. 389–394. AAAI Press (2006), http://dl.acm.org/citation.cfm?id=1597538.1597601
Lavrač, N., Kavšek, B., Flach, P., Todorovski, L.: Subgroup discovery with CN2-SD. Journal of Machine Learning Research 5, 153–188 (2004)
Lowthian, P., Thompson, M.: Bump-hunting for the proficiency tester – searching for multimodality. The Analyst 127(10), 1359–1364 (2002)
De Raedt, L.: Logical and relational learning. Springer (October 2008)
Silverman, B.W.: Using kernel density estimates to investigate multimodality. Journal of the Royal Statistical Society 43(1), 97–99 (1981)
Srinivasan, A., Muggleton, S.H., Sternberg, M.J.E., King, R.D.: Theories for mutagenicity: A study in first-order and feature-based induction. Artificial Intelligence 85, 277–299 (1996)
Železný, F., Lavrač, N.: Propositionalization-based relational subgroup discovery with RSD. Machine Learning 62(1-2), 33–63 (2006)
Wrobel, S.: An Algorithm for Multi-Relational Discovery of Subgroups. In: Komorowski, J., Żytkow, J.M. (eds.) PKDD 1997. LNCS, vol. 1263, pp. 78–87. Springer, Heidelberg (1997)
Yukizane, T., Ohi, S.Y., Miyano, E., Hirose, H.: The bump hunting method using the genetic algorithm with the extreme-value statistics. IEICE - Trans. Inf. Syst. E89-D, 2332–2339 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Černoch, R., Železný, F. (2012). Subgroup Discovery Using Bump Hunting on Multi-relational Histograms. In: Muggleton, S.H., Tamaddoni-Nezhad, A., Lisi, F.A. (eds) Inductive Logic Programming. ILP 2011. Lecture Notes in Computer Science(), vol 7207. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31951-8_11
Download citation
DOI: https://doi.org/10.1007/978-3-642-31951-8_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31950-1
Online ISBN: 978-3-642-31951-8
eBook Packages: Computer ScienceComputer Science (R0)