Abstract
In multi-relational data mining, data are represented in a relational form where the individuals of the target table are potentially related to several records in secondary tables in one-to-many relationship. In this paper, we introduce an itemset based framework for constructing variables in secondary tables and evaluating their conditional information for the supervised classification task. We introduce a space of itemset based models in the secondary table and conditional density estimation of the related constructed variables. A prior distribution is defined on this model space, resulting in a parameter-free criterion to assess the relevance of the constructed variables. A greedy algorithm is then proposed in order to explore the space of the considered itemsets. Experiments on multi-relationalal datasets confirm the advantage of the approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Knobbe, A.J., Blockeel, H., Siebes, A., Van Der Wallen, D.: Multi-Relational Data Mining. In: Proceedings of Benelearn 1999 (1999)
Džeroski, S., Lavrač, N.: Relational Data Mining. Springer-Verlag New York, Inc. (2001)
Kramer, S., Flach, P.A., Lavrač, N.: Propositionalization approaches to relational data mining. In: Džeroski, S., Lavrač, N. (eds.) Relational Data Mining, pp. 262–286. Springer, New York (2001)
Van Laer, W., De Raedt, L., Džeroski, S.: On multi-class problems and discretization in inductive logic programming. In: Raś, Z.W., Skowron, A. (eds.) ISMIS 1997. LNCS, vol. 1325, pp. 277–286. Springer, Heidelberg (1997)
Knobbe, A.J., Ho, E.K.Y.: Numbers in multi-relational data mining. In: Jorge, A.M., Torgo, L., Brazdil, P.B., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 544–551. Springer, Heidelberg (2005)
Alfred, R.: Discretization Numerical Data for Relational Data with One-to-Many Relations. Journal of Computer Science 5(7), 519–528 (2009)
Lachiche, N., Flach, P.A.: A first-order representation for knowledge discovery and Bayesian classification on relational data. In: PKDD 2000 Workshop on Data Mining, Decision Support, Meta-learning and ILP, pp. 49–60 (2000)
Flach, P.A., Lachiche, N.: Naive Bayesian Classification of Structured Data. Machine Learning 57(3), 233–269 (2004)
Ceci, M., Appice, A., Malerba, D.: Mr-SBC: A Multi-relational Naïve Bayes Classifier. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 95–106. Springer, Heidelberg (2003)
Krogel, M.-A., Wrobel, S.: Transformation-based learning using multirelational aggregation. In: Rouveirol, C., Sebag, M. (eds.) ILP 2001. LNCS (LNAI), vol. 2157, pp. 142–155. Springer, Heidelberg (2001)
Lahbib, D., Boullé, M., Laurent, D.: Informative variables selection for multi-relational supervised learning. In: Perner, P. (ed.) MLDM 2011. LNCS, vol. 6871, pp. 75–87. Springer, Heidelberg (2011)
Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE (11), 2278–2324 (1998)
De Raedt, L., Dehaspe, L.: Mining Association Rules in Multiple Relations. In: Džeroski, S., Lavrač, N. (eds.) ILP 1997. LNCS, vol. 1297, pp. 125–132. Springer, Heidelberg (1997)
Nijssen, S., Kok, J.N.: Faster association rules for multiple relations. In: Proceedings of the 17th International Joint Conference on Artificial Intelligence, vol. (1) (2001)
Guo, J., Bian, W., Li, J.: Multi-relational Association Rule Mining with Guidance of User. In: Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007), pp. 704–709 (2007)
Gu, Y., Liu, H., He, J., Hu, B., Du, X.: MrCAR: A Multi-relational Classification Algorithm Based on Association Rules. In: 2009 International Conference on Web Information Systems and Mining, pp. 256–260 (2009)
Crestana-Jensen, V., Soparkar, N.: Frequent itemset counting across multiple tables. In: Terano, T., Liu, H., Chen, A.L.P. (eds.) PAKDD 2000. LNCS, vol. 1805, pp. 49–61. Springer, Heidelberg (2000)
Goethals, B., Le Page, W., Mampaey, M.: Mining interesting sets and rules in relational databases. In: Proceedings of the 2010 ACM Symposium on Applied Computing, p. 997 (2010)
Goethals, B., Laurent, D., Le Page, W., Dieng, C.T.: Mining frequent conjunctive queries in relational databases through dependency discovery. Knowledge and Information Systems 33(3), 655–684 (2012)
Ceci, M., Appice, A.: Spatial associative classification: propositional vs structural approach. Journal of Intelligent Information Systems 27(3), 191–213 (2006)
Ceci, M., Appice, A., Malerba, D.: Emerging pattern based classification in relational data mining. In: Bhowmick, S.S., Küng, J., Wagner, R. (eds.) DEXA 2008. LNCS, vol. 5181, pp. 283–296. Springer, Heidelberg (2008)
Boullé, M.: Optimum simultaneous discretization with data grid models in supervised classification A Bayesian model selection approach. Advances in Data Analysis and Classification 3(1), 39–61 (2009)
Gay, D., Boullé, M.: A bayesian approach for classification rule mining in quantitative databases. In: Flach, P.A., De Bie, T., Cristianini, N. (eds.) ECML PKDD 2012, Part II. LNCS, vol. 7524, pp. 243–259. Springer, Heidelberg (2012)
Lahbib, D., Boullé, M., Laurent, D.: An evaluation criterion for itemset based variable construction in multi-relational supervised learning. In: Riguzzi, F., Železný, F. (eds.) The 22nd International Conference on Inductive Logic Programming (ILP 2012), Dubrovnik, Croatia (2012)
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York (1991)
Rissanen, J.: A universal prior for integers and estimation by minimum description length. Annals of Statistics 11(2), 416–431 (1983)
Shannon, C.: A mathematical theory of communication. Technical report. Bell Systems Technical Journal (1948)
Boullé, M.: Compression-based averaging of selective naive Bayes classifiers. Journal of Machine Learning Research 8, 1659–1685 (2007)
Andrews, S., Tsochantaridis, I., Hofmann, T.: Support vector machines for multiple-instance learning. In: Advances in Neural Information Processing Systems 15, pp. 561–568. MIT Press (2003)
Zhou, Z.H., Zhang, M.L.: Multi-instance multi-label learning with application to scene classification. In: Advances in Neural Information Processing Systems (NIPS 2006), Number i, pp. 1609–1616. MIT Press, Cambridge (2007)
Džeroski, S., Schulze-Kremer, S., Heidtke, K.R., Siems, K., Wettschereck, D., Blockeel, H.: Diterpene Structure Elucidation From 13C NMR Spectra with Inductive Logic Programming. Applied Artificial Intelligence 12(5), 363–383 (1998)
De Raedt, L.: Attribute-Value Learning Versus Inductive Logic Programming: The Missing Links (Extended Abstract). In: Page, D. (ed.) ILP 1998. LNCS, vol. 1446, pp. 1–8. Springer, Heidelberg (1998)
Srinivasan, A., Muggleton, S., King, R., Sternberg, M.: Mutagenesis: ILP experiments in a non-determinate biological domain. In: Proceedings of the 4th International Workshop on ILP, pp. 217–232 (1994)
Tomečková, M., Rauch, J., Berka, P.: STULONG - Data from a Longitudinal Study of Atherosclerosis Risk Factors. In: ECML/PKDD 2002 Discovery Challenge Workshop Notes (2002)
Asuncion, A., Newman, D.: UCI machine learning repository (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lahbib, D., Boullé, M., Laurent, D. (2013). Itemset-Based Variable Construction in Multi-relational Supervised Learning. In: Riguzzi, F., Železný, F. (eds) Inductive Logic Programming. ILP 2012. Lecture Notes in Computer Science(), vol 7842. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38812-5_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-38812-5_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38811-8
Online ISBN: 978-3-642-38812-5
eBook Packages: Computer ScienceComputer Science (R0)