Itemset-Based Variable Construction in Multi-relational Supervised Learning

Lahbib, Dhafer; Boullé, Marc; Laurent, Dominique

doi:10.1007/978-3-642-38812-5_10

Dhafer Lahbib²¹,
Marc Boullé²¹ &
Dominique Laurent²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7842))

Included in the following conference series:

International Conference on Inductive Logic Programming

612 Accesses
1 Citations

Abstract

In multi-relational data mining, data are represented in a relational form where the individuals of the target table are potentially related to several records in secondary tables in one-to-many relationship. In this paper, we introduce an itemset based framework for constructing variables in secondary tables and evaluating their conditional information for the supervised classification task. We introduce a space of itemset based models in the secondary table and conditional density estimation of the related constructed variables. A prior distribution is defined on this model space, resulting in a parameter-free criterion to assess the relevance of the constructed variables. A greedy algorithm is then proposed in order to explore the space of the considered itemsets. Experiments on multi-relationalal datasets confirm the advantage of the approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 49.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Knobbe, A.J., Blockeel, H., Siebes, A., Van Der Wallen, D.: Multi-Relational Data Mining. In: Proceedings of Benelearn 1999 (1999)
Google Scholar
Džeroski, S., Lavrač, N.: Relational Data Mining. Springer-Verlag New York, Inc. (2001)
Google Scholar
Kramer, S., Flach, P.A., Lavrač, N.: Propositionalization approaches to relational data mining. In: Džeroski, S., Lavrač, N. (eds.) Relational Data Mining, pp. 262–286. Springer, New York (2001)
Chapter Google Scholar
Van Laer, W., De Raedt, L., Džeroski, S.: On multi-class problems and discretization in inductive logic programming. In: Raś, Z.W., Skowron, A. (eds.) ISMIS 1997. LNCS, vol. 1325, pp. 277–286. Springer, Heidelberg (1997)
Chapter Google Scholar
Knobbe, A.J., Ho, E.K.Y.: Numbers in multi-relational data mining. In: Jorge, A.M., Torgo, L., Brazdil, P.B., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 544–551. Springer, Heidelberg (2005)
Chapter Google Scholar
Alfred, R.: Discretization Numerical Data for Relational Data with One-to-Many Relations. Journal of Computer Science 5(7), 519–528 (2009)
Article Google Scholar
Lachiche, N., Flach, P.A.: A first-order representation for knowledge discovery and Bayesian classification on relational data. In: PKDD 2000 Workshop on Data Mining, Decision Support, Meta-learning and ILP, pp. 49–60 (2000)
Google Scholar
Flach, P.A., Lachiche, N.: Naive Bayesian Classification of Structured Data. Machine Learning 57(3), 233–269 (2004)
Article MATH Google Scholar
Ceci, M., Appice, A., Malerba, D.: Mr-SBC: A Multi-relational Naïve Bayes Classifier. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 95–106. Springer, Heidelberg (2003)
Chapter Google Scholar
Krogel, M.-A., Wrobel, S.: Transformation-based learning using multirelational aggregation. In: Rouveirol, C., Sebag, M. (eds.) ILP 2001. LNCS (LNAI), vol. 2157, pp. 142–155. Springer, Heidelberg (2001)
Chapter Google Scholar
Lahbib, D., Boullé, M., Laurent, D.: Informative variables selection for multi-relational supervised learning. In: Perner, P. (ed.) MLDM 2011. LNCS, vol. 6871, pp. 75–87. Springer, Heidelberg (2011)
Chapter Google Scholar
Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE (11), 2278–2324 (1998)
Google Scholar
De Raedt, L., Dehaspe, L.: Mining Association Rules in Multiple Relations. In: Džeroski, S., Lavrač, N. (eds.) ILP 1997. LNCS, vol. 1297, pp. 125–132. Springer, Heidelberg (1997)
Chapter Google Scholar
Nijssen, S., Kok, J.N.: Faster association rules for multiple relations. In: Proceedings of the 17th International Joint Conference on Artificial Intelligence, vol. (1) (2001)
Google Scholar
Guo, J., Bian, W., Li, J.: Multi-relational Association Rule Mining with Guidance of User. In: Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007), pp. 704–709 (2007)
Google Scholar
Gu, Y., Liu, H., He, J., Hu, B., Du, X.: MrCAR: A Multi-relational Classification Algorithm Based on Association Rules. In: 2009 International Conference on Web Information Systems and Mining, pp. 256–260 (2009)
Google Scholar
Crestana-Jensen, V., Soparkar, N.: Frequent itemset counting across multiple tables. In: Terano, T., Liu, H., Chen, A.L.P. (eds.) PAKDD 2000. LNCS, vol. 1805, pp. 49–61. Springer, Heidelberg (2000)
Chapter Google Scholar
Goethals, B., Le Page, W., Mampaey, M.: Mining interesting sets and rules in relational databases. In: Proceedings of the 2010 ACM Symposium on Applied Computing, p. 997 (2010)
Google Scholar
Goethals, B., Laurent, D., Le Page, W., Dieng, C.T.: Mining frequent conjunctive queries in relational databases through dependency discovery. Knowledge and Information Systems 33(3), 655–684 (2012)
Article Google Scholar
Ceci, M., Appice, A.: Spatial associative classification: propositional vs structural approach. Journal of Intelligent Information Systems 27(3), 191–213 (2006)
Article Google Scholar
Ceci, M., Appice, A., Malerba, D.: Emerging pattern based classification in relational data mining. In: Bhowmick, S.S., Küng, J., Wagner, R. (eds.) DEXA 2008. LNCS, vol. 5181, pp. 283–296. Springer, Heidelberg (2008)
Chapter Google Scholar
Boullé, M.: Optimum simultaneous discretization with data grid models in supervised classification A Bayesian model selection approach. Advances in Data Analysis and Classification 3(1), 39–61 (2009)
Article MathSciNet MATH Google Scholar
Gay, D., Boullé, M.: A bayesian approach for classification rule mining in quantitative databases. In: Flach, P.A., De Bie, T., Cristianini, N. (eds.) ECML PKDD 2012, Part II. LNCS, vol. 7524, pp. 243–259. Springer, Heidelberg (2012)
Chapter Google Scholar
Lahbib, D., Boullé, M., Laurent, D.: An evaluation criterion for itemset based variable construction in multi-relational supervised learning. In: Riguzzi, F., Železný, F. (eds.) The 22nd International Conference on Inductive Logic Programming (ILP 2012), Dubrovnik, Croatia (2012)
Google Scholar
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York (1991)
Book MATH Google Scholar
Rissanen, J.: A universal prior for integers and estimation by minimum description length. Annals of Statistics 11(2), 416–431 (1983)
Article MathSciNet MATH Google Scholar
Shannon, C.: A mathematical theory of communication. Technical report. Bell Systems Technical Journal (1948)
Google Scholar
Boullé, M.: Compression-based averaging of selective naive Bayes classifiers. Journal of Machine Learning Research 8, 1659–1685 (2007)
MATH Google Scholar
Andrews, S., Tsochantaridis, I., Hofmann, T.: Support vector machines for multiple-instance learning. In: Advances in Neural Information Processing Systems 15, pp. 561–568. MIT Press (2003)
Google Scholar
Zhou, Z.H., Zhang, M.L.: Multi-instance multi-label learning with application to scene classification. In: Advances in Neural Information Processing Systems (NIPS 2006), Number i, pp. 1609–1616. MIT Press, Cambridge (2007)
Google Scholar
Džeroski, S., Schulze-Kremer, S., Heidtke, K.R., Siems, K., Wettschereck, D., Blockeel, H.: Diterpene Structure Elucidation From 13C NMR Spectra with Inductive Logic Programming. Applied Artificial Intelligence 12(5), 363–383 (1998)
Article Google Scholar
De Raedt, L.: Attribute-Value Learning Versus Inductive Logic Programming: The Missing Links (Extended Abstract). In: Page, D. (ed.) ILP 1998. LNCS, vol. 1446, pp. 1–8. Springer, Heidelberg (1998)
Chapter Google Scholar
Srinivasan, A., Muggleton, S., King, R., Sternberg, M.: Mutagenesis: ILP experiments in a non-determinate biological domain. In: Proceedings of the 4th International Workshop on ILP, pp. 217–232 (1994)
Google Scholar
Tomečková, M., Rauch, J., Berka, P.: STULONG - Data from a Longitudinal Study of Atherosclerosis Risk Factors. In: ECML/PKDD 2002 Discovery Challenge Workshop Notes (2002)
Google Scholar
Asuncion, A., Newman, D.: UCI machine learning repository (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Orange Labs - 2, avenue Pierre Marzin, 23300, Lannion, France
Dhafer Lahbib & Marc Boullé
ETIS-CNRS-Université de Cergy Pontoise-ENSEA, 95000, Cergy Pontoise, France
Dominique Laurent

Authors

Dhafer Lahbib
View author publications
You can also search for this author in PubMed Google Scholar
Marc Boullé
View author publications
You can also search for this author in PubMed Google Scholar
Dominique Laurent
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Mathematics and Computer Science, University of Ferrara, Via Saragat 1, 44122, Ferrara, Italy
Fabrizio Riguzzi
Department of Computer Science and Engineering, Faculty of Electrical Engineering, Czech Technical University in Prague, Karlovo namesti 13, 12135, Prague 2, Republic Czech
Filip Železný

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lahbib, D., Boullé, M., Laurent, D. (2013). Itemset-Based Variable Construction in Multi-relational Supervised Learning. In: Riguzzi, F., Železný, F. (eds) Inductive Logic Programming. ILP 2012. Lecture Notes in Computer Science(), vol 7842. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38812-5_10

Download citation

DOI: https://doi.org/10.1007/978-3-642-38812-5_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38811-8
Online ISBN: 978-3-642-38812-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics