Estimating the Importance of Relational Features by Using Gradient Boosting

Petković, Matej; Ceci, Michelangelo; Kersting, Kristian; Džeroski, Sašo

doi:10.1007/978-3-030-59491-6_34

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12117))

Included in the following conference series:

International Symposium on Methodologies for Intelligent Systems

1096 Accesses
2 Citations

Abstract

With data becoming more and more complex, the standard tabular data format often does not suffice to represent datasets. Richer representations, such as relational ones, are needed. However, a relational representation opens a much larger space of possible descriptors (features) of the examples that are to be classified. Consequently, it is important to assess which features are relevant (and to what extent) for predicting the target. In this work, we propose a novel relational feature ranking method that is based on our novel version of gradient-boosted relational trees and extends the Genie3 score towards relational data. By running the algorithm on six well-known benchmark problems, we show that it yields meaningful feature rankings, provided that the underlying classifier can learn the target concept successfully.

This is financially supported by the Slovenian Research Agency (grants P2-0103, N2-0128, and a young researcher grant to MP).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
MATH Google Scholar
Breiman, L., Friedman, J., Olshen, R., Stone, C.J.: Classification and Regression Trees. Chapman & Hall/CRC, Boca Raton (1984)
Google Scholar
Davis, D., Lichtenwalter, R., Chawla, N.V.: Multi-relational link prediction in heterogeneous information networks. In: 2011 International Conference on Advances in Social Networks Analysis and Mining, pp. 281–288 (2011)
Google Scholar
Dong, Y., et al.: Link prediction and recommendation across heterogeneous social networks. In: 2012 IEEE 12th International Conference on Data Mining, pp. 181–190 (2012)
Google Scholar
Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29(5), 1189–1232 (2001)
Article MathSciNet MATH Google Scholar
GroupLens Research: Imdb dataset. https://grouplens.org/datasets/hetrec-2011/
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
MATH Google Scholar
He, J., Liu, H., Hu, B., Du, X., Wang, P.: Selecting effective features and relations for efficient multi-relational classification. Comput. Intell. 26, 258–281 (2010)
Article MathSciNet Google Scholar
Huynh-Thu, V.A., Irrthum, A., Wehenkel, L., Geurts, P.: Inferring regulatory networks from expression data using tree-based methods. PLOS ONE 5(9), 1–10 (2010). https://doi.org/10.1371/journal.pone.0012776
Article Google Scholar
Moore, A.W.: Basket dataset. http://www.cs.cmu.edu/~awm/10701/project/data.html
Natarajan, S., Kersting, K., Khot, T., Shavlik, J.: Boosted Statistical Relational Learners. SCS. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-13644-8
Book MATH Google Scholar
Pio, G., Serafino, F., Malerba, D., Ceci, M.: Multi-type clustering and classification from heterogeneous networks. Inf. Sci. 425, 107–126 (2018)
Article MathSciNet Google Scholar
Quinlan, J.R.: Boosting first-order learning. In: Arikawa, S., Sharma, A.K. (eds.) ALT 1996. LNCS, vol. 1160, pp. 143–155. Springer, Heidelberg (1996). https://doi.org/10.1007/3-540-61863-5_42
Chapter MATH Google Scholar
Serafino, F., Pio, G., Ceci, M.: Ensemble learning for multi-type classification in heterogeneous networks. IEEE Trans. Knowl. Data Eng. 30(12), 2326–2339 (2018)
Article Google Scholar
Stack Exchage: Stack dataset. https://archive.org/details/stackexchange
Sun, Y., Barber, R., Gupta, M., Aggarwal, C.C., Han, J.: Co-author relationship prediction in heterogeneous bibliographic networks. In: 2011 International Conference on Advances in Social Networks Analysis and Mining, pp. 121–128 (2011)
Google Scholar
Vens, C.: Complex aggregates in relational learning. Ph.D. thesis, Faculteit Ingenieurswetenschappen, Katholieke Univeristeit Leuven (2007)
Google Scholar
Škrlj, B., Kralj, J., Lavrač, N.: Targeted end-to-end knowledge graph decomposition. In: Riguzzi, F., Bellodi, E., Zese, R. (eds.) ILP 2018. LNCS (LNAI), vol. 11105, pp. 157–171. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99960-9_10
Chapter Google Scholar
Yelp: Yelp dataset. www.yelp.com/dataset_challenge

Download references

Author information

Authors and Affiliations

Jozef Stefan Institute, Jamova 39, Ljubljana, Slovenia
Matej Petković, Michelangelo Ceci & Sašo Džeroski
Jozef Stefan Postgraduate School, Jamova 39, Ljubljana, Slovenia
Matej Petković & Sašo Džeroski
Università degli Studi di Bari Aldo Moro, via E. Orabona 4, Bari, Italy
Michelangelo Ceci
CS Department, TU Darmstadt, Hochschulstrasse 1, Darmstadt, Germany
Kristian Kersting

Authors

Matej Petković
View author publications
You can also search for this author in PubMed Google Scholar
Michelangelo Ceci
View author publications
You can also search for this author in PubMed Google Scholar
Kristian Kersting
View author publications
You can also search for this author in PubMed Google Scholar
Sašo Džeroski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matej Petković .

Editor information

Editors and Affiliations

Graz University of Technology, Graz, Austria
Denis Helic
University of Klagenfurt, Klagenfurt, Austria
Gerhard Leitner
Graz University of Technology, Graz, Austria
Martin Stettinger
Graz University of Technology, Graz, Austria
Alexander Felfernig
University of North Carolina at Charlotte, Charlotte, NC, USA
Zbigniew W. Raś

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Petković, M., Ceci, M., Kersting, K., Džeroski, S. (2020). Estimating the Importance of Relational Features by Using Gradient Boosting. In: Helic, D., Leitner, G., Stettinger, M., Felfernig, A., Raś, Z.W. (eds) Foundations of Intelligent Systems. ISMIS 2020. Lecture Notes in Computer Science(), vol 12117. Springer, Cham. https://doi.org/10.1007/978-3-030-59491-6_34

Download citation

DOI: https://doi.org/10.1007/978-3-030-59491-6_34
Published: 17 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59490-9
Online ISBN: 978-3-030-59491-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics