Skip to main content

Estimating the Importance of Relational Features by Using Gradient Boosting

  • Conference paper
  • First Online:
Foundations of Intelligent Systems (ISMIS 2020)

Abstract

With data becoming more and more complex, the standard tabular data format often does not suffice to represent datasets. Richer representations, such as relational ones, are needed. However, a relational representation opens a much larger space of possible descriptors (features) of the examples that are to be classified. Consequently, it is important to assess which features are relevant (and to what extent) for predicting the target. In this work, we propose a novel relational feature ranking method that is based on our novel version of gradient-boosted relational trees and extends the Genie3 score towards relational data. By running the algorithm on six well-known benchmark problems, we show that it yields meaningful feature rankings, provided that the underlying classifier can learn the target concept successfully.

This is financially supported by the Slovenian Research Agency (grants P2-0103, N2-0128, and a young researcher grant to MP).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)

    MATH  Google Scholar 

  2. Breiman, L., Friedman, J., Olshen, R., Stone, C.J.: Classification and Regression Trees. Chapman & Hall/CRC, Boca Raton (1984)

    Google Scholar 

  3. Davis, D., Lichtenwalter, R., Chawla, N.V.: Multi-relational link prediction in heterogeneous information networks. In: 2011 International Conference on Advances in Social Networks Analysis and Mining, pp. 281–288 (2011)

    Google Scholar 

  4. Dong, Y., et al.: Link prediction and recommendation across heterogeneous social networks. In: 2012 IEEE 12th International Conference on Data Mining, pp. 181–190 (2012)

    Google Scholar 

  5. Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29(5), 1189–1232 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  6. GroupLens Research: Imdb dataset. https://grouplens.org/datasets/hetrec-2011/

  7. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)

    MATH  Google Scholar 

  8. He, J., Liu, H., Hu, B., Du, X., Wang, P.: Selecting effective features and relations for efficient multi-relational classification. Comput. Intell. 26, 258–281 (2010)

    Article  MathSciNet  Google Scholar 

  9. Huynh-Thu, V.A., Irrthum, A., Wehenkel, L., Geurts, P.: Inferring regulatory networks from expression data using tree-based methods. PLOS ONE 5(9), 1–10 (2010). https://doi.org/10.1371/journal.pone.0012776

    Article  Google Scholar 

  10. Moore, A.W.: Basket dataset. http://www.cs.cmu.edu/~awm/10701/project/data.html

  11. Natarajan, S., Kersting, K., Khot, T., Shavlik, J.: Boosted Statistical Relational Learners. SCS. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-13644-8

    Book  MATH  Google Scholar 

  12. Pio, G., Serafino, F., Malerba, D., Ceci, M.: Multi-type clustering and classification from heterogeneous networks. Inf. Sci. 425, 107–126 (2018)

    Article  MathSciNet  Google Scholar 

  13. Quinlan, J.R.: Boosting first-order learning. In: Arikawa, S., Sharma, A.K. (eds.) ALT 1996. LNCS, vol. 1160, pp. 143–155. Springer, Heidelberg (1996). https://doi.org/10.1007/3-540-61863-5_42

    Chapter  MATH  Google Scholar 

  14. Serafino, F., Pio, G., Ceci, M.: Ensemble learning for multi-type classification in heterogeneous networks. IEEE Trans. Knowl. Data Eng. 30(12), 2326–2339 (2018)

    Article  Google Scholar 

  15. Stack Exchage: Stack dataset. https://archive.org/details/stackexchange

  16. Sun, Y., Barber, R., Gupta, M., Aggarwal, C.C., Han, J.: Co-author relationship prediction in heterogeneous bibliographic networks. In: 2011 International Conference on Advances in Social Networks Analysis and Mining, pp. 121–128 (2011)

    Google Scholar 

  17. Vens, C.: Complex aggregates in relational learning. Ph.D. thesis, Faculteit Ingenieurswetenschappen, Katholieke Univeristeit Leuven (2007)

    Google Scholar 

  18. Škrlj, B., Kralj, J., Lavrač, N.: Targeted end-to-end knowledge graph decomposition. In: Riguzzi, F., Bellodi, E., Zese, R. (eds.) ILP 2018. LNCS (LNAI), vol. 11105, pp. 157–171. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99960-9_10

    Chapter  Google Scholar 

  19. Yelp: Yelp dataset. www.yelp.com/dataset_challenge

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matej Petković .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Petković, M., Ceci, M., Kersting, K., Džeroski, S. (2020). Estimating the Importance of Relational Features by Using Gradient Boosting. In: Helic, D., Leitner, G., Stettinger, M., Felfernig, A., Raś, Z.W. (eds) Foundations of Intelligent Systems. ISMIS 2020. Lecture Notes in Computer Science(), vol 12117. Springer, Cham. https://doi.org/10.1007/978-3-030-59491-6_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-59491-6_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-59490-9

  • Online ISBN: 978-3-030-59491-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics