Symbolic Data Analysis to Improve Completeness of Model Combination Methods

Strecht, Pedro; Mendes-Moreira, João; Soares, Carlos

doi:10.1007/978-981-99-8391-9_9

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14472))

Included in the following conference series:

Australasian Joint Conference on Artificial Intelligence

Abstract

A growing number of organizations are adopting a strategy of breaking down large data analysis problems into specific sub-problems, tailoring models for each. However, handling a large number of individual models can pose challenges in understanding organization-wide phenomena. Recent studies focus on using decision trees to create a consensus model by aggregating local decision trees into sets of rules. Despite efforts, the resulting models may still be incomplete, i.e., not able to cover the entire decision space. This paper explores methodologies to tackle this issue by generating complete consensus models from incomplete rule sets, relying on rough estimates of the distribution of independent variables. Two approaches are introduced: synthetic dataset creation followed by decision tree training and a specialized algorithm for creating a decision tree from symbolic data. The feasibility of generating complete decision trees is demonstrated, along with an empirical evaluation on a number of datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Andrzejak, A., Langner, F., Zabala, S.: Interpretable models from distributed data via merging of decision trees. In: 2013 IEEE Symposium on Computational Intelligence and Data Mining (CIDM) (2013)
Google Scholar
Billard, L., Diday, E.: From the statistics of data to the statistics of knowledge: symbolic data analysis (2003). https://doi.org/10.1198/016214503000242
Billard, L., Diday, E.: Symbolic Data Analysis: Conceptual Statistics and Data Mining. Wiley, Hoboken (2012)
Google Scholar
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Chapman and Hall/CRC, Boca Raton (1984). https://doi.org/10.1201/9781315139470
Brito, P.: Symbolic data analysis: another look at the interaction of data mining and statistics. Wiley Interdisc. Rev. Data Min. Knowl. Discov. 4, 281–295 (2014). https://doi.org/10.1002/widm.1133
Diday, E.: Thinking by classes in data science: the symbolic data analysis paradigm (2016). https://doi.org/10.1002/wics.1384
Giabbanelli, P.J., Peters, J.G.: An algebra to merge heterogeneous classifiers (2015). http://arxiv.org/abs/1501.05141
Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques. 3rd edn. Morgan Kaufmann, Burlington (2012)
Google Scholar
Jech, T.J.: Set Theory. Springer, Heidelberg (2006). https://doi.org/10.1007/3-540-44761-X
Book Google Scholar
Kaggle: Kaggle (2022). www.kaggle.com/datasets
Kuhn, M., Weston, S., Culp, M., Coulter, N., Quinlan, J.R.: C5.0 decision trees and rule-based models (2022). https://github.com/topepo/C5.0/issues
Lakkaraju, H., Bach, S.H., Leskovec, J.: Interpretable decision sets: a joint framework for description and prediction, vol. 13–17-August-2016, pp. 1675–1684. Association for Computing Machinery (2016). https://doi.org/10.1145/2939672.2939874
Ligeza, A.: Logical Foundations for Rule-Based Systems, vol. 11. Springer, Heidelberg (2006). https://doi.org/10.1007/3-540-32446-1
Moro, S., Cortez, P., Rita, P.: A data-driven approach to predict the success of bank telemarketing. Decis. Support Syst. 62, 22–31 (2014). https://doi.org/10.1016/j.dss.2014.03.001
Article Google Scholar
Obregon, J., Kim, A., Jung, J.Y.: RuleCOSI: combination and simplification of production rules from boosted decision trees for imbalanced classification. Expert Syst. Appl. 126, 64–82 (2019). https://doi.org/10.1016/j.eswa.2019.02.012
Article Google Scholar
Perinei, E., Lechevallier, Y.: Symbolic discrimination rules. In: Analysis of Symbolic Data, Exploratory Methods for Extracting Statistical Information from Complex Data, pp. 244–265 (2000)
Google Scholar
Strecht, P., Mendes-Moreira, J., Soares, C.: Inmplode: a framework to interpret multiple related rule-based models. Expert Syst. 38, e12702 (2021). https://doi.org/10.1111/exsy.12702
Article Google Scholar

Download references

Acknowledgments

This work was partially funded by projects AISym4Med (101095387) supported by Horizon Europe Cluster 1: Health, ConnectedHealth (n.o-46858), supported by Competitiveness and Internationalisation Operational Programme (POCI) and Lisbon Regional Operational Programme (LISBOA 2020), under the PORTUGAL 2020 Partnership Agreement, through the European Regional Development Fund (ERDF) and NextGenAI - Center for Responsible AI (2022-C05i0102-02), supported by IAPMEI, and also by FCT plurianual funding for 2020–2023 of LIACC (UIDB/00027/2020_UIDP/00027/2020).

Author information

Authors and Affiliations

LIAAD-INESC TEC, Faculdade de Engenharia, Universidade do Porto, R. Dr. Roberto Frias, 4200-465, Porto, Portugal
Pedro Strecht, João Mendes-Moreira & Carlos Soares
LIACC, Faculdade de Engenharia, Universidade do Porto, R. Dr. Roberto Frias, 4200-465, Porto, Portugal
Carlos Soares
Fraunhofer Portugal AICOS, R. Alfredo Allen 455, 4200-135, Porto, Portugal
Carlos Soares

Authors

Pedro Strecht
View author publications
You can also search for this author in PubMed Google Scholar
João Mendes-Moreira
View author publications
You can also search for this author in PubMed Google Scholar
Carlos Soares
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pedro Strecht .

Editor information

Editors and Affiliations

The University of Sydney, Darlington, NSW, Australia
Tongliang Liu
Monash University, Clayton, VIC, Australia
Geoff Webb
The University of Newcastle, Callaghan, NSW, Australia
Lin Yue
CSIRO Data61, Sydney, NSW, Australia
Dadong Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Strecht, P., Mendes-Moreira, J., Soares, C. (2024). Symbolic Data Analysis to Improve Completeness of Model Combination Methods. In: Liu, T., Webb, G., Yue, L., Wang, D. (eds) AI 2023: Advances in Artificial Intelligence. AI 2023. Lecture Notes in Computer Science(), vol 14472. Springer, Singapore. https://doi.org/10.1007/978-981-99-8391-9_9

Download citation

DOI: https://doi.org/10.1007/978-981-99-8391-9_9
Published: 27 November 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8390-2
Online ISBN: 978-981-99-8391-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Symbolic Data Analysis to Improve Completeness of Model Combination Methods