Abstract
A growing number of organizations are adopting a strategy of breaking down large data analysis problems into specific sub-problems, tailoring models for each. However, handling a large number of individual models can pose challenges in understanding organization-wide phenomena. Recent studies focus on using decision trees to create a consensus model by aggregating local decision trees into sets of rules. Despite efforts, the resulting models may still be incomplete, i.e., not able to cover the entire decision space. This paper explores methodologies to tackle this issue by generating complete consensus models from incomplete rule sets, relying on rough estimates of the distribution of independent variables. Two approaches are introduced: synthetic dataset creation followed by decision tree training and a specialized algorithm for creating a decision tree from symbolic data. The feasibility of generating complete decision trees is demonstrated, along with an empirical evaluation on a number of datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Andrzejak, A., Langner, F., Zabala, S.: Interpretable models from distributed data via merging of decision trees. In: 2013 IEEE Symposium on Computational Intelligence and Data Mining (CIDM) (2013)
Billard, L., Diday, E.: From the statistics of data to the statistics of knowledge: symbolic data analysis (2003). https://doi.org/10.1198/016214503000242
Billard, L., Diday, E.: Symbolic Data Analysis: Conceptual Statistics and Data Mining. Wiley, Hoboken (2012)
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Chapman and Hall/CRC, Boca Raton (1984). https://doi.org/10.1201/9781315139470
Brito, P.: Symbolic data analysis: another look at the interaction of data mining and statistics. Wiley Interdisc. Rev. Data Min. Knowl. Discov. 4, 281–295 (2014). https://doi.org/10.1002/widm.1133
Diday, E.: Thinking by classes in data science: the symbolic data analysis paradigm (2016). https://doi.org/10.1002/wics.1384
Giabbanelli, P.J., Peters, J.G.: An algebra to merge heterogeneous classifiers (2015). http://arxiv.org/abs/1501.05141
Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques. 3rd edn. Morgan Kaufmann, Burlington (2012)
Jech, T.J.: Set Theory. Springer, Heidelberg (2006). https://doi.org/10.1007/3-540-44761-X
Kaggle: Kaggle (2022). www.kaggle.com/datasets
Kuhn, M., Weston, S., Culp, M., Coulter, N., Quinlan, J.R.: C5.0 decision trees and rule-based models (2022). https://github.com/topepo/C5.0/issues
Lakkaraju, H., Bach, S.H., Leskovec, J.: Interpretable decision sets: a joint framework for description and prediction, vol. 13–17-August-2016, pp. 1675–1684. Association for Computing Machinery (2016). https://doi.org/10.1145/2939672.2939874
Ligeza, A.: Logical Foundations for Rule-Based Systems, vol. 11. Springer, Heidelberg (2006). https://doi.org/10.1007/3-540-32446-1
Moro, S., Cortez, P., Rita, P.: A data-driven approach to predict the success of bank telemarketing. Decis. Support Syst. 62, 22–31 (2014). https://doi.org/10.1016/j.dss.2014.03.001
Obregon, J., Kim, A., Jung, J.Y.: RuleCOSI: combination and simplification of production rules from boosted decision trees for imbalanced classification. Expert Syst. Appl. 126, 64–82 (2019). https://doi.org/10.1016/j.eswa.2019.02.012
Perinei, E., Lechevallier, Y.: Symbolic discrimination rules. In: Analysis of Symbolic Data, Exploratory Methods for Extracting Statistical Information from Complex Data, pp. 244–265 (2000)
Strecht, P., Mendes-Moreira, J., Soares, C.: Inmplode: a framework to interpret multiple related rule-based models. Expert Syst. 38, e12702 (2021). https://doi.org/10.1111/exsy.12702
Acknowledgments
This work was partially funded by projects AISym4Med (101095387) supported by Horizon Europe Cluster 1: Health, ConnectedHealth (n.o-46858), supported by Competitiveness and Internationalisation Operational Programme (POCI) and Lisbon Regional Operational Programme (LISBOA 2020), under the PORTUGAL 2020 Partnership Agreement, through the European Regional Development Fund (ERDF) and NextGenAI - Center for Responsible AI (2022-C05i0102-02), supported by IAPMEI, and also by FCT plurianual funding for 2020–2023 of LIACC (UIDB/00027/2020_UIDP/00027/2020).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Strecht, P., Mendes-Moreira, J., Soares, C. (2024). Symbolic Data Analysis to Improve Completeness of Model Combination Methods. In: Liu, T., Webb, G., Yue, L., Wang, D. (eds) AI 2023: Advances in Artificial Intelligence. AI 2023. Lecture Notes in Computer Science(), vol 14472. Springer, Singapore. https://doi.org/10.1007/978-981-99-8391-9_9
Download citation
DOI: https://doi.org/10.1007/978-981-99-8391-9_9
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8390-2
Online ISBN: 978-981-99-8391-9
eBook Packages: Computer ScienceComputer Science (R0)