Skip to main content

Symbolic Data Analysis to Improve Completeness of Model Combination Methods

  • Conference paper
  • First Online:
AI 2023: Advances in Artificial Intelligence (AI 2023)

Abstract

A growing number of organizations are adopting a strategy of breaking down large data analysis problems into specific sub-problems, tailoring models for each. However, handling a large number of individual models can pose challenges in understanding organization-wide phenomena. Recent studies focus on using decision trees to create a consensus model by aggregating local decision trees into sets of rules. Despite efforts, the resulting models may still be incomplete, i.e., not able to cover the entire decision space. This paper explores methodologies to tackle this issue by generating complete consensus models from incomplete rule sets, relying on rough estimates of the distribution of independent variables. Two approaches are introduced: synthetic dataset creation followed by decision tree training and a specialized algorithm for creating a decision tree from symbolic data. The feasibility of generating complete decision trees is demonstrated, along with an empirical evaluation on a number of datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Andrzejak, A., Langner, F., Zabala, S.: Interpretable models from distributed data via merging of decision trees. In: 2013 IEEE Symposium on Computational Intelligence and Data Mining (CIDM) (2013)

    Google Scholar 

  2. Billard, L., Diday, E.: From the statistics of data to the statistics of knowledge: symbolic data analysis (2003). https://doi.org/10.1198/016214503000242

  3. Billard, L., Diday, E.: Symbolic Data Analysis: Conceptual Statistics and Data Mining. Wiley, Hoboken (2012)

    Google Scholar 

  4. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Chapman and Hall/CRC, Boca Raton (1984). https://doi.org/10.1201/9781315139470

  5. Brito, P.: Symbolic data analysis: another look at the interaction of data mining and statistics. Wiley Interdisc. Rev. Data Min. Knowl. Discov. 4, 281–295 (2014). https://doi.org/10.1002/widm.1133

  6. Diday, E.: Thinking by classes in data science: the symbolic data analysis paradigm (2016). https://doi.org/10.1002/wics.1384

  7. Giabbanelli, P.J., Peters, J.G.: An algebra to merge heterogeneous classifiers (2015). http://arxiv.org/abs/1501.05141

  8. Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques. 3rd edn. Morgan Kaufmann, Burlington (2012)

    Google Scholar 

  9. Jech, T.J.: Set Theory. Springer, Heidelberg (2006). https://doi.org/10.1007/3-540-44761-X

    Book  Google Scholar 

  10. Kaggle: Kaggle (2022). www.kaggle.com/datasets

  11. Kuhn, M., Weston, S., Culp, M., Coulter, N., Quinlan, J.R.: C5.0 decision trees and rule-based models (2022). https://github.com/topepo/C5.0/issues

  12. Lakkaraju, H., Bach, S.H., Leskovec, J.: Interpretable decision sets: a joint framework for description and prediction, vol. 13–17-August-2016, pp. 1675–1684. Association for Computing Machinery (2016). https://doi.org/10.1145/2939672.2939874

  13. Ligeza, A.: Logical Foundations for Rule-Based Systems, vol. 11. Springer, Heidelberg (2006). https://doi.org/10.1007/3-540-32446-1

  14. Moro, S., Cortez, P., Rita, P.: A data-driven approach to predict the success of bank telemarketing. Decis. Support Syst. 62, 22–31 (2014). https://doi.org/10.1016/j.dss.2014.03.001

    Article  Google Scholar 

  15. Obregon, J., Kim, A., Jung, J.Y.: RuleCOSI: combination and simplification of production rules from boosted decision trees for imbalanced classification. Expert Syst. Appl. 126, 64–82 (2019). https://doi.org/10.1016/j.eswa.2019.02.012

    Article  Google Scholar 

  16. Perinei, E., Lechevallier, Y.: Symbolic discrimination rules. In: Analysis of Symbolic Data, Exploratory Methods for Extracting Statistical Information from Complex Data, pp. 244–265 (2000)

    Google Scholar 

  17. Strecht, P., Mendes-Moreira, J., Soares, C.: Inmplode: a framework to interpret multiple related rule-based models. Expert Syst. 38, e12702 (2021). https://doi.org/10.1111/exsy.12702

    Article  Google Scholar 

Download references

Acknowledgments

This work was partially funded by projects AISym4Med (101095387) supported by Horizon Europe Cluster 1: Health, ConnectedHealth (n.o-46858), supported by Competitiveness and Internationalisation Operational Programme (POCI) and Lisbon Regional Operational Programme (LISBOA 2020), under the PORTUGAL 2020 Partnership Agreement, through the European Regional Development Fund (ERDF) and NextGenAI - Center for Responsible AI (2022-C05i0102-02), supported by IAPMEI, and also by FCT plurianual funding for 2020–2023 of LIACC (UIDB/00027/2020_UIDP/00027/2020).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pedro Strecht .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Strecht, P., Mendes-Moreira, J., Soares, C. (2024). Symbolic Data Analysis to Improve Completeness of Model Combination Methods. In: Liu, T., Webb, G., Yue, L., Wang, D. (eds) AI 2023: Advances in Artificial Intelligence. AI 2023. Lecture Notes in Computer Science(), vol 14472. Springer, Singapore. https://doi.org/10.1007/978-981-99-8391-9_9

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-8391-9_9

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-8390-2

  • Online ISBN: 978-981-99-8391-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics