Abstract
Symbolic polygonal data analysis is a new type of framework to extract valuable knowledge from a new structure of data using regular polygon built from data in class, big data, and complex data. This paper introduces a toolbox for symbolic polygonal data, named psda, that contains the main descriptive measures for this type of variable, e.g., mean, variance, correlation, and a polygonal linear regression model (plr). It is applied at the Brazilian Basic Education Assessment System (SAEB), giving a new perspective to the managers of the counties to realize the public policy in the Brazilian educational system. The hypothesis test showed that the polygonal linear regression model presented the best performance compared to some symbolic interval regression models in the SAEB application.
Similar content being viewed by others
References
Angadi SA, Kagawade VC (2017) A robust face recognition approach through symbolic modeling of polar fft features. Pattern Recognit 71(Supplement C):235–248. https://doi.org/10.1016/j.patcog.2017.06.014
Araújo MC, Lima RC, Souza RM (2014) Interval symbolic feature extraction for thermography breast cancer detection. Expert Syst Appl 41(15):6728–6737. https://doi.org/10.1016/j.eswa.2014.04.027
Araújo MC, Souza RMCR, Lima RCF, Filho TMS (2016) An interval prototype classifier based on a parameterized distance applied to breast thermographic images. Med Biol Eng Comput 55(6):873–884. https://doi.org/10.1007/s11517-016-1565-y
Arroyo J, Espínola R, Maté C (2010) Different approaches to forecast interval time series: a comparison in finance. Comput Econ 37(2):169–191. https://doi.org/10.1007/s10614-010-9230-2
Bezerra BLD, De Carvalho FAT (2010) Symbolic data analysis tools for recommendation systems. Knowl Inf Syst 26(3):385–418. https://doi.org/10.1007/s10115-009-0282-3
Billard L, Diday E (2002) Symbolic regression analysis. Springer, Berlin, pp 281–288. https://doi.org/10.1007/978-3-642-56181-8_31
Billard L, Diday E (2003) From the statistics of data to the statistics of knowledge: symbolic data analysis. J Am Stat Assoc 98(462):470–487. https://doi.org/10.1198/016214503000242
Billard L, Diday E (2007) Symbolic data analysis: conceptual statistics and data mining (Wiley series in computational statistics). Wiley, Hoboken
Brahim B, Makosso-Kallyth S (2013) GPCSIV: GPCSIV, generalized principal component of symbolic interval variables. R package version 0.1.0. https://CRAN.R-project.org/package=GPCSIV. Accessed 21 June 2020
Cabanes G, Bennani Y, Destenay R, Hardy A (2013) A new topological clustering algorithm for interval data. Pattern Recognit 46(11):3030–3039. https://doi.org/10.1016/j.patcog.2013.03.023
De Carvalho FAT, Brito P, Bock HH (2006) Dynamic clustering for interval data based on l2 distance. Comput Stat 21(2):231–250. https://doi.org/10.1007/s00180-006-0261-z
Diday E (2016) Thinking by classes in data science: the symbolic data analysis paradigm. Wiley Interdiscip Rev Comput Stat 8(5):172–205. https://doi.org/10.1002/wics.1384
Diday E (2018) Improving explanatory power of machine learning in the symbolic data analysis framework. In: Hernández Heredia Y, Milián Núñez V, Ruiz Shulcloper J (eds) Progress in artificial intelligence and pattern recognition. Springer, Cham, pp 3–14
Dudek A, Pelka M, Wilk J (2015) symbolicDA: analysis of symbolic data. R package version 0.4-2. https://CRAN.R-project.org/package=symbolicDA. Accessed 21 June 2020
Fagundes RAA, Souza RMCR, Cysneiros FJA (2013) Robust regression with application to symbolic interval data. Eng Appl Artif Intell 26(1):564–573. https://doi.org/10.1016/j.engappai.2012.05.004
Fagundes RAA, Souza RMCR, Cysneiros FJA (2014) Interval kernel regression. Neurocomputing 128:371–388. https://doi.org/10.1016/j.neucom.2013.08.029
Irpino A, Verde R (2008) Dynamic clustering of interval data using a Wasserstein-based distance. Pattern Recognit Lett 29(11):1648–1658. https://doi.org/10.1016/j.patrec.2008.04.008
Irpino A, Verde R, De Carvalho FAT (2014) Dynamic clustering of histogram data based on adaptive squared wasserstein distances. Expert Syst Appl 41(7):3351–3366. https://doi.org/10.1016/j.eswa.2013.12.001
Irpino A, Verde R, De Carvalho FAT (2017) Fuzzy clustering of distributional data with automatic weighting of variable components. Inf Sci 406–407:248–268. https://doi.org/10.1016/j.ins.2017.04.040
Kao CH, Nakano J, Shieh SH, Tien YJ, Wu HM, kai Yang C, houh Chen C (2014) Exploratory data analysis of interval-valued symbolic data with matrix visualization. Comput Stat Data Anal 79:14–29. https://doi.org/10.1016/j.csda.2014.04.012
Lima Neto EA, De Carvalho FAT (2008) Centre and range method for fitting a linear regression model to symbolic interval data. Comput Stat Data Anal 52(3):1500–1515. https://doi.org/10.1016/j.csda.2007.04.014
Lima Neto EA, De Carvalho FAT (2018) An exponential-type kernel robust regression model for interval-valued variables. Inf Sci 454–455:419–442. https://doi.org/10.1016/j.ins.2018.05.008
Maia A, De Carvalho FAT, Ludermir T (2008) Forecasting models for interval-valued time series. Neurocomputing 71(16–18):3344–3352. https://doi.org/10.1016/j.neucom.2008.02.022
Pimentel BA, Souza RMCR (2014) A weighted multivariate fuzzy c-means method in interval-valued scientific production data. Expert Syst Appl 41(7):3223–3236
Queiroz Filho R, Fagundes RAA (2012) ISDA.R: interval symbolic data analysis for R. R package version 1.0. https://CRAN.R-project.org/package=ISDA.R. Accessed 21 June 2020
Rojas OR, Calderon O, Zuniga R, Arce J (2015) RSDA: R to symbolic data analysis. R package version 1.3. https://CRAN.R-project.org/package=RSDA. Accessed 21 June 2020
Silva WJF, Souza RMCR, Cysneiros FJA (2019) Polygonal data analysis: a new framework in symbolic data analysis. Knowl Based Syst 163:26–35. https://doi.org/10.1016/j.knosys.2018.08.009
Souza RMCR, Queiroz DCF, Cysneiros FJA (2011) Logistic regression-based pattern classifiers for symbolic interval data. Pattern Anal Appl 14(3):273–282. https://doi.org/10.1007/s10044-011-0222-1
Teles P, Brito P (2013) Modeling interval time series with space–time processes. Commun Stat Theory Methods 44(17):3599–3627. https://doi.org/10.1080/03610926.2013.782200
Acknowledgements
This work was supported by the Brazilian sources FACEPE, Brazil, and CNPq, Brazil. The authors are grateful to the reviewers and associate editor for their helpful comments and suggestions. Funding was provided by Fundação de Amparo à Ciência e Tecnologia do Estado de Pernambuco (Grant No. IBPG-1185-1.03/16) and Conselho Nacional de Desenvolvimento Científico e Tecnológico (Grant Nos. 302767/2018-5, 304775/2018-5).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Communicated by V. Loia.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
This section details all results obtained in SAEB 2017 application from psda.
Rights and permissions
About this article
Cite this article
Silva, W.J.F., Souza, R.M.C.R. & Cysneiros, F.J.A. psda: A tool for extracting knowledge from symbolic data with an application in Brazilian educational data. Soft Comput 25, 1803–1819 (2021). https://doi.org/10.1007/s00500-020-05252-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-020-05252-5