Abstract
Cluster Labeling Models apply Artificial Intelligence techniques to extract the key features of clustered data to provide a tool for clustering interpretation. For this purpose, we applied different techniques such as Classification, Regression, Fuzzy Logic, and Data Discretization to identify essential attributes for cluster formation and the ranges of values associated with them. This paper presents an improvement to the Regression-based Cluster Labeling Model that integrates to the model an attribute selection step based on the coefficient of determination obtained by regression models in order to make its application possible in large datasets. The model was tested on the literature datasets Iris, Breast Cancer, and Parkinson’s Disease, evaluating the labeling performance of different dimensionality. The results obtained from the experiments showed that the model is sound, providing specific labels for each cluster representing between 99% and 100% of the elements of the clusters for the datasets used.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Dua, D., Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugenics 7(7), 179–188 (1936)
Hair, J., Black, W., Babin, B., Anderson, R., Tatham, R.: Análise multivariada de dados - 6ed. Bookman (2009). https://books.google.com.br/books?id=oFQs_zJI2GwC
Hu, H., Wen, Y., Chua, T.S., Li, X.: Toward scalable systems for big data analytics: a technology tutorial. IEEE Access 2, 652–687 (2014)
Imperes Filho, F., Machado, V.P., Veras, R.M.S., Aires, K.R.T., Silva, A.M.L.: Group labeling methodology using distance-based data grouping algorithms. Revista de Informática Teórica e Aplicada 27(1), 48–61 (2020)
Lopes, L., Machado, V.P., Rabêlo, R.A.L., Fernandes, R., Lima, B.V.A.: Automatic labelling of clusters of discrete and continuous data with supervised machine learning. Knowl.-Based Syst. 106 (2016). https://doi.org/10.1016/j.knosys.2016.05.044
Lopes, L.A., Machado, V.P., Rabêlo, R.A.L.: Automatic cluster labeling through artificial neural networks. In: International Joint Conference on Artificial Neural Networks (IJCNN), pp. 762–769 (2014)
Machado, V.P., Ribeiro, V.P., Rabelo, R.A.L.: Rotulacao de grupos utilizando conjuntos fuzzy. In: XII Simposio Brasileiro de Automacao Inteligente-SBAI. No. 12 (2015)
MacQueen, J.: Some methods for classfication and analysis of multivariate observations, vol. 1. University of California Press (1967)
Muller, J.M.: Elementary Functions. Springer, Berlin (2006) https://doi.org/10.1007/b137928
Silva, L.E.S., Machado, V.P., Araújo, S., Lima, B.V.A., Veras, R.M.S.: Automatic cluster labeling based on regression error analysis. In: 28th International Conference on Systems, Signals and Image Processing (IWSSIP), Bratislava, Slovakia (2021)
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, New York (1995). https://doi.org/10.1007/978-1-4757-2440-0
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Silva, L.E.S., Machado, V.P., Araujo, S.S., de Lima, B.V.A., Veras, R.d.M.S. (2021). Using Regression Error Analysis and Feature Selection to Automatic Cluster Labeling. In: Marreiros, G., Melo, F.S., Lau, N., Lopes Cardoso, H., Reis, L.P. (eds) Progress in Artificial Intelligence. EPIA 2021. Lecture Notes in Computer Science(), vol 12981. Springer, Cham. https://doi.org/10.1007/978-3-030-86230-5_30
Download citation
DOI: https://doi.org/10.1007/978-3-030-86230-5_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86229-9
Online ISBN: 978-3-030-86230-5
eBook Packages: Computer ScienceComputer Science (R0)