Skip to main content

Using Regression Error Analysis and Feature Selection to Automatic Cluster Labeling

  • Conference paper
  • First Online:
  • 1785 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12981))

Abstract

Cluster Labeling Models apply Artificial Intelligence techniques to extract the key features of clustered data to provide a tool for clustering interpretation. For this purpose, we applied different techniques such as Classification, Regression, Fuzzy Logic, and Data Discretization to identify essential attributes for cluster formation and the ranges of values associated with them. This paper presents an improvement to the Regression-based Cluster Labeling Model that integrates to the model an attribute selection step based on the coefficient of determination obtained by regression models in order to make its application possible in large datasets. The model was tested on the literature datasets Iris, Breast Cancer, and Parkinson’s Disease, evaluating the labeling performance of different dimensionality. The results obtained from the experiments showed that the model is sound, providing specific labels for each cluster representing between 99% and 100% of the elements of the clusters for the datasets used.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Dua, D., Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml

  2. Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugenics 7(7), 179–188 (1936)

    Article  Google Scholar 

  3. Hair, J., Black, W., Babin, B., Anderson, R., Tatham, R.: Análise multivariada de dados - 6ed. Bookman (2009). https://books.google.com.br/books?id=oFQs_zJI2GwC

  4. Hu, H., Wen, Y., Chua, T.S., Li, X.: Toward scalable systems for big data analytics: a technology tutorial. IEEE Access 2, 652–687 (2014)

    Article  Google Scholar 

  5. Imperes Filho, F., Machado, V.P., Veras, R.M.S., Aires, K.R.T., Silva, A.M.L.: Group labeling methodology using distance-based data grouping algorithms. Revista de Informática Teórica e Aplicada 27(1), 48–61 (2020)

    Google Scholar 

  6. Lopes, L., Machado, V.P., Rabêlo, R.A.L., Fernandes, R., Lima, B.V.A.: Automatic labelling of clusters of discrete and continuous data with supervised machine learning. Knowl.-Based Syst. 106 (2016). https://doi.org/10.1016/j.knosys.2016.05.044

  7. Lopes, L.A., Machado, V.P., Rabêlo, R.A.L.: Automatic cluster labeling through artificial neural networks. In: International Joint Conference on Artificial Neural Networks (IJCNN), pp. 762–769 (2014)

    Google Scholar 

  8. Machado, V.P., Ribeiro, V.P., Rabelo, R.A.L.: Rotulacao de grupos utilizando conjuntos fuzzy. In: XII Simposio Brasileiro de Automacao Inteligente-SBAI. No. 12 (2015)

    Google Scholar 

  9. MacQueen, J.: Some methods for classfication and analysis of multivariate observations, vol. 1. University of California Press (1967)

    Google Scholar 

  10. Muller, J.M.: Elementary Functions. Springer, Berlin (2006) https://doi.org/10.1007/b137928

  11. Silva, L.E.S., Machado, V.P., Araújo, S., Lima, B.V.A., Veras, R.M.S.: Automatic cluster labeling based on regression error analysis. In: 28th International Conference on Systems, Signals and Image Processing (IWSSIP), Bratislava, Slovakia (2021)

    Google Scholar 

  12. Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, New York (1995). https://doi.org/10.1007/978-1-4757-2440-0

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lucia Emilia Soares Silva .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Silva, L.E.S., Machado, V.P., Araujo, S.S., de Lima, B.V.A., Veras, R.d.M.S. (2021). Using Regression Error Analysis and Feature Selection to Automatic Cluster Labeling. In: Marreiros, G., Melo, F.S., Lau, N., Lopes Cardoso, H., Reis, L.P. (eds) Progress in Artificial Intelligence. EPIA 2021. Lecture Notes in Computer Science(), vol 12981. Springer, Cham. https://doi.org/10.1007/978-3-030-86230-5_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86230-5_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86229-9

  • Online ISBN: 978-3-030-86230-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics