Optimization of the Numeric and Categorical Attribute Weights in KAMILA Mixed Data Clustering Algorithm

Martarelli, Nádia Junqueira; Nagano, Marcelo Seido

doi:10.1007/978-3-030-33607-3_3

Nádia Junqueira Martarelli¹⁴ &
Marcelo Seido Nagano¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11871))

Included in the following conference series:

International Conference on Intelligent Data Engineering and Automated Learning

1635 Accesses
2 Citations

Abstract

The mixed data clustering algorithms have been timidly emerging since the end of the last century. One of the last algorithms proposed for this data-type has been KAMILA (KAy-means for MIxed LArge data) algorithm. While the KAMILA has outperformed the previous mixed data algorithms results, it has some gaps. Among them is the definition of numerical and categorical variable weights, which is a user-defined parameter or, by default, equal to one for all features. Hence, we propose an optimization algorithm called Biased Random-Key Genetic Algorithm for Features Weighting (BRKGAFW) to accomplish the weighting of the numerical and categorical variables in the KAMILA algorithm. The experiment relied on six real-world mixed data sets and two baseline algorithms to perform the comparison, which are the KAMILA with default weight definition, and the KAMILA with weight definition done by the traditional genetic algorithm. The results have revealed the proposed algorithm overperformed the baseline algorithms results in all data sets.

This work was supported by the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) - Brazil under grant number 306075/2017-2 and 430137/2018-4 and by Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) - Brazil.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://archive.ics.uci.edu/ml/index.htm.

References

Ahmad, A., Khan, S.S.: Survey of state-of-the-art mixed data clustering algorithms. IEEE Access 7, 31883–31902 (2019). https://doi.org/10.1109/ACCESS.2019.2903568
Article Google Scholar
Berkhin, P.: A survey of clustering data mining techniques. In: Kogan, J., Nicholas, C., Teboulle, M. (eds.) Grouping Multidimensional Data, pp. 25–71. Springer, Heidelberg (2006). https://doi.org/10.1007/3-540-28349-8_2
Chapter Google Scholar
Foss, A., Markatou, M.: KAMILA: clustering mixed-type data in R and hadoop. J. Stat. Softw. 83(13), 1–44 (2018). https://doi.org/10.18637/jss.v083.i13
Article Google Scholar
Foss, A., Markatou, M., Ray, B., Heching, A.: A semiparametric method for clustering mixed data. Mach. Learn. 105(3), 419–458 (2016). https://doi.org/10.1007/s10994-016-5575-7
Article MathSciNet MATH Google Scholar
Framinan, J.M., Nagano, M.S.: Evaluating the performance for makespan minimisation in no-wait flowshop sequencing. J. Mater. Process. Technol. 197(1–3), 1–9 (2008). https://doi.org/10.1016/j.jmatprotec.2007.07.039
Article Google Scholar
Gonçalves, J.A., Almeida, J.F., Raimundo, J.: A hybrid genetic algorithm for assembly line balancing. J. Heuristics 8, 629–642 (2002). https://doi.org/10.1023/A:1020377910258
Article Google Scholar
Gonçalves, J.F.: A hybrid genetic algorithm-heuristic for a two-dimensional orthogonal packing problem. Eur. J. Oper. Res. 183, 1212–1229 (2007). https://doi.org/10.1016/j.ejor.2005.11.062
Article MathSciNet MATH Google Scholar
Gonçalves, J.F., Mendes, J.J.M., Resende, M.G.C.: A hybrid genetic algorithm for the job shop scheduling problem. Eur. J. Oper. Res. 167, 77–95 (2005). https://doi.org/10.1016/j.ejor.2004.03.012
Article MathSciNet MATH Google Scholar
Gonçalves, J.F., Resende, M.G.C.: Biased random-key genetic algorithms for combinatorial optimization. J. Heuristics 17, 487–525 (2011). https://doi.org/10.1007/s10732-010-9143-1
Article Google Scholar
Gonçalves, J.F., Resende, M.G.C.: A parallel multi-population genetic algorithm for a constrained two-dimensional orthogonal packing problem. J. Comb. Optim. 22, 180–201 (2011). https://doi.org/10.1007/s10878-009-9282-1
Article MathSciNet Google Scholar
Gonçalves, J.F., Resende, M.G.C., Mendes, J.J.M.: A biased random-key genetic algorithm with forward-backward improvement for the resource constrained project scheduling problem. J. Heuristics 17, 467–486 (2011). https://doi.org/10.1007/s10732-010-9142-2
Article Google Scholar
Huang, Z.: Clustering large data sets with mixed numeric and categorical values. In: The First Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 1997, Singapore, pp. 1–34 (1997)
Google Scholar
Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min. Knowl. Disc. 2(3), 283–304 (1998)
Article Google Scholar
Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall, Upper Saddle River (1988)
MATH Google Scholar
Ji, J., Bai, T., Zhou, C., Ma, C., Wang, Z.: An improved k-prototypes clustering algorithm for mixed numeric and categorical data. Neurocomputing 120(23), 590–596 (2013)
Article Google Scholar
Lichman, M.: UCI machine learning repository (2013)
Google Scholar
Saxena, A., et al.: A review of clustering techniques and developments. Neurocomputing 267, 664–681 (2017). https://doi.org/10.1016/j.neucom.2017.06.053
Article Google Scholar
Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Addison-Wesley Longman Publishing Co., Inc., Boston (2005)
Google Scholar
Wei, M., Chow, T.W.S., Chan, R.H.M.: Clustering heterogeneous data with k-means by mutual information-based unsupervised feature transformation. Entropy 17(3), 1535–1548 (2015)
Article Google Scholar
Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics Bull. 1(6), 80–83 (1945)
Article Google Scholar
Xu, R., Wunsch, D.: Clustering. Wiley-IEEE Press, Hoboken, Piscataway (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Laboratory of Applied Operational Research, University of São Paulo, São Carlos, São Paulo, 13566-590, Brazil
Nádia Junqueira Martarelli & Marcelo Seido Nagano

Authors

Nádia Junqueira Martarelli
View author publications
You can also search for this author in PubMed Google Scholar
Marcelo Seido Nagano
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nádia Junqueira Martarelli .

Editor information

Editors and Affiliations

University of Manchester, Manchester, UK
Hujun Yin
Technical University of Madrid, Madrid, Spain
David Camacho
University of Birmingham, Birmingham, UK
Peter Tino
University of Huelva, Huelva, Spain
Antonio J. Tallón-Ballesteros
University of Exeter, Exeter, UK
Ronaldo Menezes
University of Manchester, Manchester, UK
Richard Allmendinger

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Martarelli, N.J., Nagano, M.S. (2019). Optimization of the Numeric and Categorical Attribute Weights in KAMILA Mixed Data Clustering Algorithm. In: Yin, H., Camacho, D., Tino, P., Tallón-Ballesteros, A., Menezes, R., Allmendinger, R. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2019. IDEAL 2019. Lecture Notes in Computer Science(), vol 11871. Springer, Cham. https://doi.org/10.1007/978-3-030-33607-3_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-33607-3_3
Published: 18 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33606-6
Online ISBN: 978-3-030-33607-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics