RETRACTED ARTICLE: Interblend fusing of genetic algorithm-based attribute selection for clustering heterogeneous data set

Dhayanithi, J.; Akilandeswari, J.

doi:10.1007/s00500-018-3669-9

RETRACTED ARTICLE: Interblend fusing of genetic algorithm-based attribute selection for clustering heterogeneous data set

Focus
Published: 03 December 2018

Volume 23, pages 2747–2759, (2019)
Cite this article

Soft Computing Aims and scope Submit manuscript

J. Dhayanithi¹ &
J. Akilandeswari¹

193 Accesses
11 Citations
Explore all metrics

This article was retracted on 17 October 2024

This article has been updated

Abstract

Different clustering strategies to partition heterogeneous data set with numeric, binary, categorical and ordinal attributes are explored by the researchers. All the real-life applications data set is often heterogeneous in nature; if it is converted to homogeneous, then it leads to information loss. In this paper, we propose an interblend fusing of genetic algorithm-based attribute selection and increase the clustering accuracy in credit risk assessment. The proposed technique classifies the similar objects together without changing the characteristics of heterogeneous data sets. This algorithm also identifies the importance of attributes in clustering large number of objects with good many attributes. The fusing technique yields contextual distance measure for clustering the objects. The result presented in this paper provides clear interpretation of applying our methodology to the data sets. The performance of this algorithm is of the higher standard when compared to the related literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

K-Mean Clustering Algorithm Approach for Data Mining of Heterogeneous Data

A Multi Criteria Document Clustering Approach Using Genetic Algorithm

Evolutionary computing based hybrid bisecting clustering algorithm for multidimensional data

Article 01 February 2019

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Change history

17 October 2024
This article has been retracted. Please see the Retraction Notice for more detail: https://doi.org/10.1007/s00500-024-10194-3

References

Ahmad A, Dey L (2007) A k-mean clustering algorithm for mixed numeric and categorical data. Data Knowl Eng 63(2):503–527
Article Google Scholar
Akeem OA, Ogunyinka TK, Abimbola BL (2012) A framework for multi media data mining in information technology environment. Int J Comput Sci Inf Secur 10(5):69–77
Google Scholar
Andritsos P et al. (2004) LIMBO: scalable clustering of categorical data. In: Proceedings of the 9th international conference on extending database technology, Springer. pp 123–146
Bache K, Lichman M (2013) UCI machine learning repository. http://archieve.ics.uci.edu/ml
Bashon Y, Neagu D, Ridley M (2013) A framework for comparing heterogeneous objects: on the similarity measurements for fuzzy, numerical and categorical attributes. Soft Comput A Fusion Found Methodol Appl 17(9):1595–1615
Google Scholar
Bie T et al. (2007) Kernel-based data fusion for gene prioritization. In: ISMB/ECCB (supplement of bioinformatics). Oxford University Press, vol 23, issuse no 13, pp 125–132
Chaturvedi A, Green PE, Caroll JD (2003) k-modes clustering. J Classif 18(1):35–55
Article MathSciNet Google Scholar
Das S (2001) Filters, wrappers and a boosting-based hybrid for feature selection In: Icml, vol 1, pp 74–81
Dash M et al (2005) Feature selection for clustering. Springer, Chicago
Google Scholar
Dos Santos TRL et al (2015) Categorical data clustering: What similarity measure to recommend? Expert Syst Appl 42(3):1247–1260
Article Google Scholar
Dy J, Brodley C (2000) Feature subset selection and order identification for unsupervised learning. In: ICML, pp 247–254
Frank A, Asuncion A (2010) UCI machine learning repository. University of California, School of Information and Computer science. http://archieve.ics.uci.edu/ml
Ganti V, Gehrke J, Ramakrishnan R (1999) CACTUS- clustering categorical data using summaries. In: Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 73–83
Gao B et al (2005) Consistent bipartite graph co-partitioning for star structured high-order heterogeneous data co-clustering. In: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. ACM, pp 1–31
Guha S, Rastogi R, Shim K (2000) ROCK: a robust clustering algorithm for categorical attributes. Inf Syst 25(5):345–366
Article Google Scholar
Hall MA (2000) Correlation-based feature selection of discrete and numeric class machine learning. In: Proceedings of the seventeenth international. Morgan Kaufmann Publishers Inc, pp 359–366
Harikumar S, Surya PV (2015) K-medoid clustering for heterogeneous datasets. Procedia Comput Sci 70:226–237
Article Google Scholar
He Z, Xu X, Deng S (2002) An efficient algorithm for clustering categorical data. J Comput Sci Technol 17(5):611–624
Article MathSciNet Google Scholar
Huang Z (1997) A fastclustering algorithm to cluster very large categorical data sets in datamining. In: Proceedings of the SIGMOD workshop on research issues on data mining and knowledge discovery, vol 3, issuse no 8, pp 34–39
Huang Z (1998) Extension to the K-means algorithm for clustering large data sets with categorical values. Data Min Knowl Discov 2(3):283–304
Article Google Scholar
Huang CL, Wang CJ, Chen MC (2007) Credit scoring with a data mining approach based on support vector machines. Expert Syst Appl 33(4):847–856
Article Google Scholar
Karegowda AG et al (2010) Feature subset selection problem using wrapper approach in supervised learning. Int J Comput Appl 1(7):13–17
Google Scholar
Khashman A (2010) Neural networks for credit risk evaluation: investigation of different neural models and learning schemes. Expert Syst Appl 37(9):6233–6239
Article Google Scholar
Kim Y, Street WN, Menczer F (2000) Feature selection in unsupervised learning via evolutionary search. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 365–369
Kohavi R, Sommerfield D (1995) Feature subset selection using the wrapper method: overfitting and dynamic search space topology. In: Proceedings of the first international conference on knowledge discovery and data mining. KDD, pp 192–197
Li C, Biswas G (2002) Unsupervised learning with mixed numeric and nominal data. IEEE Trans Knowl Data Eng 4:673–690
Article Google Scholar
Liu H, Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 14(4):491–502
Google Scholar
Liu H et al (1998) Feature extraction, construction and selection: a data mining perspective, vol 453. Springer, Berlin, pp 50–62
Book Google Scholar
Manjunath TN, Hegadi RS, Ravikumar GK (2010) A survey on multimedia data mining and its relevance today. Int J Comput Sci Inf Secur 10:165–170
Google Scholar
Mojahed A et al (2015) Applying clustering analysis to heterogeneous data using similarity matrix fusion (smf). In: International workshop on machine learning and data mining in pattern recognition. Springer, pp 251–265
Naija Y et al (2008) Extension of partitional clustering methods for handling mixed data . In: IEEE international conference on data mining workshops. IEEE, pp 257–266
Oreski S, Oreski G (2013) Genetic algorithm-based heuristic for feature selection in credit risk assessment. Expert Syst Appl 41(4):2052–2064
Article Google Scholar
Pyle D (1999) Data preparation for data mining (The Morgan Kaufmann Series in data management systems), vol 3. Morgan Kaufmann Publishers, San Francisco
Google Scholar
Rastogi R, Mondal P et al (2015) GA based clustering of mixed data type of attributes—numeric, categorical, ordinal, binary and ratio scaled. Bharati Vidyapeeth’s Institute of Computer Applications and Management (BVICAM’s) Int J Inf Technol 7(2):861–865
Google Scholar
Refaeilzadeh P, Tang L, Liu H (2007) On comparison of feature selection algorithms. In: Proceedings of AAAI workshop on evaluation methods for machine learning II, vol 3, p 5
Shi et al (2007) L2-norm multiple kernel learning and its application to biomedical data fusion. BMC Bioinform 11(1):309–332
Google Scholar
Smys S, Bala GJ (2012) Performance analysis of virtual clusters in personal communication networks. Soft Comput 15(3):211–222
Google Scholar
Tan F et al (2008) A genetic algorithm-based method for feature subset selection. Soft Comput 12(2):111–120
Article Google Scholar
Wang S et al (2009) Empirical analysis of support vector machine ensemble classifiers. Expert Syst Appl 36(3):6466–6476
Article Google Scholar
Wilson DR, Martinez TR (1997) Improved heterogeneous distance function. J Artif Intell Res 6:1–34
Article MathSciNet Google Scholar
Xing EP, Jordan MI, Karp RM (2001) Feature selection for high-dimensional genomic microarray data. In: ICML, vol 1, pp 601-608
Yu L, Liu H (2003) Feature selection for high-dimensional data: a fast correlation-based filter solution. In: Proceedings of the 20th international conference on machine learning (ICML-03), pp 856–863
Zaki MJ, Peters M (2005) CLICK:mining subspace clusters in categorical data via k partite maximal cliques. In: 21st international conference on data engineering. IEEE, pp 355-356
Zhang T, Ramakishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large databases. ACM Sigmod Rec 25:103–114
Article Google Scholar
Zhuo L et al (2008) A genetic algorithm based wrapper feature selection method for classification of hyperspectral images using support vector machine. In: Geoinformatics 2008 and joint conference on gis and built environment: classification of remote sensing images. International Society for Optics and Photonics, vol 7147, p 71471

Download references

Author information

Authors and Affiliations

Sona College of Technology, Salem, Tamilnadu, India
J. Dhayanithi & J. Akilandeswari

Authors

J. Dhayanithi
View author publications
Search author on:PubMed Google Scholar
J. Akilandeswari
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to J. Dhayanithi.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards.

Informed consent

Informed consent was obtained from all individual participants included in the study.

Additional information

Communicated by P. Pandian.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article has been retracted. Please see the retraction notice for more detail: https://doi.org/10.1007/s00500-024-10194-3

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

About this article

Cite this article

Dhayanithi, J., Akilandeswari, J. RETRACTED ARTICLE: Interblend fusing of genetic algorithm-based attribute selection for clustering heterogeneous data set. Soft Comput 23, 2747–2759 (2019). https://doi.org/10.1007/s00500-018-3669-9

Download citation

Published: 03 December 2018
Issue Date: 01 April 2019
DOI: https://doi.org/10.1007/s00500-018-3669-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

RETRACTED ARTICLE: Interblend fusing of genetic algorithm-based attribute selection for clustering heterogeneous data set

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

K-Mean Clustering Algorithm Approach for Data Mining of Heterogeneous Data

A Multi Criteria Document Clustering Approach Using Genetic Algorithm

Evolutionary computing based hybrid bisecting clustering algorithm for multidimensional data

Explore related subjects

Change history

17 October 2024

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now