Skip to main content

Advertisement

Log in

RETRACTED ARTICLE: Interblend fusing of genetic algorithm-based attribute selection for clustering heterogeneous data set

  • Focus
  • Published:
Soft Computing Aims and scope Submit manuscript

This article was retracted on 17 October 2024

This article has been updated

Abstract

Different clustering strategies to partition heterogeneous data set with numeric, binary, categorical and ordinal attributes are explored by the researchers. All the real-life applications data set is often heterogeneous in nature; if it is converted to homogeneous, then it leads to information loss. In this paper, we propose an interblend fusing of genetic algorithm-based attribute selection and increase the clustering accuracy in credit risk assessment. The proposed technique classifies the similar objects together without changing the characteristics of heterogeneous data sets. This algorithm also identifies the importance of attributes in clustering large number of objects with good many attributes. The fusing technique yields contextual distance measure for clustering the objects. The result presented in this paper provides clear interpretation of applying our methodology to the data sets. The performance of this algorithm is of the higher standard when compared to the related literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Explore related subjects

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Change history

References

  • Ahmad A, Dey L (2007) A k-mean clustering algorithm for mixed numeric and categorical data. Data Knowl Eng 63(2):503–527

    Article  Google Scholar 

  • Akeem OA, Ogunyinka TK, Abimbola BL (2012) A framework for multi media data mining in information technology environment. Int J Comput Sci Inf Secur 10(5):69–77

    Google Scholar 

  • Andritsos P et al. (2004) LIMBO: scalable clustering of categorical data. In: Proceedings of the 9th international conference on extending database technology, Springer. pp 123–146

  • Bache K, Lichman M (2013) UCI machine learning repository. http://archieve.ics.uci.edu/ml

  • Bashon Y, Neagu D, Ridley M (2013) A framework for comparing heterogeneous objects: on the similarity measurements for fuzzy, numerical and categorical attributes. Soft Comput A Fusion Found Methodol Appl 17(9):1595–1615

    Google Scholar 

  • Bie T et al. (2007) Kernel-based data fusion for gene prioritization. In: ISMB/ECCB (supplement of bioinformatics). Oxford University Press, vol 23, issuse no 13, pp 125–132

  • Chaturvedi A, Green PE, Caroll JD (2003) k-modes clustering. J Classif 18(1):35–55

    Article  MathSciNet  Google Scholar 

  • Das S (2001) Filters, wrappers and a boosting-based hybrid for feature selection In: Icml, vol 1, pp 74–81

  • Dash M et al (2005) Feature selection for clustering. Springer, Chicago

    Google Scholar 

  • Dos Santos TRL et al (2015) Categorical data clustering: What similarity measure to recommend? Expert Syst Appl 42(3):1247–1260

    Article  Google Scholar 

  • Dy J, Brodley C (2000) Feature subset selection and order identification for unsupervised learning. In: ICML, pp 247–254

  • Frank A, Asuncion A (2010) UCI machine learning repository. University of California, School of Information and Computer science. http://archieve.ics.uci.edu/ml

  • Ganti V, Gehrke J, Ramakrishnan R (1999) CACTUS- clustering categorical data using summaries. In: Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 73–83

  • Gao B et al (2005) Consistent bipartite graph co-partitioning for star structured high-order heterogeneous data co-clustering. In: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. ACM, pp 1–31

  • Guha S, Rastogi R, Shim K (2000) ROCK: a robust clustering algorithm for categorical attributes. Inf Syst 25(5):345–366

    Article  Google Scholar 

  • Hall MA (2000) Correlation-based feature selection of discrete and numeric class machine learning. In: Proceedings of the seventeenth international. Morgan Kaufmann Publishers Inc, pp 359–366

  • Harikumar S, Surya PV (2015) K-medoid clustering for heterogeneous datasets. Procedia Comput Sci 70:226–237

    Article  Google Scholar 

  • He Z, Xu X, Deng S (2002) An efficient algorithm for clustering categorical data. J Comput Sci Technol 17(5):611–624

    Article  MathSciNet  Google Scholar 

  • Huang Z (1997) A fastclustering algorithm to cluster very large categorical data sets in datamining. In: Proceedings of the SIGMOD workshop on research issues on data mining and knowledge discovery, vol 3, issuse no 8, pp 34–39

  • Huang Z (1998) Extension to the K-means algorithm for clustering large data sets with categorical values. Data Min Knowl Discov 2(3):283–304

    Article  Google Scholar 

  • Huang CL, Wang CJ, Chen MC (2007) Credit scoring with a data mining approach based on support vector machines. Expert Syst Appl 33(4):847–856

    Article  Google Scholar 

  • Karegowda AG et al (2010) Feature subset selection problem using wrapper approach in supervised learning. Int J Comput Appl 1(7):13–17

    Google Scholar 

  • Khashman A (2010) Neural networks for credit risk evaluation: investigation of different neural models and learning schemes. Expert Syst Appl 37(9):6233–6239

    Article  Google Scholar 

  • Kim Y, Street WN, Menczer F (2000) Feature selection in unsupervised learning via evolutionary search. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 365–369

  • Kohavi R, Sommerfield D (1995) Feature subset selection using the wrapper method: overfitting and dynamic search space topology. In: Proceedings of the first international conference on knowledge discovery and data mining. KDD, pp 192–197

  • Li C, Biswas G (2002) Unsupervised learning with mixed numeric and nominal data. IEEE Trans Knowl Data Eng 4:673–690

    Article  Google Scholar 

  • Liu H, Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 14(4):491–502

    Google Scholar 

  • Liu H et al (1998) Feature extraction, construction and selection: a data mining perspective, vol 453. Springer, Berlin, pp 50–62

    Book  Google Scholar 

  • Manjunath TN, Hegadi RS, Ravikumar GK (2010) A survey on multimedia data mining and its relevance today. Int J Comput Sci Inf Secur 10:165–170

    Google Scholar 

  • Mojahed A et al (2015) Applying clustering analysis to heterogeneous data using similarity matrix fusion (smf). In: International workshop on machine learning and data mining in pattern recognition. Springer, pp 251–265

  • Naija Y et al (2008) Extension of partitional clustering methods for handling mixed data . In: IEEE international conference on data mining workshops. IEEE, pp 257–266

  • Oreski S, Oreski G (2013) Genetic algorithm-based heuristic for feature selection in credit risk assessment. Expert Syst Appl 41(4):2052–2064

    Article  Google Scholar 

  • Pyle D (1999) Data preparation for data mining (The Morgan Kaufmann Series in data management systems), vol 3. Morgan Kaufmann Publishers, San Francisco

    Google Scholar 

  • Rastogi R, Mondal P et al (2015) GA based clustering of mixed data type of attributes—numeric, categorical, ordinal, binary and ratio scaled. Bharati Vidyapeeth’s Institute of Computer Applications and Management (BVICAM’s) Int J Inf Technol 7(2):861–865

    Google Scholar 

  • Refaeilzadeh P, Tang L, Liu H (2007) On comparison of feature selection algorithms. In: Proceedings of AAAI workshop on evaluation methods for machine learning II, vol 3, p 5

  • Shi et al (2007) L2-norm multiple kernel learning and its application to biomedical data fusion. BMC Bioinform 11(1):309–332

    Google Scholar 

  • Smys S, Bala GJ (2012) Performance analysis of virtual clusters in personal communication networks. Soft Comput 15(3):211–222

    Google Scholar 

  • Tan F et al (2008) A genetic algorithm-based method for feature subset selection. Soft Comput 12(2):111–120

    Article  Google Scholar 

  • Wang S et al (2009) Empirical analysis of support vector machine ensemble classifiers. Expert Syst Appl 36(3):6466–6476

    Article  Google Scholar 

  • Wilson DR, Martinez TR (1997) Improved heterogeneous distance function. J Artif Intell Res 6:1–34

    Article  MathSciNet  Google Scholar 

  • Xing EP, Jordan MI, Karp RM (2001) Feature selection for high-dimensional genomic microarray data. In: ICML, vol 1, pp 601-608

  • Yu L, Liu H (2003) Feature selection for high-dimensional data: a fast correlation-based filter solution. In: Proceedings of the 20th international conference on machine learning (ICML-03), pp 856–863

  • Zaki MJ, Peters M (2005) CLICK:mining subspace clusters in categorical data via k partite maximal cliques. In: 21st international conference on data engineering. IEEE, pp 355-356

  • Zhang T, Ramakishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large databases. ACM Sigmod Rec 25:103–114

    Article  Google Scholar 

  • Zhuo L et al (2008) A genetic algorithm based wrapper feature selection method for classification of hyperspectral images using support vector machine. In: Geoinformatics 2008 and joint conference on gis and built environment: classification of remote sensing images. International Society for Optics and Photonics, vol 7147, p 71471

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to J. Dhayanithi.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards.

Informed consent

Informed consent was obtained from all individual participants included in the study.

Additional information

Communicated by P. Pandian.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article has been retracted. Please see the retraction notice for more detail: https://doi.org/10.1007/s00500-024-10194-3

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dhayanithi, J., Akilandeswari, J. RETRACTED ARTICLE: Interblend fusing of genetic algorithm-based attribute selection for clustering heterogeneous data set. Soft Comput 23, 2747–2759 (2019). https://doi.org/10.1007/s00500-018-3669-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-018-3669-9

Keywords