Abstract
Clustering, a famous technique in data analysis and data mining, attempts to find valuable patterns in datasets. In this technique, a set of alternatives is partitioned into logical groups which are called clusters. The partitioning is based on some predefined attributes to find clusters in which their alternatives are similar to each other comparing to other clusters. In conventional methods, the similarity is usually defined by a distance-based measurement, whereas in this study, we have proposed a new multi-attribute preference disaggregation method called DISclustering in which a new measurement named global utility is introduced for cluster similarity. In DISclustering, the global utility of each alternative is calculated through a feed-forward neural network in which its parameters are determined using SA algorithm. Each alternative is assigned to a cluster based on comparing the obtained global utility with cluster boundaries, called utility thresholds; aim to minimize the intra-cluster distances (ICD). For this purpose, all utility thresholds are estimated using PSO algorithm. The performance of the proposed method is compared with 18 clustering algorithms on 14 real datasets based on F-measure and object function values (ICD values using intra-cluster or Gower distances). The experimental results and hypothesis statistical test indicate that DISclustering algorithm significantly improved clustering results on F-measure criteria in which outperforms in almost 13 compared algorithms out of 18. Note that, DISclustering calculates cluster centroid in a different way comparing to other algorithms. Hence, its ICD values are less eligible to perform a fair comparison.







Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Abualigah LMQ, Hanandeh ES (2015) Applying genetic algorithms to information retrieval using vector space model. Int J Comput Sci Eng Appl 5(1):19
Abualigah LM, Khader AT (2017) Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering. J Supercomput 73(11):4773–4795
Abualigah LM, Khader AT, Al-Betar MA (2016) Unsupervised feature selection technique based on genetic algorithm for improving the text clustering. In: 2016 7th international conference on computer science and information technology (CSIT), 2016, pp 1–6
Abualigah LM, Khader AT, Al-Betar MA, Alomari OA (2017a) Text feature selection with a robust weight scheme and dynamic dimension reduction to text document clustering. Expert Syst Appl 84:24–36
Abualigah LM, Khader AT, Hanandeh ES, Gandomi AH (2017b) A novel hybridization strategy for krill herd algorithm applied to clustering techniques. Appl Soft Comput 60:423–435
Abualigah LM, Khader AT, Al-Betar MA, Hanandeh ES (2017c) A new hybridization strategy for krill herd algorithm and harmony search algorithm applied to improve the data clustering. Management 9(11)
Abualigah LM, Khader AT, Hanandeh ES (2018a) A novel weighting scheme applied to improve the text document clustering techniques. In: Innovative computing, optimization and its applications. Springer, pp 305–320
Abualigah LM, Khader AT, Hanandeh ES (2018b) A new feature selection method to improve the document clustering using particle swarm optimization algorithm. J Comput Sci 25:456–466
Abualigah LM, Khader AT, Hanandeh ES (2018c) A combination of objective functions and hybrid Krill herd algorithm for text document clustering analysis. Eng Appl Artif Intell 73:111–125
Abualigah LM, Khader AT, Hanandeh ES (2018d) Hybrid clustering analysis using improved krill herd algorithm. Appl Intell 48(11):4047–4071
Abualigah LM, Khader AT, Hanandeh ES (2018e) A hybrid strategy for krill herd algorithm with harmony search algorithm to improve the data clustering? Intell Decis Technol 1–12 (preprint)
Aggarwal CC, Reddy CK (2013) Data clustering: algorithms and applications, vol 2. Chapman and Hall, Boca Raton
Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications, no. 2, vol 27. ACM, New York
Bishop CM (2006) Pattern recognition and machine learning. Springer, Berlin
Chatterjee GSS, Zhang A (1998) WaveCluster: a multi-resolution clustering approach for very large spatial databases. In: VLDB’98 proceedings of the 24rd international conference on very large data bases. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1998, pp 428–439
Clerc M, Kennedy J (2002) The particle swarm-explosion, stability, and convergence in a multidimensional complex space. IEEE Trans Evol Comput 6(1):58–73
Dalli A (2003) Adaptation of the F-measure to cluster based lexicon quality evaluation. In: Proceedings of the EACL 2003 workshop on evaluation initiatives in natural language processing: are evaluation methods, metrics and resources reusable?, 2003, pp 51–56
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39(1):1–22
Devaud JM, Groussaud G, Jacquet-Lagreze E (1980) UTADIS: Une méthode de construction de fonctions d’utilité additives rendant compte de jugements globaux. European Working Group Multicriteria Decision Aid, Bochum, p 94
Esmaelian M, Shahmoradi H, Vali M (2016) A novel classification method: a hybrid approach based on extension of the UTADIS with polynomial and PSO-GA algorithm. Appl Soft Comput 49:56–70
Esmaelian M, Shahmoradi H, Nemati F (2017) P-UTADIS: a multi criteria classification method. In: Nassiri-Mofakham F (ed) Current and future developments in artificial intelligence. Bentham Science Publishers, Sharjah, pp 213–266
Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. KDD 96(34):226–231
Fan C-Y, Fan P-S, Chan T-Y, Chang S-H (2012) Using hybrid data mining and machine learning clustering analysis to predict the turnover rate for technology professionals. Expert Syst Appl 39(10):8844–8851
Figueira J, Greco S, Ehrgott M (2005) Multiple criteria decision analysis: state of the art surveys, vol 78. Springer, Berlin
Gower JC (1971) A general coefficient of similarity and some of its properties. Biometrics 27:857–871
Grigoras G, Scarlatache F (2015) An assessment of the renewable energy potential using a clustering based data mining method. Case study in Romania. Energy 81:416–429
Handl J, Knowles J, Dorigo M (2003) Ant-based clustering: a comparative study of its relative performance with respect to k-means, average link and id-som. In: Proceedings of the third international conference on hybrid intelligent systems. IOS Press
Hinneburg A, Keim DA (1998) An efficient approach to clustering in large multimedia databases with noise. KDD 98:58–65
Hinneburg A, Aggarwal CC, Keim DA (2000) What is the nearest neighbor in high dimensional spaces? In: 26th International conference on very large databases, 2000, pp 506–515
Hu G, Zhou S, Guan J, Hu X (2008) Towards effective document clustering: a constrained K-means based approach. Inf. Process. Manag. 44(4):1397–1409
Huang G, Liu T, Yang Y, Lin Z, Song S, Wu C (2015) Discriminative clustering via extreme learning machine. Neural Netw 70:1–8
Iván G, Grolmusz V (2014) On dimension reduction of clustering results in structural bioinformatics. Biochim Biophys Acta (BBA)-Proteins Proteom 1844(12):2277–2283
Jacquet-Lagrèze E (1995) An application of the UTA discriminant model for the evaluation of R & D projects. In: Advances in multicriteria analysis. Springer, pp 203–211
Jacquet-Lagreze E, Siskos J (1982) Assessing a set of additive utility functions for multicriteria decision-making, the UTA method. Eur J Oper Res 10(2):151–164
Jain AK (2010) Data clustering: 50 years beyond K-means. Pattern Recognit Lett 31(8):651–666
Kargari M, Sepehri MM (2012) Stores clustering using a data mining approach for distributing automotive spare-parts to reduce transportation costs. Expert Syst Appl 39(5):4740–4748
Kerr G, Ruskin HJ, Crane M, Doolan P (2008) Techniques for clustering gene expression data. Comput Biol Med 38(3):283–293
King B (1967) Step-wise clustering procedures. J Am Stat Assoc 62(317):86–101
Li Y, Chung SM, Holt JD (2008) Text document clustering based on frequent word meaning sequences. Data Knowl Eng 64(1):381–404
Liu D, Jiang M, Yang X, Li H (2016) Analyzing documents with quantum clustering: a novel pattern recognition algorithm based on quantum mechanics. Pattern Recognit. Lett. 77:8–13
Lloyd S (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129–137
MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, 1967, vol 1, no 14, pp 281–297
McQuitty LL (1957) Elementary linkage analysis for isolating orthogonal and oblique types and typal relevancies. Educ Psychol Meas 17(2):207–229
Melin P, Castillo O (2014) A review on type-2 fuzzy logic applications in clustering, classification and pattern recognition. Appl Soft Comput 21:568–577
Mirkin B (2012) Clustering: a data recovery approach, vol 19. Chapman and Hall, Boca Raton
Moon TK (1996) The expectation-maximization algorithm. IEEE Signal Process Mag 13(6):47–60
Murphy KP (2012) Machine learning: a probabilistic perspective. MIT press, Cambridge
Peng Y, Zheng W-L, Lu B-L (2016) An unsupervised discriminative extreme learning machine and its applications to data clustering. Neurocomputing 174:250–264
Rokach L, Maimon O (2005) Clustering methods. In: Data mining and knowledge discovery handbook. Springer, pp 321–352
Schikuta E (1996) Grid-clustering: an efficient hierarchical clustering method for very large data sets. In: Proceedings of 13th international conference on pattern recognition, 1996, vol 2, pp 101–105
Shi Y (2001) Particle swarm optimization: developments, applications and resources. In: Proceedings of the 2001 congress on evolutionary computation (IEEE Cat. No. 01TH8546), 2001, vol 1, pp 81–86
Taguchi G (1990) Introduction to quality engineering, Tokyo. Asian Product Organ
Van Laarhoven PJM, Aarts EHL (1987) Simulated annealing. In: Simulated annealing: theory and applications. Springer, pp 7–15
Walpole RE (1982) Introduction to statistics
Walpole RE, Myers RH, Myers SL, Ye K (2011) Probability and statistics for engineers and scientists, 9th edn. Pearson, London
Wang W, Yang J, Muntz R (1997) STING: a statistical information grid approach to spatial data mining. VLDB 97:186–195
Wangchamhan T, Chiewchanwattana S, Sunat K (2017) Efficient algorithms based on the k-means and chaotic league championship algorithm for numeric, categorical, and mixed-type data clustering. Expert Syst Appl 90:146–167
Warnekar CS, Krishna G (1979) A heuristic clustering algorithm using union of overlapping pattern-cells. Pattern Recognit 11(2):85–93
Zahn CT (1970) Graph theoretical methods for detecting and describing gestalt clusters. IEEE Trans Comput 20(SLAC-PUB-0672-REV):68
Zell A (1994) Simulation neuronaler netze, no. 5.3, vol 1. Addison-Wesley, Bonn
Zhao L, Yang Y (2009) PSO-based single multiplicative neuron model for time series prediction. Expert Syst Appl 36(2):2805–2812
Zopounidis C, Doumpos M (2002) Multicriteria classification and sorting methods: a literature review. Eur J Oper Res 138(2):229–246
Acknowledgements
The authors would like to thank referees for their helpful comments.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there is no conflict of interests regarding the publication of this paper.
Ethical approval
This article does not contain any studies with human participants or animals performed by the author.
Informed consent
Informed consent was obtained from all individual participants included in the study.
Additional information
Communicated by V. Loia.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Esmaelian, M., Shahmoradi, H. & Nemati, F. A new preference disaggregation method for clustering problem: DISclustering. Soft Comput 24, 4483–4503 (2020). https://doi.org/10.1007/s00500-019-04210-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-019-04210-0