Skip to main content

Advertisement

Log in

A new preference disaggregation method for clustering problem: DISclustering

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Clustering, a famous technique in data analysis and data mining, attempts to find valuable patterns in datasets. In this technique, a set of alternatives is partitioned into logical groups which are called clusters. The partitioning is based on some predefined attributes to find clusters in which their alternatives are similar to each other comparing to other clusters. In conventional methods, the similarity is usually defined by a distance-based measurement, whereas in this study, we have proposed a new multi-attribute preference disaggregation method called DISclustering in which a new measurement named global utility is introduced for cluster similarity. In DISclustering, the global utility of each alternative is calculated through a feed-forward neural network in which its parameters are determined using SA algorithm. Each alternative is assigned to a cluster based on comparing the obtained global utility with cluster boundaries, called utility thresholds; aim to minimize the intra-cluster distances (ICD). For this purpose, all utility thresholds are estimated using PSO algorithm. The performance of the proposed method is compared with 18 clustering algorithms on 14 real datasets based on F-measure and object function values (ICD values using intra-cluster or Gower distances). The experimental results and hypothesis statistical test indicate that DISclustering algorithm significantly improved clustering results on F-measure criteria in which outperforms in almost 13 compared algorithms out of 18. Note that, DISclustering calculates cluster centroid in a different way comparing to other algorithms. Hence, its ICD values are less eligible to perform a fair comparison.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Abualigah LMQ, Hanandeh ES (2015) Applying genetic algorithms to information retrieval using vector space model. Int J Comput Sci Eng Appl 5(1):19

    Google Scholar 

  • Abualigah LM, Khader AT (2017) Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering. J Supercomput 73(11):4773–4795

    Google Scholar 

  • Abualigah LM, Khader AT, Al-Betar MA (2016) Unsupervised feature selection technique based on genetic algorithm for improving the text clustering. In: 2016 7th international conference on computer science and information technology (CSIT), 2016, pp 1–6

  • Abualigah LM, Khader AT, Al-Betar MA, Alomari OA (2017a) Text feature selection with a robust weight scheme and dynamic dimension reduction to text document clustering. Expert Syst Appl 84:24–36

    Google Scholar 

  • Abualigah LM, Khader AT, Hanandeh ES, Gandomi AH (2017b) A novel hybridization strategy for krill herd algorithm applied to clustering techniques. Appl Soft Comput 60:423–435

    Google Scholar 

  • Abualigah LM, Khader AT, Al-Betar MA, Hanandeh ES (2017c) A new hybridization strategy for krill herd algorithm and harmony search algorithm applied to improve the data clustering. Management 9(11)

  • Abualigah LM, Khader AT, Hanandeh ES (2018a) A novel weighting scheme applied to improve the text document clustering techniques. In: Innovative computing, optimization and its applications. Springer, pp 305–320

  • Abualigah LM, Khader AT, Hanandeh ES (2018b) A new feature selection method to improve the document clustering using particle swarm optimization algorithm. J Comput Sci 25:456–466

    Google Scholar 

  • Abualigah LM, Khader AT, Hanandeh ES (2018c) A combination of objective functions and hybrid Krill herd algorithm for text document clustering analysis. Eng Appl Artif Intell 73:111–125

    Google Scholar 

  • Abualigah LM, Khader AT, Hanandeh ES (2018d) Hybrid clustering analysis using improved krill herd algorithm. Appl Intell 48(11):4047–4071

    Google Scholar 

  • Abualigah LM, Khader AT, Hanandeh ES (2018e) A hybrid strategy for krill herd algorithm with harmony search algorithm to improve the data clustering? Intell Decis Technol 1–12 (preprint)

  • Aggarwal CC, Reddy CK (2013) Data clustering: algorithms and applications, vol 2. Chapman and Hall, Boca Raton

    MATH  Google Scholar 

  • Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications, no. 2, vol 27. ACM, New York

    Google Scholar 

  • Bishop CM (2006) Pattern recognition and machine learning. Springer, Berlin

    MATH  Google Scholar 

  • Chatterjee GSS, Zhang A (1998) WaveCluster: a multi-resolution clustering approach for very large spatial databases. In: VLDB’98 proceedings of the 24rd international conference on very large data bases. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1998, pp 428–439

  • Clerc M, Kennedy J (2002) The particle swarm-explosion, stability, and convergence in a multidimensional complex space. IEEE Trans Evol Comput 6(1):58–73

    Google Scholar 

  • Dalli A (2003) Adaptation of the F-measure to cluster based lexicon quality evaluation. In: Proceedings of the EACL 2003 workshop on evaluation initiatives in natural language processing: are evaluation methods, metrics and resources reusable?, 2003, pp 51–56

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39(1):1–22

    MathSciNet  MATH  Google Scholar 

  • Devaud JM, Groussaud G, Jacquet-Lagreze E (1980) UTADIS: Une méthode de construction de fonctions d’utilité additives rendant compte de jugements globaux. European Working Group Multicriteria Decision Aid, Bochum, p 94

  • Esmaelian M, Shahmoradi H, Vali M (2016) A novel classification method: a hybrid approach based on extension of the UTADIS with polynomial and PSO-GA algorithm. Appl Soft Comput 49:56–70

    Google Scholar 

  • Esmaelian M, Shahmoradi H, Nemati F (2017) P-UTADIS: a multi criteria classification method. In: Nassiri-Mofakham F (ed) Current and future developments in artificial intelligence. Bentham Science Publishers, Sharjah, pp 213–266

    Google Scholar 

  • Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. KDD 96(34):226–231

    Google Scholar 

  • Fan C-Y, Fan P-S, Chan T-Y, Chang S-H (2012) Using hybrid data mining and machine learning clustering analysis to predict the turnover rate for technology professionals. Expert Syst Appl 39(10):8844–8851

    Google Scholar 

  • Figueira J, Greco S, Ehrgott M (2005) Multiple criteria decision analysis: state of the art surveys, vol 78. Springer, Berlin

    MATH  Google Scholar 

  • Gower JC (1971) A general coefficient of similarity and some of its properties. Biometrics 27:857–871

    Google Scholar 

  • Grigoras G, Scarlatache F (2015) An assessment of the renewable energy potential using a clustering based data mining method. Case study in Romania. Energy 81:416–429

    Google Scholar 

  • Handl J, Knowles J, Dorigo M (2003) Ant-based clustering: a comparative study of its relative performance with respect to k-means, average link and id-som. In: Proceedings of the third international conference on hybrid intelligent systems. IOS Press

  • Hinneburg A, Keim DA (1998) An efficient approach to clustering in large multimedia databases with noise. KDD 98:58–65

    Google Scholar 

  • Hinneburg A, Aggarwal CC, Keim DA (2000) What is the nearest neighbor in high dimensional spaces? In: 26th International conference on very large databases, 2000, pp 506–515

  • Hu G, Zhou S, Guan J, Hu X (2008) Towards effective document clustering: a constrained K-means based approach. Inf. Process. Manag. 44(4):1397–1409

    Google Scholar 

  • Huang G, Liu T, Yang Y, Lin Z, Song S, Wu C (2015) Discriminative clustering via extreme learning machine. Neural Netw 70:1–8

    MATH  Google Scholar 

  • Iván G, Grolmusz V (2014) On dimension reduction of clustering results in structural bioinformatics. Biochim Biophys Acta (BBA)-Proteins Proteom 1844(12):2277–2283

    Google Scholar 

  • Jacquet-Lagrèze E (1995) An application of the UTA discriminant model for the evaluation of R & D projects. In: Advances in multicriteria analysis. Springer, pp 203–211

  • Jacquet-Lagreze E, Siskos J (1982) Assessing a set of additive utility functions for multicriteria decision-making, the UTA method. Eur J Oper Res 10(2):151–164

    MATH  Google Scholar 

  • Jain AK (2010) Data clustering: 50 years beyond K-means. Pattern Recognit Lett 31(8):651–666

    Google Scholar 

  • Kargari M, Sepehri MM (2012) Stores clustering using a data mining approach for distributing automotive spare-parts to reduce transportation costs. Expert Syst Appl 39(5):4740–4748

    Google Scholar 

  • Kerr G, Ruskin HJ, Crane M, Doolan P (2008) Techniques for clustering gene expression data. Comput Biol Med 38(3):283–293

    Google Scholar 

  • King B (1967) Step-wise clustering procedures. J Am Stat Assoc 62(317):86–101

    Google Scholar 

  • Li Y, Chung SM, Holt JD (2008) Text document clustering based on frequent word meaning sequences. Data Knowl Eng 64(1):381–404

    Google Scholar 

  • Liu D, Jiang M, Yang X, Li H (2016) Analyzing documents with quantum clustering: a novel pattern recognition algorithm based on quantum mechanics. Pattern Recognit. Lett. 77:8–13

    Google Scholar 

  • Lloyd S (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129–137

    MathSciNet  MATH  Google Scholar 

  • MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, 1967, vol 1, no 14, pp 281–297

  • McQuitty LL (1957) Elementary linkage analysis for isolating orthogonal and oblique types and typal relevancies. Educ Psychol Meas 17(2):207–229

    Google Scholar 

  • Melin P, Castillo O (2014) A review on type-2 fuzzy logic applications in clustering, classification and pattern recognition. Appl Soft Comput 21:568–577

    Google Scholar 

  • Mirkin B (2012) Clustering: a data recovery approach, vol 19. Chapman and Hall, Boca Raton

    MATH  Google Scholar 

  • Moon TK (1996) The expectation-maximization algorithm. IEEE Signal Process Mag 13(6):47–60

    Google Scholar 

  • Murphy KP (2012) Machine learning: a probabilistic perspective. MIT press, Cambridge

    MATH  Google Scholar 

  • Peng Y, Zheng W-L, Lu B-L (2016) An unsupervised discriminative extreme learning machine and its applications to data clustering. Neurocomputing 174:250–264

    Google Scholar 

  • Rokach L, Maimon O (2005) Clustering methods. In: Data mining and knowledge discovery handbook. Springer, pp 321–352

  • Schikuta E (1996) Grid-clustering: an efficient hierarchical clustering method for very large data sets. In: Proceedings of 13th international conference on pattern recognition, 1996, vol 2, pp 101–105

  • Shi Y (2001) Particle swarm optimization: developments, applications and resources. In: Proceedings of the 2001 congress on evolutionary computation (IEEE Cat. No. 01TH8546), 2001, vol 1, pp 81–86

  • Taguchi G (1990) Introduction to quality engineering, Tokyo. Asian Product Organ

  • Van Laarhoven PJM, Aarts EHL (1987) Simulated annealing. In: Simulated annealing: theory and applications. Springer, pp 7–15

  • Walpole RE (1982) Introduction to statistics

  • Walpole RE, Myers RH, Myers SL, Ye K (2011) Probability and statistics for engineers and scientists, 9th edn. Pearson, London

    MATH  Google Scholar 

  • Wang W, Yang J, Muntz R (1997) STING: a statistical information grid approach to spatial data mining. VLDB 97:186–195

    Google Scholar 

  • Wangchamhan T, Chiewchanwattana S, Sunat K (2017) Efficient algorithms based on the k-means and chaotic league championship algorithm for numeric, categorical, and mixed-type data clustering. Expert Syst Appl 90:146–167

    Google Scholar 

  • Warnekar CS, Krishna G (1979) A heuristic clustering algorithm using union of overlapping pattern-cells. Pattern Recognit 11(2):85–93

    MATH  Google Scholar 

  • Zahn CT (1970) Graph theoretical methods for detecting and describing gestalt clusters. IEEE Trans Comput 20(SLAC-PUB-0672-REV):68

    MATH  Google Scholar 

  • Zell A (1994) Simulation neuronaler netze, no. 5.3, vol 1. Addison-Wesley, Bonn

    MATH  Google Scholar 

  • Zhao L, Yang Y (2009) PSO-based single multiplicative neuron model for time series prediction. Expert Syst Appl 36(2):2805–2812

    MathSciNet  Google Scholar 

  • Zopounidis C, Doumpos M (2002) Multicriteria classification and sorting methods: a literature review. Eur J Oper Res 138(2):229–246

    MATH  Google Scholar 

Download references

Acknowledgements

The authors would like to thank referees for their helpful comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Majid Esmaelian.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interests regarding the publication of this paper.

Ethical approval

This article does not contain any studies with human participants or animals performed by the author.

Informed consent

Informed consent was obtained from all individual participants included in the study.

Additional information

Communicated by V. Loia.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Esmaelian, M., Shahmoradi, H. & Nemati, F. A new preference disaggregation method for clustering problem: DISclustering. Soft Comput 24, 4483–4503 (2020). https://doi.org/10.1007/s00500-019-04210-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-019-04210-0

Keywords