A multi-objective clustering approach based on different clustering measures combinations

Azevedo, Beatriz Flamia; Rocha, Ana Maria A. C.; Pereira, Ana I.

doi:10.1007/s40314-024-03004-x

A multi-objective clustering approach based on different clustering measures combinations

Original Article
Open access
Published: 02 December 2024

Volume 44, article number 59, (2025)
Cite this article

Download PDF

You have full access to this open access article

Computational and Applied Mathematics Aims and scope Submit manuscript

A multi-objective clustering approach based on different clustering measures combinations

Download PDF

882 Accesses
1 Citation
Explore all metrics

Abstract

Clustering methods aim to categorize the elements of a dataset into groups according to the similarities and dissimilarities of the elements. This paper proposes the Multi-objective Clustering Algorithm (MCA), which combines clustering methods with the Nondominated Sorting Genetic Algorithm II. In this way, the proposed algorithm can automatically define the optimal number of clusters and partition the elements based on clustering measures. For this, 6 intra-clustering and 7 inter-clustering measures are explored, combining them 2-to-2, to define the most appropriate pair of measures to be used in a bi-objective approach. Out of the 42 possible combinations, 6 of them were considered the most appropriate, since they showed an explicitly conflicting behavior among the measures. The results of these 6 Pareto fronts were combined into two Pareto fronts, according to the measure of intra-clustering that the combination has in common. The elements of these Pareto fronts were analyzed in terms of dominance, so the nondominanted ones were kept, generating a hybrid Pareto front composed of solutions provided by different combinations of measures. The presented approach was validated on three benchmark datasets and also on a real dataset. The results were satisfactory since the proposed algorithm could estimate the optimal number of clusters and suitable dataset partitions. The obtained results were compared with the classical k-means and DBSCAN algorithms, and also two hybrid approaches, the Clustering Differential Evolution, and the Game-Based k-means algorithms. The MCA results demonstrated that they are competitive, mainly for the advancement of providing a set of optimum solutions for the decision-maker.

Improved multi-objective clustering with automatic determination of the number of clusters

Article 21 January 2016

Sophisticated SOM based genetic operators in multi-objective clustering framework

Article 15 December 2018

Augmented weighted K-means grey wolf optimizer: An enhanced metaheuristic algorithm for data clustering problems

Article Open access 05 March 2024

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Clustering is an unsupervised data partitioning method that aims to divide the dataset according to characteristics intrinsic to each element, satisfying some criteria such that elements of the same cluster are more similar than those in different ones (Aggarwal and Reddy 2013). Among the unsupervised methods, clustering techniques are the most popular (Azevedo et al. 2024b). Due to its versatility, the clustering procedure is very useful in engineering, health sciences, humanities, economics, education, and other areas (Azevedo et al. 2023; Bi et al. 2020; Liu and Liu 2024; Shi et al. 2025; Tambunan et al. 2020), for that reason several clustering techniques have been proposed over the years.

Usually, in the literature, two clustering problems are commonly reported. The first problem refers to premature convergence at local optimal points, and the second is around the dependence of initial parametrization, especially regarding the number clusters (Azevedo et al. 2024b; Eesa and Orman 2020; Morimoto et al. 2021). In many cases, the estimation of the number of clusters is difficult to predict due to the lack of domain knowledge of the problem, clusters differentiation in terms of shape, size, and density, and when clusters are overlapping in nature (Dutta et al. 2019; Zhao et al. 2024).

Determining the most suitable number of cluster partitions can be considered an optimization problem. Therefore, several studies propose to use nature-inspired metaheuristics to find a solution that maximizes the separation between different clusters and maximizes the cohesion between data elements in the same cluster (Azevedo et al. 2024b; Qaddoura et al. 2021). In turn, Ikotun and Ezugwu (2022) improved the k-means clustering algorithm by using the Symbiotic Organisms Search Algorithm as a global search metaheuristic for generating the optimal initial cluster centroids. Behera et al. (2022) presented a novel approach to define the number of optimal clusters using two hybrid Firefly Particle Swarm Optimization algorithms. Initially, the approach focused on searching for the optimal number of clusters and gradually moved towards global optimal cluster centers. Wang et al. (2024) used the k-means clustering algorithm combined with a self-adapting Genetic Algorithm and Particle Swarm Optimization to identify the optimal solution for a vehicle routing problem. Wadhwa et al. (2023) presented a modified Density-based spatial clustering of applications with noise (DBSCAN) clustering-based scheme where density-based clusters are formed using the DBSCAN, in which the algorithm parameters were estimated by the Bat Algorithm. All approaches were tested on several benchmark datasets, and real-life problems, and the authors considered several statistical tests to justify the effectiveness of the suggested approaches.

A good clustering algorithm should maintain high similarity within the cluster and higher dissimilarities in distinct clusters. Most current clustering methods have also been proposed to integrate different distance measures to obtain the optimal clustering division. The idea is to maximize the distance measure between distinct clusters and, at the same time, minimize a similarity distance measure between the points in the same cluster. However, the weights for several distance measures are challenging to set (Liu et al. 2018). So, a multi-objective optimization algorithm is a suitable strategy for this problem. However, selecting an appropriate measure of the objective function is a nontrivial matter, and the outcome of clustering can significantly depend on this choice. Many works explore multi-objective algorithms using different objective functions, extracting patterns, and providing multiple partitions as solutions, as can be seen in Morimoto et al. (2021).

Dutta et al. (2019) proposed a Multi-objective Genetic Algorithm for automatic clustering, which takes advantage of the local search ability of k-means with the global search ability of the Genetic Algorithm to find the optimal k. The objective was to minimize the intra-cluster distance and maximize the inter-cluster distance. Kaur and Kumar (2022) presented a multi-objective clustering algorithm based on a vibrating particle system, considering as an objective function the intra-cluster variance and the connectedness; besides the vibrating particle system was used for optimizing the objectives to obtain good clustering results.

Binu Jose and Das (2022) presented a multi-objective approach for clustering to establish the relationship between inter-cluster and intra-cluster distances. Three objective functions were considered to simultaneously minimize the sum of the distance between the elements and their centroids, maximize the sum of the distance between the centroids, and minimize the sum of the distance between elements of the same cluster. Nevertheless, this approach needs the prior specification of the optimal number of centroids, and their final position is obtained randomly from the optimal centroid position that generates the minimum sum of the distance between the elements and their centroids. The algorithms were tested in different benchmark datasets and compared with single objective clustering algorithms, demonstrating superior performance.

The approach proposed in this work explores bio-inspired strategies and clustering techniques to achieve a robust clustering algorithm, named Multi-objective Clustering Algorithm (MCA), combined with the Nondominated Sorting Genetic Algorithm II (Kok et al. 2011), to define the optimal number of cluster sets and the partitioning of the elements, minimizing an intra-clustering measure and maximizing an inter-clustering one. To this end, 42 combinations between 6 intra-clustering measures and 7 inter-clustering measures were analyzed, and the most prominent ones were selected to be used in the MCA as a bi-objective function. As it is been considered a multi-objective approach, the results of each combination consist of a set of Pareto fronts. The elements of these Pareto fronts were analyzed in terms of dominance, generating a hybrid Pareto front composed of elements provided by different combinations of measures.

The main contributions of this work are the use of a multi-objective strategy and the combination of different solutions in the definition of the optimal number of cluster sets and the partitioning of their elements. Single objective algorithms minimize a single measure at a time, which has limitations in exploring specific geometries, dimensions, amount of data, or any other reason, which can indicate, in the eyes of the decision-maker, a solution not suitable for the partitioning of the dataset. By providing a set of optimal solutions, the decision-maker has the variability and flexibility to choose the most appropriate solution according to his/her knowledge or preferences. Moreover, the hybrid Pareto front allows the consideration of different measures that enrich the diversity and robustness of the solution set.

This paper is organized as follows. After the introduction, the clustering measures explored in the paper are described in Sect. 2; these measures are divided into two categories: intra-clustering measures and inter-clustering measures. After that, Sect. 3 defines the multi-objective concepts and the algorithm proposed, which uses the clustering measures to define the set of the optimum solution automatically. The results and discussions are presented in Sect. 4 and a comparison between the MCA results with the k-means, DBSCAN, Clustering Differential Evolution, and the Game-Based k-means algorithms are presented in Sect. 5. Finally, the main conclusions of this research and future steps are presented in Sect. 6.

2 Clustering measures

To partition the dataset into different groups, it is necessary to establish some measures for computing the distances between each element. The choice of distance measures is fundamental for the algorithm’s performance, as it strongly influences the clustering results. A multi-objective clustering algorithm considers different clustering measures to automatically define the optimal number of clusters by minimizing the intra-cluster distance and maximizing the inter-cluster distance. Based on the measures criterion, the objective is to group data elements close to each other in the feature space, reflecting their similarity. Many well-known methods are explored in the literature, such as single linkage, complete linkage, and average linkage, among others (Institute; Sokal and Michener 1958; Sorensen 1948). The following sections present the intra- and inter-clustering measures, respectively. Before presenting these measures, consider the notation:

X is the dataset, in which $X = \{x_1, x_2,...,x_m\}$ where $x_i$ is an element of the dataset;
m is the number of elements x that the set X is composed;
C defines the set of centroids of the form $C=\{c_1, c_2,...,c_k\}$, where $c_j$ defines the centroid j;
k is the number of centroids in which X is partitioned;
$C_j$ defines the cluster j, in which $C_j=\{x_1^j,x_2^j,...,x_i^j\}$;
$x_i^j$ represents an element i that belongs to cluster j;
$\#C_j$ is the number of elements of cluster $C_j$;
$D(\cdot ,\cdot )$ represents the Euclidean distance between two elements;
$s^*$ is a vector of solution, in which $s^*=\{C_1^{*},C_2^{*}...,C_k^{*}\}$, where C refers to a cluster set.

2.1 Intra-clustering measures

Intra-cluster measures refer to the distance among elements of a given cluster. There are many ways to compute the intra-clustering measure. The ones considered in this work are presented below.

The sum of the distances between the elements $x_i^j$ belonging to $C_j$ to their centroids $c_j$ is denoted by $Sxc_j$, as presented in Eq. (1),

$$\begin{aligned} Sxc_j = \sum _{i=1}^{\#C_j} D(x_i^j,c_j) \ \ \ \text { for } \ \ \ j=1,...,k, \end{aligned}$$

(1)

where $\#C_j$ represents the number of elements in the cluster $C_j$.

Thus, Sxc measure corresponds to the sum of $Sxc_j$, for each cluster $C_j$, as presented in Eq. (2),

$$\begin{aligned} Sxc = \sum _{j=1}^{k} Sxc_j. \end{aligned}$$

(2)

Figure 1a illustrates this measure, in which the points (blue, green, and magenta) describe a particular set of clusters $C_j$, the red cross represents the centroids of the clusters, and the black lines represent the distance considered. Thereby, Sxc corresponds to the sum of the length of each black line.

The mean of Sxc is represented by Mxc, and calculated as defined by Eq. (3),

$$\begin{aligned} Mxc = \frac{Sxc}{k}. \end{aligned}$$

(3)

The measure SMxc represents the mean of the distance in each cluster from the element to its centroid, in terms of the number of elements belonging to each cluster set $\#C_j$, as defined in Eq. (4),

$$\begin{aligned} SMxc = \sum _{j=1}^{k} \frac{Sxc_j}{\#C_j}. \end{aligned}$$

(4)

The measure MSMxc represents the mean of the distance in each cluster from the element to its centroid, in terms of the number of clusters, which is described in Eq. (5),

$$\begin{aligned} MSMxc =\frac{SMxc}{k}. \end{aligned}$$

(5)

Another inter-clustering measure considered is the sum of the distance of the furthest neighbor within the cluster (FNc). It evaluates the sum of the furthest neighbor distance of each cluster $C_j$, where $x_i^j$ and $x_l^j$ belong to the same cluster j. Figure 1b illustrates this measure, as well as described in Eq. (6).

$$\begin{aligned} FNc = \sum _{j=1}^{k} max \{ D(x_i^j, x_l^j) \} \ \text {for} \ \ {i=1,\ldots ,\#C_j, \ l=1,\ldots ,\#C_j, \ i\ne l} \end{aligned}$$

(6)

Thus, the mean of the furthest neighbor distance (MFNc) evaluates the mean of the furthest neighbor distance in terms of the number of clusters k, as presented in Eq. (7).

$$\begin{aligned} MFNc = \frac{FNc}{k} \end{aligned}$$

(7)

2.2 Inter-clustering measures

The inter-cluster measures define the distance between elements that belong to different clusters. The inter-cluster measures considered are presented below.

The measure Scc represents the sum of the distance between centroids (Sokal and Michener 1958), is illustrated in Fig. 2a, and defined in Eq. (8),

$$\begin{aligned} Scc = \displaystyle \sum _{\begin{array}{c} t,j=1, \\ t \ne j \end{array}}^{k} \ D(c_j, c_t). \end{aligned}$$

(8)

Thus, the mean of the distance between centroids (Mcc) is based on Scc, and it is presented in Eq. (9).

$$\begin{aligned} Mcc = \frac{Scc}{k} \end{aligned}$$

(9)

The sum of the furthest neighbor distance between elements of different clusters $C_j$ (FNcc) also known as complete linkage (Sorensen 1948), is illustrated in Fig. 2b, and described in Eq. (10),

$$\begin{aligned} FNcc = \sum _{j=1}^{k} \sum _{t>j}^{k} \max \ \{D(x_{i}^j, x_{l}^t)\} \ \ \text {for} \ \ {i=1,\ldots ,\#C_j, l=1,..,\#C_t, i \ne j}. \end{aligned}$$

(10)

The mean of the sum of the furthest neighbor distances between elements of k different clusters (MFNcc) is described in Eq. (11),

$$\begin{aligned} MFNcc = \frac{FNcc}{k}. \end{aligned}$$

(11)

Another measure considered is the sum of the nearest neighbor distance between elements of different clusters (NNcc), known as single linkage (Sokal and Michener 1958), is illustrated in Fig. 2c and defined in Eq. (12),

$$\begin{aligned} NNcc = \sum _{j=1}^{k} \sum _{t>j}^{k} \min \ \{D(x_{i}^j, x_{l}^t)\} \ \ \text {for} \ \ {i=1,\ldots ,\#C_j, \ l=1,..,\#C_t, i \ne l}. \end{aligned}$$

(12)

The mean of the nearest neighbor distance between elements of different clusters (MNNcc), is defined in Eq. (13).

$$\begin{aligned} MNNcc = \frac{NNcc}{k} \end{aligned}$$

(13)

Finally, the sum of the distance between an element i, which belongs to cluster j, to all other elements of the dataset that belong to a cluster t (Sxx), is illustrated in Fig. 2d, and expressed in Eq. (14),

$$\begin{aligned} Sxx = \frac{1}{2}\sum _{j=1}^{k}\sum _{t=1}^{k} \sum _{i=1}^{\#C_j} \sum _{l=1}^{\#C_j} \ D(x_i^j, x_l^t) \ \ \text {for} \ \ {t \ne j, l \ne i}. \end{aligned}$$

(14)

3 Multi-objective approach

In this section, the main concepts of the multi-objective approach are presented, as well as the developed algorithm, the Multi-objective Clustering Algorithm (MCA) that uses the Nondominated Sorting Genetic Algorithm II (NSGA-II) (Deb et al. 2002).

3.1 Multi-objective concepts

Multi-objective optimization is an area of multiple-criteria decision-making concerning mathematical optimization problems that involve several objective functions to be minimized or maximized simultaneously. These objectives are conflicting, meaning there is a trade-off between them (Deb 2001).

In the single-objective optimization problem, there is only one objective function to be optimized, and its superiority is determined by comparing the objective function values. Meanwhile, in the multi-objective optimization problem, the problem to be optimized involves multiple conflicting objectives (h objectives), i.e., a vector of objective functions. In this case, the quality of a solution is determined by the nondominated criterion (Deb 2001).

An unconstrained multi-objective optimization problem is defined in the form of Eq. (15),

$$\begin{aligned} \min _{\textrm{x}\in \mathbb {R}} F =\ \ \{f_{1}(\textrm{x}), f_{2}(\textrm{x}), \ldots , f_{h}(\textrm{x})\} \end{aligned}$$

(15)

in which the solution $\textrm{x}$ is a vector of d decision variables, $\textrm{x} = (\mathrm {x_1}, \mathrm {x_2},\ldots ,\mathrm {x_d})$.

The optimal solutions according to a multi-objective approach will be specified based on a mathematical concept of partial ordering (Deb 2011). Thereby, multi-objective optimization algorithms use the concept of dominated and nondominated solutions, where the nondominated solutions will constitute the Pareto front, representing the optimal set of solutions for a multi-objective optimization problem. Specifically, a decision variable vector $\textrm{x}'$ $\in $ S is called a dominated solution if there exist $\bar{\textrm{x}}$ $\in $ S such that $f_{i}({\bar{\textrm{x}}})$ $\le $ $f_{i}({\textrm{x}}')$ for all i = $1,\ldots ,h$. Then, the vectors of the objective function are treated as optimal if none of their solutions can be improved without deteriorating at least one of the other objectives. Thereby, any solution that is nondominated by any other set member is known as a nondominated solution (Deb 2011).

Figure 3 illustrates this concept. The dark blue circles represent the nondominated solutions, constituting the Pareto front, and the light blue circles represent the dominated solutions, that were dominated by the solution represented by the dark blue circles. A nondominated solution cannot be improved in any objective without sacrificing performance in another. Note that solution A is equally optimal as solution B, since A first objective function value $(f_{1A})$ is smaller than B second objective function value $(f_{1B})$, while A second objective function value $(f_{2A})$ is higher than B second objective function value $(f_{2B})$. Meanwhile, solution C is dominated by solutions A and B.

3.2 Nondominated Sorting Genetic Algorithm

The Nondominated Sorting Genetic Algorithm II (NSGA-II), is a popular bio-inspired algorithm for solving multi-objective optimization problems. It was developed by Deb et al. (2002) and is based on evolutionary procedures of the Genetic Algorithm. Evolutionary algorithms explore a large solution space efficiently, making them well-suited for finding global optima in complex, multimodal landscapes (Yang and Gen 2010). Unlike deterministic algorithms, evolutionary algorithms, also known as metaheuristics, do not require derivative information of the objective function; this makes them suitable for optimization problems where analytical derivatives are unavailable, difficult to compute, or unreliable (Azevedo et al. 2024b). Besides, evolutionary algorithms used to be robust in handling noisy, non-linear, and non-convex objective functions. The NSGA-II is an appropriate algorithm for the work since it has strong exploration capabilities, fast convergence, and computational efficiency.

The NSGA-II uses a fast nondomination sorting procedure to categorize solutions according to the level of nondomination and a crowding distance operator to preserve diversity in the evolutionary procedure. Moreover, elitism is achieved by controling the elite members of the population as the algorithm progresses to maintain the diversity of the population until it converges to a Pareto-optimal front (Deb et al. 2002).

Basically, the NSGA-II algorithm starts by randomly initializing the population, in which each objective $f_i$ ($i \in \{1,\ldots ,h\}$) is identified. The population is composed of N individuals represented by a candidate solution $\textrm{x}$ (Kok et al. 2011). Afterward, each element $\textrm{x}$ is faced with genetic operations which entail a simulated binary crossover and a polynomial mutation. The NSGA-II algorithm is applied iteratively until a specified stopping criterion is met (Kok et al. 2011). More details about the NSGA-II can be found in Deb et al. (2002), Kok et al. (2011), and the algorithm code is available on Matlab^®, specifically by gamultiobj function (MATLAB 2019).

3.3 Multi-objective clustering algorithm

The algorithm developed in this work, named Multi-objective Clustering Algorithm (MCA), evaluates intra- and inter-clustering measures to define the optimal number of centroids and their optimal position. This is achieved by simultaneously minimizing intra-clustering distances and maximizing inter-clustering distances by combining several pairs of intra- and inter-clustering measures.

A bi-objective programming problem can be defined in order to minimize an intra-clustering measure and maximize an inter-clustering measure as follows:

$$\begin{aligned} \min F=\{f_a,-g_b\} \end{aligned}$$

where $f_a$, for $a=1,\ldots ,6$, is the intra-clustering measure and $g_b$, for $b=1,\ldots ,7$, is the inter-clustering measure, defined in Sect. 2.

The MCA can be defined in 8 stages, as presented below. To better explain the MCA, consider a given dataset $X=\{x_1, x_2, \ldots , x_m\}$ composed of m elements, where $x_i \in \mathbb {R}^d$ (d is the number of variables of the dataset), the idea is to partition X into k optimal groups (clusters). Thus, the output of the MCA is a vector of the solution $s^*=\{C_1^{*},C_2^{*}\ldots ,C_k^{*}\}$, in which C refers to a cluster set.

Stage 1—Input data the algorithm starts with the input of the dataset X, and the definition of the values $k_{min}$ and $k_{max}$, which represent the minimum, and maximum number of centroids that could be assigned. The MCA automatically defines the optimal number of cluster partitions in a range of possible partitions. This value can be given by the user or considered by default $k_{min}=2$ and $k_{max}=[\sqrt{m}]$ (Pal and Bezdek 1995).

Stage 2—Measures selection a pair of measures is selected, one intra-measure and another inter-measure, in which $f_a$, is an intra-clustering measure, among $f_1= Sxc$, $f_2= Mxc$, $f_3= SMxc$, $f_4= MSMxc$, $f_5= FNc$, and $f_6= MFNc$, and $g_b$, is the inter-clustering measure among $g_1= Scc$, $g_2= Mcc$, $g_3= FNcc$, $g_4= MFNcc$ $g_5= NNcc$, $g_6= MNNcc$, and $g_7= Sxx$.

Stage 3—Centroids calculation after the measures selection, the process needs to calculate the centroids. The Centroids Calculation (CC) iterative procedure randomly generates $k_{max}$ ordered pairs, which are the possible candidates for k centroids. Each candidate ordered pair is associated with a random value $\omega $ belonging to [0, 1]. If $\omega > \gamma $, for a fix $\gamma $, the candidate pair will be considered as a centroid. If the dimension of C is smaller than $k_{min}$, the centroid candidates with the largest $\omega $ are added to C until $\#C=k_{min}$ (Heris 2015). The next step is to calculate the Euclidean distance D between all the elements of X up to each centroid j. The closest elements of each centroid $c_j$ define a cluster set $C_j$. To avoid small cluster sets, a minimum number of elements per cluster is defined, as $\zeta $, (Memarsadeghi et al. 2007). Thereby, the centroids $c_j$ that have less than $\zeta $ associated elements are automatically removed from the set of centroids, and the elements become part of other remaining centroid, which is the closest one in terms of Euclidean distance of the elements considered. As default, it is considered $\zeta = [\sqrt{m}] $, with $\zeta \in \mathbb {N}$ (Dutta et al. 2019). The set C has all remaining centroids. To improve the algorithm’s performance, the coordinates of each centroid j assume the coordinates of their barycenter cluster $c_j$, composed of its $x_{i}^{j}$ elements.

For a better understanding of the centroids calculation, the Algorithm 1 is presented.

Stage 4—Optimization method to identify the Pareto front associated with that bi-objective function of the problem, it is necessary to use a multi-objective algorithm. In this case, the MCA uses the NSGA-II (Deb et al. 2002), as defined in Sect. 3.2. This iterative evolutionary process revisits Stage 3 as many times as necessary, refining and advancing the population algorithm. Stage 4 can be replaced with another multi-objective optimization process, such as Multi-objective Particle Swarm Optimization (Coello-Coello and Lechuga 2002), Multi-objective Grey Wolf Optimizer (Mirjalili et al. 2016), or Multi-objective Genetic Algorithm (Dutta et al. 2019), among others. Here, the NSGA-II was chosen due to its popularity and its high exploration ability.

Stage 5—Stopping criterion the Stages 2, 3, and 4 are repeated until all combinations of pairs of measures have been evaluated.

Stage 6—Normalization process considering the different measures with varying orders of magnitude, normalizing the values is essential to facilitate a comprehensive comparison and ensure a fair and meaningful analysis of the results. Then, each Pareto front generated is normalized through the Min-Max scaling method (Müller and Guido 2016). That is, each solution s of the Pareto front is individually normalized between [0, 1], using Eq. (16),

$$\begin{aligned} f_{i}^{*}=\frac{f_{i} (s) - f_{i}^{min}}{f_{i}^{max} - f_{i}^{min}}, \end{aligned}$$

(16)

where $f_i(s)$ represents the components of the bi-objective function F at the solution s, $f_i^{min}$ and $f_i^{max}$ are respectively the smallest and the highest solution value, of the function $f_i$ that belong to the Pareto front. Thus, $f_i^*=(f_i(s*))$ is a normalized Pareto front solution of each Pareto front considered a given set of measures.

Stage 7—Nondominated procedure evaluation: in this stage, all normalized Pareto front are evaluated regarding nondominated criterion, and the nondominated solutions are selected to compose a hybrid Pareto front (HPF).

Stage 8—Hybrid Pareto front: the HPF is the set of nondominated solutions, considering all the normalized solutions of the Pareto fronts obtained for each pair of measures.

The pseudocode of the MCA algorithm is presented in Algorithm 2.

4 Results and discussion

To validate the proposed approach, 4 datasets are considered, as described in Table 1, according to the number of elements, features, Number of clusters, and references.

Table 1 Datasets description

A multi-objective clustering approach based on different clustering measures combinations

Abstract

Similar content being viewed by others

Improved multi-objective clustering with automatic determination of the number of clusters

Sophisticated SOM based genetic operators in multi-objective clustering framework

Augmented weighted K-means grey wolf optimizer: An enhanced metaheuristic algorithm for data clustering problems

1 Introduction

2 Clustering measures

2.1 Intra-clustering measures

2.2 Inter-clustering measures

3 Multi-objective approach

3.1 Multi-objective concepts

3.2 Nondominated Sorting Genetic Algorithm

3.3 Multi-objective clustering algorithm

4 Results and discussion

4.1 Conflict analysis

4.2 Results for dataset 1—My data

4.2.1 Results of the intra-clustering measure SMxc with inter-clustering measures

4.2.2 Results of the intra-clustering measure FNc with inter-clustering measures

4.2.3 MCA sensitivity analysis

4.3 Results for dataset 2-Thyroid

4.3.1 Results of the intra-clustering measure SMxc with inter-clustering measures

4.3.2 Results of the intra-clustering measure FNc with inter-clustering measures

4.4 Results for dataset 3–Breast dataset

4.4.1 Results of the intra-clustering measure SMxc with inter-clustering measures

4.4.2 Results of the intra-clustering measure FNc with inter-clustering measures

4.5 Results for dataset 4

5 Algorithms comparison results

6 Conclusions and future work

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification