Towards multicriteria clustering: An extension of the k-means algorithm

doi:10.1016/j.ejor.2003.06.012

European Journal of Operational Research

Volume 158, Issue 2, 16 October 2004, Pages 390-398

https://doi.org/10.1016/j.ejor.2003.06.012 Get rights and content

Abstract

The research within the multicriteria classification field is mainly focused on the assignment of actions to pre-defined classes. Nevertheless the building of multicriteria categories remains a theoretical question still not studied in detail. To tackle this problem, we propose an extension of the well-known k-means algorithm to the multicriteria framework. This extension relies on the definition of a multicriteria distance based on the preference structure defined by the decision maker. Thus, two alternatives will be similar if they are preferred, indifferent and incomparable to more or less the same actions. Armed with this multicriteria distance, we will be able to partition the set of alternatives into classes that are meaningful from a multicriteria perspective. Finally, the examples of the country risk problem and the diagnosis of firms will be treated to illustrate the applicability of this method.

Introduction

In many situations, decision makers have to group alternatives (objects or actions) into homogeneous classes. Several examples can be cited:

•
in finance, especially in business failure prediction, credit risk assessment or country risk assessment;
•
in marketing, for the analysis of the characteristics of customers to design the market penetration strategies;
•
in medical diagnosis, for the classification of patients into diseases groups, on the basis of a set of symptoms.

The two mostly used techniques for grouping objects with similar properties are: classification and clustering. Both are often confused, but some important differences exist between them. Classification techniques use supervised learning; what means that the objects are assigned to pre-defined classes. On the contrary, clustering is an unsupervised technique that finds potential groups in data such that, the objects within a same cluster are more similar to each other than to objects in other clusters. Several approaches such as expert systems, neural networks, mathematical programming, multicriteria decision aid (MCDA) [19], [20], … , have been explored to deal with this problem. This paper is focused on the MCDA approach.

In the MCDA methodology, grouping alternatives into homogeneous classes consists in assigning a set of alternatives A={a₁,a₂,…,a_n} evaluated on m criteria {g₁,g₂,…,g_m} to one of the categories while examining their intrinsic value. The categories are pre-defined by norms called profiles, which separate them or play the role of central reference objects in these categories. The assignment of an alternative to a specific class results from a comparison of its evaluation on all criteria with the profiles defining the categories.

Among the MCDA methods developed to solve these kinds of problems we can mention: Tricotomic Segmentation [14], [17], [18], ELECTRE TRI [15], [21], [25], PROAFTN [3], Filtering methods based on concordance and non discordance [16], the UTADIS method [7], [11] based on the criteria aggregation model and the Rough set approach [22]. These procedures can be considered as supervised learning methods. As far as we know, in MCDA literature, there are few unsupervised learning techniques [6]. In this paper we are focused on this problematic: regrouping alternatives into a restricted number of categories which will remain as homogeneous as possible. Recently, Zopounidis and Doumpos [26] made a literature review on MCDA classification and sorting methods. We refer the interested reader to this study.

The k-means algorithm [12] is one of the most widely used unsupervised technique. This method allows to group the alternatives into categories in such a way that the distances between the alternatives, within a same category are the shortest, while the distances between the centers of different categories are the largest. Following the multicriteria approach, we are interested in defining a notion of distance between alternatives that takes into account the multicriteria nature of the problem.

In this paper, we present an extension of the k-means algorithm to the multicriteria framework. The intuition behind the method is the following: all actions within the same cluster are preferred, indifferent and incomparable to more or less the same actions following the decision maker preferences. Let us note that this resembles, in the spirit at least, to the sociometric idea of structural equivalence [24]. To quantify this similarity we introduce a multicriteria distance based on the preference structure (P,I,J) defined by the decision maker. The originality of this approach is the application of this new measure within the well-known k-means framework.

The outline of this paper is as follows. In Section 2, we will introduce some important concepts to develop our MCDA clustering approach. The algorithm itself will be presented in Section 3. Section 4 will be dedicated to the study of two applications: the country risk problem and the diagnosis of firms. Finally we will conclude with some general remarks about the method and directions for future research.

Section snippets

MCDA clustering

In this section, we describe our clustering method which is an extension of the well-known k-means algorithm with a MCDA background.

In a highly summarized way, the method starts with k prototypes (centroids) that are randomly chosen among all the actions. The alternatives are then assigned to the cluster represented by the nearest prototype. To determine this assignment we introduce a multicriteria distance based on the preference structure defined by the decision maker. Once this step has been

MCDA clustering algorithm

Algorithm 1, here below, shows step by step, the procedures realized by the multicriteria clustering method developed in the previous section.

Being an extension of the k-means method, the previous algorithm suffers from the same weaknesses. Among these we can cite the fact that there is no uniqueness of results and that the initial conditions may strongly influence the classes structure, see for instance Bradley and Fayyad [5], and Bottou and Bengio [4]. Moreover some empirical tests have been

Applications

As we mentioned in the introduction of this paper, the unsupervised learning techniques have many possible applications in different fields. In this section we will study two applications dedicated to financial problems: the Country Risk and the Firms Diagnosis Problems.

Conclusions and future research

In this paper we developed an extension of the k-means algorithm to the multicriteria framework. The originality of this approach comes from the definition of a distance that takes into account the multicriteria nature of the problem.

The model introduced is independent from the way the decision maker builds his preferences (outranking methods, AHP, utility theory, …). One more time, let us underline the fact that only outranking methods will lead to define a non-empty incomparability relation J.

References (26)

A.I. Dimitras et al.
A survey of business failures with an emphasis on prediction
European Journal of Operational Research
(1996)
C. Zopounidis et al.
Multicriteria classification and sorting methods: A literature review
European Journal of Operational Research
(2002)
E.I. Altman
The Prediction of Corporate Bankruptcy: A Discriminant Analysis
(1988)
E.I. Altman
Corporate Financial Distress and Bankruptcy
(1993)
N. Belacel
Multicriteria assignment method PROAFTN: Methodology and medical application
European Journal of Operational Research
(2000)
L. Bottou, Y. Bengio, Convergence properties of the k-means algorithms, Advances in Neural Information Processing...
P.S. Bradley, U.M. Fayyad, Refining initial points for the k-means clustering, in: Proceedings of the 15th...
Y. De Smet, F. Gilbart, A class definition method for country risk problems, Technical report IS-MG 2001/13,...
J.M. Devaud, G. Groussaud, Utadis: Une méthode de construction de fonctions d'utilité additives rendant compte de...
A. Dimitras et al.
Multicriteria decision aid method for the assissment of business failure risk
Fundations of Computing and Decision Sciences
(1995)

F. Gilbart

Le risque pays dans le secteur bancaire: Approche multicritère

(2003)

E. Jacquet-Lagrèze

Advances in multicriteria analysis

J. MacQueen, Some methods for classification and analysis of multivariate observations, in: Le Cam, J. Neyman (Eds.),...

Cited by (86)

Multiple criteria decision support system for customer segmentation using a sorting outranking method
2024, Expert Systems with Applications
For companies, customer segmentation plays a key role in improving supply chain management by implementing appropriate marketing strategies. The objectives of this research are to design and validate a multicriteria model to support decision making for customer segmentation in a business to business context. First, the model based on the transactional customer behaviour is extended by a hierarchy with three main criteria: Recency, Frequency and Monetary (RFM), customer collaboration and growth rates. Customer collaboration includes quota compliance, variety of products and customer commitment to sustainability (reverse logistics and shared information). Second, the Global Local Net Flow Sorting (GLNF sorting) algorithm is implemented and validated using real company data to classify 8,157 customers of a multinational healthcare company. Third, the SILS quality indicator has been implemented and validated to assess the quality of preference-ordered customer groups and its parameters have been adapted for contexts with thousands of alternatives. The results are also compared with an alternative model based on data mining (K-means). The multicriteria system proposed allows to segment thousands of customers in ordered categories by preferences according to company strategies. The segments generated are more homogeneous, robust and understandable by managers than those from alternative methods. These advantages represent a relevant contribution to automating supply chain management while providing detailed analysis tools for decision making.
A new hierarchical multiple criteria ordered clustering approach as a complementary tool for sorting and ranking problems
2023, Omega (United Kingdom)
Multiple criteria ordered clustering is a problem that involves grouping the objects of decisions (actions) into a priori unknown ordered classes considering the preferences of a decision maker (DM). By exploring the relationship between multiple criteria sorting and multiple criteria ordered clustering, we take advantage of some ordinal classification approaches to propose a new approach. The set of totally ordered clusters fulfills a property of monotonicity with respect to dominance and an asymmetric preference relation; this is useful for suggesting a more consistent and robust rank of actions regarding the existence of possible “irrelevant alternatives”. The algorithm designed to operationalize our approach makes use of either a fuzzy outranking relation or a fuzzy preference relation. Imperfect knowledge (namely uncertainty and imprecision) of criteria performance levels and model parameter values can be modelled using interval numbers. Our approach and algorithm are illustrated through a simple interval extension of the well-known PROMETHEE method, which is applied to group countries according to human development criteria. The OECD country governments are also grouped according to their public sector performance but instead using a fuzzy outranking relation in an interval framework. In both examples, the results are very promising.
A robust multicriteria clustering methodology for portfolio decision analysis
2022, Computers and Industrial Engineering
We consider a portfolio decision problem in which a subset of projects is selected to form a portfolio by dealing with uncertain multiple criteria evaluations, decision makers’ preferences and real-world constraints. Over the last years, many methods have been developed with the aim of maximizing the sum of multicriteria scores of projects selected for the final portfolio. In this paper, unlike the existing literature, we propose a new robust multicriteria clustering methodology that enables to group the best ranked projects into a new kind of cluster (so-called optimal portfolio) that complies with the given constraints. With this aim, a new Integer Programming (IP) model as an extension of the K-medoids clustering technique is combined with the PROMETHEE method. Specifically, we first apply PROMETHEE for multicriteria evaluation of the individual projects and then the two main outputs of PROMETHEE, preference matrix and net flows, are used in the IP model to generate clusters of projects with the given resource constraints. Herein, to alleviate the problem without the influence of all K clusters on the final results, we focus on generating two clusters, with the selected projects in the best one forming the portfolio. In developing this model, we also introduce portfolio quality constraints to ensure the proper distribution of “good” evaluations among all considered criteria. We then enhance this combined model by embedding it into SMAA framework to consider the inherent uncertainties. As a large number of potentially optimal portfolios are obtained through the SMAA simulation, both project and portfolio-level robustness indices are computed in order to help decision makers to identify the most robust and stable portfolio. Our methodology is validated using the data from a bridge maintenance program.
Multi-criteria ordered clustering of countries in the Global Health Security Index
2022, Socio-Economic Planning Sciences
The Global Health Security Index (GHSI) categorizes countries according to a composite score. It is proposed as a means to alert governments and health institutions about the gaps of national health systems to compromise with the detection and management of biological threats. Recently, several criticisms have been addressed to the scoring, ranking and segmentation process underlying the computation and usage of this index, the construction process and to the quality of information sources from which component indicators and sub-indicators are collected. In this article, we propose a multi-criteria ordered clustering procedure, based on the PROMETHEE II method and an iterative algorithm aimed to improve the segmentation of nations in the GHSI database. For robustness analysis purposes, this procedure is wrapped into an approach which considers the uncertain nature of data and parameters. The application considering 195 countries in the 2019's GHSI is performed. This approach is also compared to k-means. Clusters different from segments proposed in GHSI and k-means are found. Based on our robustness analysis, three groups can be identified: robustly assigned nations (clearly belonging to a cluster); weakly and possibly assigned nations (belonging to a cluster, but ambiguously); and non-assigned countries (without any cluster to allocate them). Results suggest that the 2019's GHSI segments should be reviewed.
Impact of socio-economic factors on local energetic retrofitting needs - A data analytics approach
2022, Energy Policy
Despite great efforts to increase energetic retrofitting rates in the residential building stock, greenhouse gas emissions are still too high to counteract climate change. One barrier is that policy measures are mostly national and do not address local differences. Even though there is plenty of research on instruments to overcome general barriers of energetic retrofitting, literature does not consider differences in local peculiarities. Thus, this paper aims to provide guidance for policy-makers by deriving evidence from over 19 million Energy Performance Certificates and socio-economic data from England, Scotland, and Wales. We find that building archetypes with their respective energetic retrofitting needs differ locally and that socio-economic factors show a strong correlation to the buildings’ energy efficiency, with the correlation varying depending on different degrees of this condition. For example, factors associated to employment mainly affect buildings with lower energy efficiency whereas the impact on more efficient buildings is limited. The findings of this paper allow for tailoring local policy instruments to fit the local peculiarities. We obtain a list of the most important socio-economic factors influencing the regional energy efficiency. Further, for two exemplary factors, we illustrate how local policy instruments should consider local retrofitting needs and socio-economic factors.
Classifying the degree of exposure of customers to COVID-19 in the restaurant industry: A novel intuitionistic fuzzy set extension of the TOPSIS-Sort
2021, Applied Soft Computing
Despite the rigid public safety protocols of the restaurant sector amid the COVID-19 pandemic in an effort to restart economic activities, customers do not feel secure eating at a sit-in restaurant, which is associated with prolonged restrictions on movement. As a mitigating initiative, holistically evaluating customers’ perceived degree of exposure to COVID-19 in restaurants is deemed relevant in the design of mitigation measures. Such an agenda is associated with multiple attributes under decision-making uncertainty within the framework of multiple criteria sorting (MCS). Thus, this work addresses this problem domain by proposing an intuitionistic fuzzy set extension of the previously developed TOPSIS-Sort (i.e., IF TOPSIS-Sort). As a case demonstration, 40 restaurants are evaluated under six attributes that define exposure to COVID-19. With 250 survey participants, the IF TOPSIS-Sort assigns 10, 13, and 17 restaurants to low, moderate, and high exposure classes, respectively. With this classification, crucial insights are offered to the restaurant industry for planning and policy formulation. To determine its effectiveness, a comparative analysis was carried with other distance-based MCS methods. Findings reveal that the proposed method is pessimistic and that other methods tend to underestimate the assignments, which may be counterintuitive, especially in applications related to public health. These sorting differences may be associated with addressing the vagueness and uncertainty in decision-making within the IF TOPSIS-Sort platform. The proposed novel IF TOPSIS-Sort is sufficiently generic for other domain sorting applications and contributes to the MCS literature.

View all citing articles on Scopus

View full text

Towards multicriteria clustering: An extension of the k-means algorithm

Abstract

Introduction

Section snippets

MCDA clustering

MCDA clustering algorithm

Applications

Conclusions and future research

European Journal of Operational Research

European Journal of Operational Research

The Prediction of Corporate Bankruptcy: A Discriminant Analysis

Corporate Financial Distress and Bankruptcy

Multicriteria assignment method PROAFTN: Methodology and medical application

European Journal of Operational Research

Multicriteria decision aid method for the assissment of business failure risk

Fundations of Computing and Decision Sciences

Le risque pays dans le secteur bancaire: Approche multicritère

Advances in multicriteria analysis