Towards multicriteria clustering: An extension of the k-means algorithm

https://doi.org/10.1016/j.ejor.2003.06.012Get rights and content

Abstract

The research within the multicriteria classification field is mainly focused on the assignment of actions to pre-defined classes. Nevertheless the building of multicriteria categories remains a theoretical question still not studied in detail. To tackle this problem, we propose an extension of the well-known k-means algorithm to the multicriteria framework. This extension relies on the definition of a multicriteria distance based on the preference structure defined by the decision maker. Thus, two alternatives will be similar if they are preferred, indifferent and incomparable to more or less the same actions. Armed with this multicriteria distance, we will be able to partition the set of alternatives into classes that are meaningful from a multicriteria perspective. Finally, the examples of the country risk problem and the diagnosis of firms will be treated to illustrate the applicability of this method.

Introduction

In many situations, decision makers have to group alternatives (objects or actions) into homogeneous classes. Several examples can be cited:

  • in finance, especially in business failure prediction, credit risk assessment or country risk assessment;

  • in marketing, for the analysis of the characteristics of customers to design the market penetration strategies;

  • in medical diagnosis, for the classification of patients into diseases groups, on the basis of a set of symptoms.


The two mostly used techniques for grouping objects with similar properties are: classification and clustering. Both are often confused, but some important differences exist between them. Classification techniques use supervised learning; what means that the objects are assigned to pre-defined classes. On the contrary, clustering is an unsupervised technique that finds potential groups in data such that, the objects within a same cluster are more similar to each other than to objects in other clusters. Several approaches such as expert systems, neural networks, mathematical programming, multicriteria decision aid (MCDA) [19], [20],  , have been explored to deal with this problem. This paper is focused on the MCDA approach.

In the MCDA methodology, grouping alternatives into homogeneous classes consists in assigning a set of alternatives A={a1,a2,…,an} evaluated on m criteria {g1,g2,…,gm} to one of the categories while examining their intrinsic value. The categories are pre-defined by norms called profiles, which separate them or play the role of central reference objects in these categories. The assignment of an alternative to a specific class results from a comparison of its evaluation on all criteria with the profiles defining the categories.

Among the MCDA methods developed to solve these kinds of problems we can mention: Tricotomic Segmentation [14], [17], [18], ELECTRE TRI [15], [21], [25], PROAFTN [3], Filtering methods based on concordance and non discordance [16], the UTADIS method [7], [11] based on the criteria aggregation model and the Rough set approach [22]. These procedures can be considered as supervised learning methods. As far as we know, in MCDA literature, there are few unsupervised learning techniques [6]. In this paper we are focused on this problematic: regrouping alternatives into a restricted number of categories which will remain as homogeneous as possible. Recently, Zopounidis and Doumpos [26] made a literature review on MCDA classification and sorting methods. We refer the interested reader to this study.

The k-means algorithm [12] is one of the most widely used unsupervised technique. This method allows to group the alternatives into categories in such a way that the distances between the alternatives, within a same category are the shortest, while the distances between the centers of different categories are the largest. Following the multicriteria approach, we are interested in defining a notion of distance between alternatives that takes into account the multicriteria nature of the problem.

In this paper, we present an extension of the k-means algorithm to the multicriteria framework. The intuition behind the method is the following: all actions within the same cluster are preferred, indifferent and incomparable to more or less the same actions following the decision maker preferences. Let us note that this resembles, in the spirit at least, to the sociometric idea of structural equivalence [24]. To quantify this similarity we introduce a multicriteria distance based on the preference structure (P,I,J) defined by the decision maker. The originality of this approach is the application of this new measure within the well-known k-means framework.

The outline of this paper is as follows. In Section 2, we will introduce some important concepts to develop our MCDA clustering approach. The algorithm itself will be presented in Section 3. Section 4 will be dedicated to the study of two applications: the country risk problem and the diagnosis of firms. Finally we will conclude with some general remarks about the method and directions for future research.

Section snippets

MCDA clustering

In this section, we describe our clustering method which is an extension of the well-known k-means algorithm with a MCDA background.

In a highly summarized way, the method starts with k prototypes (centroids) that are randomly chosen among all the actions. The alternatives are then assigned to the cluster represented by the nearest prototype. To determine this assignment we introduce a multicriteria distance based on the preference structure defined by the decision maker. Once this step has been

MCDA clustering algorithm

Algorithm 1, here below, shows step by step, the procedures realized by the multicriteria clustering method developed in the previous section.

Being an extension of the k-means method, the previous algorithm suffers from the same weaknesses. Among these we can cite the fact that there is no uniqueness of results and that the initial conditions may strongly influence the classes structure, see for instance Bradley and Fayyad [5], and Bottou and Bengio [4]. Moreover some empirical tests have been

Applications

As we mentioned in the introduction of this paper, the unsupervised learning techniques have many possible applications in different fields. In this section we will study two applications dedicated to financial problems: the Country Risk and the Firms Diagnosis Problems.

Conclusions and future research

In this paper we developed an extension of the k-means algorithm to the multicriteria framework. The originality of this approach comes from the definition of a distance that takes into account the multicriteria nature of the problem.

The model introduced is independent from the way the decision maker builds his preferences (outranking methods, AHP, utility theory, …). One more time, let us underline the fact that only outranking methods will lead to define a non-empty incomparability relation J.

References (26)

  • A.I. Dimitras et al.

    A survey of business failures with an emphasis on prediction

    European Journal of Operational Research

    (1996)
  • C. Zopounidis et al.

    Multicriteria classification and sorting methods: A literature review

    European Journal of Operational Research

    (2002)
  • E.I. Altman

    The Prediction of Corporate Bankruptcy: A Discriminant Analysis

    (1988)
  • E.I. Altman

    Corporate Financial Distress and Bankruptcy

    (1993)
  • N. Belacel

    Multicriteria assignment method PROAFTN: Methodology and medical application

    European Journal of Operational Research

    (2000)
  • L. Bottou, Y. Bengio, Convergence properties of the k-means algorithms, Advances in Neural Information Processing...
  • P.S. Bradley, U.M. Fayyad, Refining initial points for the k-means clustering, in: Proceedings of the 15th...
  • Y. De Smet, F. Gilbart, A class definition method for country risk problems, Technical report IS-MG 2001/13,...
  • J.M. Devaud, G. Groussaud, Utadis: Une méthode de construction de fonctions d'utilité additives rendant compte de...
  • A. Dimitras et al.

    Multicriteria decision aid method for the assissment of business failure risk

    Fundations of Computing and Decision Sciences

    (1995)
  • F. Gilbart

    Le risque pays dans le secteur bancaire: Approche multicritère

    (2003)
  • E. Jacquet-Lagrèze

    Advances in multicriteria analysis

  • J. MacQueen, Some methods for classification and analysis of multivariate observations, in: Le Cam, J. Neyman (Eds.),...
  • Cited by (86)

    View all citing articles on Scopus
    View full text