Elsevier

Information Sciences

Volume 275, 10 August 2014, Pages 1-12
Information Sciences

Evaluation of clustering algorithms for financial risk analysis using MCDM methods

https://doi.org/10.1016/j.ins.2014.02.137Get rights and content

Abstract

The evaluation of clustering algorithms is intrinsically difficult because of the lack of objective measures. Since the evaluation of clustering algorithms normally involves multiple criteria, it can be modeled as a multiple criteria decision making (MCDM) problem. This paper presents an MCDM-based approach to rank a selection of popular clustering algorithms in the domain of financial risk analysis. An experimental study is designed to validate the proposed approach using three MCDM methods, six clustering algorithms, and eleven cluster validity indices over three real-life credit risk and bankruptcy risk data sets. The results demonstrate the effectiveness of MCDM methods in evaluating clustering algorithms and indicate that the repeated-bisection method leads to good 2-way clustering solutions on the selected financial risk data sets.

Introduction

Financial risks are uncertainties associated with any form of financing, including credit risk, business risk, investment risk, and operational risk. Financial data analysis, which is also called business intelligence [38], can help companies to detect financial risks in advance, take appropriate actions to minimize the defaults, and support better decision-making [22], [63]. Supervised and unsupervised learning methods are two major techniques used in financial risk analysis. Though supervised learning may achieve high prediction accuracy (see for examples, [46]), they are inapplicable when financial data have no predefined class labels. Unsupervised learning methods can not only find underlying structures in unlabeled data, but also provide labeled data for supervised methods.

As one of the most important types of unsupervised learning methods, clustering algorithms have been widely used in financial risk analysis [58]. Brockett et al. [9] presented a study using Kohonen’s Self Organizing Feature Map (SOM) to uncover automobile bodily injury claims fraud. Cox [16] developed a fuzzy system for detecting anomalous behaviors in healthcare provider claims based on unsupervised neural network and fuzzy logic. Moreau et al. [45] applied unsupervised neural networks to identify fraud in mobile communications. Williams and Huang [69] combined k-means clustering method and supervised method for insurance risk analysis. Yeo et al. [72] used hierarchical clustering technique for risk predicting in the automobile insurance industry.

Performance evaluation of learning methods is an important topic in financial risk management. The algorithm evaluation problem in general is a central issue in fields like artificial intelligence, operations research, machine learning, and data mining and knowledge discovery [28], [32], [55], [66]. Whereas supervised learning methods can be assessed using measures such as accuracy and precision, the evaluation of clustering algorithms is much harder due to the very nature of cluster analysis [70] and has been studied for years (e.g. [26], [31], [40], [41], [42], [43], [44], [68]).

In 2010, Rokach [62] suggested that the algorithm selection can be considered as a multiple criteria decision making (MCDM) problem and MCDM techniques can be used to select the best ensemble method for a problem in hand. Since evaluation of clustering algorithms involves more than one criterion, such as entropy, Dunn’s index, and computation time, it can also be modeled as a MCDM problem. The objective of this paper is to propose an MCDM-based approach for clustering algorithms evaluation in the domain of financial risk analysis. Though there are many studies assessing the qualities of clustering methods, few, if any, have analyzed this problem using a combination of multiple criteria. The experimental study of this paper, which selects six clustering algorithms, eleven selection criteria, three MCDM methods, and three real-life financial data sets, is designed to validate the proposed approach.

The rest of this paper is organized as follows: Section 2 describes the research approach, clustering algorithms, performance measures, and MCDM methods; Section 3 presents details of the experimental study that evaluates the clustering algorithms using three financial risk data sets; Section 4 concludes the paper with summaries and future research directions.

Section snippets

Research methodology

This paper proposes an MCDM-based approach to evaluate the clustering results in financial risk analysis. The empirical study chooses six clustering algorithms, eleven validity measures, and three MCDM methods to validate the evaluation approach (see Fig. 1). This section provides details of the proposed evaluation approach, clustering algorithms, performance measures, and MCDM methods.

Experiment

The experiment is designed to validate the proposed evaluation approach. The first part of this section describes the three real-life financial risk data sets. The second and third parts discuss the experimental design and results.

Conclusions

This paper proposed a new evaluation approach that utilizes MCDM methods to assess the quality of clustering algorithms in the domain of financial risk analysis. The approach first obtained clustering solutions using various clustering algorithms. The performances of clustering algorithms were measured using a collection of external and internal performance measures. MCDM methods were then used to rank clustering algorithms by taking into account all performance criteria. An experiment was

Acknowledgements

The authors are grateful to the editor in chief and the anonymous reviewers for their valuable suggestions which helped in improving the quality of this paper. This research has been partially supported by grants from the National Natural Science Foundation of China (#71222108, #71173028 and #71325001).

References (73)

  • S. Opricovic et al.

    Compromise solution by MCDM methods: a comparative analysis of VIKOR and TOPSIS

    Eur. J. Operat. Res.

    (2004)
  • I. Ozkan et al.

    MiniMax ε-stable cluster validity index for Type-2 fuzziness

    Inform. Sci.

    (2012)
  • Y. Peng et al.

    FAMCDM: a fusion approach of MCDM methods to rank multiclass classification algorithms

    OMEGA

    (2011)
  • Y. Peng et al.

    An empirical performance metric for classification algorithm selection in financial risk management

    Appl. Soft Comput.

    (2011)
  • A. Raveh

    Co-plot: a graphic display method for geometrical representations of MCDM

    Eur. J. Operat. Res.

    (2000)
  • P.J. Rousseeuw

    Silhouettes: a graphical aid to the interpretation and validation of cluster analysis

    Comput. Appl. Math.

    (1987)
  • A. Abou-Rjeili, G. Karypis, Multilevel algorithms for partitioning power-law graphs, in: IEEE International Parallel &...
  • E.I. Altman

    Financial ratios, discriminant analysis and the prediction of corporate bankruptcy

    J. Finance

    (1968)
  • P. Andersen et al.

    A procedure for ranking efficient units in data envelopment analysis

    Manage. Sci.

    (1993)
  • L. Aristidis et al.

    The global k-means clustering algorithm

    Pattern Recogn.

    (2003)
  • R. Banker et al.

    Some models for estimating technical and scale inefficiencies in data envelopment analysis

    Manage. Sci.

    (1984)
  • P. Berkhin, A survey of clustering data mining techniques, Grouping Multidimensional Data In Grouping Multidimensional...
  • R. Bittman et al.

    Decision method for cluster analysis problems using visual approach

    Expert Syst.

    (2007)
  • P. Brockett et al.

    Using Kohonen’s self organizing feature map to uncover automobile bodily injury claims fraud

    J. Risk Insur.

    (1998)
  • A. Charnes et al.

    Evaluating program and managerial efficiency: an application of data envelopment analysis to program follow through

    Manage. Sci.

    (1981)
  • CLUTO manual, 2003....
  • W.W. Cooper et al.

    Data envelopment analysis: history, models and interpretations

  • E. Cox

    A fuzzy system for detecting anomalous behaviors in healthcare provider claims

  • A.P. Dempster et al.

    Maximum likelihood from incomplete data via the EM algorithm

    J. Roy. Stat. Soc. Ser. B Meth.

    (1977)
  • R.C. Dubes

    Cluster analysis and related issues

  • J.C. Dunn

    A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters

    J. Cybern.

    (1973)
  • D. Ergu et al.

    Analytic network process in risk assessment and decision analysis

    Comput. Oper. Res.

    (2011)
  • D.H. Fisher

    Knowledge acquisition via incremental conceptual clustering

    Mach. Learn.

    (1987)
  • C. Fraley et al.

    How many clusters? Which clustering method? Answers via model-based cluster analysis

    Comput. J.

    (1998)
  • A. Frank, A. Asuncion, UCI Machine Learning Repository, Irvine, CA: University of California, School of Information and...
  • J. Grabmeier et al.

    Techniques of cluster algorithms in data mining

    Data Min. Knowl. Disc.

    (2002)
  • Cited by (705)

    View all citing articles on Scopus
    View full text