Evaluation of clustering algorithms for financial risk analysis using MCDM methods

doi:10.1016/j.ins.2014.02.137

Information Sciences

Volume 275, 10 August 2014, Pages 1-12

https://doi.org/10.1016/j.ins.2014.02.137 Get rights and content

Abstract

The evaluation of clustering algorithms is intrinsically difficult because of the lack of objective measures. Since the evaluation of clustering algorithms normally involves multiple criteria, it can be modeled as a multiple criteria decision making (MCDM) problem. This paper presents an MCDM-based approach to rank a selection of popular clustering algorithms in the domain of financial risk analysis. An experimental study is designed to validate the proposed approach using three MCDM methods, six clustering algorithms, and eleven cluster validity indices over three real-life credit risk and bankruptcy risk data sets. The results demonstrate the effectiveness of MCDM methods in evaluating clustering algorithms and indicate that the repeated-bisection method leads to good 2-way clustering solutions on the selected financial risk data sets.

Introduction

Financial risks are uncertainties associated with any form of financing, including credit risk, business risk, investment risk, and operational risk. Financial data analysis, which is also called business intelligence [38], can help companies to detect financial risks in advance, take appropriate actions to minimize the defaults, and support better decision-making [22], [63]. Supervised and unsupervised learning methods are two major techniques used in financial risk analysis. Though supervised learning may achieve high prediction accuracy (see for examples, [46]), they are inapplicable when financial data have no predefined class labels. Unsupervised learning methods can not only find underlying structures in unlabeled data, but also provide labeled data for supervised methods.

As one of the most important types of unsupervised learning methods, clustering algorithms have been widely used in financial risk analysis [58]. Brockett et al. [9] presented a study using Kohonen’s Self Organizing Feature Map (SOM) to uncover automobile bodily injury claims fraud. Cox [16] developed a fuzzy system for detecting anomalous behaviors in healthcare provider claims based on unsupervised neural network and fuzzy logic. Moreau et al. [45] applied unsupervised neural networks to identify fraud in mobile communications. Williams and Huang [69] combined k-means clustering method and supervised method for insurance risk analysis. Yeo et al. [72] used hierarchical clustering technique for risk predicting in the automobile insurance industry.

Performance evaluation of learning methods is an important topic in financial risk management. The algorithm evaluation problem in general is a central issue in fields like artificial intelligence, operations research, machine learning, and data mining and knowledge discovery [28], [32], [55], [66]. Whereas supervised learning methods can be assessed using measures such as accuracy and precision, the evaluation of clustering algorithms is much harder due to the very nature of cluster analysis [70] and has been studied for years (e.g. [26], [31], [40], [41], [42], [43], [44], [68]).

In 2010, Rokach [62] suggested that the algorithm selection can be considered as a multiple criteria decision making (MCDM) problem and MCDM techniques can be used to select the best ensemble method for a problem in hand. Since evaluation of clustering algorithms involves more than one criterion, such as entropy, Dunn’s index, and computation time, it can also be modeled as a MCDM problem. The objective of this paper is to propose an MCDM-based approach for clustering algorithms evaluation in the domain of financial risk analysis. Though there are many studies assessing the qualities of clustering methods, few, if any, have analyzed this problem using a combination of multiple criteria. The experimental study of this paper, which selects six clustering algorithms, eleven selection criteria, three MCDM methods, and three real-life financial data sets, is designed to validate the proposed approach.

The rest of this paper is organized as follows: Section 2 describes the research approach, clustering algorithms, performance measures, and MCDM methods; Section 3 presents details of the experimental study that evaluates the clustering algorithms using three financial risk data sets; Section 4 concludes the paper with summaries and future research directions.

Section snippets

Research methodology

This paper proposes an MCDM-based approach to evaluate the clustering results in financial risk analysis. The empirical study chooses six clustering algorithms, eleven validity measures, and three MCDM methods to validate the evaluation approach (see Fig. 1). This section provides details of the proposed evaluation approach, clustering algorithms, performance measures, and MCDM methods.

Experiment

The experiment is designed to validate the proposed evaluation approach. The first part of this section describes the three real-life financial risk data sets. The second and third parts discuss the experimental design and results.

Conclusions

This paper proposed a new evaluation approach that utilizes MCDM methods to assess the quality of clustering algorithms in the domain of financial risk analysis. The approach first obtained clustering solutions using various clustering algorithms. The performances of clustering algorithms were measured using a collection of external and internal performance measures. MCDM methods were then used to rank clustering algorithms by taking into account all performance criteria. An experiment was

Acknowledgements

The authors are grateful to the editor in chief and the anonymous reviewers for their valuable suggestions which helped in improving the quality of this paper. This research has been partially supported by grants from the National Natural Science Foundation of China (#71222108, #71173028 and #71325001).

References (73)

S. Bandyopadhyay et al.
Use of a fuzzy granulation–degranulation criterion for assessing cluster validity
Fuzzy Sets Syst.
(2011)
A. Charnes et al.
Foundations of data envelopment analysis for Pareto–Koopmans efficient empirical production functions
J. Econom.
(1985)
A. Charnes et al.
Measuring the efficiency of decision making units
Eur. J. Oper. Res.
(1978)
M.T. Chu et al.
Comparison among three analytical methods for knowledge communities group decision analysis
Expert Syst. Appl.
(2007)
R.C. Dubes et al.
Clustering techniques: the user’s dilemma
Pattern Recogn.
(1976)
D. Ergu et al.
A simple method to improve the consistency ratio of the pair-wise comparison matrix in ANP
Eur. J. Oper. Res.
(2011)
R. Gelbard et al.
Investigating diversity of clustering methods: an empirical comparison
Data Knowl. Eng.
(2007)
J.H. Gennari et al.
Models of incremental concept formation
Artif. Intell.
(1989)
C. Hwang et al.
A similarity measure of intuitionistic fuzzy sets based on the Sugeno integral with its application to pattern recognition
Inform. Sci.
(2012)
D.L. Olson
Comparison of weights in TOPSIS models
Math. Comput. Model.
(2004)

S. Opricovic et al.

Compromise solution by MCDM methods: a comparative analysis of VIKOR and TOPSIS

Eur. J. Operat. Res.

(2004)

I. Ozkan et al.

MiniMax ε-stable cluster validity index for Type-2 fuzziness

Inform. Sci.

(2012)

Y. Peng et al.

FAMCDM: a fusion approach of MCDM methods to rank multiclass classification algorithms

OMEGA

(2011)

Y. Peng et al.

An empirical performance metric for classification algorithm selection in financial risk management

Appl. Soft Comput.

(2011)

A. Raveh

Co-plot: a graphic display method for geometrical representations of MCDM

Eur. J. Operat. Res.

(2000)

P.J. Rousseeuw

Silhouettes: a graphical aid to the interpretation and validation of cluster analysis

Comput. Appl. Math.

(1987)

A. Abou-Rjeili, G. Karypis, Multilevel algorithms for partitioning power-law graphs, in: IEEE International Parallel &...

E.I. Altman

Financial ratios, discriminant analysis and the prediction of corporate bankruptcy

J. Finance

(1968)

P. Andersen et al.

A procedure for ranking efficient units in data envelopment analysis

Manage. Sci.

(1993)

L. Aristidis et al.

The global k-means clustering algorithm

Pattern Recogn.

(2003)

R. Banker et al.

Some models for estimating technical and scale inefficiencies in data envelopment analysis

Manage. Sci.

(1984)

P. Berkhin, A survey of clustering data mining techniques, Grouping Multidimensional Data In Grouping Multidimensional...

R. Bittman et al.

Decision method for cluster analysis problems using visual approach

Expert Syst.

(2007)

P. Brockett et al.

Using Kohonen’s self organizing feature map to uncover automobile bodily injury claims fraud

J. Risk Insur.

(1998)

A. Charnes et al.

Evaluating program and managerial efficiency: an application of data envelopment analysis to program follow through

Manage. Sci.

(1981)

CLUTO manual, 2003....

W.W. Cooper et al.

Data envelopment analysis: history, models and interpretations

E. Cox

A fuzzy system for detecting anomalous behaviors in healthcare provider claims

A.P. Dempster et al.

Maximum likelihood from incomplete data via the EM algorithm

J. Roy. Stat. Soc. Ser. B Meth.

(1977)

R.C. Dubes

Cluster analysis and related issues

J.C. Dunn

A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters

J. Cybern.

(1973)

D. Ergu et al.

Analytic network process in risk assessment and decision analysis

Comput. Oper. Res.

(2011)

D.H. Fisher

Knowledge acquisition via incremental conceptual clustering

Mach. Learn.

(1987)

C. Fraley et al.

How many clusters? Which clustering method? Answers via model-based cluster analysis

Comput. J.

(1998)

A. Frank, A. Asuncion, UCI Machine Learning Repository, Irvine, CA: University of California, School of Information and...

J. Grabmeier et al.

Techniques of cluster algorithms in data mining

Data Min. Knowl. Disc.

(2002)

Cited by (705)

Capacity-based daily maintenance optimization of urban bus with multi-objective failure priority ranking
2024, Reliability Engineering and System Safety
There is a specific correlation between bus operational failures and work intensity. The hysteresis effect identified the correlation between the number of repairs and the bus capacity of urban bus fleets. Therefore, administrators can determine whether to perform maintenance on a specific bus based on the severity of detected potential failures. At this stage, administrators should make a maintenance decision after considering maintenance costs, operational losses, and bus capacity. As a result, we propose a model for optimizing maintenance decisions with multi-objective failure priority ranking in this article. We employ the TOPSIS to prioritize the maintenance of failed buses based on indicators such as transport capacity loss, downtime, and maintenance cost. Finally, the study verifies the model's validity by utilizing real data from urban bus operation and maintenance data in China. The outcomes of the case study indicate that implementing the model can reduce operating expenses by 18.8143% when considering a corrective maintenance cost of 300. The model adoption provides a significant reduction in operating costs.
Parallel inference for cross-collection latent generalized Dirichlet allocation model and applications
2024, Expert Systems with Applications
Existing cross-collection topic models with document-topic representation encounter performance bottlenecks in large-scale datasets due to their reliance on Dirichlet priors and conventional inference schemes. These constraints become noticeable in models derived from the Latent Dirichlet Allocation (LDA) framework. To address these challenges, this paper introduces the GPU-accelerated cross-collection latent generalized Dirichlet allocation (gccLGDA) model. This innovative approach integrates the benefits of generalized Dirichlet (GD) distribution with the computational prowess of GPU-based parallel inference, offering enhanced cross-collection topic modeling. The gccLGDA employs the GD distribution presenting a more flexible prior with a comprehensive covariance structure, enabling a more nuanced capture of relationships between latent topics across different collections. Leveraging GPU for parallel inference, our model promises scalable and efficient training for expansive datasets, making it apt for large-scale data challenges. Through empirical evaluations in comparative text mining and document classification, we demonstrate the enhanced performance of the gccLGDA, highlighting its advantages over existing cross-collection topic models.
The rise and fall of cryptocurrencies: defining the economic and social values of blockchain technologies, assessing the opportunities, and defining the financial and cybersecurity risks of the Metaverse
2024, Financial Innovation
Scale elasticity and technical efficiency measures in two-stage network production processes: an application to the insurance sector
2024, Financial Innovation
Assessing portfolio vulnerability to systemic risk: a vine copula and APARCH-DCC approach
2024, Financial Innovation
Unsupervised clustering of bitcoin transactions
2024, Financial Innovation

View all citing articles on Scopus

View full text

Evaluation of clustering algorithms for financial risk analysis using MCDM methods

Abstract

Introduction

Section snippets

Research methodology

Experiment

Conclusions

Acknowledgements

Fuzzy Sets Syst.

J. Econom.

Eur. J. Oper. Res.

Expert Syst. Appl.

Pattern Recogn.

Eur. J. Oper. Res.

Data Knowl. Eng.

Artif. Intell.

Inform. Sci.

Math. Comput. Model.

Eur. J. Operat. Res.

Inform. Sci.

OMEGA

Appl. Soft Comput.

Eur. J. Operat. Res.

Comput. Appl. Math.

Financial ratios, discriminant analysis and the prediction of corporate bankruptcy

J. Finance

A procedure for ranking efficient units in data envelopment analysis

Manage. Sci.

The global k-means clustering algorithm

Pattern Recogn.

Some models for estimating technical and scale inefficiencies in data envelopment analysis

Manage. Sci.

Decision method for cluster analysis problems using visual approach

Expert Syst.

Using Kohonen’s self organizing feature map to uncover automobile bodily injury claims fraud

J. Risk Insur.

Evaluating program and managerial efficiency: an application of data envelopment analysis to program follow through

Manage. Sci.

Data envelopment analysis: history, models and interpretations

A fuzzy system for detecting anomalous behaviors in healthcare provider claims

Maximum likelihood from incomplete data via the EM algorithm

J. Roy. Stat. Soc. Ser. B Meth.

Cluster analysis and related issues

A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters

J. Cybern.

Analytic network process in risk assessment and decision analysis

Comput. Oper. Res.

Knowledge acquisition via incremental conceptual clustering

Mach. Learn.

How many clusters? Which clustering method? Answers via model-based cluster analysis

Comput. J.

Techniques of cluster algorithms in data mining

Data Min. Knowl. Disc.