Evaluation of clustering algorithms for financial risk analysis using MCDM methods
Introduction
Financial risks are uncertainties associated with any form of financing, including credit risk, business risk, investment risk, and operational risk. Financial data analysis, which is also called business intelligence [38], can help companies to detect financial risks in advance, take appropriate actions to minimize the defaults, and support better decision-making [22], [63]. Supervised and unsupervised learning methods are two major techniques used in financial risk analysis. Though supervised learning may achieve high prediction accuracy (see for examples, [46]), they are inapplicable when financial data have no predefined class labels. Unsupervised learning methods can not only find underlying structures in unlabeled data, but also provide labeled data for supervised methods.
As one of the most important types of unsupervised learning methods, clustering algorithms have been widely used in financial risk analysis [58]. Brockett et al. [9] presented a study using Kohonen’s Self Organizing Feature Map (SOM) to uncover automobile bodily injury claims fraud. Cox [16] developed a fuzzy system for detecting anomalous behaviors in healthcare provider claims based on unsupervised neural network and fuzzy logic. Moreau et al. [45] applied unsupervised neural networks to identify fraud in mobile communications. Williams and Huang [69] combined k-means clustering method and supervised method for insurance risk analysis. Yeo et al. [72] used hierarchical clustering technique for risk predicting in the automobile insurance industry.
Performance evaluation of learning methods is an important topic in financial risk management. The algorithm evaluation problem in general is a central issue in fields like artificial intelligence, operations research, machine learning, and data mining and knowledge discovery [28], [32], [55], [66]. Whereas supervised learning methods can be assessed using measures such as accuracy and precision, the evaluation of clustering algorithms is much harder due to the very nature of cluster analysis [70] and has been studied for years (e.g. [26], [31], [40], [41], [42], [43], [44], [68]).
In 2010, Rokach [62] suggested that the algorithm selection can be considered as a multiple criteria decision making (MCDM) problem and MCDM techniques can be used to select the best ensemble method for a problem in hand. Since evaluation of clustering algorithms involves more than one criterion, such as entropy, Dunn’s index, and computation time, it can also be modeled as a MCDM problem. The objective of this paper is to propose an MCDM-based approach for clustering algorithms evaluation in the domain of financial risk analysis. Though there are many studies assessing the qualities of clustering methods, few, if any, have analyzed this problem using a combination of multiple criteria. The experimental study of this paper, which selects six clustering algorithms, eleven selection criteria, three MCDM methods, and three real-life financial data sets, is designed to validate the proposed approach.
The rest of this paper is organized as follows: Section 2 describes the research approach, clustering algorithms, performance measures, and MCDM methods; Section 3 presents details of the experimental study that evaluates the clustering algorithms using three financial risk data sets; Section 4 concludes the paper with summaries and future research directions.
Section snippets
Research methodology
This paper proposes an MCDM-based approach to evaluate the clustering results in financial risk analysis. The empirical study chooses six clustering algorithms, eleven validity measures, and three MCDM methods to validate the evaluation approach (see Fig. 1). This section provides details of the proposed evaluation approach, clustering algorithms, performance measures, and MCDM methods.
Experiment
The experiment is designed to validate the proposed evaluation approach. The first part of this section describes the three real-life financial risk data sets. The second and third parts discuss the experimental design and results.
Conclusions
This paper proposed a new evaluation approach that utilizes MCDM methods to assess the quality of clustering algorithms in the domain of financial risk analysis. The approach first obtained clustering solutions using various clustering algorithms. The performances of clustering algorithms were measured using a collection of external and internal performance measures. MCDM methods were then used to rank clustering algorithms by taking into account all performance criteria. An experiment was
Acknowledgements
The authors are grateful to the editor in chief and the anonymous reviewers for their valuable suggestions which helped in improving the quality of this paper. This research has been partially supported by grants from the National Natural Science Foundation of China (#71222108, #71173028 and #71325001).
References (73)
- et al.
Use of a fuzzy granulation–degranulation criterion for assessing cluster validity
Fuzzy Sets Syst.
(2011) - et al.
Foundations of data envelopment analysis for Pareto–Koopmans efficient empirical production functions
J. Econom.
(1985) - et al.
Measuring the efficiency of decision making units
Eur. J. Oper. Res.
(1978) - et al.
Comparison among three analytical methods for knowledge communities group decision analysis
Expert Syst. Appl.
(2007) - et al.
Clustering techniques: the user’s dilemma
Pattern Recogn.
(1976) - et al.
A simple method to improve the consistency ratio of the pair-wise comparison matrix in ANP
Eur. J. Oper. Res.
(2011) - et al.
Investigating diversity of clustering methods: an empirical comparison
Data Knowl. Eng.
(2007) - et al.
Models of incremental concept formation
Artif. Intell.
(1989) - et al.
A similarity measure of intuitionistic fuzzy sets based on the Sugeno integral with its application to pattern recognition
Inform. Sci.
(2012) Comparison of weights in TOPSIS models
Math. Comput. Model.
(2004)
Compromise solution by MCDM methods: a comparative analysis of VIKOR and TOPSIS
Eur. J. Operat. Res.
MiniMax ε-stable cluster validity index for Type-2 fuzziness
Inform. Sci.
FAMCDM: a fusion approach of MCDM methods to rank multiclass classification algorithms
OMEGA
An empirical performance metric for classification algorithm selection in financial risk management
Appl. Soft Comput.
Co-plot: a graphic display method for geometrical representations of MCDM
Eur. J. Operat. Res.
Silhouettes: a graphical aid to the interpretation and validation of cluster analysis
Comput. Appl. Math.
Financial ratios, discriminant analysis and the prediction of corporate bankruptcy
J. Finance
A procedure for ranking efficient units in data envelopment analysis
Manage. Sci.
The global k-means clustering algorithm
Pattern Recogn.
Some models for estimating technical and scale inefficiencies in data envelopment analysis
Manage. Sci.
Decision method for cluster analysis problems using visual approach
Expert Syst.
Using Kohonen’s self organizing feature map to uncover automobile bodily injury claims fraud
J. Risk Insur.
Evaluating program and managerial efficiency: an application of data envelopment analysis to program follow through
Manage. Sci.
Data envelopment analysis: history, models and interpretations
A fuzzy system for detecting anomalous behaviors in healthcare provider claims
Maximum likelihood from incomplete data via the EM algorithm
J. Roy. Stat. Soc. Ser. B Meth.
Cluster analysis and related issues
A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters
J. Cybern.
Analytic network process in risk assessment and decision analysis
Comput. Oper. Res.
Knowledge acquisition via incremental conceptual clustering
Mach. Learn.
How many clusters? Which clustering method? Answers via model-based cluster analysis
Comput. J.
Techniques of cluster algorithms in data mining
Data Min. Knowl. Disc.
Cited by (705)
Capacity-based daily maintenance optimization of urban bus with multi-objective failure priority ranking
2024, Reliability Engineering and System SafetyParallel inference for cross-collection latent generalized Dirichlet allocation model and applications
2024, Expert Systems with ApplicationsAssessing portfolio vulnerability to systemic risk: a vine copula and APARCH-DCC approach
2024, Financial InnovationUnsupervised clustering of bitcoin transactions
2024, Financial Innovation