Abstract
Support Vector Machine, an optimization technique, is well known in the data mining community. In fact, many other optimization techniques have been effectively used in dealing with data separation and analysis. For the last 10 years, the author and his colleagues have proposed and extended a series of optimization-based classification models via Multiple Criteria Linear Programming (MCLP) and Multiple Criteria Quadratic Programming (MCQP). These methods are different from statistics, decision tree induction, and neural networks. The purpose of this paper is to review the basic concepts and frameworks of these methods and promote the research interests in the data mining community. According to the evolution of multiple criteria programming, the paper starts with the bases of MCLP. Then, it further discusses penalized MCLP, MCQP, Multiple Criteria Fuzzy Linear Programming (MCFLP), Multi-Class Multiple Criteria Programming (MCMCP), and the kernel-based Multiple Criteria Linear Program, as well as MCLP-based regression. This paper also outlines several applications of Multiple Criteria optimization-based data mining methods, such as Credit Card Risk Analysis, Classification of HIV-1 Mediated Neuronal Dendritic and Synaptic Damage, Network Intrusion Detection, Firm Bankruptcy Prediction, and VIP E-Mail Behavior Analysis.
Access this article
We’re sorry, something doesn't seem to be working properly.
Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.
Similar content being viewed by others
References
Altman E (1968) Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. J Finance 23(3): 589–609
Chang CC, Lin CJ (2001) LIBSVM: A Library for Support Vector Machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Charnes A, Cooper WW, Rhodes E (1979) Measuring the efficiency of decision-making units. Eur J Oper Res 3(4): 339
Cortes C, Vapnik V (1995) Support-vector Network. Mach Learn 20: 273–279
Freed N, Glover F (1981) Simple but powerful goal programming models for discriminant problems. Eur J Oper Res 7: 44–60
Fung G (2003) Machine learning and data mining via mathematical programming-based support vector machines. Ph.D thesis, The University of Wisconsin-Madison
Fung G, Stoeckel J (2007) SVM feature selection for classification of SPECT images of Alzheimer’s disease using spatial information. Knowl Inform Syst 11: 243–258
He J, Liu X, Shi Y, Xu W, Yan N (2004) Classifications of credit cardholder behavior by using fuzzy linear programming. Int J Inform Technol Decis Making 3(4): 633–650
Han J, Kamber M (2006) Data mining: concepts and techniques. Morgan Kaufmann, San Francisco
Joachims T (2004) SVM-light: support vector machine. Available at: http://svmlight.joachims.org/
Kou G, Shi Y (2002) Linux based Multiple Linear Programming Classification Program, Omaha, NE, U.S.A., College of Information Science and Technology, University of Nebraska-Omaha
Kou G, Liu X, Peng Y, Shi Y, Wise M, Xu W (2003) Multiple criteria linear programming to data mining: models, algorithm designs and software developments. Optim Methods Softw 18: 453–473
Kou G, Peng Y, Yan N, Shi Y, Chen Z, Zhu Q, Huff J, McCartney S (2004) Network intrusion detection by using multiple-criteria linear programming. In: Chen J (eds) Proceedings of 2004 international conference on service systems and service management July, 19–21. Beijing, China, pp 806–809
Kou G, Peng Y, Shi Y, Wise M, Xu W (2005) Discovering credit cardholders behavior by multiple criteria linear programming. Ann Oper Res 135(1): 261–274
Kou G (2006) Multi-class multi-criteria mathematical programming and its applications in large scale data mining problems. PhD Dissertation, University of Nebraska Omaha
Kou G, Peng Y, Chen Z, Shi Y (2009) Multiple criteria mathematical programming for multi-class classification and applications in network intrusion detection. Inform Sci 179: 371–381
Kwak W, Shi Y, Cheh JJ (2006) Firm bankruptcy prediction using multiple criteria linear programming data mining approach. Adv Financial Plan Forecast 2: 27–49
Kwak W, Shi Y, Eldridge S, Kou G (2006) Bankruptcy prediction for Japanese firms: using multiple criteria linear programming data mining approach. Int J Bus Intell Data Mining 1(4): 401–416
Li A, Shi Y, He J (2008) MCLP-based methods for improving “Bad” catching rate in credit cardholder behavior analysis. Appl Soft Comput 8(3): 1259–1265
Mangasarian OL (1965) Linear and nonlinear separation of patterns by linear programming. Oper Res 13: 444–452
Lu Y, Roychowdhury V (2008) Parallel randomized sampling for support vector machine (SVM) and support vector regression (SVR). Knowl Inform Syst 14: 233–247
Ng AY (2004) Feature selection, L1 vs. L2 regularization, and rotational invariance. In: Proceedings of the twenty-first international conference on machine learning, Banff, Alberta, Canada, July 4-8, pp 78–86
Ohlson J (1980) Financial ratios and the probabilistic prediction of bankruptcy. J Acc Res 18(1): 109–131
Olson D, Shi Y (2007) Introduction to business data mining. McGraw-Hill/Irwin, New York
Peng T, Zuo W, He F (2008) SVM based adaptive learning method for text classification from positive and unlabeled documents. Knowl Inform Syst 16: 281–301
Peng Y (2002) Data mining in credit card portfolio management: classification for cardholders’ behavior. Master Thesis, University of Nebraska Omaha
Peng Y, Kou G, Shi Y, Chen Z (2008) A multi-criteria convex quadratic programming model for credit data analysis. Decis Support Syst 44: 1016–1030
Shi Y (2001) Multiple criteria and multiple constraint level linear programming: concepts, techniques and applications. World Scientific Publishing, Singapore
Shi Y, Yu PL (1989) Goal setting and compromise solutions. In: Karpak B, Zionts S (eds) Multiple criteria decision making and risk analysis using microcomputers. Springer, Berlin, pp 165–204
Shi Y, Wise M, Luo M, Lin Y (2001) Data mining in credit card portfolio management: a multiple criteria decision making approach. In: Koksalan M, Zionts S (eds) Advance in multiple criteria decision making in the new millennium. Springer, Berlin, pp 427–436
Shi Y, Peng Y, Xu W, Tang X (2002) Data mining via multiple criteria linear programming: applications in credit card portfolio management. Int J Inform Technol Decis Making 1: 131–151
Stolfo SJ, Fan W, Lee W, Prodromidis A, Chan PK (2000) Cost-based modeling and evaluation for data mining with application to fraud and intrusion detection: results from the JAM Project, DARPA Information Survivability Conference
Wang Z, Klir G (1992) Fuzzy measure theory. Plenum, New York
Wang Z, Leung K, Klir GJ (2005) Applying fuzzy measures and nonlinear integrals in data mining. Fuzzy Sets Syst 156: 371–380
Wang Z, Guo H (2003) A New genetic algorithm for nonlinear multi-regressions based on generalized Choquet integrals. In: Proceedings of Fuzz/IEEE, IEEE, pp 819–821
Wei LW (2008) Research on data mining classification model based on the multiple criteria programming and its application. PhD Dissertation, Institute of Policy and Management, Chinese Academy of Sciences
Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Y, Motoda H, McLachlan GJ, Ng A, Liu B, Yu PS, Zhou ZH, Steinbach M, Hand DJ, Steinberg D (2008) Top 10 algorithms in data mining. Knowl Inform Syst 14: 1–37
Yan N, Shi Y (2003) Neural network classification program, College of Information Science and Technology, University of Nebraska-Omaha. http://dm.ist.unomaha.edu/tools.htm
Yan N, Shi Y, Chen Z (2008) Multiple criteria nonlinear programming classification with signed non-additive measure. In: The 19th international conference on multiple criteria decision making, Auckland, New Zealand, January 7–12
Zhang P, Zhang JL, Shi Y (2007) A new multi-criteria quadratic-programming linear classification model for VIP E-Mail Analysis, ICCS 2007, Part II. LNCS, vol 4488, Springer, Berlin, pp 499–502
Zhang P, Shi Y (2008) Multiple criteria linear programming for vip e-mail behavior analysis. Working Paper, Research Center on Fictitious Economy and Data Science, Chinese Academy of Sciences
Zhang JL, Shi Y, Zhang P (2009) Several multi-criteria programming methods for classification. Comput Oper Res 36: 823–836
Zhang D, Tian Y, Shi Y (2008) A Regression Method by Multiple Criteria Linear Programming. In: 19th international conference on multiple criteria decision making, (MCDM), Auckland, New Zealand, January 7–12
Zhang Z, Zhang D, Tian Y, Shi Y (2008) Kernel based multiple criteria linear program. In: The 19th international conference on multiple criteria decision making (MCDM), Auckland, New Zealand, January 7–12
Zheng J, Zhuang W, Yan N, Kou G, Peng H, McNally C, Erichsen D, Cheloha A, Herek S, Shi C, Shi Y (2004) Classification of HIV-1 mediated neuronal dendritic and synaptic damage using multiple criteria linear programming. Neuroinformatics 2: 303–326
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Shi, Y. Multiple criteria optimization-based data mining methods and applications: a systematic survey. Knowl Inf Syst 24, 369–391 (2010). https://doi.org/10.1007/s10115-009-0268-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-009-0268-1