Abstract
Case-based reasoning (CBR) is widely used in data mining for managerial applications because it often shows significant promise for improving the effectiveness of complex and unstructured decision making. There are, however, some limitations in designing appropriate case indexing and retrieval mechanisms including feature selection and feature weighting. Some of the prior studies pointed out that finding the optimal k parameter for the k-nearest neighbor (k-NN) is also one of the most important factors for designing an effective CBR system. Nonetheless, there have been few attempts to optimize the number of neighbors, especially using artificial intelligence (AI) techniques. This study proposes a genetic algorithm (GA) approach to optimize the number of neighbors to combine. In this study, we apply this novel model to two real-world cases involving stock market and online purchase prediction problems. Experimental results show that a GA-optimized k-NN approach may outperform traditional k-NN. In addition, these results also show that our proposed method is as good as or sometime better than other AI techniques in performance-comparison.
Similar content being viewed by others
References
Ahn, H., Kim, K.-j., & Han, I. (2006a). Hybrid genetic algorithms and case-based reasoning systems for customer classification. Expert Systems, 23(3), 127–144.
Ahn, H., Kim, K.-j., & Han, I. (2006b). Global optimization of feature weights and the number of neighbors that combine in a CBR system. Expert Systems, 23(5), 290–301.
Ahn, H., Kim, K.-j., & Han, I. (2007). A case-based reasoning system with the two-dimensional reduction technique for customer classification. Expert Systems with Applications, 32(4), 1011–1019.
Bradley, P. (1994). Case-based reasoning: Business applications. Communications of the ACM, 37(3), 40–43.
Chiu, C. (2002). A case-based customer classification approach for direct marketing. Expert Systems with Applications, 22, 163–168.
Chiu, C., Chang, P. C., & Chiu, N. H. (2003). A case-based expert support system for due-date assignment in a water fabrication factory. Journal of Intelligent Manufacturing, 14, 287–296.
Fu, Y., & Shen, R. (2004). GA based CBR approach in Q&A system. Expert Systems with Applications, 26(2), 167–170.
Garrell i Guiu, J. M., Golobardes i Ribé, E., Bernadó i Mansilla, E., & Llorà i Fàbrega, X. (1999). Automatic diagnosis with genetic algorithms and case-based reasoning. Artificial Intelligence in Engineering, 13, 367–372.
Han, J., & Kamber, M. (2001). Datamining: concepts and techniques. San Francisco: Morgan Kaufmann.
Harnett, D. L., & Soni, A. K. (1991). Statistical methods for business and economics. Massachusetts: Addison-Wesley.
Jarmulak, J., Craw, S., & Rowe, R. (2000). Self-optimizing CBR retrieval. In Proceedings of the 12th IEEE international conference on tools with artificial intelligence, pp. 376–383.
Kim, K. (2004). Toward global optimization of case-based reasoning systems for financial forecasting. Applied Intelligence, 21(3), 239–249.
Kim, K., & Han, I. (2001). Maintaining case-based reasoning systems using a genetic algorithms approach. Expert Systems with Applications, 21, 139–145.
Kolodner, J. (1993). Case-based reasoning. San Mateo: Morgan Kaufmann.
Kuncheva, L. I., & Jain, L. C. (1999). Nearest neighbor classifier: Simultaneous editing and feature selection. Pattern Recognition Letters, 20, 1149–1156.
Lee, H. Y., & Park, K. N. (1999). Methods for Determining the optimal number of cases to combine in an effective case based forecasting system. Korean Journal of Management Research, 27, 1239–1252.
Rozsypal, A., & Kubat, M. (2003). Selecting representative examples and attributes by a genetic algorithm. Intelligent Data Analysis, 7, 291–304.
Shin, K.-S., & Han, I. (1999). Case-based reasoning supported by genetic algorithms for corporate bond rating. Expert Systems with Applications, 16, 85–95.
Shin, K.-S., & Lee, Y.-J. (2002). A genetic algorithm application in bankruptcy prediction modeling. Expert Systems with Applications, 23(3), 321–328.
Siedlecki, W., & Sklanski, J. (1989). A note on genetic algorithms for large-scale feature selection. Pattern Recognition Letters, 10, 335–347.
Wang, Y., & Ishii, N. (1997). A method of similarity metrics for structured representations. Expert Systems with Applications, 12, 89–100.
Watson, I. (1997). Applying case-based reasoning: techniques for enterprise systems. San Francisco: Morgan Kaufmann.
Yin, W. J., Liu, M., & Wu, C. (2002). A genetic learning approach with case-based memory for job-shop scheduling problems. In Proceedings of the first international conference on machine learning and cybernetics, pp. 1683–1687.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ahn, H., Kim, Kj. Using genetic algorithms to optimize nearest neighbors for data mining. Ann Oper Res 163, 5–18 (2008). https://doi.org/10.1007/s10479-008-0325-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10479-008-0325-2