ABSTRACT
K nearest neighbor (kNN) method is a popular classification method in data mining because of its simple implementation and significant classification performance. However, kNN do not scale well to big datasets. In this paper, CLUKER, a novel kNN regression method based on hierarchical clustering, is proposed. CLUKER uses hierarchical clustering to divide the original dataset into several parts, effectively reducing the query scope of kNN. Moreover, in order to improve kNN's ability to handle imbalanced datasets, this paper proposes a novel weighting method based on local data distribution, called LD-Weighting method. In the end, having integrated the two algorithms, this paper proposes an efficient kNN-based model for imbalanced dataset classification called CW-kNN. The experimental results show that the proposed methods perform well on different datasets.
- Zhu, X., Li, X., Zhang, S., Ju, C., & Wu, X. 2017. Robust joint graph sparse coding for unsupervised spectral feature selection. IEEE Transactions on Neural Networks & Learning Systems, 28(6), 1263--1275.Google ScholarCross Ref
- Liu, H., & Zhang, S. 2012. Noisy data elimination using mutual k-nearest neighbor for classification mining. Journal of Systems & Software, 85(5), 1067--1074. Google ScholarDigital Library
- Pang, Y., Ji, Z., Jing, P., & Li, X. 2013. Ranking graph embedding for learning to rerank. IEEE Transactions on Neural Networks & Learning Systems, 24(8), 1292--1303.Google ScholarCross Ref
- Mary-Huard T, Robin S. Tailored Aggregation for Classification{J}. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2009, 31(11):2098--2105. Google ScholarDigital Library
- Qin, Y., Zhang, S., Zhu, X., Zhang, J., & Zhang, C. 2007. Semi-parametric optimization for missing data imputation. Applied Intelligence, 27(1), 79--88. Google ScholarDigital Library
- Pan, J. S., Qiao, Y. L., & Sun, S. H. 2004. A fast k nearest neighbors classification algorithm. Ieice Trans Fundamentals A, 87(4), págs. 961--963.Google Scholar
- Zhang, S., Li, X., Zong, M., Zhu, X., & Wang, R. 2017. Efficient knn classification with different numbers of nearest neighbors. IEEE Transactions on Neural Networks & Learning Systems, PP(99), 1--12.Google Scholar
- Cheng, Z., Chen, C., Qiu, X., & Xie, H. 2017. An Improved KNN Classification Algorithm based on Sampling. Advances in Materials, Machinery, Electrical Engineering.Google Scholar
- Chen, G., Ding, Y., & Shen, X. 2017. Sweet KNN: An Efficient KNN on GPU through Reconciliation between Redundancy Removal and Regularity. IEEE, International Conference on Data Engineering (pp.621--632). IEEE.Google ScholarCross Ref
- Chen, Z., & Yan, J. 2017. Fast KNN search for big data with set compression tree and best bin first. International Conference on Cloud Computing and Internet of Things (pp.97--100). IEEE.Google Scholar
- Chawla N V, Bowyer K W, Hall L O, et al. 2002. SMOTE: synthetic minority over-sampling technique{J}. Journal of Artificial Intelligence Research, 16(1):321--357. Google ScholarDigital Library
- Han E H, Karypis G, Kumar V. 2001. Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification{C}// Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, Berlin, Heidelberg, 2001:53--65. Google ScholarDigital Library
- Khaldi, B., Harrou, F., Cherif, F., & Sun, Y. 2018. Self-organization in aggregating robot swarms: a dw-knn topological approach. Biosystems, 165, 106--121.Google ScholarCross Ref
- Liu, Z., Luo, X., & He, T. 2017. Indoor positioning system based on the improved W-KNN algorithm. Advanced Information Technology, Electronic and Automation Control Conference (pp.1355--1359). IEEE.Google Scholar
- Liu, Z., Gao, Z., & Li, X. 2017. An Improved kNN Algorithm Based on Conditional Probability Distance Metric. International Conference on Machinery, Materials and Computing Technology.Google ScholarCross Ref
- Mao, X., Zhao, G., & Sun, R. 2017. Naive Bayesian algorithm classification model with local attribute weighted based on KNN. IEEE, Information Technology, Networking, Electronic and Automation Control Conference (pp.904--908). IEEE.Google Scholar
- Cieslak D A, Chawla N V. 2008. Learning Decision Trees for Unbalanced Data{C}// European Conference on Machine Learning and Knowledge Discovery in Databases. Springer-Verlag, 2008:241--256.Google Scholar
- Cieslak D A, Chawla N V. 2008. Learning Decision Trees for Unbalanced Data{C}// European Conference on Machine Learning and Knowledge Discovery in Databases. Springer-Verlag, 2008:241--256.Google Scholar
Index Terms
- CW-kNN: an efficient kNN-based model for imbalanced dataset classification
Recommendations
New Undersampling Method Based on the kNN Approach
AbstractClass imbalance is a common problem in machine learning tasks, which often leads to sub-optimal performance of classifiers, where the classification of a new example is based on minimizing the error rate. Researchers have worked on this problem by ...
Adaptive semi-unsupervised weighted oversampling (A-SUWO) for imbalanced datasets
A new oversampling method for imbalanced dataset classification is presented.It clusters the minority class and identifies borderline minority instances.Considering majority class during minority class clustering improves oversampling.Cluster size after ...
Coupling different methods for overcoming the class imbalance problem
Many classification problems must deal with imbalanced datasets where one class - the majority class - outnumbers the other classes. Standard classification methods do not provide accurate predictions in this setting since classification is generally ...
Comments