research-article

CW-kNN: an efficient kNN-based model for imbalanced dataset classification

Authors:
Yi Xiang

Yunnan University, Kunming, China

Yunnan University, Kunming, China
View Profile

,
ZhongFeng Cao

Chinese Academy of Sciences, Beijing, China

Chinese Academy of Sciences, Beijing, China
View Profile

,
ShaoWen Yao

Yunnan University, Kunming, China

Yunnan University, Kunming, China
View Profile

,
Jing He

Yunnan University, Kunming, China

Yunnan University, Kunming, China
View Profile

ICCIP '18: Proceedings of the 4th International Conference on Communication and Information ProcessingNovember 2018Pages 7–11https://doi.org/10.1145/3290420.3290431

Published:02 November 2018Publication History

ICCIP '18: Proceedings of the 4th International Conference on Communication and Information Processing

Pages 7–11

ABSTRACT

K nearest neighbor (kNN) method is a popular classification method in data mining because of its simple implementation and significant classification performance. However, kNN do not scale well to big datasets. In this paper, CLUKER, a novel kNN regression method based on hierarchical clustering, is proposed. CLUKER uses hierarchical clustering to divide the original dataset into several parts, effectively reducing the query scope of kNN. Moreover, in order to improve kNN's ability to handle imbalanced datasets, this paper proposes a novel weighting method based on local data distribution, called LD-Weighting method. In the end, having integrated the two algorithms, this paper proposes an efficient kNN-based model for imbalanced dataset classification called CW-kNN. The experimental results show that the proposed methods perform well on different datasets.

References

Zhu, X., Li, X., Zhang, S., Ju, C., & Wu, X. 2017. Robust joint graph sparse coding for unsupervised spectral feature selection. IEEE Transactions on Neural Networks & Learning Systems, 28(6), 1263--1275.Google ScholarCross Ref
Liu, H., & Zhang, S. 2012. Noisy data elimination using mutual k-nearest neighbor for classification mining. Journal of Systems & Software, 85(5), 1067--1074. Google ScholarDigital Library
Pang, Y., Ji, Z., Jing, P., & Li, X. 2013. Ranking graph embedding for learning to rerank. IEEE Transactions on Neural Networks & Learning Systems, 24(8), 1292--1303.Google ScholarCross Ref
Mary-Huard T, Robin S. Tailored Aggregation for Classification{J}. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2009, 31(11):2098--2105. Google ScholarDigital Library
Qin, Y., Zhang, S., Zhu, X., Zhang, J., & Zhang, C. 2007. Semi-parametric optimization for missing data imputation. Applied Intelligence, 27(1), 79--88. Google ScholarDigital Library
Pan, J. S., Qiao, Y. L., & Sun, S. H. 2004. A fast k nearest neighbors classification algorithm. Ieice Trans Fundamentals A, 87(4), págs. 961--963.Google Scholar
Zhang, S., Li, X., Zong, M., Zhu, X., & Wang, R. 2017. Efficient knn classification with different numbers of nearest neighbors. IEEE Transactions on Neural Networks & Learning Systems, PP(99), 1--12.Google Scholar
Cheng, Z., Chen, C., Qiu, X., & Xie, H. 2017. An Improved KNN Classification Algorithm based on Sampling. Advances in Materials, Machinery, Electrical Engineering.Google Scholar
Chen, G., Ding, Y., & Shen, X. 2017. Sweet KNN: An Efficient KNN on GPU through Reconciliation between Redundancy Removal and Regularity. IEEE, International Conference on Data Engineering (pp.621--632). IEEE.Google ScholarCross Ref
Chen, Z., & Yan, J. 2017. Fast KNN search for big data with set compression tree and best bin first. International Conference on Cloud Computing and Internet of Things (pp.97--100). IEEE.Google Scholar
Chawla N V, Bowyer K W, Hall L O, et al. 2002. SMOTE: synthetic minority over-sampling technique{J}. Journal of Artificial Intelligence Research, 16(1):321--357. Google ScholarDigital Library
Han E H, Karypis G, Kumar V. 2001. Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification{C}// Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, Berlin, Heidelberg, 2001:53--65. Google ScholarDigital Library
Khaldi, B., Harrou, F., Cherif, F., & Sun, Y. 2018. Self-organization in aggregating robot swarms: a dw-knn topological approach. Biosystems, 165, 106--121.Google ScholarCross Ref
Liu, Z., Luo, X., & He, T. 2017. Indoor positioning system based on the improved W-KNN algorithm. Advanced Information Technology, Electronic and Automation Control Conference (pp.1355--1359). IEEE.Google Scholar
Liu, Z., Gao, Z., & Li, X. 2017. An Improved kNN Algorithm Based on Conditional Probability Distance Metric. International Conference on Machinery, Materials and Computing Technology.Google ScholarCross Ref
Mao, X., Zhao, G., & Sun, R. 2017. Naive Bayesian algorithm classification model with local attribute weighted based on KNN. IEEE, Information Technology, Networking, Electronic and Automation Control Conference (pp.904--908). IEEE.Google Scholar
Cieslak D A, Chawla N V. 2008. Learning Decision Trees for Unbalanced Data{C}// European Conference on Machine Learning and Knowledge Discovery in Databases. Springer-Verlag, 2008:241--256.Google Scholar
Cieslak D A, Chawla N V. 2008. Learning Decision Trees for Unbalanced Data{C}// European Conference on Machine Learning and Knowledge Discovery in Databases. Springer-Verlag, 2008:241--256.Google Scholar

Index Terms

CW-kNN: an efficient kNN-based model for imbalanced dataset classification
1. Information systems
  1. Information systems applications
    1. Data mining
      1. Nearest-neighbor search

Recommendations

New Undersampling Method Based on the kNN Approach
Abstract
Class imbalance is a common problem in machine learning tasks, which often leads to sub-optimal performance of classifiers, where the classification of a new example is based on minimizing the error rate. Researchers have worked on this problem by ...
Read More
Adaptive semi-unsupervised weighted oversampling (A-SUWO) for imbalanced datasets

A new oversampling method for imbalanced dataset classification is presented.It clusters the minority class and identifies borderline minority instances.Considering majority class during minority class clustering improves oversampling.Cluster size after ...
Read More
Coupling different methods for overcoming the class imbalance problem

Many classification problems must deal with imbalanced datasets where one class - the majority class - outnumbers the other classes. Standard classification methods do not provide accurate predictions in this setting since classification is generally ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICCIP '18: Proceedings of the 4th International Conference on Communication and Information Processing
November 2018
326 pages
ISBN:9781450365345
DOI:10.1145/3290420
Conference Chairs:
Jalel Ben-Othman
University of Paris 13, France
,
Hui Yu
University of Portsmouth, the United Kingdom, UK
,
Program Chairs:
Herwig Unger
University of Hagen, Germany
,
Masayuki Arai
Graduate School of Science and Engineering Teikyo University, Japan
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 November 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
K nearest-neighbor
data mining
imbalanced dataset
weighting method
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate61of301submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 283
  Total Downloads
- Downloads (Last 12 months)10
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

CW-kNN: an efficient kNN-based model for imbalanced dataset classification

ICCIP '18: Proceedings of the 4th International Conference on Communication and Information Processing

ABSTRACT

References

Cited By

Index Terms

Recommendations

New Undersampling Method Based on the kNN Approach

Adaptive semi-unsupervised weighted oversampling (A-SUWO) for imbalanced datasets

Coupling different methods for overcoming the class imbalance problem

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

CW-kNN: an efficient kNN-based model for imbalanced dataset classification

ICCIP '18: Proceedings of the 4th International Conference on Communication and Information Processing

ABSTRACT

References

Cited By

Index Terms

Recommendations

New Undersampling Method Based on the kNN Approach

Adaptive semi-unsupervised weighted oversampling (A-SUWO) for imbalanced datasets

Coupling different methods for overcoming the class imbalance problem

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media