skip to main content
10.1145/3331453.3360963acmotherconferencesArticle/Chapter ViewAbstractPublication PagescsaeConference Proceedingsconference-collections
research-article

An Improved Algorithm based on KNN and Random Forest

Published: 22 October 2019 Publication History

Abstract

This paper gives an improved algorithm called RFDKNN based on an enhanced KNN (K-Nearest Neighbor) and random forest. First, RFDKNN sorts features based on importance through Gini index and a random forest algorithm. Then it deletes some unimportant features based on this sort in a certain proportion r. Finally, it uses an enhanced KNN algorithm dynamically selecting the optimal nearest neighbor number and distance function to make the distance between two samples closer to true value. Experiments are carried out on the 20 data sets from UCI Machine Learning repository. The results show that compared with other r values, RFDKNN of r=0.7 can obtain a relatively satisfactory classification accuracy. Compared with Naive Bayes, Adaboost, Random Forest, RRSB, W-KNN, dwh-FNN and LI-KNN, RFDKNN has higher classification accuracy on most data sets, especially on large data sets such as Pendigits and Letter.

References

[1]
Z Li, K Zhang, B Chen, Y Dong and L Zhang (2018). Driver identification in intelligent vehicle systems using machine learning algorithms. IET Intelligent Transport Systems, 13(1), 40--47.
[2]
J B Kisiel, B A Dukek, R VSA Kanipakam, et al. (2019). Hepatocellular Carcinoma Detection by Plasma Methylated DNA: Discovery, Phase I Pilot, and Phase II Clinical Validation. Hepatology, 69(3), 1180--1192.
[3]
M Amiri, H R Pourghasemi, G A Ghanbarian and S F Afzali (2019). Assessment of the importance of gully erosion effective factors using Boruta algorithm and its spatial modeling and mapping using three machine learning algorithms. Geoderma, 340, 55--69.
[4]
L Zhang, C Zhang, Q Xu and C Liu. 2015. Weigted-KNN and its application on UCI. 2015 IEEE International Conference on Information and Automation, IEEE.
[5]
T Wang, S Guan and Z Wang. 2018. Incremental Attribute Learning based on KNN. Proceedings of the International Multi Conference of Engineers and Computer Scientists, Vol, 2.
[6]
S Zhang, X Li, M Zong, X Zhu and R Wang (2018). Efficient knn classification with different numbers of nearest neighbors. IEEE transactions on neural networks and learning systems, 29(5), 1774--1785.
[7]
L Jiang, H Zhang and Z Cai. 2006. Dynamic K-Nearest-Neighbor Naive Bayes with Attribute Weighted. International Conference on Fuzzy Systems & Knowledge Discovery, 365--368.
[8]
A A Nababan and O S Sitompul. 2008. Attribute Weighting Based K-Nearest Neighbor Using Gain Ratio. Journal of Physics: Conference Series. Vol. 1007. No. 1. IOP Publishing, p.012007.
[9]
Z Pan, Y Wang and W Ku (2017). A new k-harmonic nearest neighbor classifier based on the multi-local means. Expert Systems with Applications, 67, 115--125.
[10]
G Bhattacharya, K Ghosh and A S Chowdhury (2017). Granger Causality Driven AHP for Feature Weighted kNN. Pattern Recognition, 66, 425--436.
[11]
H Ma, J Gou, X Wang, J Ke and S Zeng (2017). Sparse Coefficient-Based k-Nearest Neighbor Classification. IEEE Access 5, 16618--16634.
[12]
N Li, H Kong, Y Ma, G Gong and W Huai. 2016. Human performance modeling for manufacturing based on an improved KNN algorithm. The International Journal of Advanced Manufacturing Technology, 84(1-4), 473--483.
[13]
H Yigit (2015). ABC-based distance-weighted k NN algorithm. Journal of Experimental & Theoretical Artificial Intelligence, 27(2), 189--198.
[14]
L Breiman (2001). Random forests. Machine learning, 45(1), 5--32.
[15]
J Abellán, C J Mantas and J G Castellano (2017). A Random Forest approach using imprecise probabilities. Knowledge-Based Systems, 134, 72--84.
[16]
F Nan, J Wang and V Saligrama (2015). Feature-budgeted random forest. arXiv preprint arXiv:1502.05925.
[17]
A Paul and D P Mukherjee. 2016. Reinforced random forest. Proceedings of the Tenth Indian Conference on Computer Vision, Graphics and Image Processing. ACM, p.1.
[18]
Z Zhou. 2016. Machine Learning. Tsinghua University Press, p.79.
[19]
Y Zhang, G Cao, B Wang and X Li (2019). A novel ensemble method for k-nearest neighbor. Pattern Recognition, 85, 13--25.
[20]
N Tomašev, M Radovanović, D Mladenić and M Ivanović (2014). Hubness-based fuzzy measures for high-dimensional k-nearest neighbor classification. International Journal of Machine Learning and Cybernetics, 5(3), 445--458.
[21]
Y Song, J Huang, D Zhou, H Zha and C Giles. 2007. Iknn: Informative k-nearest neighbor pattern classification. European Conference on Principles of Data Mining and Knowledge Discovery. Springer, Berlin, Heidelberg, 248--264.

Cited By

View all
  • (2024)Classification of Stunting in Toddlers from Bandarharjo Using K-Nearest Neighbors and Random Forest Algorithms2024 IEEE International Conference on Communication, Networks and Satellite (COMNETSAT)10.1109/COMNETSAT63286.2024.10862317(258-262)Online publication date: 28-Nov-2024
  • (2023)A Machine Learning Application to Predict Customer Churn: A Case in Indonesian Telecommunication CompanyAdvanced Mathematical Applications in Data Science10.2174/9789815124842123010013(144-161)Online publication date: 22-Aug-2023
  • (2023)Forecasting a FastMoving Consumer Goods (FMCG) Company's Customer Repurchase Behavior via Classification Machine Learning ModelsProceedings of the 5th International Conference on Information Management & Machine Intelligence10.1145/3647444.3647840(1-5)Online publication date: 23-Nov-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
CSAE '19: Proceedings of the 3rd International Conference on Computer Science and Application Engineering
October 2019
942 pages
ISBN:9781450362948
DOI:10.1145/3331453
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 October 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. An enhanced KNN
  2. Gini index
  3. Random forest
  4. Reducing the dimensionality

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • Science and Technology Planning Project of Guangdong Province
  • Natural Science Foundation of Guangdong Province
  • National Natural Science Foundation of China
  • he Creative Talents Project Fund of Guangdong Province Department of Education
  • the Ph.D. Start-up Fund of Natural Science Foundation of Guangdong Province

Conference

CSAE 2019

Acceptance Rates

Overall Acceptance Rate 368 of 770 submissions, 48%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)16
  • Downloads (Last 6 weeks)4
Reflects downloads up to 27 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Classification of Stunting in Toddlers from Bandarharjo Using K-Nearest Neighbors and Random Forest Algorithms2024 IEEE International Conference on Communication, Networks and Satellite (COMNETSAT)10.1109/COMNETSAT63286.2024.10862317(258-262)Online publication date: 28-Nov-2024
  • (2023)A Machine Learning Application to Predict Customer Churn: A Case in Indonesian Telecommunication CompanyAdvanced Mathematical Applications in Data Science10.2174/9789815124842123010013(144-161)Online publication date: 22-Aug-2023
  • (2023)Forecasting a FastMoving Consumer Goods (FMCG) Company's Customer Repurchase Behavior via Classification Machine Learning ModelsProceedings of the 5th International Conference on Information Management & Machine Intelligence10.1145/3647444.3647840(1-5)Online publication date: 23-Nov-2023
  • (2023)Random Similarity ForestsMachine Learning and Knowledge Discovery in Databases10.1007/978-3-031-26419-1_4(53-69)Online publication date: 17-Mar-2023
  • (2022)Feature transfer learning by reinforcement learning for detecting software defectSoftware: Practice and Experience10.1002/spe.315253:2(366-389)Online publication date: 21-Sep-2022
  • (2021)Predicting Customer Churn: A Comparison of Eight Machine Learning Techniques : A Case Study in an Indonesian Telecommunication Company2021 International Conference on Data Analytics for Business and Industry (ICDABI)10.1109/ICDABI53623.2021.9655790(42-46)Online publication date: 25-Oct-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media