Rationalizing the Parameters of K-Nearest Neighbor Classification Algorithm

Liu, Jian; Zhao, Gang; Zheng, Yunpeng

doi:10.1007/978-3-319-28430-9_15

Rationalizing the Parameters of K-Nearest Neighbor Classification Algorithm

Jian Liu¹⁶,
Gang Zhao¹⁶ &
Yunpeng Zheng¹⁶

Conference paper
First Online: 10 January 2016

1336 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 9106))

Abstract

With arrival of big-data era, data mining algorithm becomes more and more important. K nearest neighbor algorithm is a representative algorithm for data classification; it is a simple classification method which is widely used in many fields. But some unreasonable parameters of KNN limit its scope of application, such as sample feature values must be numeric types; Some unreasonable parameters limit its classification efficiency, such as the number of training samples is too much, too high feature dimension; Some unreasonable parameters limit the effect of classification, such as the selection of K value is not reasonable, such as distance calculating method is not reasonable, Class voting method is not reasonable. This paper proposed some methods to rationalize the unreasonable parameters above, such as feature value quantification, Dimension reduction, weighted distance and weighted voting function. This paper uses experimental results based on benchmark data to show the effect.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Larose, D.T.: k-nearest neighbor algorithm. In: Discovering Knowledge in Data: An Introduction to Data Mining, pp. 90–106 (2005)
Google Scholar
Soucy, P., Mineau, G.W.: A simple KNN algorithm for text categorization. In: Proceedings of the IEEE International Conference on IEEE Data Mining, ICDM 2001, pp. 647–648 (2001)
Google Scholar
Bandara, U., Wijayarathna, G.: A machine learning based tool for source code plagiarism detection. Int. J. Mach. Learn. Comput. 1(4), 337–343 (2011)
Article Google Scholar
Chung, C.H., Parker, J.S., Karaca, G., et al.: Molecular classification of head and neck squamous cell carcinomas using patterns of gene expression. Cancer Cell 5(5), 489–500 (2004)
Article Google Scholar
Costa, J.A., Hero, A.O.: Manifold learning using Euclidean k-nearest neighbor graphs (image processing examples). In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2004), vol. 3, pp. iii-988–iii-991. IEEE (2004)
Google Scholar
Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. IJCAI 14(2), 1137–1145 (1995)
Google Scholar
Parvin, H., Alizadeh, H., Minati, B.: A modification on k-nearest neighbor classifier. Glob. J. Comput. Sci. Technol. 10(14), 37–41 (2010)
Google Scholar
Parvin, H., Alizadeh, H., Minaei-Bidgoli, B.: MKNN: modified k-nearest neighbor. In: Proceedings of the World Congress on Engineering and Computer Science, pp. 831–834 (2008)
Google Scholar
Li, L., Weinberg, C.R., Darden, T.A., et al.: Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method. Bioinformatics 17(12), 1131–1142 (2001)
Article Google Scholar
Yahia, M.E., Ibrahim, B.A.: K-nearest neighbor and C4.5 algorithms as data mining methods: advantages and difficulties. In: ACS/IEEE International Conference on Computer Systems and Applications, p. 103. IEEE (2003)
Google Scholar
Mylonas, P., Wallace, M., Kollias, S.D.: Using k-nearest neighbor and feature selection as an improvement to hierarchical clustering. In: Vouros, G.A., Panayiotopoulos, T. (eds.) SETN 2004. LNCS (LNAI), vol. 3025, pp. 191–200. Springer, Heidelberg (2004)
Chapter Google Scholar
Midzuno, H.: On the sampling system with probability proportionate to sum of sizes. Ann. Inst. Stat. Math. 3(1), 99–107 (1951)
Article MathSciNet Google Scholar
Sabhnani, M., Serpen, G.: Application of machine learning algorithms to KDD intrusion detection dataset within misuse detection context. In: MLMTA 2003, pp. 209–215 (2003)
Google Scholar
Li, T., Zhang, C., Ogihara, M.: A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 20(15), 2429–2437 (2004)
Article Google Scholar
Chen, W.: A method to determine weight according samples. China’s High. Educ. Eval. 4, 018 (2003)
Google Scholar
Hechenbichler, K., Schliep, K.: Weighted k-nearest-neighbor techniques and ordinal classification (2004)
Google Scholar
Qian, G., Sural, S., Gu, Y., et al.: Similarity between Euclidean and cosine angle distance for nearest neighbor queries. In: Proceedings of the 2004 ACM Symposium on Applied Computing, pp. 1232–1237. ACM (2004)
Google Scholar
Liu, H., Setiono, R.: Chi2: Feature selection and discretization of numeric attributes. In: 2012 IEEE 24th International Conference on Tools with Artificial Intelligence, pp. 388–388. IEEE Computer Society (1995)
Google Scholar

Download references

Acknowledgement

This work was supported by the National Natural Science Foundation of China (Grant No. 61272513) and Beijing Municipal Science and Technology Project (Grant No. D151100004215003).

Author information

Authors and Affiliations

Beijing Information Science and Technology University, Beijing, 100192, China
Jian Liu, Gang Zhao & Yunpeng Zheng

Authors

Jian Liu
View author publications
You can also search for this author in PubMed Google Scholar
Gang Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Yunpeng Zheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jian Liu .

Editor information

Editors and Affiliations

School of Computer Science and Tech., Huazhong Univ. of Science and Technology, Wuhan, China
Weizhong Qiang
College of Mathematics and Computer Sci., Fuzhou University, Fuzhou, China
Xianghan Zheng
Dept. of Computer Scie and Informat. Eng, Chung Hua University, Hsinchu, Taiwan
Ching-Hsien Hsu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, J., Zhao, G., Zheng, Y. (2015). Rationalizing the Parameters of K-Nearest Neighbor Classification Algorithm. In: Qiang, W., Zheng, X., Hsu, CH. (eds) Cloud Computing and Big Data. CloudCom-Asia 2015. Lecture Notes in Computer Science(), vol 9106. Springer, Cham. https://doi.org/10.1007/978-3-319-28430-9_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-28430-9_15
Published: 10 January 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28429-3
Online ISBN: 978-3-319-28430-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics