Skip to main content

Rationalizing the Parameters of K-Nearest Neighbor Classification Algorithm

  • Conference paper
  • First Online:
  • 1336 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 9106))

Abstract

With arrival of big-data era, data mining algorithm becomes more and more important. K nearest neighbor algorithm is a representative algorithm for data classification; it is a simple classification method which is widely used in many fields. But some unreasonable parameters of KNN limit its scope of application, such as sample feature values must be numeric types; Some unreasonable parameters limit its classification efficiency, such as the number of training samples is too much, too high feature dimension; Some unreasonable parameters limit the effect of classification, such as the selection of K value is not reasonable, such as distance calculating method is not reasonable, Class voting method is not reasonable. This paper proposed some methods to rationalize the unreasonable parameters above, such as feature value quantification, Dimension reduction, weighted distance and weighted voting function. This paper uses experimental results based on benchmark data to show the effect.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Larose, D.T.: k-nearest neighbor algorithm. In: Discovering Knowledge in Data: An Introduction to Data Mining, pp. 90–106 (2005)

    Google Scholar 

  2. Soucy, P., Mineau, G.W.: A simple KNN algorithm for text categorization. In: Proceedings of the IEEE International Conference on IEEE Data Mining, ICDM 2001, pp. 647–648 (2001)

    Google Scholar 

  3. Bandara, U., Wijayarathna, G.: A machine learning based tool for source code plagiarism detection. Int. J. Mach. Learn. Comput. 1(4), 337–343 (2011)

    Article  Google Scholar 

  4. Chung, C.H., Parker, J.S., Karaca, G., et al.: Molecular classification of head and neck squamous cell carcinomas using patterns of gene expression. Cancer Cell 5(5), 489–500 (2004)

    Article  Google Scholar 

  5. Costa, J.A., Hero, A.O.: Manifold learning using Euclidean k-nearest neighbor graphs (image processing examples). In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2004), vol. 3, pp. iii-988–iii-991. IEEE (2004)

    Google Scholar 

  6. Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. IJCAI 14(2), 1137–1145 (1995)

    Google Scholar 

  7. Parvin, H., Alizadeh, H., Minati, B.: A modification on k-nearest neighbor classifier. Glob. J. Comput. Sci. Technol. 10(14), 37–41 (2010)

    Google Scholar 

  8. Parvin, H., Alizadeh, H., Minaei-Bidgoli, B.: MKNN: modified k-nearest neighbor. In: Proceedings of the World Congress on Engineering and Computer Science, pp. 831–834 (2008)

    Google Scholar 

  9. Li, L., Weinberg, C.R., Darden, T.A., et al.: Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method. Bioinformatics 17(12), 1131–1142 (2001)

    Article  Google Scholar 

  10. Yahia, M.E., Ibrahim, B.A.: K-nearest neighbor and C4.5 algorithms as data mining methods: advantages and difficulties. In: ACS/IEEE International Conference on Computer Systems and Applications, p. 103. IEEE (2003)

    Google Scholar 

  11. Mylonas, P., Wallace, M., Kollias, S.D.: Using k-nearest neighbor and feature selection as an improvement to hierarchical clustering. In: Vouros, G.A., Panayiotopoulos, T. (eds.) SETN 2004. LNCS (LNAI), vol. 3025, pp. 191–200. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  12. Midzuno, H.: On the sampling system with probability proportionate to sum of sizes. Ann. Inst. Stat. Math. 3(1), 99–107 (1951)

    Article  MathSciNet  Google Scholar 

  13. Sabhnani, M., Serpen, G.: Application of machine learning algorithms to KDD intrusion detection dataset within misuse detection context. In: MLMTA 2003, pp. 209–215 (2003)

    Google Scholar 

  14. Li, T., Zhang, C., Ogihara, M.: A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 20(15), 2429–2437 (2004)

    Article  Google Scholar 

  15. Chen, W.: A method to determine weight according samples. China’s High. Educ. Eval. 4, 018 (2003)

    Google Scholar 

  16. Hechenbichler, K., Schliep, K.: Weighted k-nearest-neighbor techniques and ordinal classification (2004)

    Google Scholar 

  17. Qian, G., Sural, S., Gu, Y., et al.: Similarity between Euclidean and cosine angle distance for nearest neighbor queries. In: Proceedings of the 2004 ACM Symposium on Applied Computing, pp. 1232–1237. ACM (2004)

    Google Scholar 

  18. Liu, H., Setiono, R.: Chi2: Feature selection and discretization of numeric attributes. In: 2012 IEEE 24th International Conference on Tools with Artificial Intelligence, pp. 388–388. IEEE Computer Society (1995)

    Google Scholar 

Download references

Acknowledgement

This work was supported by the National Natural Science Foundation of China (Grant No. 61272513) and Beijing Municipal Science and Technology Project (Grant No. D151100004215003).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jian Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Liu, J., Zhao, G., Zheng, Y. (2015). Rationalizing the Parameters of K-Nearest Neighbor Classification Algorithm. In: Qiang, W., Zheng, X., Hsu, CH. (eds) Cloud Computing and Big Data. CloudCom-Asia 2015. Lecture Notes in Computer Science(), vol 9106. Springer, Cham. https://doi.org/10.1007/978-3-319-28430-9_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-28430-9_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-28429-3

  • Online ISBN: 978-3-319-28430-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics