Skip to main content

kNN Algorithm with Data-Driven k Value

  • Conference paper
Book cover Advanced Data Mining and Applications (ADMA 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8933))

Included in the following conference series:

Abstract

This paper proposes a new k Nearest Neighbor (kNN) algorithm based on sparse learning, so as to overcome the drawbacks of the previous kNN algorithm, such as the fixed k value for each test sample and the neglect of the correlation of samples. Specifically, the paper reconstructs test samples by training samples to learn the optimal k value for each test sample, and then uses kNN algorithm with the learnt k value to conduct all kinds of tasks, such as classification, regression, and missing value imputation. The rationale of the proposed method is that different test samples should be assigned different k values in kNN algorithm, and learning the optimal k value for each test sample should be taken the correlation of data into account. To this end, in the reconstruction process, the proposed method is designed to achieve the minimal reconstruction error via a least square loss function, and employ an ℓ1-norm regularization term to create the element-wise sparsity in the reconstruction coefficient, i.e., sparsity appearing in the element of the coefficient matrix. For achieving effectiveness, the Locality Preserving Projection (LPP) is employed to keep the local structures of data. Finally, the experimental results on real datasets, and the experimental results show that the proposed kNN algorithm is better than the state-of-the-art algorithms in terms of different learning tasks, such as classification, regression, and missing value imputation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bache, K., Lichman, M.: UCI machine learning repository (2013)

    Google Scholar 

  2. Burba, F., Ferraty, F., Vieu, P.: k-nearest neighbour method in functional nonparametric regression. Journal of Nonparametric Statistics 21(4), 453–469 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  3. Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27:1–27:27 (2011), http://www.csie.ntu.edu.tw/~cjlin/libsvm

  4. Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Transactions on Information Theory 13(1), 21–27 (1967)

    Article  MATH  Google Scholar 

  5. Ferraty, F., Vieu, P.: Nonparametric functional data analysis: theory and practice (2006)

    Google Scholar 

  6. Goldberger, J., Roweis, S.T., Hinton, G.E., Salakhutdinov, R.: Neighbourhood components analysis. In: NIPS (2004)

    Google Scholar 

  7. He, X., Niyogi, P.: Locality preserving projections. In: NIPS (2003)

    Google Scholar 

  8. Kang, P., Cho, S.: Locally linear reconstruction for instance-based learning. Pattern Recognition 41(11), 3507–3518 (2008)

    Article  MATH  Google Scholar 

  9. Lall, U., Sharma, A.: A nearest neighbor bootstrap for resampling hydrologic time series. Water Resources Research 32(3), 679–693 (1996)

    Article  Google Scholar 

  10. Liu, H., Zhang, S., Zhao, J., Zhao, X., Mo, Y.: A new classification algorithm using mutual nearest neighbors. In: GCC, pp. 52–57 (2010)

    Google Scholar 

  11. Meesad, P., Hengpraprohm, K.: Combination of knn-based feature selection and knnbased missing-value imputation of microarray data. In: ICICIC, pp. 341–341 (2008)

    Google Scholar 

  12. Qin, Y., Zhang, S., Zhu, X., Zhang, J., Zhang, C.: Semi-parametric optimization for missing data imputation. Applied Intelligence 27(1), 79–88 (2007)

    Article  MATH  Google Scholar 

  13. Qin, Z., Wang, A.T., Zhang, C., Zhang, S.: Cost-sensitive classification with k-nearest neighbors. In: Wang, M. (ed.) KSEM 2013. LNCS, vol. 8041, pp. 112–131. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  14. Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 267–288 (1996)

    Google Scholar 

  15. Weinberger, K.Q., Blitzer, J., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. In: NIPS, pp. 1473–1480 (2005)

    Google Scholar 

  16. Wu, X., Zhang, C., Zhang, S.: Efficient mining of both positive and negative association rules. ACM Transactions on Information Systems (TOIS) 22(3), 381–405 (2004)

    Article  Google Scholar 

  17. Wu, X., Zhang, C., Zhang, S.: Database classification for multi-database mining. Information Systems 30(1), 71–88 (2005)

    Article  MATH  Google Scholar 

  18. Wu, X., Zhang, S.: Synthesizing high-frequency rules from different data sources. IEEE Transactions on Knowledge and Data Engineering 15(2), 353–367 (2003)

    Article  Google Scholar 

  19. Zhang, C., Zhu, X., Zhang, J., Qin, Y., Zhang, S.: GBKII: An imputation method for missing values. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 1080–1087. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  20. Zhang, S.: Cost-sensitive classification with respect to waiting cost. Knowledge-Based Systems 23(5), 369–378 (2010)

    Article  Google Scholar 

  21. Zhang, S.: Estimating semi-parametric missing values with iterative imputation. International Journal of Data Warehousing and Mining 6(3), 1–10 (2010)

    Article  Google Scholar 

  22. Zhang, S.: KNN-CF approach: Incorporating certainty factor to knn classification. IEEE Intelligent Informatics Bulletin 11(1), 24–33 (2010)

    Google Scholar 

  23. Zhang, S.: Shell-neighbor method and its application in missing data imputation. Applied Intelligence 35(1), 123–133 (2011)

    Article  MATH  Google Scholar 

  24. Zhang, S.: Decision tree classifiers sensitive to heterogeneous costs. Journal of Systems and Software 85(4), 771–779 (2012)

    Article  Google Scholar 

  25. Zhang, S.: Nearest neighbor selection for iteratively knn imputation. Journal of Systems and Software 85(11), 2541–2552 (2012)

    Article  Google Scholar 

  26. Zhang, S., Jin, Z., Zhu, X.: Missing data imputation by utilizing information within incomplete instances. Journal of Systems and Software 84(3), 452–459 (2011)

    Article  Google Scholar 

  27. Zhang, S., Jin, Z., Zhu, X., Zhang, J.: Missing data analysis: A kernel-based multi-imputation approach. In: Gavrilova, M.L., Tan, C.J.K. (eds.) Transactions on Computational Science III. LNCS, vol. 5300, pp. 122–142. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  28. Zhang, S., Qin, Z., Ling, C.X., Sheng, S.: “Missing is useful”: missing values in cost-sensitive decision trees. IEEE Transactions on Knowledge and Data Engineering 17(12), 1689–1693 (2005)

    Article  Google Scholar 

  29. Zhao, Y., Zhang, S.: Generalized dimension-reduction framework for recent-biased time series analysis. IEEE Transactions on Knowledge and Data Engineering 18(2), 231–244 (2006)

    Article  Google Scholar 

  30. Zhu, X., Huang, Z., Cheng, H., Cui, J., Shen, H.T.: Sparse hashing for fast multimedia search. ACM Transactions on Information Systems 31(2), 9 (2013)

    Article  Google Scholar 

  31. Zhu, X., Huang, Z., Cui, J., Shen, H.T.: Video-to-shot tag propagation by graph sparse group lasso. IEEE Transactions on Multimedia 15(3), 633–646 (2013)

    Article  Google Scholar 

  32. Zhu, X., Huang, Z., Shen, H.T., Zhao, X.: Linear cross-modal hashing for efficient multimedia search. In: ACM Multimedia, pp. 143–152 (2013)

    Google Scholar 

  33. Zhu, X., Huang, Z., Tao Shen, H., Cheng, J., Xu, C.: Dimensionality reduction by mixed kernel canonical correlation analysis. Pattern Recognition 45(8), 3003–3016 (2012)

    Article  MATH  Google Scholar 

  34. Zhu, X., Huang, Z., Yang, Y., Tao Shen, H., Xu, C., Luo, J.: Self-taught dimensionality reduction on the high-dimensional small-sized data. Pattern Recognition 46(1), 215–229 (2013)

    Article  MATH  Google Scholar 

  35. Zhu, X., Suk, H.-I., Shen, D.: Matrix-similarity based loss function and feature selection for alzheimer’s disease diagnosis. In: CVPR, pp. 3089–3096 (2014)

    Google Scholar 

  36. Zhu, X., Suk, H.-I., Shen, D.: A novel matrix-similarity based loss function for joint regression and classification in ad diagnosis. NeuroImage (2014)

    Google Scholar 

  37. Zhu, X., Zhang, L., Huang, Z.: A sparse embedding and least variance encoding approach to hashing. IEEE Transactions on Image Processing 23(9), 3737–3750 (2014)

    Article  MathSciNet  Google Scholar 

  38. Zhu, X., Zhang, S., Jin, Z., Zhang, Z., Xu, Z.: Missing value estimation for mixed-attribute data sets. IEEE Transactions on Knowledge and Data Engineering 23(1), 110–121 (2011)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Cheng, D., Zhang, S., Deng, Z., Zhu, Y., Zong, M. (2014). kNN Algorithm with Data-Driven k Value. In: Luo, X., Yu, J.X., Li, Z. (eds) Advanced Data Mining and Applications. ADMA 2014. Lecture Notes in Computer Science(), vol 8933. Springer, Cham. https://doi.org/10.1007/978-3-319-14717-8_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-14717-8_39

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-14716-1

  • Online ISBN: 978-3-319-14717-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics