kNN Algorithm with Data-Driven k Value

Cheng, Debo; Zhang, Shichao; Deng, Zhenyun; Zhu, Yonghua; Zong, Ming

doi:10.1007/978-3-319-14717-8_39

Debo Cheng²²,
Shichao Zhang²²,
Zhenyun Deng²²,
Yonghua Zhu²² &
…
Ming Zong²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8933))

Included in the following conference series:

International Conference on Advanced Data Mining and Applications

3509 Accesses
42 Citations

Abstract

This paper proposes a new k Nearest Neighbor (kNN) algorithm based on sparse learning, so as to overcome the drawbacks of the previous kNN algorithm, such as the fixed k value for each test sample and the neglect of the correlation of samples. Specifically, the paper reconstructs test samples by training samples to learn the optimal k value for each test sample, and then uses kNN algorithm with the learnt k value to conduct all kinds of tasks, such as classification, regression, and missing value imputation. The rationale of the proposed method is that different test samples should be assigned different k values in kNN algorithm, and learning the optimal k value for each test sample should be taken the correlation of data into account. To this end, in the reconstruction process, the proposed method is designed to achieve the minimal reconstruction error via a least square loss function, and employ an ℓ₁-norm regularization term to create the element-wise sparsity in the reconstruction coefficient, i.e., sparsity appearing in the element of the coefficient matrix. For achieving effectiveness, the Locality Preserving Projection (LPP) is employed to keep the local structures of data. Finally, the experimental results on real datasets, and the experimental results show that the proposed kNN algorithm is better than the state-of-the-art algorithms in terms of different learning tasks, such as classification, regression, and missing value imputation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bache, K., Lichman, M.: UCI machine learning repository (2013)
Google Scholar
Burba, F., Ferraty, F., Vieu, P.: k-nearest neighbour method in functional nonparametric regression. Journal of Nonparametric Statistics 21(4), 453–469 (2009)
Article MATH MathSciNet Google Scholar
Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27:1–27:27 (2011), http://www.csie.ntu.edu.tw/~cjlin/libsvm
Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Transactions on Information Theory 13(1), 21–27 (1967)
Article MATH Google Scholar
Ferraty, F., Vieu, P.: Nonparametric functional data analysis: theory and practice (2006)
Google Scholar
Goldberger, J., Roweis, S.T., Hinton, G.E., Salakhutdinov, R.: Neighbourhood components analysis. In: NIPS (2004)
Google Scholar
He, X., Niyogi, P.: Locality preserving projections. In: NIPS (2003)
Google Scholar
Kang, P., Cho, S.: Locally linear reconstruction for instance-based learning. Pattern Recognition 41(11), 3507–3518 (2008)
Article MATH Google Scholar
Lall, U., Sharma, A.: A nearest neighbor bootstrap for resampling hydrologic time series. Water Resources Research 32(3), 679–693 (1996)
Article Google Scholar
Liu, H., Zhang, S., Zhao, J., Zhao, X., Mo, Y.: A new classification algorithm using mutual nearest neighbors. In: GCC, pp. 52–57 (2010)
Google Scholar
Meesad, P., Hengpraprohm, K.: Combination of knn-based feature selection and knnbased missing-value imputation of microarray data. In: ICICIC, pp. 341–341 (2008)
Google Scholar
Qin, Y., Zhang, S., Zhu, X., Zhang, J., Zhang, C.: Semi-parametric optimization for missing data imputation. Applied Intelligence 27(1), 79–88 (2007)
Article MATH Google Scholar
Qin, Z., Wang, A.T., Zhang, C., Zhang, S.: Cost-sensitive classification with k-nearest neighbors. In: Wang, M. (ed.) KSEM 2013. LNCS, vol. 8041, pp. 112–131. Springer, Heidelberg (2013)
Chapter Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 267–288 (1996)
Google Scholar
Weinberger, K.Q., Blitzer, J., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. In: NIPS, pp. 1473–1480 (2005)
Google Scholar
Wu, X., Zhang, C., Zhang, S.: Efficient mining of both positive and negative association rules. ACM Transactions on Information Systems (TOIS) 22(3), 381–405 (2004)
Article Google Scholar
Wu, X., Zhang, C., Zhang, S.: Database classification for multi-database mining. Information Systems 30(1), 71–88 (2005)
Article MATH Google Scholar
Wu, X., Zhang, S.: Synthesizing high-frequency rules from different data sources. IEEE Transactions on Knowledge and Data Engineering 15(2), 353–367 (2003)
Article Google Scholar
Zhang, C., Zhu, X., Zhang, J., Qin, Y., Zhang, S.: GBKII: An imputation method for missing values. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 1080–1087. Springer, Heidelberg (2007)
Chapter Google Scholar
Zhang, S.: Cost-sensitive classification with respect to waiting cost. Knowledge-Based Systems 23(5), 369–378 (2010)
Article Google Scholar
Zhang, S.: Estimating semi-parametric missing values with iterative imputation. International Journal of Data Warehousing and Mining 6(3), 1–10 (2010)
Article Google Scholar
Zhang, S.: KNN-CF approach: Incorporating certainty factor to knn classification. IEEE Intelligent Informatics Bulletin 11(1), 24–33 (2010)
Google Scholar
Zhang, S.: Shell-neighbor method and its application in missing data imputation. Applied Intelligence 35(1), 123–133 (2011)
Article MATH Google Scholar
Zhang, S.: Decision tree classifiers sensitive to heterogeneous costs. Journal of Systems and Software 85(4), 771–779 (2012)
Article Google Scholar
Zhang, S.: Nearest neighbor selection for iteratively knn imputation. Journal of Systems and Software 85(11), 2541–2552 (2012)
Article Google Scholar
Zhang, S., Jin, Z., Zhu, X.: Missing data imputation by utilizing information within incomplete instances. Journal of Systems and Software 84(3), 452–459 (2011)
Article Google Scholar
Zhang, S., Jin, Z., Zhu, X., Zhang, J.: Missing data analysis: A kernel-based multi-imputation approach. In: Gavrilova, M.L., Tan, C.J.K. (eds.) Transactions on Computational Science III. LNCS, vol. 5300, pp. 122–142. Springer, Heidelberg (2009)
Chapter Google Scholar
Zhang, S., Qin, Z., Ling, C.X., Sheng, S.: “Missing is useful”: missing values in cost-sensitive decision trees. IEEE Transactions on Knowledge and Data Engineering 17(12), 1689–1693 (2005)
Article Google Scholar
Zhao, Y., Zhang, S.: Generalized dimension-reduction framework for recent-biased time series analysis. IEEE Transactions on Knowledge and Data Engineering 18(2), 231–244 (2006)
Article Google Scholar
Zhu, X., Huang, Z., Cheng, H., Cui, J., Shen, H.T.: Sparse hashing for fast multimedia search. ACM Transactions on Information Systems 31(2), 9 (2013)
Article Google Scholar
Zhu, X., Huang, Z., Cui, J., Shen, H.T.: Video-to-shot tag propagation by graph sparse group lasso. IEEE Transactions on Multimedia 15(3), 633–646 (2013)
Article Google Scholar
Zhu, X., Huang, Z., Shen, H.T., Zhao, X.: Linear cross-modal hashing for efficient multimedia search. In: ACM Multimedia, pp. 143–152 (2013)
Google Scholar
Zhu, X., Huang, Z., Tao Shen, H., Cheng, J., Xu, C.: Dimensionality reduction by mixed kernel canonical correlation analysis. Pattern Recognition 45(8), 3003–3016 (2012)
Article MATH Google Scholar
Zhu, X., Huang, Z., Yang, Y., Tao Shen, H., Xu, C., Luo, J.: Self-taught dimensionality reduction on the high-dimensional small-sized data. Pattern Recognition 46(1), 215–229 (2013)
Article MATH Google Scholar
Zhu, X., Suk, H.-I., Shen, D.: Matrix-similarity based loss function and feature selection for alzheimer’s disease diagnosis. In: CVPR, pp. 3089–3096 (2014)
Google Scholar
Zhu, X., Suk, H.-I., Shen, D.: A novel matrix-similarity based loss function for joint regression and classification in ad diagnosis. NeuroImage (2014)
Google Scholar
Zhu, X., Zhang, L., Huang, Z.: A sparse embedding and least variance encoding approach to hashing. IEEE Transactions on Image Processing 23(9), 3737–3750 (2014)
Article MathSciNet Google Scholar
Zhu, X., Zhang, S., Jin, Z., Zhang, Z., Xu, Z.: Missing value estimation for mixed-attribute data sets. IEEE Transactions on Knowledge and Data Engineering 23(1), 110–121 (2011)
Article Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer Science & Information Technology, Guangxi Normal University, Guilin, Guangxi, 541004, China
Debo Cheng, Shichao Zhang, Zhenyun Deng, Yonghua Zhu & Ming Zong

Authors

Debo Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Shichao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhenyun Deng
View author publications
You can also search for this author in PubMed Google Scholar
Yonghua Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Ming Zong
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Sun Yat-sen University, Guangzhou, P.R. China
Xudong Luo
The Chinese University of Hong Kong, Shatin, N.T., Hong Kong
Jeffrey Xu Yu
Guanxi Normal University, Guilin, P.R. China
Zhi Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cheng, D., Zhang, S., Deng, Z., Zhu, Y., Zong, M. (2014). kNN Algorithm with Data-Driven k Value. In: Luo, X., Yu, J.X., Li, Z. (eds) Advanced Data Mining and Applications. ADMA 2014. Lecture Notes in Computer Science(), vol 8933. Springer, Cham. https://doi.org/10.1007/978-3-319-14717-8_39

Download citation

DOI: https://doi.org/10.1007/978-3-319-14717-8_39
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14716-1
Online ISBN: 978-3-319-14717-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics