Improved nearest neighbor classifiers by weighting and selection of predictors

Tutz, Gerhard; Koch, Dominik

doi:10.1007/s11222-015-9588-z

Improved nearest neighbor classifiers by weighting and selection of predictors

Published: 05 July 2015

Volume 26, pages 1039–1057, (2016)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Gerhard Tutz¹ &
Dominik Koch¹

487 Accesses
6 Citations
Explore all metrics

Abstract

Nearest neighborhood classification is a flexible classification method that works under weak assumptions. The basic concept is to use the weighted or un-weighted sums over class indicators of observations in the neighborhood of the target value. Two modifications that improve the performance are considered here. Firstly, instead of using weights that are solely determined by the distances we estimate the weights by use of a logit model. By using a selection procedure like lasso or boosting the relevant nearest neighbors are automatically selected. Based on the concept of estimation and selection, in the second step, we extend the predictor space. We include nearest neighborhood counts, but also the original predictors themselves and nearest neighborhood counts that use distances in sub dimensions of the predictor space. The resulting classifiers combine the strength of nearest neighbor methods with parametric approaches and by use of sub dimensions are able to select the relevant features. Simulations and real data sets demonstrate that the method yields better misclassification rates than currently available nearest neighborhood methods and is a strong and flexible competitor in classification problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Bache, K., Lichman, M.: Uci machine learning repository. http://archive.ics.uci.edu/ml 19 (2013)
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996a)
MathSciNet MATH Google Scholar
Breiman, L.: Heuristics of instability and stabilisation in model selection. Ann. Stat. 24, 2350–2383 (1996b)
Article MathSciNet MATH Google Scholar
Breiman, L.: Stacked regressions. Mach. Learn. 24(1), 49–64 (1996c)
MathSciNet MATH Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article MathSciNet MATH Google Scholar
Bühlmann, P., Hothorn, T.: Boosting algorithms: regularization, prediction and model fitting (with discussion). Stat. Sci. 22, 477–505 (2007)
Article MATH Google Scholar
Bühlmann, P., Yu, B.: Boosting with the L2 loss: regression and classification. J. Am. Stat. Assoc. 98, 324–339 (2003)
Article MATH Google Scholar
Candes, E., Tao, T.: The Dantzig selector: statistical estimation when p is much larger than n. Ann. Stat. 35(6), 2313–2351 (2007)
Article MathSciNet MATH Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
MATH Google Scholar
Domeniconi, C., Peng, J., Gunopulos, D.: Locally adaptive metric nearest-neighbor classification. IEEE Trans. Pattern Anal. Mach. Intell. 24, 1281–1285 (2002)
Article Google Scholar
Domeniconi, C., Yan, B.: Nearest neighbor ensemble. In: Proceedings of the 17th International Conference on Pattern Recognition, vol. 1, pp. 228–231 (2004)
Fan, J., Li, R.: Variable selection via nonconcave penalize likelihood and its oracle properties. J. Am. Stat. Assoc. 96, 1348–1360 (2001)
Article MathSciNet MATH Google Scholar
Fix, E., Hodges, J.L.: Discriminatory Analysis-nonparametric Discrimination: Consistency Properties. US Air Force School of Aviation Medicine, Randolph Field (1951)
MATH Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1–22 (2010)
Article Google Scholar
Friedman, J.H.: Flexible metric nearest neighbor classification. Technical report 113, Stanford University, Statistics Department (1994)
Friedman, J.H., Hastie, T., Tibshirani, R.: Additive logistic regression: a statistical view of boosting. Ann. Stat. 28, 337–407 (2000)
Article MathSciNet MATH Google Scholar
Gertheiss, J., Tutz, G.: Feature selection and weighting by nearest neighbor ensembles. Chemom. Intell. Lab. Syst. 99, 30–38 (2009)
Article Google Scholar
Ghosh, A.K.: On nearest neighbor classification using adaptive choice of k. J. Comput. Graph. Stat. 16(2), 482–502 (2007)
Article MathSciNet Google Scholar
Ghosh, A.K.: A probabilistic approach for semi-supervised nearest neighbor classification. Pattern Recognit. Lett. 33(9), 1127–1133 (2012)
Article Google Scholar
Ghosh, A.K., Godtliebsen, F.: On hybrid classification using model assisted posterior estimates. Pattern Recognit. 45(6), 2288–2298 (2012)
Article MATH Google Scholar
Goeman, J. J.: Penalized: weighted k-nearest neighbors. R package version 0.9–42 (2012)
Hall, P., Park, B.U., Samworth, R.J.: Choice of neighbor order in nearest-neighbor classification. Ann. Stat. 36, 2135–2152 (2008)
Article MathSciNet MATH Google Scholar
Hastie, T., Tibshirani, R.: Discriminant adaptive nearest-neighbor classification. IEEE Trans. Pattern Anal. Mach. Intell. 18, 607–616 (1996)
Article Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.H.: The Elements of Statistical Learning, 2nd edn. Springer, New York (2009)
Book MATH Google Scholar
Holmes, C., Adams, N.: A probabilistic nearest neighbour method for statistical pattern recognition. J. R. Stat. Soc.: Ser. B (Stat. Methodol.) 64(2), 295–306 (2002)
Article MathSciNet MATH Google Scholar
Holmes, C.C., Adams, N.M.: Likelihood inference in nearest-neighbour classification models. Biometrika 90(1), 99–112 (2003)
Article MathSciNet MATH Google Scholar
Hothorn, T.: TH.data: TH’s data archive. R package version 1.0-3 (2014)
Hothorn, T., Buehlmann, P., Kneib, T., Schmid, M., Hofner, B.: Mboost: model-based boosting. R package version 2.2-3 (2013)
Leisch, F., Dimitriadou, E.: mlbench: Machine learning benchmark problems. R package version 2.1-1 (2010)
Liaw, A., Wiener, M.: Classification and regression by randomforest. R News 2(3), 18–22 (2002)
Google Scholar
Lin, Y., Jeon, Y.: Random forests and adaptive nearest neighbors. J. Am. Stat. Assoc. 101, 578–590 (2006)
Article MathSciNet MATH Google Scholar
Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., Leisch, F.: e1071: Misc functions of the department of statistics (e1071), TU Wien. R package version 1.6-2 (2014)
Morin, R.L., Raeside, D.E.: A reappraisal of distance-weighted k-nearest neighbor classification for pattern recognition with missing data. IEEE Trans. Syst. Man Cybern. 11, 241–243 (1981)
Article MathSciNet Google Scholar
Nadaraya, E.A.: On estimating regression. Theory Probab. Appl. 10, 186–190 (1964)
Article MATH Google Scholar
Paik, M., Yang, Y.: Combining nearest neighbor classifiers versus cross-validation selection. Stat. Appl. Genet. Mol. Biol. 3(12), 1–19 (2004)
Park, M.Y., Hastie, T.: An l1 regularization-path algorithm for generalized linear models. J. R. Stat. Soc. B 69, 659–677 (2007)
Article MathSciNet Google Scholar
Parthasarthy, G., Chatterji, B.N.: A class of new knn methods for low sample problems. IEEE Trans. Syst. Man Cybern. 20, 715–718 (1990)
Pößnecker, W.: MRSP: multinomial response models with structured penalties. R package version 0.4.2 (2014)
R Core Team: R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing (2013)
Ripley, B.D.: Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge (1996)
Book MATH Google Scholar
Schliep, K., Hechenbichler, K.: kknn: Weighted k-nearest neighbors. R package version 1.2-3 (2013)
Silverman, B.W., Jones, M.C.: Commentary on fix and hodges (1951): an important contribution to nonparametric discriminant analysis and density estimation. Int. Stat. Rev. 57, 233–238 (1989)
Article MATH Google Scholar
Simonoff, J.S.: Smoothing Methods in Statistics. Springer, New York (1996)
Book MATH Google Scholar
Stone, C.J.: Consistent nonparametric regression (with discussion). Ann. Stat. 5, 595–645 (1977)
Article MATH Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58, 267–288 (1996)
MathSciNet MATH Google Scholar
Tibshirani, R., Chu, G., Narasimhan, B., Li, J.: samr: SAM: Significance analysis of microarrays. R package version 2.0 (2011)
Tutz, G., Binder, H.: Generalized additive modeling with implicit variable selection by likelihood-based boosting. Biometrics 62, 961–971 (2006)
Article MathSciNet MATH Google Scholar
Tutz, G., Pössnecker, W., Uhlmann, L.: Variable selection in general multinomial logit models. Comput. Stat. Data Anal. 82, 207–222 (2015)
Article MathSciNet Google Scholar
Venables, W.N., Ripley, B.D.: Modern Applied Statistics with S (Fourth ed.). New York: Springer. ISBN 0-387-95457-0 (2002)
Watson, G.S.: Smooth regression analysis. Sankhyā, Ser. A 26, 359–372 (1964)
MathSciNet MATH Google Scholar
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. B 67, 301–320 (2005)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Ludwig-Maximilians-Universität München, Akademiestraße 1, 80799, Munich, Germany
Gerhard Tutz & Dominik Koch

Authors

Gerhard Tutz
View author publications
You can also search for this author in PubMed Google Scholar
Dominik Koch
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gerhard Tutz.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tutz, G., Koch, D. Improved nearest neighbor classifiers by weighting and selection of predictors. Stat Comput 26, 1039–1057 (2016). https://doi.org/10.1007/s11222-015-9588-z

Download citation

Received: 15 June 2014
Accepted: 23 June 2015
Published: 05 July 2015
Issue Date: September 2016
DOI: https://doi.org/10.1007/s11222-015-9588-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improved nearest neighbor classifiers by weighting and selection of predictors

Abstract

Access this article

Similar content being viewed by others

A regression model based on the nearest centroid neighborhood

Non-parametric Nearest Neighbor Classification Based on Global Variance Difference

Nearest Neighbor Algorithms

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improved nearest neighbor classifiers by weighting and selection of predictors

Abstract

Access this article

Similar content being viewed by others

A regression model based on the nearest centroid neighborhood

Non-parametric Nearest Neighbor Classification Based on Global Variance Difference

Nearest Neighbor Algorithms

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation