Optimization Strategies for the k-Nearest Neighbor Classifier

Yepdjio Nkouanga, Hermann; Vajda, Szilárd

doi:10.1007/s42979-022-01469-3

Optimization Strategies for the k-Nearest Neighbor Classifier

Original Research
Published: 10 November 2022

Volume 4, article number 47, (2023)
Cite this article

SN Computer Science Aims and scope Submit manuscript

241 Accesses
2 Citations
Explore all metrics

Abstract

In this paper, we propose six (6) fast and efficient classification schemes for different type of images (digits, objects, characters) using the classical k-nearest neighbor (kNN) classifier. It is common knowledge that kNN is one of the most popular supervised classification algorithm. However, for large data collections, the classification is time consuming because of the distance calculations which increase with the number of training samples and the data dimensionality. To reduce the number of training samples, we propose two techniques that employ the notions of convex envelope and stratified sampling. For the convex envelope, only the data points building the convex hull will serve as designated prototypes (training samples), thus reducing considerably the computational burden. To reduce the dimensionality of the data, we considered auto-encoder networks and vector embedding. The former is learning a way to compact data representation using statistical machine learning and the latter is changing the data itself emphasizing how each data sample is organized in space compared to others. The experiments on multiple benchmark data collections such as MNIST, Fashion-MNIST and Lampung characters show a considerable classification speed up (up to 32$\times $) with no significant drop in accuracy when compared to results obtained using the complete data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

A Fast and Efficient K-Nearest Neighbor Classifier Using a Convex Envelope

Combining k-Nearest Neighbor and Centroid Neighbor Classifier for Fast and Robust Classification

Integrated Parallel K-Nearest Neighbor Algorithm

Notes

References

Duin RPW, Verzakov S. Fast knn mode seeking clustering applied to active learning. CoRR abs/1712.07454 (2017).
Pekalska E, Duin RPW, Paclík P. Prototype selection for dissimilarity-based classifiers. Pattern Recognit. 2006;39(2):189–208.
Article MATH Google Scholar
Bishop CM. Neural networks for pattern recognition. New York: Oxford University Press Inc; 1995.
MATH Google Scholar
Irani J, Pise N, Phatak M. Clustering techniques and the similarity measures used in clustering: a survey. Int J Comput Appl. 2016;134:9–14.
Google Scholar
Sharma N, Sengupta A, Sharma R, Pal U, Blumenstein M. Pincode detection using deep CNN for postal automation. In: International conference on image and vision computing, 2017;1–6.
Vajda S, Roy K, Pal U, Chaudhuri BB, Belaïd A. Automation of Indian postal documents written in Bangla and English. Int J Pattern Recognit Artif Intell. 2009;23(8):1599–632.
Article Google Scholar
Borovikov E, Vajda S. Facematch: real-world face image retrieval. In: Santosh, K.C., Hangarge, M., Bevilacqua, V., Negi, A. editors. Recent trends in image processing and pattern recognition—First international conference, RTIP2R 2016, Bidar, India, December 16–17, 2016, Revised selected papers. Communications in computer and information science, 2016; 709, pp. 405–419.
Wang C. Research and application of traffic sign detection and recognition based on deep learning. In: 2018 international conference on robots intelligent system (ICRIS), 2018;150–152.
Avis D, Bremner D. How good are convex hull algorithms? In: Snoeyink, J. editor. Proceedings of the eleventh annual symposium on computational geometry, Vancouver, B.C., Canada, June 5–12, 1995, pp. 20–28.
Shahrokh Esfahani M, Dougherty ER. Effect of separate sampling on classification accuracy. Bioinformatics, 2013;30(2), 242–250. https://doi.org/10.1093/bioinformatics/btt662. https://academic.oup.com/bioinformatics/article-pdf/30/2/242/17147301/btt662.pdf.
Liu Z, Zhang A. A survey on sampling and profiling over big data (technical report). CoRR abs/2005.05079. 2020.
Hinton GE, Salakhutdinov RR. Reducing the dimensionality of data with neural networks. Science. 2006;313(5786):504–7.
Article MathSciNet MATH Google Scholar
Fix E, Hodges JL. Discriminatory analysis: nonparametric discrimination, consistency properties. 1951.
Agarwal Y, Poornalatha G. Analysis of the nearest neighbor classifiers: a review. In: Chiplunkar, N., Fukao, T. editors. Advances in artificial intelligence and data engineering—Select proceedings of AIDE 2019. Advances in intelligent systems and computing, pp. 559–570. Springer, Berlin, 2021. International conference on artificial intelligence and data engineering, AIDE 2019; Conference date: 23-05-2019 through 24-05-2019.
Taunk K, De S, Verma S, Swetapadma A. A brief review of nearest neighbor algorithm for learning and classification. In: 2019 international conference on intelligent computing and control systems (ICCS), pp. 1255–1260, 2019.
Torralba A, Fergus R, Freeman WT. 80 million tiny images: a large data set for nonparametric object and scene recognition. PAMI. 2008;30(11):1958–70.
Article Google Scholar
Hajebi K, Abbasi-Yadkori Y, Shahbazi H, Zhang H. Fast approximate nearest-neighbor search with k-nearest neighbor graph. In: Proceedings of the twenty-second international joint conference on artificial intelligence-vol. 2. 2011;1312–1317.
Bentley JL. Multidimensional divide-and-conquer. Commun ACM. 1980;23(4):214–29.
Article MathSciNet MATH Google Scholar
Friedman JH, Bentley JL, Finkel RA. An algorithm for finding best matches in logarithmic expected time. ACM Trans Math Softw. 1977;3(3):209–26.
Article MATH Google Scholar
Indyk P, Motwani R. Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the thirtieth annual ACM symposium on theory of computing. STOC ’98, pp. 604–613. ACM, New York; 1998.
Jalan A, Kar P. Accelerating extreme classification via adaptive feature agglomeration. CoRR abs/1905.11769. arXiv:1905.11769. 2019.
Vajda S, Santosh KC. A fast k-nearest neighbor classifier using unsupervised clustering. In: Santosh KC, Hangarge M, Bevilacqua V, Negi A editors. Recent trends in image processing and pattern recognition—first international conference, RTIP2R 2016, Bidar, India, December 16–17, 2016, Revised selected papers. Communications in computer and information science, vol. 709, pp. 185–193, 2016.
Garcia V, Debreuve E, Barlaud M. Fast k nearest neighbor search using GPU. In: 2008 IEEE Computer Society conference on computer vision and pattern recognition workshops, pp. 1–6, 2008.
Johnson J, Douze M, Jégou H. Billion-scale similarity search with gpus. CoRR abs/1702.08734. 2017. arXiv:1702.08734.
Yepdjio H, Vajda S. A fast and efficient k-nearest neighbor classifier using a convex envelop. In: International conference on recent trends in image processing and pattern recognition, RTIP2R 2021, Msida, Malta, December 8–10. Communications in Computer and Information Science, 2021.
Cortes C, Vapnik V. Support-vector networks. Mach Learn. 1995;20(3):273–97.
Article MATH Google Scholar
Vajda S, Rangoni Y, Cecotti H. Semi-automatic ground truth generation using unsupervised clustering and limited manual labeling: application to handwritten character recognition. Pattern Recognit Lett. 2015;58:23–8.
Article Google Scholar
LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. In: Intelligent signal processing, pp. 306–351, 2001.
Xiao H, Rasul K, Vollgraf R. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. CoRR abs/1708.07747; 2017.
Junaidi A, Vajda S, Fink GA. Lampung—a new handwritten character benchmark: database, labeling and recognition. In: International workshop on multilingual OCR (MOCR), pp. 105–112. ACM, Beijing, 2011.
Vajda S, Junaidi A, Fink GA. A semi-supervised ensemble learning approach for character labeling with minimal human effort. In: ICDAR, pp. 259–263, 2011.
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E. Scikit-learn: machine learning in Python. J Mach Learn Res. 2011;12:2825–30.
MathSciNet MATH Google Scholar
Computer Science University of Toronto: transpose convolutions and autoencoders. https://www.cs.toronto.edu/~lczhang/360/lec/w05/autoencoder.html. Accessed 03 Jan 2021.
Akiba T, Sano S, Yanase T, Ohta T, Koyama M. Optuna: a next-generation hyperparameter optimization framework. In: Proceedings of the 25rd ACM SIGKDD international conference on knowledge discovery and data mining. 2019.
Bergstra J, Komer B, Eliasmith C, Yamins D, Cox DD. Hyperopt: a python library for model selection and hyperparameter optimization. Comput Sci Discov. 2015;8(1): 014008.
Article Google Scholar

Download references

Author information

Hermann Yepdjio Nkouanga and Szilárd Vajda contributed equally to this work.

Authors and Affiliations

Department of Computer Science, Portland State University, 1825 SW Broadway, Portland, 97201, OR, USA
Hermann Yepdjio Nkouanga
Department of Computer Science, Central Washington University, 400 University Way, Ellensburg, 98926, WA, USA
Szilárd Vajda

Authors

Hermann Yepdjio Nkouanga
View author publications
Search author on:PubMed Google Scholar
Szilárd Vajda
View author publications
Search author on:PubMed Google Scholar

Corresponding authors

Correspondence to Hermann Yepdjio Nkouanga or Szilárd Vajda.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Advances in Applied Image Processing and Pattern Recognition” guest edited by K C Santosh.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yepdjio Nkouanga, H., Vajda, S. Optimization Strategies for the k-Nearest Neighbor Classifier. SN COMPUT. SCI. 4, 47 (2023). https://doi.org/10.1007/s42979-022-01469-3

Download citation

Received: 15 March 2022
Accepted: 21 October 2022
Published: 10 November 2022
DOI: https://doi.org/10.1007/s42979-022-01469-3

Keywords

Part of a collection:

Advances in Applied Image Processing and Pattern Recognition

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimization Strategies for the k-Nearest Neighbor Classifier

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Fast and Efficient K-Nearest Neighbor Classifier Using a Convex Envelope

Combining k-Nearest Neighbor and Centroid Neighbor Classifier for Fast and Robust Classification

Integrated Parallel K-Nearest Neighbor Algorithm

Explore related subjects

Notes

References

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now