Abstract
In this paper, we propose six (6) fast and efficient classification schemes for different type of images (digits, objects, characters) using the classical k-nearest neighbor (kNN) classifier. It is common knowledge that kNN is one of the most popular supervised classification algorithm. However, for large data collections, the classification is time consuming because of the distance calculations which increase with the number of training samples and the data dimensionality. To reduce the number of training samples, we propose two techniques that employ the notions of convex envelope and stratified sampling. For the convex envelope, only the data points building the convex hull will serve as designated prototypes (training samples), thus reducing considerably the computational burden. To reduce the dimensionality of the data, we considered auto-encoder networks and vector embedding. The former is learning a way to compact data representation using statistical machine learning and the latter is changing the data itself emphasizing how each data sample is organized in space compared to others. The experiments on multiple benchmark data collections such as MNIST, Fashion-MNIST and Lampung characters show a considerable classification speed up (up to 32\(\times \)) with no significant drop in accuracy when compared to results obtained using the complete data.



Similar content being viewed by others
References
Duin RPW, Verzakov S. Fast knn mode seeking clustering applied to active learning. CoRR abs/1712.07454 (2017).
Pekalska E, Duin RPW, Paclík P. Prototype selection for dissimilarity-based classifiers. Pattern Recognit. 2006;39(2):189–208.
Bishop CM. Neural networks for pattern recognition. New York: Oxford University Press Inc; 1995.
Irani J, Pise N, Phatak M. Clustering techniques and the similarity measures used in clustering: a survey. Int J Comput Appl. 2016;134:9–14.
Sharma N, Sengupta A, Sharma R, Pal U, Blumenstein M. Pincode detection using deep CNN for postal automation. In: International conference on image and vision computing, 2017;1–6.
Vajda S, Roy K, Pal U, Chaudhuri BB, Belaïd A. Automation of Indian postal documents written in Bangla and English. Int J Pattern Recognit Artif Intell. 2009;23(8):1599–632.
Borovikov E, Vajda S. Facematch: real-world face image retrieval. In: Santosh, K.C., Hangarge, M., Bevilacqua, V., Negi, A. editors. Recent trends in image processing and pattern recognition—First international conference, RTIP2R 2016, Bidar, India, December 16–17, 2016, Revised selected papers. Communications in computer and information science, 2016; 709, pp. 405–419.
Wang C. Research and application of traffic sign detection and recognition based on deep learning. In: 2018 international conference on robots intelligent system (ICRIS), 2018;150–152.
Avis D, Bremner D. How good are convex hull algorithms? In: Snoeyink, J. editor. Proceedings of the eleventh annual symposium on computational geometry, Vancouver, B.C., Canada, June 5–12, 1995, pp. 20–28.
Shahrokh Esfahani M, Dougherty ER. Effect of separate sampling on classification accuracy. Bioinformatics, 2013;30(2), 242–250. https://doi.org/10.1093/bioinformatics/btt662. https://academic.oup.com/bioinformatics/article-pdf/30/2/242/17147301/btt662.pdf.
Liu Z, Zhang A. A survey on sampling and profiling over big data (technical report). CoRR abs/2005.05079. 2020.
Hinton GE, Salakhutdinov RR. Reducing the dimensionality of data with neural networks. Science. 2006;313(5786):504–7.
Fix E, Hodges JL. Discriminatory analysis: nonparametric discrimination, consistency properties. 1951.
Agarwal Y, Poornalatha G. Analysis of the nearest neighbor classifiers: a review. In: Chiplunkar, N., Fukao, T. editors. Advances in artificial intelligence and data engineering—Select proceedings of AIDE 2019. Advances in intelligent systems and computing, pp. 559–570. Springer, Berlin, 2021. International conference on artificial intelligence and data engineering, AIDE 2019; Conference date: 23-05-2019 through 24-05-2019.
Taunk K, De S, Verma S, Swetapadma A. A brief review of nearest neighbor algorithm for learning and classification. In: 2019 international conference on intelligent computing and control systems (ICCS), pp. 1255–1260, 2019.
Torralba A, Fergus R, Freeman WT. 80 million tiny images: a large data set for nonparametric object and scene recognition. PAMI. 2008;30(11):1958–70.
Hajebi K, Abbasi-Yadkori Y, Shahbazi H, Zhang H. Fast approximate nearest-neighbor search with k-nearest neighbor graph. In: Proceedings of the twenty-second international joint conference on artificial intelligence-vol. 2. 2011;1312–1317.
Bentley JL. Multidimensional divide-and-conquer. Commun ACM. 1980;23(4):214–29.
Friedman JH, Bentley JL, Finkel RA. An algorithm for finding best matches in logarithmic expected time. ACM Trans Math Softw. 1977;3(3):209–26.
Indyk P, Motwani R. Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the thirtieth annual ACM symposium on theory of computing. STOC ’98, pp. 604–613. ACM, New York; 1998.
Jalan A, Kar P. Accelerating extreme classification via adaptive feature agglomeration. CoRR abs/1905.11769. arXiv:1905.11769. 2019.
Vajda S, Santosh KC. A fast k-nearest neighbor classifier using unsupervised clustering. In: Santosh KC, Hangarge M, Bevilacqua V, Negi A editors. Recent trends in image processing and pattern recognition—first international conference, RTIP2R 2016, Bidar, India, December 16–17, 2016, Revised selected papers. Communications in computer and information science, vol. 709, pp. 185–193, 2016.
Garcia V, Debreuve E, Barlaud M. Fast k nearest neighbor search using GPU. In: 2008 IEEE Computer Society conference on computer vision and pattern recognition workshops, pp. 1–6, 2008.
Johnson J, Douze M, Jégou H. Billion-scale similarity search with gpus. CoRR abs/1702.08734. 2017. arXiv:1702.08734.
Yepdjio H, Vajda S. A fast and efficient k-nearest neighbor classifier using a convex envelop. In: International conference on recent trends in image processing and pattern recognition, RTIP2R 2021, Msida, Malta, December 8–10. Communications in Computer and Information Science, 2021.
Cortes C, Vapnik V. Support-vector networks. Mach Learn. 1995;20(3):273–97.
Vajda S, Rangoni Y, Cecotti H. Semi-automatic ground truth generation using unsupervised clustering and limited manual labeling: application to handwritten character recognition. Pattern Recognit Lett. 2015;58:23–8.
LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. In: Intelligent signal processing, pp. 306–351, 2001.
Xiao H, Rasul K, Vollgraf R. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. CoRR abs/1708.07747; 2017.
Junaidi A, Vajda S, Fink GA. Lampung—a new handwritten character benchmark: database, labeling and recognition. In: International workshop on multilingual OCR (MOCR), pp. 105–112. ACM, Beijing, 2011.
Vajda S, Junaidi A, Fink GA. A semi-supervised ensemble learning approach for character labeling with minimal human effort. In: ICDAR, pp. 259–263, 2011.
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E. Scikit-learn: machine learning in Python. J Mach Learn Res. 2011;12:2825–30.
Computer Science University of Toronto: transpose convolutions and autoencoders. https://www.cs.toronto.edu/~lczhang/360/lec/w05/autoencoder.html. Accessed 03 Jan 2021.
Akiba T, Sano S, Yanase T, Ohta T, Koyama M. Optuna: a next-generation hyperparameter optimization framework. In: Proceedings of the 25rd ACM SIGKDD international conference on knowledge discovery and data mining. 2019.
Bergstra J, Komer B, Eliasmith C, Yamins D, Cox DD. Hyperopt: a python library for model selection and hyperparameter optimization. Comput Sci Discov. 2015;8(1): 014008.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the topical collection “Advances in Applied Image Processing and Pattern Recognition” guest edited by K C Santosh.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yepdjio Nkouanga, H., Vajda, S. Optimization Strategies for the k-Nearest Neighbor Classifier. SN COMPUT. SCI. 4, 47 (2023). https://doi.org/10.1007/s42979-022-01469-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-022-01469-3