Abstract
The kNN (k nearest-neighbors) search is currently applied in a wide range of applications, such as data mining, multimedia, information retrieval, machine learning, pattern recognition, among others. Most of the solutions for this type of search are restricted to metric spaces or limited to use low dimension data. Our proposed algorithm uses as input a set of values (or measures) and returns the K lowest values from that set and can be used with measures obtained from metric and non-metric spaces or also from high dimensional databases. In this work, we introduce a novel GPU-based exhaustive algorithm to solve kNN queries, which is composed of two steps. The first is based on pivots to reduce the range of search, and the second one uses a set of heaps as auxiliary structures to return the final results. We also extended our algorithm to be able to use a multi-GPU platform and a multi-node/multi-GPU platform. To the best of our knowledge, taking account of the state-of-the-art technical literature, this work uses the most extensive database (in terms of data amount) to process a kNN query using up to 13,189 million of elements and achieving a speed-up up to 1843× when using a 5-nodes/20-GPUs platform.








Similar content being viewed by others
Notes
OpenCL uses different names for the same constructs.
References
Adeniyi D, Wei Z, Yongquan Y (2014) Automated web usage data mining and recommendation system using k-nearest neighbor (knn) classification method. App Comput Inform. https://doi.org/10.1016/j.aci.2014.10.001
Adeniyi DA, Wei Z, Yongquan Y (2016) Automated web usage data mining and recommendation system using k-nearest neighbor (knn) classification method. Appl Comput Inform 12(1):90–108
Aha DW, Kibler D, Albert M (1991) Instance-based learning algorithms. Springer, New York, pp 37–66
AL-Nabi DA, Ahmed SS, (2013) Survey on classification algorithms for data mining: comparison and evaluation. Comput Eng Intell Syst 4(8):18–24
Amorim LA, Freitas MF, da Silva PH, Martins WS (2018) A fast similarity search knn for textual datasets. In: 2018 Symposium on High Performance Computing Systems (WSCAD). IEEE, pp 229–236
Archana S, Elangovan K (2014) Survey of classification techniques in data mining. Int J Comput Sci Mobile Appl 2(2):65–71
Bajramovic F, Mattern F, Butko N, Denzler J (2006) A comparison of nearest neighbor search algorithms for generic object recognition. In: Blanc-Talon J, Philips W, Popescu D, Scheunders P (eds) Advanced concepts for intelligent vision systems. Springer, Berlin, pp 1186–1197
Barrientos R, Gómez J, Tenllado C, Prieto M, Marin M (2011) kNN query processing in metric spaces using GPUs. In: 17th International European Conference on Parallel and Distributed Computing (Euro-Par 2011), pp 380–392
Barrientos RJ, Millaguir F, Sánchez JL, Arias E (2017) GPU-based exhaustive algorithms processing knn queries. J Supercomput 73(10):4611–4634
Beliakov G, Johnstone M, Nahavandi S (2012) Computing of high breakdown regression estimators without sorting on graphics processing units. Computing 94(5):433–447. https://doi.org/10.1007/s00607-011-0183-7
Beliakov G, Li G (2012) Improving the speed and stability of the k-nearest neighbors method. Pattern Recogn Lett 33(10):1296–1301. https://doi.org/10.1016/j.patrec.2012.02.016
Bhatia N (2010) Vandana: survey of nearest neighbor techniques. Int J Comput Sci Inform Secur 8(2)
Brisaboa NR, Fariña A, Pedreira O, Reyes N (2006) Similarity search using sparse pivots for efficient multimedia information retrieval. In: ISM, pp 881–888
Cai Y, See S (2016) GPU computing and applications. Springer, New York
Cardie C, Nowe N (1997) Improving minority class prediction using case-specific feature weights. In: Proceedings of the Fourteenth International Conference on Machine Learning, ICM-97. Morgan Kaufmann Publishers Inc., San Francisco, pp 57–65
Cayton L (2012) Accelerating nearest neighbor search on manycore systems. In: Parallel Distributed Processing Symposium (IPDPS), 2012 IEEE 26th International, pp 402–413. https://doi.org/10.1109/IPDPS.2012.45
Chapman B, Jost G, Pas RVD (2008) Using OpenMP: portable shared memory parallel programming. The MIT Press, Cambridge
Chávez E, Navarro G (2005) A compact space decomposition for effective metric indexing. Pattern Recogn Lett 26(9):1363–1376
Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inform Theory 13(1):21–27
CUDA: Compute Unified Device Architecture. 2007 NVIDIA Corporation. http://developer.nvidia.com/object/cuda.html
CUB Library v1.7.0. http://nvlabs.github.io/cub/index.html
Dashti A, Komarov I, D’Souza RM (2013) Efficient computation of k-nearest neighbour graphs for large high-dimensional data sets on gpu clusters. PLoS ONE 8(9):1–12. https://doi.org/10.1371/journal.pone.0074113
Deng Z, Zhu X, Cheng D, Zong M, Zhang S (2016) Efficient knn classification algorithm for big data. Neurocomputing 195:143–148
Deole PA, Longadge R (2014) Content based image retrieval using color feature extraction with knn classification. IJCSMC 3(5):1274–1280
Elnahrawy E (2002) Log-based chat room monitoring using text categorization: a comparative study
Garcia V, Debreuve E, Barlaud M (2008) Fast k nearest neighbor search using GPU. In: Computer Vision and Pattern Recognition Workshop, pp 1–6
GPU Computing. http://www.nvidia.com/object/what-is-gpu-computing.html
García-Pedrajas N, del Castillo JAR, Cerruela-García G (2017) A proposal for local k values for k-nearest neighbor rule. IEEE Trans Neural Netw Learn Syst 28(2):470–475
Geng X, Liu TY, Qin T, Arnold A, Li H, Shum HY (2008) Query dependent ranking using k-nearest neighbor, pp 115–122. 10.1145/1390334.1390356
Geng X, Liu TY, Qin T, Arnold A, Li H, Shum HY (2008) Query dependent ranking using k-nearest neighbor. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’08. Association for Computing Machinery, New York, pp 115–122. 10.1145/1390334.1390356
Kalakuntla P (2017) Performance analysis of knn query processing on large datasets using cuda & pthreads comparing between cpu & gpu. Ph.D. thesis. 10.13140/RG.2.2.30376.88326
Keogh E, Mueen A (2010) Curse of dimensionality. In: Encyclopedia of machine learning. Springer, pp 257–258. 10.1007/978-0-387-30164-8\_192
Klusek A, Dzwinel W (2018) Multi-gpu k-nearest neighbor search in the context of data embedding. Adv Parallel Comput 32:359–368
Krulis M, Skopal T, Lokoc J, Beecks C (2012) Combining CPU and GPU architectures for fast similarity search. Distrib Parallel Databases 30(3–4):179–207. https://doi.org/10.1007/s10619-012-7092-4
Kuang Q, Zhao L (2009) A practical GPU based knn algorithm. Huangshan, China, pp 151–155
Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml
Liu Y, Zhang D, Lu G, Ma WY (2007) A survey of content-based image retrieval with high-level semantics. Pattern Recogn 40(1):262–282
Ma H, Gou J, Ou W, Zeng S, Rao Y, Yang H (2017) A new nearest neighbor classifier based on multi-harmonic mean distances. In: 2017 International Conference on Security, Pattern Analysis, and Cybernetics (SPAC), pp 31–36
Mic V, Novak D, Zezula P (2016) Speeding up similarity search by sketches. Springer, Cham, pp 250–258. 10.1007/978-3-319-46759-7\_19
Myhre JN, Mikalsen KØ, Løkse S, Jenssen R (2018) Robust clustering using a knn mode seeking ensemble. Pattern Recogn 76:491–505
Navarro CA, Hitschfeld-Kahler N, Mateu L (2014) A survey on parallel computing and its applications in data-parallel problems using GPU architectures. Commun Comput Phys 15(2):285–329
Navarro G, Uribe-Paredes R (2011) Fully dynamic metric access methods based on hyperplane partitioning. Inform Syst 36(4):734–747. https://doi.org/10.1016/j.is.2011.01.002
Novak D, Batko M, Zezula P (2011) Metric index: an efficient and scalable solution for precise and approximate similarity search. Inform Syst 36(4):721–733
NVIDIA T (2017) V100 GPU architecture
NVIDIA Corporation (2015) CUDA C Best Practices Guide, 7.5 edn
Pan J, Manocha D (2011) Fast GPU-based locality sensitive hashing for k-nearest neighbor computation. In: Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, GIS ’11. ACM, New York, pp 211–220. 10.1145/2093973.2094002
Pan Z, Wang Y, Pan Y (2020) A new locally adaptive k-nearest neighbor algorithm based on discrimination class. Knowledge-Based Syst 204:106–185. https://doi.org/10.1016/j.knosys.2020.106185
Romsaiyud W, Schnoor H, Hasselbring W (2019) Improving k-nearest neighbor pattern recognition models for privacy-preserving data analysis. In: 2019 IEEE International Conference on Big Data (Big Data), pp 5804–5813
Schäfer M (2018) The fourth industrial revolution: how the EU can lead it. Eur View 17(1):5–12. https://doi.org/10.1177/1781685818762890
Skryjomski P, Krawczyk B, Cano A (2019) Speeding up k-nearest neighbors classifier for large-scale multi-label learning on GPUs. Neurocomputing 354:10–19
Tang X, Huang Z, Eyers D, Mills S, Guo M (2015) Efficient selection algorithm for fast k-NN search on GPUs. In: 2015 IEEE International Parallel and Distributed Processing Symposium. IEEE, pp 397–406
Toker G, Kirmemis O (2013) Text categorization using k nearest neighbor classification. Middle East Technical University, Survey Paper
Tesla C2050/C2070 GPU Computing Processor. http://www.nvidia.co.uk/object/product_tesla_C2050_C2070_uk.html
Tesla M2050/M2070 GPU Computing Processor. http://www.nvidia.co.uk/object/product_tesla_M2050_M2070_uk.html
Vaidehi V (2008) Person authentication using face recognition. In: Proceedings of World Congress on Engineering and Computer Science. https://ci.nii.ac.jp/naid/20000817879/en/
Watad A, Libov A, Shacham O, Bortnikov E, Silberstein M (2019) Achieving scalability in a k-NN multi-GPU network service with centaur. In: 2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT). IEEE, pp 245–257
Xua S, Wub Y (2008) An algorithm for remote sensing image classification based on artificial immune b-cell network
Yang Y, Ault T, Pierce T, Lattimer CW (2000) Improving text categorization methods for event tracking. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2000. Association for Computing Machinery, New York, pp 65–72. 10.1145/345508.345550
Zhang H, Kiranyaz S, Gabbouj M (2017) A k-nearest neighbor multilabel ranking algorithm with application to content-based image retrieval. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 2587–2591
Acknowledgements
This research was funded by project FONDECYT REGULAR 2020 No 1200810 “Very Large Fingerprint Classification based on a Fast and Distributed Extreme Learning Machine”, and project FONDECYT DE INICIACIÓN No 11180881. Both projects are from Agencia Nacional de Investigación y Desarrollo, Ministerio de Ciencia, Tecnología, Conocimiento e Innovación, Gobierno de Chile.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Barrientos, R.J., Riquelme, J.A., Hernández-García, R. et al. Fast kNN query processing over a multi-node GPU environment. J Supercomput 78, 3045–3071 (2022). https://doi.org/10.1007/s11227-021-03975-2
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-021-03975-2