ABSTRACT
k-nearest neighbor (k-NN) search is one of the commonly used query in database systems. It has its application in various domains like data mining, decision support systems, information retrieval, multimedia and spatial databases, etc. When k-NN search is performed over large data sets, spatial data indexing structures such as R-trees are commonly used to improve query efficiency. The best-first k-NN (BF-kNN) algorithm is the fastest known k-NN over R-trees. We present CBF-kNN, a concurrent BF-kNN for R-trees, which is the first concurrent version of k-NN we know of for R-trees. CBF-kNN uses one of the most efficient concurrent priority queues known as mound. CBF-kNN overcomes the concurrency limitations of priority queues by using a tree-parallel mode of execution. CBF-kNN has an estimated speedup of O(p/k) for p threads. Experimental results on various real datasets show that the speedup in practice is close to this estimate.
- T. Cover and P. Hart. 1967. Nearest neighbor pattern classification. IEEE Trans. Inf. Theo. 13,1(Sep 1967), 21--27. Google ScholarDigital Library
- N. Bhatia and Vandana. 2010. Survey of Nearest Neighbor Techniques. International Journal of Computer Science & Information Security (IJCSIS'10) 8, 2 (2010), 302--305.Google Scholar
- A. Guttman. 1984. R-trees: a dynamic index structure for spatial searching. SIGMOD Rec.14, 2 (June 1984), 47--57. Google ScholarDigital Library
- Y. Manolopoulos, et al. 2005. R-Trees: Theory and Applications (Advanced Information and Knowledge Processing). Springer-Verlag New York, Inc., NJ, USA. Google ScholarDigital Library
- Jon Louis Bentley. 1975. Multidimensional binary search trees used for associative searching. Commun. ACM 18, 9 (Sep 1975), 509--517. Google ScholarDigital Library
- N. Roussopoulos, S. Kelley, and F. Vincent. 1995. Nearest neighbor queries. SIGMOD Rec. 24, 2 (May 1995), 71--79. Google ScholarDigital Library
- K. L. Cheung and A. W. Fu. 1998. Enhanced nearest neighbour search on the R-tree. SIGMOD Rec. 27, 3 (Sep 1998), 16--21. Google ScholarDigital Library
- G. R. Hjaltason and H. Samet. 1999. Distance browsing in spatial databases. ACM Trans. Database Syst. 24, 2 (June 1999), 265--318 Google ScholarDigital Library
- J. H. Friedman, J. L. Bentley, and R. A. Finkel. 1977. An Algorithm for Finding Best Matches in Logarithmic Expected Time. ACM Trans. Math. Soft. 3, 3 (Sep 1977), 209--226. Google ScholarDigital Library
- R. F. Sproull. 1991. Refinements to Nearest-Neighbor Search in k-Dimensional Trees. Algorithmica 6, (1991), 579--589.Google Scholar
- N. Sismanis, N. Pitsianis, and X. Sun. 2012. Parallel search of k-nearest neighbors with synchronous operations. In Proceedings of 2012 IEEE Conference on High Performance Extreme Computing (HPEC), IEEE Computer Society, Washington D.C., USA, 1--6.Google Scholar
- F. Gieseke, et al. 2014. Buffer k-d Trees: Processing Massive Nearest Neighbor Queries on GPUs. In Proc. of 31st International Conference on Machine Learning, Beijing, China, 2014, 1--9.Google Scholar
- T. Hering. 2013. Parallel Execution of kNN-Queries on in-memory K-D Trees. In Proc. of 15th GI Symposium on Business, Technology & Web (BTW'13), Magdeburg, Germany, 257--266.Google Scholar
- A. N. Papadopoulos and Y. Manolopoulos. 1998. Similarity query processing using disk arrays. In Proc. of the 1998 ACM SIGMOD international conference on Management of data (SIGMOD '98), ACM, New York, NY, USA, 225--236. Google ScholarDigital Library
- Y. Gao, et al. 2006. Efficient Parallel Processing for K-Nearest-Neighbor Search in Spatial Databases. Lect. Notes in Comp. Science 3984 (2006),39--48. Google ScholarDigital Library
- C. Bohm and F. Krebs. 2002. High Performance Data Mining Using the Nearest Neighbor Join. In Proc. of IEEE International Conf. on Data Mining (ICDM), Japan, 43--50. Google ScholarDigital Library
- Y. Liu and M. Spear. 2012. Mounds: Array-Based Concurrent Priority Queues. In Proc. of 41st International Conference on Parallel Processing (ICPP '12). IEEE Computer Society, Washington, DC, USA, 1--10. Google ScholarDigital Library
- D. Alistarh, et al. 2015. The SprayList: a scalable relaxed priority queue. In Proc. of 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2015). ACM, NY, 11--20. Google ScholarDigital Library
- M. Herlihy and N. Shavit. 2008. The Art of Multiprocessor Programming. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. Google ScholarDigital Library
- VampirTrace Library, http://www.tudresden.de/die_tu_dresden/zentrale_einrichtungen/zih/forschung/projekte/vampirtraceGoogle Scholar
- V. Springel, et al. 2005. Simulations of the formation, evolution and clustering of galaxies and quasars. Nature 435, 7042, 629--636.Google Scholar
- SUVnet-Trace data, http://wirelesslab.sjtu.edu.cn.Google Scholar
- M. Kaul, B. Yang, and C. S. Jensen. 2013. Building Accurate 3D Spatial Networks to Enable Next Generation Intelligent Transportation Systems. In Proc. of 14th International Conference on Mobile Data Management (IEEE MDM'13), Milan, Italy, 137--14. Google ScholarDigital Library
Recommendations
Enhanced nearest neighbour search on the R-tree
Multimedia databases usually deal with huge amounts of data and it is necessary to have an indexing structure such that efficient retrieval of data can be provided. R-Tree with its variations, is a commonly cited indexing method. In this paper we ...
Efficient k-Nearest Neighbors Search in High Dimensions Using MapReduce
BDCLOUD '15: Proceedings of the 2015 IEEE Fifth International Conference on Big Data and Cloud ComputingFinding the k-Nearest Neighbors (kNN) of a query object for a given dataset S is a primitive operation in many application domains. kNN search is very costly, especially many applications witness a quick increase in the amount and dimension of data to ...
Performance of R-Tree with Slim-Down and Reinsertion Algorithm
ICSAP '10: Proceedings of the 2010 International Conference on Signal Acquisition and ProcessingWith the development of information technology, the amount of the multimedia data become more and more. The growth of these data brings the need for more effective methods in retrieval. The multimedia retrieval systems always index these data on the ...
Comments