Auto-tuning Similarity Search Algorithms on Multi-core Architectures

Gedik, Buğra

doi:10.1007/s10766-013-0239-8

Auto-tuning Similarity Search Algorithms on Multi-core Architectures

Published: 02 February 2013

Volume 41, pages 595–620, (2013)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Buğra Gedik¹

555 Accesses
2 Citations
Explore all metrics

Abstract

In recent times, large high-dimensional datasets have become ubiquitous. Video and image repositories, financial, and sensor data are just a few examples of such datasets in practice. Many applications that use such datasets require the retrieval of data items similar to a given query item, or the nearest neighbors (NN or \(k\)-NN) of a given item. Another common query is the retrieval of multiple sets of nearest neighbors, i.e., multi \(k\)-NN, for different query items on the same data. With commodity multi-core CPUs becoming more and more widespread at lower costs, developing parallel algorithms for these search problems has become increasingly important. While the core nearest neighbor search problem is relatively easy to parallelize, it is challenging to tune it for optimality. This is due to the fact that the various performance-specific algorithmic parameters, or “tuning knobs”, are inter-related and also depend on the data and query workloads. In this paper, we present (1) a detailed study of the various tuning knobs and their contributions on increasing the query throughput for parallelized versions of the two most common classes of high-dimensional multi-NN search algorithms: linear scan and tree traversal, and (2) an offline auto-tuner for setting these knobs by iteratively measuring actual query execution times for a given workload and dataset. We show experimentally that our auto-tuner reaches near-optimal performance and significantly outperforms un-tuned versions of parallel multi-NN algorithms for real video repository data on a variety of multi-core platforms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

This is just an analogy, not to be confused with the SIMD instructions, like SSE, which we discuss later.
Horizontal addition of a vector of \(4\) floats, which is an operation needed for distance computation via SIMD instructions, requires \(2\) SIMD instructions with SSE3, which is only \(2\) (\(<4\)) times better than the scalar version of the same computation.
If the blocks are sufficiently small, this may still be advantageous if the index trees fit inside L2 or L3 cache.

References

Advanced Micro Devices: AMD Athlon 64 X2 Dual-Core Processor Product Data Sheet. http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/33425.pdf (2007)
Arge, L.: The buffer tree: a new technique for optimal i/o-algorithms (extended abstract). In: Proceedings of the 4th International Workshop on Algorithms and Data Structures (WADS), pp. 334–345. Springer, London (1995)
Cascaval, C., Duesterwald, E., Sweeney, P., Wisniewski, R.W.: Multiple page size modeling and optimization. In: Proceedings of the Parallel Architectures and Compilation, Techniques (PACT). pp. 339–349 (2005)
Chen, C., Chame, J., Hall, M.W.: Combining models and guided empirical search to optimize for multiple levels of the memory hierarchy. In: Proceedings of the International Symposium on Code Generation and Optimization (CGO) (2005)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proceedings of the Computer Vision and Pattern Recognition, Workshop (CVPR), pp. 886–893 (2005)
Datta, K., Murphy, M., Volkov, V., Williams, S., Carter, J., Oliker, L., Patterson, D., Shalf, J., Yelick, K.: Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing (SC). pp. 1–12 (2008)
Faloutsos, C., Lin, K.-I.: Fastmap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. SIGMOD Rec. 24(2), 163–174 (1995)
Article Google Scholar
Friedman, J.H., Bentley, J.L., Finkel, R.A.: An algorithm for finding best matches in logarithmic expected time. ACM Trans. Math. Softw. 3(3), 209–226 (1977)
Article MATH Google Scholar
Girbal, S., Vasilache, N., Bastoul, C., Cohen, A., Parello, D., Sigler, M., Temam, O.: Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies. Int. J. Parallel Program. 34(3), 261–317 (2006)
Article MATH Google Scholar
Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: Yormark, B. (ed) Proceedings of the ACM International Conference on Management of Data (SIGMOD), pp. 47–57 (1984)
Hill, M.D., Marty, M.R.: Amdahl’s law in the multicore era. IEEE Comput. 41, 33–38 (2008)
Article Google Scholar
Chungand, I.-H., Hollingsworth, J.: Using information from prior runs to improve automated tuning systems. In: Proceedings of the ACM/IEEE Conference on Supercomputing (SC) (2004)
Intel Corporation. Intel Itanium 2 Processor Reference Manual. http://download.intel.com/design/Itanium2/manuals/25111003.pdf (2004)
Intel Corporation: The Intel 64 and IA-32 Architectures Optimization Reference Manual. http://download.intel.com/design/processor/manuals/248966.pdf (2008)
Jolliffe, I.T.: Principal Component Analysis. Springer Series in, Statistics (1986)
Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P.: Optimization by simulated annealing. Science 220(4598), 671–680 (1983)
Article MathSciNet MATH Google Scholar
Le, H.Q., Starke, W.J., Fields, J.S., O’Connell, F.P., Nguyen, D.Q., Ronchetti, B.J., Sauer, W.M., Schwarz, E.M., Vaden, M.T.: IBM POWER6 microarchitecture. IBM J. Res. Dev. 51(6), 639–662 (2007)
Article Google Scholar
Nelson, Y., Bansal, B., Hall, M., Nakano, A., Lerman, K.: Model-guided performance tuning of parameter values: a case study with molecular dynamics visualization. In: Proceedingsof the International Symposium on Parallel and Distributed Processing, pp. 1–8 (2008)
NIST: NIST Special Publication: SP 500–274 (Proceedings of The Sixteenth Text REtrieval Conference (TREC) 2007). http://trec.nist.gov/pubs/trec16/t16_proceedings.html. 2007
NIST: The Digital Millennium Copyright Act of 1998. http://www.copyright.gov/legislation/dmca.pdf (2011)
Qiao, L., Raman, V., Reiss, F., Haas, P.J., Lohman, G.M.: Main-memory scan sharing for multi-core cpus. Very Larg Data Bases J (VLDBJ) 1(1), 610–621 (2008)
Google Scholar
Seidl, T., Kriegel, H.-P.: Optimal multi-step k-nearest neighbor search. In: Proceedings of the ACM International Conference on Management of Data (SIGMOD), pp. 154–165 (1998)
Voss, M., Eigenmann, R.: ADAPT: automated de-coupled adaptive program transformation. In: Proceedings of the International Conference on Parallel Processing, pp. 163–170 (2000)
Vuduc, R.W.: Automatic performance tuning of sparse matrix kernels. PhD thesis, University of California, Berkeley, Dec 2003
Weber, R., Schek, H.-J., Blott, S.: A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: Proceedings of the International Conference on Very Large Data Bases (VLDB) (1998)
Whaley, R.C., Petitet, A., Dongarra, J.J.: Automated empirical optimizations of software and the ATLAS project. Parallel Comput 27(1–2), 3–35 (2001)
Article MATH Google Scholar
White, D.A., Jain, R.: Similarity indexing with the ss-tree. In: Proceedings of the IEEE International Conference on Data, Engineering (ICDE). pp. 516–523 (1996)
Williams, S.W.: Auto-tuning performance on multicore computers. Technical report, Electrical Engineering and Computer Sciences, University of California at Berkeley (2008)
Yotov, K., Li, X., Ren, G., Cibulskis, M., DeJong, G., Garzarn, M.J., Padua, D.A., Pingali, K., Stodghill, P., Wu, P.: A comparison of empirical and model-driven optimization. In: Proceedings of the ACM Programming Language Design and Implementation Conference (PLDI) (2003)
Yotov, K., Li, X., Ren, G., Garzaran, M., Padua, D., Pingali, K., Stodghill, P.: Is search really necessary to generate high-performance BLAS? In: Proceedings of the IEEE: Special Issue on Program Generation, Optimization, and Platform Adaptation, vol. 93(2). pp. 358–386 (2005)

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Bilkent University, Ankara, Turkey
Buğra Gedik

Authors

Buğra Gedik
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Buğra Gedik.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gedik, B. Auto-tuning Similarity Search Algorithms on Multi-core Architectures. Int J Parallel Prog 41, 595–620 (2013). https://doi.org/10.1007/s10766-013-0239-8

Download citation

Received: 30 June 2012
Accepted: 17 January 2013
Published: 02 February 2013
Issue Date: October 2013
DOI: https://doi.org/10.1007/s10766-013-0239-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Auto-tuning Similarity Search Algorithms on Multi-core Architectures

Abstract

Access this article

Similar content being viewed by others

Accelerating massive queries of approximate nearest neighbor search on high-dimensional data

Computational Enhancements of HNSW Targeted to Very Large Datasets

Accelerating Metric Space Similarity Joins with Multi-core and Many-core Processors

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Auto-tuning Similarity Search Algorithms on Multi-core Architectures

Abstract

Access this article

Similar content being viewed by others

Accelerating massive queries of approximate nearest neighbor search on high-dimensional data

Computational Enhancements of HNSW Targeted to Very Large Datasets

Accelerating Metric Space Similarity Joins with Multi-core and Many-core Processors

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation