skip to main content
10.1145/1807167.1807206acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

FAST: fast architecture sensitive tree search on modern CPUs and GPUs

Published:06 June 2010Publication History

ABSTRACT

In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous computing power by integrating multiple cores, each with wide vector units. There has been much work to exploit modern processor architectures for database primitives like scan, sort, join and aggregation. However, unlike other primitives, tree search presents significant challenges due to irregular and unpredictable data accesses in tree traversal.

In this paper, we present FAST, an extremely fast architecture sensitive layout of the index tree. FAST is a binary tree logically organized to optimize for architecture features like page size, cache line size, and SIMD width of the underlying hardware. FAST eliminates impact of memory latency, and exploits thread-level and datalevel parallelism on both CPUs and GPUs to achieve 50 million (CPU) and 85 million (GPU) queries per second, 5X (CPU) and 1.7X (GPU) faster than the best previously reported performance on the same architectures. FAST supports efficient bulk updates by rebuilding index trees in less than 0.1 seconds for datasets as large as 64Mkeys and naturally integrates compression techniques, overcoming the memory bandwidth bottleneck and achieving a 6X performance improvement over uncompressed index search for large keys on CPUs.

References

  1. D. Abadi, S. Madden, and M. Ferreira. Integrating compression and execution in column-oriented database systems. In SIGMOD, pages 671--682, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. A. Alcantara, A. Sharf, F. Abbasinejad, S. Sengupta, et al. Real-time parallel hashing on the GPU. ACM Transactions on Graphics, 28(5), Dec. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. V. H. Allan, R. B. Jones, R. M. Lee, and S. J. Allan. Software pipelining. ACM Comput. Surv., 27(3):367--432, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. L. Arge. The buffer tree: A technique for designing batched external data structures. Algorithmica, 37(1):1--24, 2003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. Bayer and K. Unterauer. Prefix b-trees. ACM Trans. Database Syst., 2(1):11--26, 1977. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Belazzougui, P. Boldi, R. Pagh, and S. Vigna. Theory and practise of monotone minimal perfect hashing. In ALENEX, pages 132--144, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  7. C. Binnig, S. Hildenbrand, and F. Färber. Dictionary-based order-preserving string compression for column stores. In SIGMOD, pages 283--296, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. P. Bohannon, P. Mcllroy, and R. Rastogi. Main-memory index structures with fixed-size partial keys. In SIGMOD, pages 163--174, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Chen, P. B. Gibbons, and T. C. Mowry. Improving index performance through prefetching. SIGMOD Record, 30(2):235--246, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Chen, P. B. Gibbons, T. C. Mowry, et al. Fractal prefetching b+-trees: optimizing both cache and disk performance. In SIGMOD, pages 157--168, '02. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Chhugani, A. D. Nguyen, V.W. Lee,W. Macy, et al. Efficient implementation of sorting on multi-core SIMD CPU architecture. PVLDB, 1(2), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Cieslewicz and K. A. Ross. Adaptive aggregation on chip multiprocessors. In VLDB, pages 339--350, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. Comer. Ubiquitous b-tree. ACM Comput. Surv., 11(2):121--137, 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. E. A. Fox, Q. F. Chen, A. M. Daoud, and L. S. Heath. Order-preserving minimal perfect hash functions. ACM Trans. Inf. Syst., 9(3):281--308, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Goldstein, R. Ramakrishnan, and U. Shaft. Compressing relations and indexes. In ICDE, pages 370--379, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. G. Graefe and P.-A. Larson. B-tree indexes and cpu caches. In ICDE, pages 349--358, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. G. Graefe and L. Shapiro. Data compression and database performance. In Applied Computing, pages 22--27, Apr 1991.Google ScholarGoogle Scholar
  18. R. A. Hankins and J. M. Patel. Effect of node size on the performance of cache-conscious b+-trees. In SIGMETRICS, pages 283--294, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. L. Holloway, V. Raman, G. Swart, and D. J. DeWitt. How to barter bits for chronons: tradeoffs for database scans. In SIGMOD, pages 389--400, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. B. R. Iyer and D. Wilhite. Data compression support in databases. In VLDB, pages 695--704, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. T. Kaldewey, J. Hagen, A. D. Blas, and E. Sedlar. Parallel search on video cards. In USENIX Workshop on Hot Topics in Parallelism, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. C. Kim, E. Sedlar, J. Chhugani, T. Kaldewey, et al. Sort vs. hash revisited: Fast join implementation on multi-core CPUs. PVLDB, 2(2):1378--1389, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. T. J. Lehman and M. J. Carey. A study of index structures for main memory database management systems. In VLDB, pages 294--303, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. NVIDIA. NVIDIA CUDA Programming Guide 2.3. 2009.Google ScholarGoogle Scholar
  25. J. Rao and K. A. Ross. Cache conscious indexing for decision support in main memory. In VLDB, pages 78--89, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Rao and K. A. Ross. Making b+- trees cache conscious in main memory. In SIGMOD, pages 475--486, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M. Reilly. When multicore isn't enough: Trends and the future for multi-multicore systems. In HPEC, 2008.Google ScholarGoogle Scholar
  28. B. Schlegel, R. Gemulla, and W. Lehner. k-ary search on modern processors. In DaMoN, pages 52--60, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. L. Seiler, D. Carmean, E. Sprangle, T. Forsyth, et al. Larrabee: A Many-Core x86 Architecture for Visual Computing. SIGGRAPH, 27(3), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. T. Willhalm, N. Popovici, Y. Boshmaf, H. Plattner, et al. Simd-scan: Ultra fast in-memory scan using vector processing units. PVLDB, 2(1):385--394, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. J. Zhou and K. A. Ross. Implementing database operations using simd instructions. In SIGMOD Conference, pages 145--156, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. J. Zhou and K. A. Ross. Buffering accesses to memory resident index structures. In VLDB, pages 405--416, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. M. Zukowski, S. Heman, N. Nes, and P. Boncz. Super-scalar ram-cpu cache compression. In ICDE, page 59, 2006 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. FAST: fast architecture sensitive tree search on modern CPUs and GPUs

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
      June 2010
      1286 pages
      ISBN:9781450300322
      DOI:10.1145/1807167

      Copyright © 2010 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 6 June 2010

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate785of4,003submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader