Skip to main content
Log in

VIP: A SIMD vectorized analytical query engine

  • Special Issue Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Query execution engines for analytics are continuously adapting to the underlying hardware in order to maximize performance. Wider SIMD registers and more complex SIMD instruction sets are emerging in mainstream CPUs and new processor designs such as the many-core Intel Xeon Phi CPUs that rely on SIMD vectorization to achieve high performance per core while packing a greater number of smaller cores per chip. In the database literature, using SIMD to optimize stand-alone operators with key–rid pairs is common, yet the state-of-the-art query engines rely on compilation of tightly coupled operators where hand-optimized individual operators become impractical. In this article, we extend a state-of-the-art analytical query engine design by combining code generation and operator pipelining with SIMD vectorization and show that the SIMD speedup is diminished when execution is dominated by random memory accesses. To better utilize the hardware features, we introduce VIP, an analytical query engine designed and built bottom up from pre-compiled column-oriented data parallel sub-operators and implemented entirely in SIMD. In our evaluation using synthetic and TPC-H queries on a many-core CPU, we show that VIP outperforms hand-optimized query-specific code without incurring the runtime compilation overhead, and highlight the efficiency of VIP at utilizing the hardware features of many-core CPUs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Notes

  1. Based on Vectorization, Interpretation, and Partitioning.

  2. Block at a time is termed vectorized in earlier work; here we use the term vectorized to denote SIMD vectorized.

References

  1. Abadi, D., Myers, D., DeWitt, D., Madden, S.: Materialization strategies in a column-oriented DBMS. In: ICDE, pp. 466–475 (2007)

  2. Balkesen, C., Alonso, G., Teubner, J., Ozsu, M.T.: Multicore, main-memory joins: sort vs. hash revisited. PVLDB 7(1), 85–96 (2013)

    Google Scholar 

  3. Balkesen, C., Teubner, J., Alonso, G., Ozsu, M.T.: Main-memory hash joins on multi-core CPUs: tuning to the underlying hardware. In: ICDE, pp. 362–373 (2013)

  4. Blanas, S., Li, Y., Patel, J.: Design and evaluation of main memory hash join algorithms for multi-core CPUs. In: SIGMOD, pp. 37–48 (2011)

  5. Boncz, P., Manegold, S., Kersten, M.: Database architecture optimized for the new bottleneck: memory access. In: VLDB, pp. 54–65 (1999)

  6. Boncz, P.A., Zukowski, M., Nes, N.: MonetDB/X100: hyper-pipelining query execution. In: CIDR (2005)

  7. Cheng, X., He, B., Du, X., Lau, C.T.: A study of main-memory hash joins on many-core processor: a case with intel knights landing architecture. In: CIKM, pp. 657–666 (2017)

  8. Chhugani, J., Nguyen, A.D., Lee, V.W., Macy, W., Hagog, M., Chen, Y.-K., Baransi, A., Kumar, S., Dubey, P.: Efficient implementation of sorting on multi-core SIMD CPU architecture. In: VLDB, pp. 1313–1324 (2008)

  9. Costea, A., Ionescu, A., Răducanu, B., Switakowski, M., Bârca, C., Sompolski, J., Luszczak, A., Szafrański, M., de Nijs, G., Boncz, P.: VectorH: taking SQL-on-Hadoop to the next level. In: SIGMOD, pp. 1105–1117 (2016)

  10. Dageville, B., Cruanes, T., Zukowski, M., Antonov, V., Avanes, A., Bock, J., Claybaugh, J., Engovatov, D., Hentschel, M., Huang, J., Lee, A.W., Motivala, A., Munir, A.Q., Pelley, S., Povinec, P., Rahn, G., Triantafyllis, S., Unterbrunner, P.: The snowflake elastic data warehouse. In: SIGMOD, pp. 215–226 (2016)

  11. Fang, Z., Zheng, B., Weng, C.: Interleaved multi-vectorizing. PVLDB 13(3), 226–238 (2019)

    Google Scholar 

  12. Flajolet, P., Martin, G.N.: Probabilistic counting algorithms for data base applications. J. Comput. Syst. Sci. 31(2), 182–209 (1985)

    Article  MathSciNet  Google Scholar 

  13. Fowler, G., Noll, L.C., Vo, K.-P., Eastlake, D.: The FNV non-cryptographic hash algorithm. Technical report (2017). http://www.ietf.org/internet-drafts/draft-eastlake-fnv-13.txt

  14. Graefe, G.: Volcano: an extensible and parallel query evaluation system. TKDE 6(1), 120–135 (1994)

    Google Scholar 

  15. Gupta, A., Agarwal, D., Tan, D., Kulesza, J., Pathak, R., Stefani, S., Srinivasan, V.: Amazon redshift and the case for simpler data warehouses. In: SIGMOD, pp. 1917–1923 (2015)

  16. Inoue, H., Moriyama, T., Komatsu, H., Nakatani, T.: AA-sort: a new parallel sorting algorithm for multi-core SIMD processors. In: PACT, pp. 189–198 (2007)

  17. Inoue, H., Ohara, M., Taura, K.: Faster set intersection with SIMD instructions by reducing branch mispredictions. PVLDB 8(3), 293–304 (2014)

    Google Scholar 

  18. Inoue, H., Taura, K.: SIMD- and cache-friendly algorithm for sorting an array of structures. PVLDB 8(11), 1274–1285 (2015)

    Google Scholar 

  19. Jha, S., He, B., Lu, M., Cheng, X., Huynh, H.P.: Improving main memory hash joins on Intel Xeon Phi processors: an experimental approach. PVLDB 8(6), 642–653 (2015)

    Google Scholar 

  20. Kim, C., Kaldewey, T., Lee, V.W., Sedlar, E., Nguyen, A.D., Satish, N., Chhugani, J., Di Blas, A., Dubey, P.: Sort vs. hash revisited: fast join implementation on modern multi-core CPUs. PVLDB 2(2), 1378–1389 (2009)

    Google Scholar 

  21. Krikellas, K., Viglas, S., Cintra, M.: Generating code for holistic query evaluation. In: ICDE, pp. 613–624 (2010)

  22. Lang, H., Kipf, A., Passing, L., Boncz, P., Neumann, T., Kemper, A.: Make the most out of your SIMD investments: counter control flow divergence in compiled query pipelines. In: DaMoN (2018)

  23. Lang, H., Mühlbauer, T., Funke, F., Boncz, P.A., Neumann, T., Kemper, A.: Data blocks: hybrid OLTP and OLAP on compressed storage using both vectorization and compilation. In: SIGMOD, pp. 311–326 (2016)

  24. Lang, H., Neumann, T., Kemper, A., Boncz, P.: Performance-optimal filtering: Bloom overtakes cuckoo at high throughput. PVLDB 12(5), 502–515 (2019)

    Google Scholar 

  25. Leis, V., Boncz, P., Kemper, A., Neumann, T.: Morsel-driven parallelism: a NUMA-aware query evaluation framework for the many-core age. In: SIGMOD, pp. 743–754 (2014)

  26. Lemire, D., et al.: Decoding billions of integers per second through vectorization. Softw. Pract. Exp. 45(1), 1–29 (2015)

    Article  MathSciNet  Google Scholar 

  27. Li, Y., Patel, J.M.: Bitweaving: fast scans for main memory data processing. In: SIGMOD, pp. 289–300 (2013)

  28. Li, Y., Patel, J.M.: Widetable: an accelerator for analytical data processing. PVLDB 7(10), 907–918 (2014)

    Google Scholar 

  29. Manegold, S., Boncz, P., Kersten, M.: Optimizing database architecture for the new bottleneck: memory access. J. VLDB 9(3), 231–246 (2000)

    Article  Google Scholar 

  30. Manegold, S., Boncz, P., Kersten, M.: What happens during a join? Dissecting CPU and memory optimization effects. In: VLDB, pp. 339–350 (2000)

  31. Manegold, S., Boncz, P., Kersten, M.: Optimizing main-memory join on modern hardware. TKDE 14(4), 709–730 (2002)

    Google Scholar 

  32. Menon, P., Mowry, T.C., Pavlo, A.: Relaxed operator fusion for in-memory databases: making compilation, vectorization, and prefetching work together at last. In: PVLDB (2017)

  33. Neumann, T.: Efficiently compiling efficient query plans for modern hardware. PVLDB 4(9), 539–550 (2011)

    Google Scholar 

  34. Pagh, R., Rodler, F.F.: Cuckoo hashing. J. Algorithms 51(2), 122–144 (2004)

    Article  MathSciNet  Google Scholar 

  35. Pirk, H., Moll, O., Zaharia, M., Madden, S.: Voodoo—a vector algebra for portable database performance on modern hardware. PVLDB 9(14), 1707–1718 (2016)

    Google Scholar 

  36. Polychroniou, O., Raghavan, A., Ross, K.A.: Rethinking SIMD vectorization for in-memory databases. In: SIGMOD, pp. 1493–1508 (2015)

  37. Polychroniou, O., Ross, K.A.: High throughput heavy hitter aggregation for modern SIMD processors. In: DaMoN (2013)

  38. Polychroniou, O., Ross, K.A.: A comprehensive study of main-memory partitioning and its application to large-scale comparison- and radix-sort. In: SIGMOD, pp. 755–766 (2014)

  39. Polychroniou, O., Ross, K.A.: Vectorized Bloom filters for advanced SIMD processors. In: DaMoN (2014)

  40. Polychroniou, O., Ross, K.A.: Efficient lightweight compression alongside fast scans. In: DaMoN (2015)

  41. Polychroniou, O., Ross, K.A.: Towards practical vectorized analytical query engines. In: DaMoN (2019)

  42. Raman, V., Attaluri, G., Barber, R., Chainani, N., Kalmuk, D., KulandaiSamy, V., Leenstra, J., Lightstone, S., Liu, S., Lohman, G.M., Malkemus, T., Mueller, R., Pandis, I., Schiefer, B., Sharpe, D., Sidle, R., Storm, A., Zhang, L.: DB2 with BLU acceleration: so much more than just a column store. PVLDB 6(11), 1080–1091 (2013)

    Google Scholar 

  43. Ross, K.A.: Selection conditions in main memory. TODS 29(1), 132–161 (2004)

    Article  Google Scholar 

  44. Ross, K.A.: Efficient hash probes on modern processors. In: ICDE, pp. 1297–1301 (2007)

  45. Roy, P., Teubner, J., Alonso, G.: Efficient frequent item counting in multi-core hardware. In: KDD, pp. 1451–1459 (2012)

  46. Satish, N., Kim, C., Chhugani, J., Nguyen, A.D., Lee, V.W., Kim, D., Dubey, P.: Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort. In: SIGMOD, pp. 351–362 (2010)

  47. Schlegel, B., Karnagel, T., Kiefer, T., Lehner, W.: Scalable frequent itemset mining on many-core processors. In: DaMoN (2013)

  48. Schuh, S., Chen, X., Dittrich, J.: An experimental comparison of thirteen relational equi-joins in main memory. In: SIGMOD, pp. 1961–1976 (2016)

  49. Sirin, U., Tözün, P., Porobic, D., Ailamaki, A.: Micro-architectural analysis of in-memory OLTP. In: SIGMOD, pp. 387–402 (2016)

  50. Sitaridi, E., Polychroniou, O., Ross, K.A.: SIMD-accelerated regular expression matching. In: DaMoN (2016)

  51. Stonebraker, M., Abadi, D.J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O’Neil, E., O’Neil, P., Rasin, A., Tran, N., Zdonik, S.: C-store: a column-oriented DBMS. In: VLDB, pp. 553–564 (2005)

  52. Ungethüm, A., Pietrzyk, J., Damme, P., Krause, A., Habich, D., Lehner, W., Focht, E.: Hardware-oblivious SIMD parallelism for in-memory column-stores. In: CIDR (2020)

  53. Wassenberg, J., Sanders, P.: Engineering a multi core radix sort. In: EuroPar, pp. 160–169 (2011)

  54. Willhalm, T., Popovici, N., Boshmaf, Y., Plattner, H., Zeier, A., Schaffner, J.: SIMD-scan: ultra fast in-memory table scan using on-chip vector processing units. PVLDB 2(1), 385–394 (2009)

    Google Scholar 

  55. Zhou, J., Ross, K.A.: Implementing database operations using SIMD instructions. In: SIGMOD, pp. 145–156 (2002)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Orestis Polychroniou.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is an extension of earlier published work [41], done while the first author was affiliated with Columbia University, and supported by NSF Grant IIS-1422488 and an Oracle gift.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Polychroniou, O., Ross, K.A. VIP: A SIMD vectorized analytical query engine. The VLDB Journal 29, 1243–1261 (2020). https://doi.org/10.1007/s00778-020-00621-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-020-00621-w

Keywords

Navigation