VIP: A SIMD vectorized analytical query engine

Polychroniou, Orestis; Ross, Kenneth A.

doi:10.1007/s00778-020-00621-w

VIP: A SIMD vectorized analytical query engine

Special Issue Paper
Published: 13 July 2020

Volume 29, pages 1243–1261, (2020)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

1306 Accesses
10 Citations
Explore all metrics

Abstract

Query execution engines for analytics are continuously adapting to the underlying hardware in order to maximize performance. Wider SIMD registers and more complex SIMD instruction sets are emerging in mainstream CPUs and new processor designs such as the many-core Intel Xeon Phi CPUs that rely on SIMD vectorization to achieve high performance per core while packing a greater number of smaller cores per chip. In the database literature, using SIMD to optimize stand-alone operators with key–rid pairs is common, yet the state-of-the-art query engines rely on compilation of tightly coupled operators where hand-optimized individual operators become impractical. In this article, we extend a state-of-the-art analytical query engine design by combining code generation and operator pipelining with SIMD vectorization and show that the SIMD speedup is diminished when execution is dominated by random memory accesses. To better utilize the hardware features, we introduce VIP, an analytical query engine designed and built bottom up from pre-compiled column-oriented data parallel sub-operators and implemented entirely in SIMD. In our evaluation using synthetic and TPC-H queries on a many-core CPU, we show that VIP outperforms hand-optimized query-specific code without incurring the runtime compilation overhead, and highlight the efficiency of VIP at utilizing the hardware features of many-core CPUs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 5

Fig. 6

Performance improvement of the triangular matrix product in commodity clusters

Article Open access 15 April 2024

In-memory database acceleration on FPGAs: a survey

Article Open access 26 October 2019

Containers in HPC: a survey

Article 27 October 2022

Notes

Based on Vectorization, Interpretation, and Partitioning.
Block at a time is termed vectorized in earlier work; here we use the term vectorized to denote SIMD vectorized.

References

Abadi, D., Myers, D., DeWitt, D., Madden, S.: Materialization strategies in a column-oriented DBMS. In: ICDE, pp. 466–475 (2007)
Balkesen, C., Alonso, G., Teubner, J., Ozsu, M.T.: Multicore, main-memory joins: sort vs. hash revisited. PVLDB 7(1), 85–96 (2013)
Google Scholar
Balkesen, C., Teubner, J., Alonso, G., Ozsu, M.T.: Main-memory hash joins on multi-core CPUs: tuning to the underlying hardware. In: ICDE, pp. 362–373 (2013)
Blanas, S., Li, Y., Patel, J.: Design and evaluation of main memory hash join algorithms for multi-core CPUs. In: SIGMOD, pp. 37–48 (2011)
Boncz, P., Manegold, S., Kersten, M.: Database architecture optimized for the new bottleneck: memory access. In: VLDB, pp. 54–65 (1999)
Boncz, P.A., Zukowski, M., Nes, N.: MonetDB/X100: hyper-pipelining query execution. In: CIDR (2005)
Cheng, X., He, B., Du, X., Lau, C.T.: A study of main-memory hash joins on many-core processor: a case with intel knights landing architecture. In: CIKM, pp. 657–666 (2017)
Chhugani, J., Nguyen, A.D., Lee, V.W., Macy, W., Hagog, M., Chen, Y.-K., Baransi, A., Kumar, S., Dubey, P.: Efficient implementation of sorting on multi-core SIMD CPU architecture. In: VLDB, pp. 1313–1324 (2008)
Costea, A., Ionescu, A., Răducanu, B., Switakowski, M., Bârca, C., Sompolski, J., Luszczak, A., Szafrański, M., de Nijs, G., Boncz, P.: VectorH: taking SQL-on-Hadoop to the next level. In: SIGMOD, pp. 1105–1117 (2016)
Dageville, B., Cruanes, T., Zukowski, M., Antonov, V., Avanes, A., Bock, J., Claybaugh, J., Engovatov, D., Hentschel, M., Huang, J., Lee, A.W., Motivala, A., Munir, A.Q., Pelley, S., Povinec, P., Rahn, G., Triantafyllis, S., Unterbrunner, P.: The snowflake elastic data warehouse. In: SIGMOD, pp. 215–226 (2016)
Fang, Z., Zheng, B., Weng, C.: Interleaved multi-vectorizing. PVLDB 13(3), 226–238 (2019)
Google Scholar
Flajolet, P., Martin, G.N.: Probabilistic counting algorithms for data base applications. J. Comput. Syst. Sci. 31(2), 182–209 (1985)
Article MathSciNet Google Scholar
Fowler, G., Noll, L.C., Vo, K.-P., Eastlake, D.: The FNV non-cryptographic hash algorithm. Technical report (2017). http://www.ietf.org/internet-drafts/draft-eastlake-fnv-13.txt
Graefe, G.: Volcano: an extensible and parallel query evaluation system. TKDE 6(1), 120–135 (1994)
Google Scholar
Gupta, A., Agarwal, D., Tan, D., Kulesza, J., Pathak, R., Stefani, S., Srinivasan, V.: Amazon redshift and the case for simpler data warehouses. In: SIGMOD, pp. 1917–1923 (2015)
Inoue, H., Moriyama, T., Komatsu, H., Nakatani, T.: AA-sort: a new parallel sorting algorithm for multi-core SIMD processors. In: PACT, pp. 189–198 (2007)
Inoue, H., Ohara, M., Taura, K.: Faster set intersection with SIMD instructions by reducing branch mispredictions. PVLDB 8(3), 293–304 (2014)
Google Scholar
Inoue, H., Taura, K.: SIMD- and cache-friendly algorithm for sorting an array of structures. PVLDB 8(11), 1274–1285 (2015)
Google Scholar
Jha, S., He, B., Lu, M., Cheng, X., Huynh, H.P.: Improving main memory hash joins on Intel Xeon Phi processors: an experimental approach. PVLDB 8(6), 642–653 (2015)
Google Scholar
Kim, C., Kaldewey, T., Lee, V.W., Sedlar, E., Nguyen, A.D., Satish, N., Chhugani, J., Di Blas, A., Dubey, P.: Sort vs. hash revisited: fast join implementation on modern multi-core CPUs. PVLDB 2(2), 1378–1389 (2009)
Google Scholar
Krikellas, K., Viglas, S., Cintra, M.: Generating code for holistic query evaluation. In: ICDE, pp. 613–624 (2010)
Lang, H., Kipf, A., Passing, L., Boncz, P., Neumann, T., Kemper, A.: Make the most out of your SIMD investments: counter control flow divergence in compiled query pipelines. In: DaMoN (2018)
Lang, H., Mühlbauer, T., Funke, F., Boncz, P.A., Neumann, T., Kemper, A.: Data blocks: hybrid OLTP and OLAP on compressed storage using both vectorization and compilation. In: SIGMOD, pp. 311–326 (2016)
Lang, H., Neumann, T., Kemper, A., Boncz, P.: Performance-optimal filtering: Bloom overtakes cuckoo at high throughput. PVLDB 12(5), 502–515 (2019)
Google Scholar
Leis, V., Boncz, P., Kemper, A., Neumann, T.: Morsel-driven parallelism: a NUMA-aware query evaluation framework for the many-core age. In: SIGMOD, pp. 743–754 (2014)
Lemire, D., et al.: Decoding billions of integers per second through vectorization. Softw. Pract. Exp. 45(1), 1–29 (2015)
Article MathSciNet Google Scholar
Li, Y., Patel, J.M.: Bitweaving: fast scans for main memory data processing. In: SIGMOD, pp. 289–300 (2013)
Li, Y., Patel, J.M.: Widetable: an accelerator for analytical data processing. PVLDB 7(10), 907–918 (2014)
Google Scholar
Manegold, S., Boncz, P., Kersten, M.: Optimizing database architecture for the new bottleneck: memory access. J. VLDB 9(3), 231–246 (2000)
Article Google Scholar
Manegold, S., Boncz, P., Kersten, M.: What happens during a join? Dissecting CPU and memory optimization effects. In: VLDB, pp. 339–350 (2000)
Manegold, S., Boncz, P., Kersten, M.: Optimizing main-memory join on modern hardware. TKDE 14(4), 709–730 (2002)
Google Scholar
Menon, P., Mowry, T.C., Pavlo, A.: Relaxed operator fusion for in-memory databases: making compilation, vectorization, and prefetching work together at last. In: PVLDB (2017)
Neumann, T.: Efficiently compiling efficient query plans for modern hardware. PVLDB 4(9), 539–550 (2011)
Google Scholar
Pagh, R., Rodler, F.F.: Cuckoo hashing. J. Algorithms 51(2), 122–144 (2004)
Article MathSciNet Google Scholar
Pirk, H., Moll, O., Zaharia, M., Madden, S.: Voodoo—a vector algebra for portable database performance on modern hardware. PVLDB 9(14), 1707–1718 (2016)
Google Scholar
Polychroniou, O., Raghavan, A., Ross, K.A.: Rethinking SIMD vectorization for in-memory databases. In: SIGMOD, pp. 1493–1508 (2015)
Polychroniou, O., Ross, K.A.: High throughput heavy hitter aggregation for modern SIMD processors. In: DaMoN (2013)
Polychroniou, O., Ross, K.A.: A comprehensive study of main-memory partitioning and its application to large-scale comparison- and radix-sort. In: SIGMOD, pp. 755–766 (2014)
Polychroniou, O., Ross, K.A.: Vectorized Bloom filters for advanced SIMD processors. In: DaMoN (2014)
Polychroniou, O., Ross, K.A.: Efficient lightweight compression alongside fast scans. In: DaMoN (2015)
Polychroniou, O., Ross, K.A.: Towards practical vectorized analytical query engines. In: DaMoN (2019)
Raman, V., Attaluri, G., Barber, R., Chainani, N., Kalmuk, D., KulandaiSamy, V., Leenstra, J., Lightstone, S., Liu, S., Lohman, G.M., Malkemus, T., Mueller, R., Pandis, I., Schiefer, B., Sharpe, D., Sidle, R., Storm, A., Zhang, L.: DB2 with BLU acceleration: so much more than just a column store. PVLDB 6(11), 1080–1091 (2013)
Google Scholar
Ross, K.A.: Selection conditions in main memory. TODS 29(1), 132–161 (2004)
Article Google Scholar
Ross, K.A.: Efficient hash probes on modern processors. In: ICDE, pp. 1297–1301 (2007)
Roy, P., Teubner, J., Alonso, G.: Efficient frequent item counting in multi-core hardware. In: KDD, pp. 1451–1459 (2012)
Satish, N., Kim, C., Chhugani, J., Nguyen, A.D., Lee, V.W., Kim, D., Dubey, P.: Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort. In: SIGMOD, pp. 351–362 (2010)
Schlegel, B., Karnagel, T., Kiefer, T., Lehner, W.: Scalable frequent itemset mining on many-core processors. In: DaMoN (2013)
Schuh, S., Chen, X., Dittrich, J.: An experimental comparison of thirteen relational equi-joins in main memory. In: SIGMOD, pp. 1961–1976 (2016)
Sirin, U., Tözün, P., Porobic, D., Ailamaki, A.: Micro-architectural analysis of in-memory OLTP. In: SIGMOD, pp. 387–402 (2016)
Sitaridi, E., Polychroniou, O., Ross, K.A.: SIMD-accelerated regular expression matching. In: DaMoN (2016)
Stonebraker, M., Abadi, D.J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O’Neil, E., O’Neil, P., Rasin, A., Tran, N., Zdonik, S.: C-store: a column-oriented DBMS. In: VLDB, pp. 553–564 (2005)
Ungethüm, A., Pietrzyk, J., Damme, P., Krause, A., Habich, D., Lehner, W., Focht, E.: Hardware-oblivious SIMD parallelism for in-memory column-stores. In: CIDR (2020)
Wassenberg, J., Sanders, P.: Engineering a multi core radix sort. In: EuroPar, pp. 160–169 (2011)
Willhalm, T., Popovici, N., Boshmaf, Y., Plattner, H., Zeier, A., Schaffner, J.: SIMD-scan: ultra fast in-memory table scan using on-chip vector processing units. PVLDB 2(1), 385–394 (2009)
Google Scholar
Zhou, J., Ross, K.A.: Implementing database operations using SIMD instructions. In: SIGMOD, pp. 145–156 (2002)

Download references

Author information

Authors and Affiliations

Amazon Web Services, Palo Alto, USA
Orestis Polychroniou
Columbia University, New York, USA
Kenneth A. Ross

Authors

Orestis Polychroniou
View author publications
You can also search for this author in PubMed Google Scholar
Kenneth A. Ross
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Orestis Polychroniou.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is an extension of earlier published work [41], done while the first author was affiliated with Columbia University, and supported by NSF Grant IIS-1422488 and an Oracle gift.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Polychroniou, O., Ross, K.A. VIP: A SIMD vectorized analytical query engine. The VLDB Journal 29, 1243–1261 (2020). https://doi.org/10.1007/s00778-020-00621-w

Download citation

Received: 27 January 2020
Revised: 10 June 2020
Accepted: 22 June 2020
Published: 13 July 2020
Issue Date: November 2020
DOI: https://doi.org/10.1007/s00778-020-00621-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

VIP: A SIMD vectorized analytical query engine

Abstract

Access this article

Similar content being viewed by others

Performance improvement of the triangular matrix product in commodity clusters

In-memory database acceleration on FPGAs: a survey

Containers in HPC: a survey

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

VIP: A SIMD vectorized analytical query engine

Abstract

Access this article

Similar content being viewed by others

Performance improvement of the triangular matrix product in commodity clusters

In-memory database acceleration on FPGAs: a survey

Containers in HPC: a survey

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation