Abstract
Query execution engines for analytics are continuously adapting to the underlying hardware in order to maximize performance. Wider SIMD registers and more complex SIMD instruction sets are emerging in mainstream CPUs and new processor designs such as the many-core Intel Xeon Phi CPUs that rely on SIMD vectorization to achieve high performance per core while packing a greater number of smaller cores per chip. In the database literature, using SIMD to optimize stand-alone operators with key–rid pairs is common, yet the state-of-the-art query engines rely on compilation of tightly coupled operators where hand-optimized individual operators become impractical. In this article, we extend a state-of-the-art analytical query engine design by combining code generation and operator pipelining with SIMD vectorization and show that the SIMD speedup is diminished when execution is dominated by random memory accesses. To better utilize the hardware features, we introduce VIP, an analytical query engine designed and built bottom up from pre-compiled column-oriented data parallel sub-operators and implemented entirely in SIMD. In our evaluation using synthetic and TPC-H queries on a many-core CPU, we show that VIP outperforms hand-optimized query-specific code without incurring the runtime compilation overhead, and highlight the efficiency of VIP at utilizing the hardware features of many-core CPUs.
Similar content being viewed by others
Notes
Based on Vectorization, Interpretation, and Partitioning.
Block at a time is termed vectorized in earlier work; here we use the term vectorized to denote SIMD vectorized.
References
Abadi, D., Myers, D., DeWitt, D., Madden, S.: Materialization strategies in a column-oriented DBMS. In: ICDE, pp. 466–475 (2007)
Balkesen, C., Alonso, G., Teubner, J., Ozsu, M.T.: Multicore, main-memory joins: sort vs. hash revisited. PVLDB 7(1), 85–96 (2013)
Balkesen, C., Teubner, J., Alonso, G., Ozsu, M.T.: Main-memory hash joins on multi-core CPUs: tuning to the underlying hardware. In: ICDE, pp. 362–373 (2013)
Blanas, S., Li, Y., Patel, J.: Design and evaluation of main memory hash join algorithms for multi-core CPUs. In: SIGMOD, pp. 37–48 (2011)
Boncz, P., Manegold, S., Kersten, M.: Database architecture optimized for the new bottleneck: memory access. In: VLDB, pp. 54–65 (1999)
Boncz, P.A., Zukowski, M., Nes, N.: MonetDB/X100: hyper-pipelining query execution. In: CIDR (2005)
Cheng, X., He, B., Du, X., Lau, C.T.: A study of main-memory hash joins on many-core processor: a case with intel knights landing architecture. In: CIKM, pp. 657–666 (2017)
Chhugani, J., Nguyen, A.D., Lee, V.W., Macy, W., Hagog, M., Chen, Y.-K., Baransi, A., Kumar, S., Dubey, P.: Efficient implementation of sorting on multi-core SIMD CPU architecture. In: VLDB, pp. 1313–1324 (2008)
Costea, A., Ionescu, A., Răducanu, B., Switakowski, M., Bârca, C., Sompolski, J., Luszczak, A., Szafrański, M., de Nijs, G., Boncz, P.: VectorH: taking SQL-on-Hadoop to the next level. In: SIGMOD, pp. 1105–1117 (2016)
Dageville, B., Cruanes, T., Zukowski, M., Antonov, V., Avanes, A., Bock, J., Claybaugh, J., Engovatov, D., Hentschel, M., Huang, J., Lee, A.W., Motivala, A., Munir, A.Q., Pelley, S., Povinec, P., Rahn, G., Triantafyllis, S., Unterbrunner, P.: The snowflake elastic data warehouse. In: SIGMOD, pp. 215–226 (2016)
Fang, Z., Zheng, B., Weng, C.: Interleaved multi-vectorizing. PVLDB 13(3), 226–238 (2019)
Flajolet, P., Martin, G.N.: Probabilistic counting algorithms for data base applications. J. Comput. Syst. Sci. 31(2), 182–209 (1985)
Fowler, G., Noll, L.C., Vo, K.-P., Eastlake, D.: The FNV non-cryptographic hash algorithm. Technical report (2017). http://www.ietf.org/internet-drafts/draft-eastlake-fnv-13.txt
Graefe, G.: Volcano: an extensible and parallel query evaluation system. TKDE 6(1), 120–135 (1994)
Gupta, A., Agarwal, D., Tan, D., Kulesza, J., Pathak, R., Stefani, S., Srinivasan, V.: Amazon redshift and the case for simpler data warehouses. In: SIGMOD, pp. 1917–1923 (2015)
Inoue, H., Moriyama, T., Komatsu, H., Nakatani, T.: AA-sort: a new parallel sorting algorithm for multi-core SIMD processors. In: PACT, pp. 189–198 (2007)
Inoue, H., Ohara, M., Taura, K.: Faster set intersection with SIMD instructions by reducing branch mispredictions. PVLDB 8(3), 293–304 (2014)
Inoue, H., Taura, K.: SIMD- and cache-friendly algorithm for sorting an array of structures. PVLDB 8(11), 1274–1285 (2015)
Jha, S., He, B., Lu, M., Cheng, X., Huynh, H.P.: Improving main memory hash joins on Intel Xeon Phi processors: an experimental approach. PVLDB 8(6), 642–653 (2015)
Kim, C., Kaldewey, T., Lee, V.W., Sedlar, E., Nguyen, A.D., Satish, N., Chhugani, J., Di Blas, A., Dubey, P.: Sort vs. hash revisited: fast join implementation on modern multi-core CPUs. PVLDB 2(2), 1378–1389 (2009)
Krikellas, K., Viglas, S., Cintra, M.: Generating code for holistic query evaluation. In: ICDE, pp. 613–624 (2010)
Lang, H., Kipf, A., Passing, L., Boncz, P., Neumann, T., Kemper, A.: Make the most out of your SIMD investments: counter control flow divergence in compiled query pipelines. In: DaMoN (2018)
Lang, H., Mühlbauer, T., Funke, F., Boncz, P.A., Neumann, T., Kemper, A.: Data blocks: hybrid OLTP and OLAP on compressed storage using both vectorization and compilation. In: SIGMOD, pp. 311–326 (2016)
Lang, H., Neumann, T., Kemper, A., Boncz, P.: Performance-optimal filtering: Bloom overtakes cuckoo at high throughput. PVLDB 12(5), 502–515 (2019)
Leis, V., Boncz, P., Kemper, A., Neumann, T.: Morsel-driven parallelism: a NUMA-aware query evaluation framework for the many-core age. In: SIGMOD, pp. 743–754 (2014)
Lemire, D., et al.: Decoding billions of integers per second through vectorization. Softw. Pract. Exp. 45(1), 1–29 (2015)
Li, Y., Patel, J.M.: Bitweaving: fast scans for main memory data processing. In: SIGMOD, pp. 289–300 (2013)
Li, Y., Patel, J.M.: Widetable: an accelerator for analytical data processing. PVLDB 7(10), 907–918 (2014)
Manegold, S., Boncz, P., Kersten, M.: Optimizing database architecture for the new bottleneck: memory access. J. VLDB 9(3), 231–246 (2000)
Manegold, S., Boncz, P., Kersten, M.: What happens during a join? Dissecting CPU and memory optimization effects. In: VLDB, pp. 339–350 (2000)
Manegold, S., Boncz, P., Kersten, M.: Optimizing main-memory join on modern hardware. TKDE 14(4), 709–730 (2002)
Menon, P., Mowry, T.C., Pavlo, A.: Relaxed operator fusion for in-memory databases: making compilation, vectorization, and prefetching work together at last. In: PVLDB (2017)
Neumann, T.: Efficiently compiling efficient query plans for modern hardware. PVLDB 4(9), 539–550 (2011)
Pagh, R., Rodler, F.F.: Cuckoo hashing. J. Algorithms 51(2), 122–144 (2004)
Pirk, H., Moll, O., Zaharia, M., Madden, S.: Voodoo—a vector algebra for portable database performance on modern hardware. PVLDB 9(14), 1707–1718 (2016)
Polychroniou, O., Raghavan, A., Ross, K.A.: Rethinking SIMD vectorization for in-memory databases. In: SIGMOD, pp. 1493–1508 (2015)
Polychroniou, O., Ross, K.A.: High throughput heavy hitter aggregation for modern SIMD processors. In: DaMoN (2013)
Polychroniou, O., Ross, K.A.: A comprehensive study of main-memory partitioning and its application to large-scale comparison- and radix-sort. In: SIGMOD, pp. 755–766 (2014)
Polychroniou, O., Ross, K.A.: Vectorized Bloom filters for advanced SIMD processors. In: DaMoN (2014)
Polychroniou, O., Ross, K.A.: Efficient lightweight compression alongside fast scans. In: DaMoN (2015)
Polychroniou, O., Ross, K.A.: Towards practical vectorized analytical query engines. In: DaMoN (2019)
Raman, V., Attaluri, G., Barber, R., Chainani, N., Kalmuk, D., KulandaiSamy, V., Leenstra, J., Lightstone, S., Liu, S., Lohman, G.M., Malkemus, T., Mueller, R., Pandis, I., Schiefer, B., Sharpe, D., Sidle, R., Storm, A., Zhang, L.: DB2 with BLU acceleration: so much more than just a column store. PVLDB 6(11), 1080–1091 (2013)
Ross, K.A.: Selection conditions in main memory. TODS 29(1), 132–161 (2004)
Ross, K.A.: Efficient hash probes on modern processors. In: ICDE, pp. 1297–1301 (2007)
Roy, P., Teubner, J., Alonso, G.: Efficient frequent item counting in multi-core hardware. In: KDD, pp. 1451–1459 (2012)
Satish, N., Kim, C., Chhugani, J., Nguyen, A.D., Lee, V.W., Kim, D., Dubey, P.: Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort. In: SIGMOD, pp. 351–362 (2010)
Schlegel, B., Karnagel, T., Kiefer, T., Lehner, W.: Scalable frequent itemset mining on many-core processors. In: DaMoN (2013)
Schuh, S., Chen, X., Dittrich, J.: An experimental comparison of thirteen relational equi-joins in main memory. In: SIGMOD, pp. 1961–1976 (2016)
Sirin, U., Tözün, P., Porobic, D., Ailamaki, A.: Micro-architectural analysis of in-memory OLTP. In: SIGMOD, pp. 387–402 (2016)
Sitaridi, E., Polychroniou, O., Ross, K.A.: SIMD-accelerated regular expression matching. In: DaMoN (2016)
Stonebraker, M., Abadi, D.J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O’Neil, E., O’Neil, P., Rasin, A., Tran, N., Zdonik, S.: C-store: a column-oriented DBMS. In: VLDB, pp. 553–564 (2005)
Ungethüm, A., Pietrzyk, J., Damme, P., Krause, A., Habich, D., Lehner, W., Focht, E.: Hardware-oblivious SIMD parallelism for in-memory column-stores. In: CIDR (2020)
Wassenberg, J., Sanders, P.: Engineering a multi core radix sort. In: EuroPar, pp. 160–169 (2011)
Willhalm, T., Popovici, N., Boshmaf, Y., Plattner, H., Zeier, A., Schaffner, J.: SIMD-scan: ultra fast in-memory table scan using on-chip vector processing units. PVLDB 2(1), 385–394 (2009)
Zhou, J., Ross, K.A.: Implementing database operations using SIMD instructions. In: SIGMOD, pp. 145–156 (2002)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is an extension of earlier published work [41], done while the first author was affiliated with Columbia University, and supported by NSF Grant IIS-1422488 and an Oracle gift.
Rights and permissions
About this article
Cite this article
Polychroniou, O., Ross, K.A. VIP: A SIMD vectorized analytical query engine. The VLDB Journal 29, 1243–1261 (2020). https://doi.org/10.1007/s00778-020-00621-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-020-00621-w