Abstract
Array database management systems (Array databases) are specialized software to streamline multi-dimensional data processing. Due to the data-hungry nature of multi-dimensional data applications (e.g., images and time series), array databases must ideally provide linear speedup when using a multi-processing system. However, when dealing with non-uniform memory access (NUMA) machines, array databases may require massive data movement for processing across the NUMA nodes resulting in severe performance impact. This paper investigates the performance impact of five well-known thread pinning strategies running array filtering operations in two different NUMA architectures. To identify the maximum potential performance improvement, we perform an in-width analysis evaluating all possible thread pinning combinations. Our experiments showed execution metrics of two array databases, namely SAVIME and SciDB. We observe a maximum speedup by \(2.25\times \) in SAVIME with a reduction in remote memory access by \(5\times \). For SciDB, we observed a speedup of up to \(5.83\times \) and a reduction on the remote memory access by \(4.1\times \). Our main finding is that well-known static thread pinning strategies only yield 48% from the potential speedup (and 26% of the energy reduction), opening multiple opportunities for improvements.
Similar content being viewed by others
Data Availability Statement
The data, queries, and other information used in our experiments are available at https://github.com/Simone-Dominico/array_database_teste.
References
Albutiu MC, Kemper A, Neumann T (2012) Massively parallel sort-merge joins in main memory multi-core database systems. PVLDB 5(10):1064–1075
Balkesen C, Alonso G, Teubner J, Özsu MT (2013) Multi-core, main-memory joins: sort vs. hash revisited. PVLDB 7(1):85–96
Baumann P, Furtado P, Ritsch R, Widmann N (1997) The rasdaman approach to multidimensional database management. In: SAC ’97, pp 166–173
Brown PG (2010) Overview of scidb: Large scale array storage, processing and analysis. In: SIGMOD, pp 963–968
Camara G, Egenhofer MJ, Ferreira K, Andrade P, Queiroz G, Sanchez A, Jones J, Vinhas L (2014) Fields as a generic data type for big spatial data. In: International conference on geographic information science. Springer, Berlin, pp 159–172
Center BS (2016) New hpc4e seismic test suite to increase the pace of development of new modelling and imaging technologies. https://www.bsc.es/news/bsc-news/new-hpc4e-seismic-test-suite-increase-the-pace-development-new-modelling-and-imaging-technologies
Chandra R, Dagum L, Kohr D, Menon R, Maydan D, McDonald J (2001) Parallel programming in OpenMP. Morgan Kaufmann, Burlington
Chasparis GC, Rossbory M, Janjic V (2017) Efficient dynamic pinning of parallelized applications by reinforcement learning with applications. In: Euro-Par: parallel processing, pp 164–176
Cruz EHM, Alves MAZ, Carissimi A, Navaux POA, Ribeiro CP, Méhaut JF (2012) Memory-aware thread and data mapping for hierarchical multi-core platforms. Int J Network Comput 2(1):97–116
Cruz EHM, Diener M, Pilla LL, Navaux POA (2016) Hardware-assisted thread and data mapping in hierarchical multicore architectures. ACM Trans Archit Code Optim 13(3):1–28
Dominico S, de Almeida EC, Meira JA, Alves MAZ (2018) An elastic multi-core allocation mechanism for database systems. In: ICDE, pp 473–484
Gerhardt L, Faham C, Yao Y (2015) Accelerating scientific analysis with SciDB. J Phys Conf Ser 664(7):072019
Giceva J, Alonso G, Roscoe T, Harris T (2014) Deployment of query plans on multicores. PVLDB 8(3):233–244
Intel (2019) Maximizing multicore processor performance. https://www.intel.com/content/www/us/en/io/quickpath-technology/quickpath-technology-general.html
Kepe TR (2019) The design and implementation of query execution in modern processing-in-memory hardware. PhD thesis, UFPR - Federal University of Paraná, Curitiba - Brazil, 115 p
Khaleghzadeh H, Manumachu RR, Lastovetsky A (2018) A novel data-partitioning algorithm for performance optimization of data-parallel applications on heterogeneous hpc platforms. TPDS 29(10):2176–2190
Kim S, Sohn SG, Kim T, Yu J, Kim B, Moon B (2016) Selective scan for filter operator of scidb. In: SSDBM ’16, pp 1–4
Kissinger T, Kiefer T, Schlegel B, Habich D, Molka D, Lehner W (2014) Eris: a numa-aware in-memory storage engine for analytical workloads. In: ADMS@ VLDB, pp 1–12
Leis V, Boncz P, Kemper A, Neumann T (2014) Morsel-driven parallelism: a numa-aware query evaluation framework for the many-core age. In: SIGMOD, pp 743–754
Lepers B, Quéma V, Fedorova A (2015) Thread and memory placement on numa systems: asymmetry matters. In: USENIX ATC ’15, pp 277–289
Lustosa H, Porto F (2019) SAVIME: A multidimensional system for the analysis and visualization of simulation data. CoRR abs/1903.02949
Lustosa H, Porto F, Blanco P, Valduriez P (2016) Database system support of simulation data. PVLDB 9(13):1329–1340
Lustosa H, Lemus N, Porto F, Valduriez P (2017) Tars: an array model with rich semantics for multidimensional data. In: ER FORUM, pp 114–127
Memarzia P, Ray S, Bhavsar VC (2020) The art of efficient in-memory query processing on NUMA systems: a systematic approach. In: ICDE, pp 781–792
Ozturk O, Orhan U, Ding W, Yedlapalli P, Kandemir MT (2016) Cache hierarchy-aware query mapping on emerging multicore architectures. IEEE Trans Comput 66(3):403–415
Papadopoulos S, Datta K, Madden S, Mattson T (2016) The tiledb array data storage manager. PVLDB 10(4):349–360
Popov M, Jimborean A, Black-Schaffer D (2019) Efficient thread/page/parallelism autotuning for numa systems. In: ICS ’19, pp 342–353
Porobic D, Pandis I, Branco M, Tözün P, Ailamaki A (2012) Oltp on hardware islands. PVLDB 5(11):1447–1458
Ray S, Higgins C, Anupindi V, Gautam S (2020) Enabling numa-aware main memory spatial join processing: an experimental study. In: ADMS@ VLDB
Sánchez Barrera I, Moretó M, Ayguadé E, Labarta J, Valero M, Casas M (2018) Reducing data movement on large shared memory systems by exploiting computation dependencies. In: ICS ’18, pp 207–217
Soroush E, Balazinska M, Wang D (2011) Arraystore: a storage manager for complex parallel array processing. In: SIGMOD, pp 253–264
Virouleau P, Broquedis F, Gautier T, Rastello F (2016) Using data dependencies to improve task-based scheduling strategies on numa architectures. Euro-Par 2016:531–544
Willhalm Thomas FP Dementiev Roman (2012) Intel performance counter monitor. https://software.intel.com/en-us/articles/intel-performance-counter-monitor
Zhang Y, Kersten M, Manegold S (2013) Sciql: array data processing inside an rdbms. In: SIGMOD, pp 1049–1052
Acknowledgements
We would like to thank the UFRGS, some experiments in this work used the PCAD infrastructure, http://gppd-hpc.inf.ufrgs.br, at INF/UFRGS.
Funding
This work was supported by Serrapilheira Institute (Grant Number Serra-1709-16621) and CAPES.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Dominico, S., de Almeida, E.C. & Alves, M.A.Z. On the performance limits of thread placement for array databases in non-uniform memory architectures. Computing 105, 1059–1075 (2023). https://doi.org/10.1007/s00607-021-01043-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00607-021-01043-4