On the performance limits of thread placement for array databases in non-uniform memory architectures

Dominico, Simone; de Almeida, Eduardo C.; Alves, Marco A. Zanata

doi:10.1007/s00607-021-01043-4

On the performance limits of thread placement for array databases in non-uniform memory architectures

Special Issue Article
Published: 17 January 2022

Volume 105, pages 1059–1075, (2023)
Cite this article

Computing Aims and scope Submit manuscript

194 Accesses
Explore all metrics

Abstract

Array database management systems (Array databases) are specialized software to streamline multi-dimensional data processing. Due to the data-hungry nature of multi-dimensional data applications (e.g., images and time series), array databases must ideally provide linear speedup when using a multi-processing system. However, when dealing with non-uniform memory access (NUMA) machines, array databases may require massive data movement for processing across the NUMA nodes resulting in severe performance impact. This paper investigates the performance impact of five well-known thread pinning strategies running array filtering operations in two different NUMA architectures. To identify the maximum potential performance improvement, we perform an in-width analysis evaluating all possible thread pinning combinations. Our experiments showed execution metrics of two array databases, namely SAVIME and SciDB. We observe a maximum speedup by \(2.25\times \) in SAVIME with a reduction in remote memory access by \(5\times \). For SciDB, we observed a speedup of up to \(5.83\times \) and a reduction on the remote memory access by \(4.1\times \). Our main finding is that well-known static thread pinning strategies only yield 48% from the potential speedup (and 26% of the energy reduction), opening multiple opportunities for improvements.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

Article 27 April 2021

Containerization technologies: taxonomies, applications and challenges

Article 08 June 2021

A Modern Primer on Processing in Memory

Data Availability Statement

The data, queries, and other information used in our experiments are available at https://github.com/Simone-Dominico/array_database_teste.

Notes

http://geoserver.geo-solutions.it/edu/en/multidim/netcdf/netcdf_basics.html.

References

Albutiu MC, Kemper A, Neumann T (2012) Massively parallel sort-merge joins in main memory multi-core database systems. PVLDB 5(10):1064–1075
Google Scholar
Balkesen C, Alonso G, Teubner J, Özsu MT (2013) Multi-core, main-memory joins: sort vs. hash revisited. PVLDB 7(1):85–96
Google Scholar
Baumann P, Furtado P, Ritsch R, Widmann N (1997) The rasdaman approach to multidimensional database management. In: SAC ’97, pp 166–173
Brown PG (2010) Overview of scidb: Large scale array storage, processing and analysis. In: SIGMOD, pp 963–968
Camara G, Egenhofer MJ, Ferreira K, Andrade P, Queiroz G, Sanchez A, Jones J, Vinhas L (2014) Fields as a generic data type for big spatial data. In: International conference on geographic information science. Springer, Berlin, pp 159–172
Center BS (2016) New hpc4e seismic test suite to increase the pace of development of new modelling and imaging technologies. https://www.bsc.es/news/bsc-news/new-hpc4e-seismic-test-suite-increase-the-pace-development-new-modelling-and-imaging-technologies
Chandra R, Dagum L, Kohr D, Menon R, Maydan D, McDonald J (2001) Parallel programming in OpenMP. Morgan Kaufmann, Burlington
Google Scholar
Chasparis GC, Rossbory M, Janjic V (2017) Efficient dynamic pinning of parallelized applications by reinforcement learning with applications. In: Euro-Par: parallel processing, pp 164–176
Cruz EHM, Alves MAZ, Carissimi A, Navaux POA, Ribeiro CP, Méhaut JF (2012) Memory-aware thread and data mapping for hierarchical multi-core platforms. Int J Network Comput 2(1):97–116
Article Google Scholar
Cruz EHM, Diener M, Pilla LL, Navaux POA (2016) Hardware-assisted thread and data mapping in hierarchical multicore architectures. ACM Trans Archit Code Optim 13(3):1–28
Article Google Scholar
Dominico S, de Almeida EC, Meira JA, Alves MAZ (2018) An elastic multi-core allocation mechanism for database systems. In: ICDE, pp 473–484
Gerhardt L, Faham C, Yao Y (2015) Accelerating scientific analysis with SciDB. J Phys Conf Ser 664(7):072019
Giceva J, Alonso G, Roscoe T, Harris T (2014) Deployment of query plans on multicores. PVLDB 8(3):233–244
Google Scholar
Intel (2019) Maximizing multicore processor performance. https://www.intel.com/content/www/us/en/io/quickpath-technology/quickpath-technology-general.html
Kepe TR (2019) The design and implementation of query execution in modern processing-in-memory hardware. PhD thesis, UFPR - Federal University of Paraná, Curitiba - Brazil, 115 p
Khaleghzadeh H, Manumachu RR, Lastovetsky A (2018) A novel data-partitioning algorithm for performance optimization of data-parallel applications on heterogeneous hpc platforms. TPDS 29(10):2176–2190
Google Scholar
Kim S, Sohn SG, Kim T, Yu J, Kim B, Moon B (2016) Selective scan for filter operator of scidb. In: SSDBM ’16, pp 1–4
Kissinger T, Kiefer T, Schlegel B, Habich D, Molka D, Lehner W (2014) Eris: a numa-aware in-memory storage engine for analytical workloads. In: ADMS@ VLDB, pp 1–12
Leis V, Boncz P, Kemper A, Neumann T (2014) Morsel-driven parallelism: a numa-aware query evaluation framework for the many-core age. In: SIGMOD, pp 743–754
Lepers B, Quéma V, Fedorova A (2015) Thread and memory placement on numa systems: asymmetry matters. In: USENIX ATC ’15, pp 277–289
Lustosa H, Porto F (2019) SAVIME: A multidimensional system for the analysis and visualization of simulation data. CoRR abs/1903.02949
Lustosa H, Porto F, Blanco P, Valduriez P (2016) Database system support of simulation data. PVLDB 9(13):1329–1340
Google Scholar
Lustosa H, Lemus N, Porto F, Valduriez P (2017) Tars: an array model with rich semantics for multidimensional data. In: ER FORUM, pp 114–127
Memarzia P, Ray S, Bhavsar VC (2020) The art of efficient in-memory query processing on NUMA systems: a systematic approach. In: ICDE, pp 781–792
Ozturk O, Orhan U, Ding W, Yedlapalli P, Kandemir MT (2016) Cache hierarchy-aware query mapping on emerging multicore architectures. IEEE Trans Comput 66(3):403–415
Article MathSciNet MATH Google Scholar
Papadopoulos S, Datta K, Madden S, Mattson T (2016) The tiledb array data storage manager. PVLDB 10(4):349–360
Google Scholar
Popov M, Jimborean A, Black-Schaffer D (2019) Efficient thread/page/parallelism autotuning for numa systems. In: ICS ’19, pp 342–353
Porobic D, Pandis I, Branco M, Tözün P, Ailamaki A (2012) Oltp on hardware islands. PVLDB 5(11):1447–1458
Google Scholar
Ray S, Higgins C, Anupindi V, Gautam S (2020) Enabling numa-aware main memory spatial join processing: an experimental study. In: ADMS@ VLDB
Sánchez Barrera I, Moretó M, Ayguadé E, Labarta J, Valero M, Casas M (2018) Reducing data movement on large shared memory systems by exploiting computation dependencies. In: ICS ’18, pp 207–217
Soroush E, Balazinska M, Wang D (2011) Arraystore: a storage manager for complex parallel array processing. In: SIGMOD, pp 253–264
Virouleau P, Broquedis F, Gautier T, Rastello F (2016) Using data dependencies to improve task-based scheduling strategies on numa architectures. Euro-Par 2016:531–544
Google Scholar
Willhalm Thomas FP Dementiev Roman (2012) Intel performance counter monitor. https://software.intel.com/en-us/articles/intel-performance-counter-monitor
Zhang Y, Kersten M, Manegold S (2013) Sciql: array data processing inside an rdbms. In: SIGMOD, pp 1049–1052

Download references

Acknowledgements

We would like to thank the UFRGS, some experiments in this work used the PCAD infrastructure, http://gppd-hpc.inf.ufrgs.br, at INF/UFRGS.

Funding

This work was supported by Serrapilheira Institute (Grant Number Serra-1709-16621) and CAPES.

Author information

Authors and Affiliations

Department of Informatics, University Federal of Paraná, Curitiba, Brazil
Simone Dominico, Eduardo C. de Almeida & Marco A. Zanata Alves

Authors

Simone Dominico
View author publications
You can also search for this author in PubMed Google Scholar
Eduardo C. de Almeida
View author publications
You can also search for this author in PubMed Google Scholar
Marco A. Zanata Alves
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Simone Dominico.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dominico, S., de Almeida, E.C. & Alves, M.A.Z. On the performance limits of thread placement for array databases in non-uniform memory architectures. Computing 105, 1059–1075 (2023). https://doi.org/10.1007/s00607-021-01043-4

Download citation

Received: 01 May 2021
Accepted: 09 December 2021
Published: 17 January 2022
Issue Date: May 2023
DOI: https://doi.org/10.1007/s00607-021-01043-4

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the performance limits of thread placement for array databases in non-uniform memory architectures

Abstract

Access this article

Similar content being viewed by others

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

Containerization technologies: taxonomies, applications and challenges

A Modern Primer on Processing in Memory

Data Availability Statement

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

On the performance limits of thread placement for array databases in non-uniform memory architectures

Abstract

Access this article

Similar content being viewed by others

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

Containerization technologies: taxonomies, applications and challenges

A Modern Primer on Processing in Memory

Data Availability Statement

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation