Skip to main content
Log in

On the performance limits of thread placement for array databases in non-uniform memory architectures

  • Special Issue Article
  • Published:
Computing Aims and scope Submit manuscript

Abstract

Array database management systems (Array databases) are specialized software to streamline multi-dimensional data processing. Due to the data-hungry nature of multi-dimensional data applications (e.g., images and time series), array databases must ideally provide linear speedup when using a multi-processing system. However, when dealing with non-uniform memory access (NUMA) machines, array databases may require massive data movement for processing across the NUMA nodes resulting in severe performance impact. This paper investigates the performance impact of five well-known thread pinning strategies running array filtering operations in two different NUMA architectures. To identify the maximum potential performance improvement, we perform an in-width analysis evaluating all possible thread pinning combinations. Our experiments showed execution metrics of two array databases, namely SAVIME and SciDB. We observe a maximum speedup by \(2.25\times \) in SAVIME with a reduction in remote memory access by \(5\times \). For SciDB, we observed a speedup of up to \(5.83\times \) and a reduction on the remote memory access by \(4.1\times \). Our main finding is that well-known static thread pinning strategies only yield 48% from the potential speedup (and 26% of the energy reduction), opening multiple opportunities for improvements.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data Availability Statement

The data, queries, and other information used in our experiments are available at https://github.com/Simone-Dominico/array_database_teste.

Notes

  1. http://geoserver.geo-solutions.it/edu/en/multidim/netcdf/netcdf_basics.html.

References

  1. Albutiu MC, Kemper A, Neumann T (2012) Massively parallel sort-merge joins in main memory multi-core database systems. PVLDB 5(10):1064–1075

    Google Scholar 

  2. Balkesen C, Alonso G, Teubner J, Özsu MT (2013) Multi-core, main-memory joins: sort vs. hash revisited. PVLDB 7(1):85–96

    Google Scholar 

  3. Baumann P, Furtado P, Ritsch R, Widmann N (1997) The rasdaman approach to multidimensional database management. In: SAC ’97, pp 166–173

  4. Brown PG (2010) Overview of scidb: Large scale array storage, processing and analysis. In: SIGMOD, pp 963–968

  5. Camara G, Egenhofer MJ, Ferreira K, Andrade P, Queiroz G, Sanchez A, Jones J, Vinhas L (2014) Fields as a generic data type for big spatial data. In: International conference on geographic information science. Springer, Berlin, pp 159–172

  6. Center BS (2016) New hpc4e seismic test suite to increase the pace of development of new modelling and imaging technologies. https://www.bsc.es/news/bsc-news/new-hpc4e-seismic-test-suite-increase-the-pace-development-new-modelling-and-imaging-technologies

  7. Chandra R, Dagum L, Kohr D, Menon R, Maydan D, McDonald J (2001) Parallel programming in OpenMP. Morgan Kaufmann, Burlington

    Google Scholar 

  8. Chasparis GC, Rossbory M, Janjic V (2017) Efficient dynamic pinning of parallelized applications by reinforcement learning with applications. In: Euro-Par: parallel processing, pp 164–176

  9. Cruz EHM, Alves MAZ, Carissimi A, Navaux POA, Ribeiro CP, Méhaut JF (2012) Memory-aware thread and data mapping for hierarchical multi-core platforms. Int J Network Comput 2(1):97–116

    Article  Google Scholar 

  10. Cruz EHM, Diener M, Pilla LL, Navaux POA (2016) Hardware-assisted thread and data mapping in hierarchical multicore architectures. ACM Trans Archit Code Optim 13(3):1–28

    Article  Google Scholar 

  11. Dominico S, de Almeida EC, Meira JA, Alves MAZ (2018) An elastic multi-core allocation mechanism for database systems. In: ICDE, pp 473–484

  12. Gerhardt L, Faham C, Yao Y (2015) Accelerating scientific analysis with SciDB. J Phys Conf Ser 664(7):072019

  13. Giceva J, Alonso G, Roscoe T, Harris T (2014) Deployment of query plans on multicores. PVLDB 8(3):233–244

    Google Scholar 

  14. Intel (2019) Maximizing multicore processor performance. https://www.intel.com/content/www/us/en/io/quickpath-technology/quickpath-technology-general.html

  15. Kepe TR (2019) The design and implementation of query execution in modern processing-in-memory hardware. PhD thesis, UFPR - Federal University of Paraná, Curitiba - Brazil, 115 p

  16. Khaleghzadeh H, Manumachu RR, Lastovetsky A (2018) A novel data-partitioning algorithm for performance optimization of data-parallel applications on heterogeneous hpc platforms. TPDS 29(10):2176–2190

    Google Scholar 

  17. Kim S, Sohn SG, Kim T, Yu J, Kim B, Moon B (2016) Selective scan for filter operator of scidb. In: SSDBM ’16, pp 1–4

  18. Kissinger T, Kiefer T, Schlegel B, Habich D, Molka D, Lehner W (2014) Eris: a numa-aware in-memory storage engine for analytical workloads. In: ADMS@ VLDB, pp 1–12

  19. Leis V, Boncz P, Kemper A, Neumann T (2014) Morsel-driven parallelism: a numa-aware query evaluation framework for the many-core age. In: SIGMOD, pp 743–754

  20. Lepers B, Quéma V, Fedorova A (2015) Thread and memory placement on numa systems: asymmetry matters. In: USENIX ATC ’15, pp 277–289

  21. Lustosa H, Porto F (2019) SAVIME: A multidimensional system for the analysis and visualization of simulation data. CoRR abs/1903.02949

  22. Lustosa H, Porto F, Blanco P, Valduriez P (2016) Database system support of simulation data. PVLDB 9(13):1329–1340

    Google Scholar 

  23. Lustosa H, Lemus N, Porto F, Valduriez P (2017) Tars: an array model with rich semantics for multidimensional data. In: ER FORUM, pp 114–127

  24. Memarzia P, Ray S, Bhavsar VC (2020) The art of efficient in-memory query processing on NUMA systems: a systematic approach. In: ICDE, pp 781–792

  25. Ozturk O, Orhan U, Ding W, Yedlapalli P, Kandemir MT (2016) Cache hierarchy-aware query mapping on emerging multicore architectures. IEEE Trans Comput 66(3):403–415

    Article  MathSciNet  MATH  Google Scholar 

  26. Papadopoulos S, Datta K, Madden S, Mattson T (2016) The tiledb array data storage manager. PVLDB 10(4):349–360

    Google Scholar 

  27. Popov M, Jimborean A, Black-Schaffer D (2019) Efficient thread/page/parallelism autotuning for numa systems. In: ICS ’19, pp 342–353

  28. Porobic D, Pandis I, Branco M, Tözün P, Ailamaki A (2012) Oltp on hardware islands. PVLDB 5(11):1447–1458

    Google Scholar 

  29. Ray S, Higgins C, Anupindi V, Gautam S (2020) Enabling numa-aware main memory spatial join processing: an experimental study. In: ADMS@ VLDB

  30. Sánchez Barrera I, Moretó M, Ayguadé E, Labarta J, Valero M, Casas M (2018) Reducing data movement on large shared memory systems by exploiting computation dependencies. In: ICS ’18, pp 207–217

  31. Soroush E, Balazinska M, Wang D (2011) Arraystore: a storage manager for complex parallel array processing. In: SIGMOD, pp 253–264

  32. Virouleau P, Broquedis F, Gautier T, Rastello F (2016) Using data dependencies to improve task-based scheduling strategies on numa architectures. Euro-Par 2016:531–544

    Google Scholar 

  33. Willhalm Thomas FP Dementiev Roman (2012) Intel performance counter monitor. https://software.intel.com/en-us/articles/intel-performance-counter-monitor

  34. Zhang Y, Kersten M, Manegold S (2013) Sciql: array data processing inside an rdbms. In: SIGMOD, pp 1049–1052

Download references

Acknowledgements

We would like to thank the UFRGS, some experiments in this work used the PCAD infrastructure, http://gppd-hpc.inf.ufrgs.br, at INF/UFRGS.

Funding

This work was supported by Serrapilheira Institute (Grant Number Serra-1709-16621) and CAPES.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Simone Dominico.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dominico, S., de Almeida, E.C. & Alves, M.A.Z. On the performance limits of thread placement for array databases in non-uniform memory architectures. Computing 105, 1059–1075 (2023). https://doi.org/10.1007/s00607-021-01043-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00607-021-01043-4

Keywords

Mathematics Subject Classification

Navigation