ABSTRACT
In this paper, we set out the goal to revisit the results of "Starring into the Abyss [...] of Concurrency Control with [1000] Cores" [27] and analyse in-memory DBMSs on today's large hardware. Despite the original assumption of the authors, today we do not see single-socket CPUs with 1000 cores. Instead multi-socket hardware made its way into production data centres. Hence, we follow up on this prior work with an evaluation of the characteristics of concurrency control schemes on real production multi-socket hardware with 1568 cores. To our surprise, we made several interesting findings which we report on in this paper.
- Tiemo Bang, Ismail Oukid, Norman May, Ilia Petrov, and Carsten Binnig. 2020. Robust Performance of Main Memory Data Structures by Configuration. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (SIGMOD'20),. ACM, New York, NY, USA, 16. Google ScholarDigital Library
- Philip A. Bernstein and Nathan Goodman. 1981. Concurrency Control in Distributed Database Systems. ACM Comput. Surv. 13, 2 (June 1981), 185--221.Google ScholarDigital Library
- Philip A. Bernstein and Nathan Goodman. 1983. Multiversion Concurrency Control---Theory and Algorithms. ACM Trans. Database Syst. 8, 4 (Dec. 1983), 465--483.Google ScholarDigital Library
- Trevor Brown, Alex Kogan, Yossi Lev, and Victor Luchangco. 2016. Investigating the Performance of Hardware Transactions on a Multi-Socket Machine. In Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures. ACM, 2935796, 121--132. Google ScholarDigital Library
- Austin T. Clements, M. Frans Kaashoek, and Nickolai Zeldovich. 2012. Scalable Address Spaces using RCU Balanced Trees. In Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems. Association for Computing Machinery, 199--210. Google ScholarDigital Library
- Tudor David, Rachid Guerraoui, and Vasileios Trigonakis. 2013. Everything You Always Wanted to Know About Synchronization But Were Afraid to Ask. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. ACM, 2522714, 33--48. Google ScholarDigital Library
- Ulrich Drepper. 2007. What Every Programmer Should Know About Memory. https://people.freebsd.org/~lstewart/articles/cpumemory.pdf.Google Scholar
- Alireza Farshin, Amir Roozbeh, Gerald Q. Maguire, and Dejan Kostić. 2019. Make the Most out of Last Level Cache in Intel Processors. In Proceedings of the Fourteenth EuroSys Conference 2019. Association for Computing Machinery, Article 8. Google ScholarDigital Library
- Hubertus Franke, Rusty Russell, and Matthew Kirkwood. 2002. Fuss, Futexes and Furwocks: Fast Userlevel Locking in Linux. In AUUG Conference Proceedings, Vol. 85. AUUG, Inc. Kensington, NSW, Australia, 479 -- 495.Google Scholar
- Hewlett Packard Enterprise. 2018. The Unique Modular Architecture of HPE Superdome Flex: How it Works and Why It Matters. https://community.hpe.com/t5/Servers-The-Right-Compute/The-unique-modular-architecture-of-HPE-Superdome-Flex-How-it/ba-p/7001330#.XnsMbEBFyAg.Google Scholar
- Hewlett Packard Enterprise Development LP. 2018. HPE Superdome Flex, Intel Processors Scale SAP HANA. https://www.intel.com/content/www/us/en/big-data/hpe-superdome-flex-sap-hana-wp.html.Google Scholar
- Hewlett Packard Enterprise Development LP. 2020. HPE Superdome Flex Server Architecture and RAS. https://assets.ext.hpe.com/is/content/hpedam/documents/a00036000-6999/a00036491/a00036491enw.pdf.Google Scholar
- Yihe Huang, William Qian, Eddie Kohler, Barbara Liskov, and Liuba Shrira. 2020. Opportunities for Optimism in Contended Main-Memory Multicore Transactions. Proc. VLDB Endow. 13, 5 (Jan. 2020), 629--642. Google ScholarDigital Library
- Intel Corporation. 2019. Intel® 64 and IA-32 Architectures Software Developer's Manual Combined Volumes: 1, 2A, 2B, 2C, 2D, 3A, 3B, 3C, 3D and 4.Google Scholar
- Christopher Jonathan, Umar Farooq Minhas, James Hunter, Justin Levandoski, and Gor Nishanov. 2018. Exploiting Coroutines to Attack the "Killer Nanoseconds". Proc. VLDB Endow. 11, 11 (2018), 1702--1714. Google ScholarDigital Library
- Robert Kallman, Hideaki Kimura, Jonathan Natkins, Andrew Pavlo, Alexander Rasin, Stanley Zdonik, Evan P. C. Jones, Samuel Madden, Michael Stonebraker, Yang Zhang, John Hugg, and Daniel J. Abadi. 2008. H-Store: A High-Performance, Distributed Main Memory Transaction Processing System. Proc. VLDB Endow. 1, 2 (Aug. 2008), 1496--1499. Google ScholarDigital Library
- Timo Kersten, Viktor Leis, Alfons Kemper, Thomas Neumann, Andrew Pavlo, and Peter Boncz. 2018. Everything You Always Wanted to Know about Compiled and Vectorized Queries but Were Afraid to Ask. Proc. VLDB Endow. 11, 13 (Sept. 2018), 2209--2222. Google ScholarDigital Library
- Hideaki Kimura. 2015. FOEDUS: OLTP Engine for a Thousand Cores and NVRAM. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. ACM, 691--706. Google ScholarDigital Library
- H. T. Kung and John T. Robinson. 1981. On Optimistic Methods for Concurrency Control. ACM Trans. Database Syst. 6, 2 (June 1981), 213--226.Google ScholarDigital Library
- Tirthankar Lahiri and Markus Kissling. 2015. Oracle's In-Memory Database Strategy for OLTP and Analytics. https://www.doag.org/formes/pubfiles/7378967/2015-K-DB-Tirthankar_Lahiri-Oracle_s_In-Memory_Database_Strategy_for_Analytics_and_OLTP-Manuskript.pdf.Google Scholar
- Danica Porobic, Ippokratis Pandis, Miguel Branco, Pinar Tözün, and Anastasia Ailamaki. 2016. Characterization of the Impact of Hardware Islands on OLTP. The VLDB Journal 25, 5 (2016), 625--650. Google ScholarDigital Library
- Georgios Psaropoulos, Thomas Legler, Norman May, and Anastasia Ailamaki. 2019. Interleaving with Coroutines: A Systematic and Practical Approach to Hide Memory Latency in Index Joins. The VLDB Journal 28, 4 (2019), 451--471. Google ScholarCross Ref
- Iraklis Psaroudakis, Tobias Scheuer, Norman May, Abdelkader Sellami, and Anastasia Ailamaki. 2015. Scaling Up Concurrent Main-Memory Column-Store Scans: Towards Adaptive NUMA-aware Data and Task Placement. The Proceedings of the VLDB Endowment 8, 12 (2015), 1442--1453. Google ScholarDigital Library
- The Transaction Processing Council. 2007. TPC-C Benchmark (Revision 5.9.0). http://www.tpc.org/tpcc/spec/tpcc_current.pdf.Google Scholar
- Stephen Tu, Wenting Zheng, Eddie Kohler, Barbara Liskov, and Samuel Madden. 2013. Speedy Transactions in Multicore In-Memory Databases. In ACM SIGOPS 24th Symposium on Operating Systems Principles, SOSP '13, Farmington, PA, USA, November 3-6, 2013. ACM, 18--32. Google ScholarDigital Library
- Vish Viswanathan, Karthik Kumar, Thomas Willhalm, Patrick Lu, Blazej Filipiak, and Sri Sakthivelu. 2020. Intel Memory Latency Checker v3.8. https://software.intel.com/en-us/articles/intelr-memory-latency-checker.Google Scholar
- Xiangyao Yu, George Bezerra, Andrew Pavlo, Srinivas Devadas, and Michael Stonebraker. 2014. Staring into the Abyss: An Evaluation of Concurrency Control with One Thousand Cores. Proc. VLDB Endow. 8, 3 (Nov. 2014), 209--220. Google ScholarDigital Library
- Xiangyao Yu, Andrew Pavlo, Daniel Sanchez, and Srinivas Devadas. 2016. Tic-Toc: Time Traveling Optimistic Concurrency Control. In Proceedings of the 2016 International Conference on Management of Data. Association for Computing Machinery, New York, NY, USA, 1629--1642. Google ScholarDigital Library
Recommendations
How to simulate 1000 cores
This paper proposes a novel methodology to efficiently simulate shared-memory multiprocessors composed of hundreds of cores. The basic idea is to use thread-level parallelism in the software system and translate it into corelevel parallelism in the ...
The full story of 1000 cores: An examination of concurrency control on real(ly) large multi-socket hardware
AbstractIn our initial DaMoN paper, we set out the goal to revisit the results of “Starring into the Abyss [...] of Concurrency Control with [1000] Cores” (Yu in Proc. VLDB Endow 8: 209-220, 2014). Against their assumption, today we do not see single-...
Scaling up matrix computations on shared-memory manycore systems with 1000 CPU cores
ICS '14: Proceedings of the 28th ACM international conference on SupercomputingWhile the growing number of cores per chip allows researchers to solve larger scientific and engineering problems, the parallel efficiency of the deployed parallel software starts to decrease. This unscalability problem happens to both vendor-provided ...
Comments