Scaling Up Concurrent Analytical Workloads on Multi-Core Servers

Psaroudakis, Iraklis

doi:10.5075/epfl-thesis-7222

Psaroudakis, Iraklis

2016

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

Today, an ever-increasing number of researchers, businesses, and data scientists collect and analyze massive amounts of data in database systems. The database system needs to process the resulting highly concurrent analytical workloads by exploiting modern multi-socket multi-core processor systems with non-uniform memory access (NUMA) architectures and increasing memory sizes. Conventional execution engines, however, are not designed for many cores, and neither scale nor perform efficiently on modern multi-core NUMA architectures. Firstly, their query-centric approach, where each query is optimized and evaluated independently, can result in unnecessary contention for hardware resources due to redundant work found across queries in highly concurrent workloads. Secondly, they are unaware of the non-uniform memory access costs and the underlying hardware topology, incurring unnecessarily expensive memory accesses and bandwidth saturation. In this thesis, we show how these scalability and performance impediments can be solved by exploiting sharing among concurrent queries and incorporating NUMA-aware adaptive task scheduling and data placement strategies in the execution engine. Regarding sharing, we identify and categorize state-of-the-art techniques for sharing data and work across concurrent queries at run-time into two categories: reactive sharing, which shares intermediate results across common query sub-plans, and proactive sharing, which builds a global query plan with shared operators to evaluate queries. We integrate the original research prototypes that introduce reactive and proactive sharing, perform a sensitivity analysis, and show how and when each technique benefits performance. Our most significant finding is that reactive and proactive sharing can be combined to exploit the advantages of both sharing techniques for highly concurrent analytical workloads. Regarding NUMA-awareness, we identify, implement, and compare various combinations of task scheduling and data placement strategies under a diverse set of highly concurrent analytical workloads. We develop a prototype based on a commercial main-memory column-store database system. Our most significant finding is that there is no single strategy for task scheduling and data placement that is best for all workloads. In specific, inter-socket stealing of memory-intensive tasks can hurt overall performance, and unnecessary partitioning of data across sockets involves an overhead. For this reason, we implement algorithms that adapt task scheduling and data placement to the workload at run-time. Our experiments show that both sharing and NUMA-awareness can significantly improve the performance and scalability of highly concurrent analytical workloads on modern multi-core servers. Thus, we argue that sharing and NUMA-awareness are key factors for supporting faster processing of big data analytical applications, fully exploiting the hardware resources of modern multi-core servers, and for more responsive user experience.

Details

Title Scaling Up Concurrent Analytical Workloads on Multi-Core Servers

Author(s) Psaroudakis, Iraklis

Advisor(s)

Ailamaki, Anastasia

Pagination 178

Date 2016

Publisher Lausanne, EPFL

Keywords

Database management systems; Analytical processing systems; Multi-socket multi-core servers; Non-uniform hardware topologies; Sharing data and work; Task scheduling; Data placement; NUMA-awareness

Language English

DOI https://doi.org/10.5075/epfl-thesis-7222

Other identifier(s) urn: urn:nbn:ch:bel-epfl-thesis7222-9

Laboratories DIAS

Record Appears in Scientific production and competences > I&C - School of Computer and Communication Sciences > IINFCOM > DIAS - Data-Intensive Applications and Systems Laboratory
Scientific production and competences > EPFL Theses
Work produced at EPFL
Published
Theses

Record creation date 2016-11-09

Files

Abstract

Details

PDF