ABSTRACT
Modern databases can run application logic defined in stored procedures inside the database server to improve application speed. The SQL standard specifies how to call external stored routines implemented in programming languages, such as C, C++, or JAVA, to complement declarative SQL-based application logic. This is beneficial for scientific and analytical algorithms because they are usually too complex to be implemented entirely in SQL. At the same time, database applications like matrix calculations or data mining algorithms benefit from multi-threading to parallelize compute-intensive operations. Multi-threaded application code, however, introduces a resource competition between the threads of applications and the threads of the database task scheduler. In this paper, we show that multi-threaded application code can render the database's workload scheduling ineffective and decrease the core throughput of the database by up to 50%. We present a general approach to address this issue by integrating shared memory programming solutions into the task schedulers of databases. In particular, we describe the integration of OpenMP into databases. We implement and evaluate our approach using SAP HANA. Our experiments show that our integration does not introduce overhead, and can improve the throughput of core database operations by up to 15%.
- SAP HANA Predictive Analysis Library (PAL), 2015. http://help.sap.com/hana/SAP_HANA_Predictive_Analysis_Library_PAL_en.pdf.Google Scholar
- SAP HANA SQLScript Reference, 2015. http://help.sap.com/hana/sap_hana_sql_script_reference_en.pdf.Google Scholar
- D. Battré et al. Nephele/PACTs: a programming model and execution framework for web-scale analytical processing. In Proc. ACM SoCC, pages 119--130, 2010. Google ScholarDigital Library
- D. Carney et al. Operator scheduling in a data stream manager. In Proc. VLDB, pages 838--849, 2003. Google ScholarDigital Library
- R. Chaiken et al. SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets. Proc. VLDB Endow., 1(2):1265--1276, 2008. Google ScholarDigital Library
- L. Dagum et al. OpenMP: an industry standard API for shared-memory programming. IEEE Comput. Sci. Eng, 5(1):46--55, 1998. Google ScholarDigital Library
- A. Eisenberg. New standard for stored procedures in SQL. SIGMOD Rec., 25(4):81--88, 1996. Google ScholarDigital Library
- F. Färber et al. The SAP HANA database -- an architecture overview. IEEE Data Eng. Bull., 35(1):28--33, 2012.Google Scholar
- U. Fischer et al. Towards integrated data analytics: Time series forecasting in DBMS. Datenbank-Spektrum, 13(1):45--53, 2013.Google ScholarCross Ref
- E. Friedman et al. SQL/MapReduce: A practical approach to self-describing, polymorphic, and parallelizable user-defined functions. Proc. VLDB Endow., 2(2):1402--1413, 2009. Google ScholarDigital Library
- P. Große et al. Bridging two worlds with RICE integrating R into the SAP in-memory computing engine. PVLDB, 4(12):1307--1317, 2011.Google Scholar
- G. Harrison. MySQL stored procedures: Next big thing or relic of the past? Linux Journal, 2007(164). Google ScholarDigital Library
- M. Jaedicke et al. On Parallel Processing of Aggregate and Scalar Functions in Object-relational DBMS. In Proc. SIGMOD, pages 379--389, 1998. Google ScholarDigital Library
- D. Kernert et al. Slacid - sparse linear algebra in a column-oriented in-memory database system. In Proc. SSDBM, pages 11:1--11:12, 2014. Google ScholarDigital Library
- W. Kim et al. Multicore desktop programming with intel threading building blocks. IEEE Softw., 28(1):23--31, 2011. Google ScholarDigital Library
- V. Leis et al. Morsel-Driven Parallelism: A NUMA-Aware Query Evaluation Framework for the Many-Core Age. In Proc. SIGMOD, pages 743--754, 2014. Google ScholarDigital Library
- V. Linnemann et al. Design and implementation of an extensible database management system supporting user defined data types and functions. In Proc. VLDB, pages 294--305, 1988. Google ScholarDigital Library
- I. Psaroudakis et al. Task Scheduling for Highly Concurrent Analytical and Transactional Main-Memory Workloads. In ADMS, pages 36--45, 2013.Google Scholar
- I. Psaroudakis et al. Scaling up mixed workloads: a battle of data freshness, flexibility, and scheduling. In Proc. TPCTC, 2014.Google Scholar
- V. Raman et al. DB2 with BLU Acceleration: So much more than just a column store. volume 6, pages 1080--1091, 2013. Google ScholarDigital Library
- M. Snir. MPI--the Complete Reference: The MPI core, volume 1. MIT press, 1998. Google ScholarDigital Library
- H. Sutter. The free lunch is over: A fundamental turn toward concurrency in software. Dr. Dobb's Journal, 30(3), 2005.Google Scholar
- M. Wilhelm et al. Mass-spectrometry-based draft of the human proteome. Nature, 509(7502):582--587, 2014.Google ScholarCross Ref
Index Terms
- Extending database task schedulers for multi-threaded application code
Recommendations
Accelerating SQL database operations on a GPU with CUDA
GPGPU-3: Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing UnitsPrior work has shown dramatic acceleration for various database operations on GPUs, but only using primitives that are not part of conventional database languages such as SQL. This paper implements a subset of the SQLite command processor directly on ...
An application-centric evaluation of OpenCL on multi-core CPUs
Although designed as a cross-platform parallel programming model, OpenCL remains mainly used for GPU programming. Nevertheless, a large amount of applications are parallelized, implemented, and eventually optimized in OpenCL. Thus, in this paper, we ...
Experiences in extending parallware to support OpenACC
WACCPD '15: Proceedings of the Second Workshop on Accelerator Programming using DirectivesPorting scientific codes to accelerator-based computers using OpenACC and OpenMP is an important topic for the HPC community. Programmability, performance portability and developer productivity are key issues for the widespread use of these systems. In ...
Comments