ABSTRACT
Modern Datalog engines are employed in industrial applications such as graph-databases, networks, and static program analysis. To cope with vast amount of data, Datalog engines must employ parallel execution strategies, for which specialized concurrent data structures are of paramount importance.
In this paper, we introduce a specialized B-tree data structure for an open-source Datalog compiler written in C++. Our data structure has been specialized for Datalog workloads running on shared-memory multi-core computers. It features (1) an optimistic locking protocol for scalability, (2) is highly tuned, and (3) uses the notion of "hints" to re-use the results of previously performed tree traversals to exploit data ordering properties exhibited by Datalog evaluation. In parallel micro-benchmarks, the new data structure achieves up to 59× higher performance than state-of-the-art industrial standards, while integrated into a Datalog engine it accounts for 3× higher, overall system performance.
- Serge Abiteboul, Richard Hull, and Victor Vianu (Eds.). 1995. Foundations of Databases: The Logical Level (1st ed.). Addison-Wesley Longman Publishing Co., Inc. Google ScholarDigital Library
- Nicholas Allen, Padmanabhan Krishnan, and Bernhard Scholz. 2015. Combining Type-analysis with Points-to Analysis for Analyzing Java Library Source-code. In Proceedings of the 4th ACM SIGPLAN International Workshop on State Of the Art in Program Analysis (SOAP 2015). ACM, New York, NY, USA, 13--18. Google ScholarDigital Library
- Tony Antoniadis, Konstantinos Triantafyllou, and Yannis Smaragdakis. 2017. Porting Doop to Souffle: A Tale of Inter-engine Portability for Datalog-based Analyses. In Proceedings of the 6th ACM SIGPLAN International Workshop on State Of the Art in Program Analysis (SOAP 2017). ACM, New York, NY, USA, 25--30. Google ScholarDigital Library
- Maya Arbel and Hagit Attiya. 2014. Concurrent Updates with RCU: Search Tree As an Example. In Proceedings of the 2014 ACM Symposium on Principles of Distributed Computing (PODC '14). ACM, New York, NY, USA, 196--205. Google ScholarDigital Library
- Dmitry Basin, Edward Bortnikov, Anastasia Braginsky, Guy Golan-Gueta, Eshcar Hillel, Idit Keidar, and Moshe Sulamy. 2017. KiWi: A Key-Value Map for Scalable Real-Time Analytics. SIGPLAN Not. 52, 8 (Jan. 2017), 357--369. Google ScholarDigital Library
- R. Bayer and E. M. Mccreight. 1972. Organization and Maintenance of Large Ordered Indexes. Acta Inf. 1, 3 (Sept. 1972), 173--189. Google ScholarDigital Library
- Hans-J Boehm. 2012. Can seqlocks get along with programming language memory models?. In Proceedings of the 2012 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness. ACM, 12--20. Google ScholarDigital Library
- Anastasia Braginsky and Erez Petrank. 2012. A Lock-free B+Tree. In Proceedings of the Twenty-fourth Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA '12). ACM, New York, NY, USA, 58--67. Google ScholarDigital Library
- Martin Bravenboer and Yannis Smaragdakis. 2009. Exception analysis and points-to analysis: better together. In Proceedings of the eighteenth international symposium on Software testing and analysis (ISSTA '09). ACM, New York, NY, USA, 1--12. Google ScholarDigital Library
- Martin Bravenboer and Yannis Smaragdakis. 2009. Strictly Declarative Specification of Sophisticated Points-to Analyses. SIGPLAN Not. 44, 10 (2009), 243--262. Google ScholarDigital Library
- Nathan G. Bronson, Jared Casper, Hassan Chafi, and Kunle Olukotun. 2010. A Practical Concurrent Binary Search Tree. SIGPLAN Not. 45, 5 (Jan. 2010), 257--268. Google ScholarDigital Library
- Trevor Brown. 2014. B-slack Trees: Space Efficient B-Trees. In Algorithm Theory - SWAT 2014, R. Ravi and Inge Li Gørtz (Eds.). Springer International Publishing, Cham, 122--133.Google Scholar
- Trevor Brown and Joanna Helga. 2011. Non-blocking K-ary Search Trees. In Proceedings of the 15th International Conference on Principles of Distributed Systems (OPODIS'11). Springer-Verlag, Berlin, Heidelberg, 207--221. Google ScholarDigital Library
- S. Cohen and O. Wolfson. 1989. Why a Single Parallelization Strategy is Not Enough in Knowledge Bases. In Proceedings of the Eighth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS '89). ACM, New York, NY, USA, 200--216. Google ScholarDigital Library
- Tyler Crain, Vincent Gramoli, and Michel Raynal. 2012. A Speculation-friendly Binary Search Tree. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '12). ACM, New York, NY, USA, 161--170. Google ScholarDigital Library
- The Souffle Developers. {n. d.}. Souffle - A Datalog Engine. http://www.github.com/souffle-lang/souffle. ({n. d.}). Accessed: 2019-01-05.Google Scholar
- Dana Drachsler, Martin Vechev, and Eran Yahav. 2014. Practical Concurrent Binary Search Trees via Logical Ordering. SIGPLAN Not. 49, 8 (Feb. 2014), 343--356. Google ScholarDigital Library
- Ari Fogel, Stanley Fung, Luis Pedrosa, Meg Walraed-Sullivan, Ramesh Govindan, Ratul Mahajan, and Todd Millstein. 2015. A General Approach to Network Configuration Analysis. In Proceedings of the 12th USENIX Conference on Networked Systems Design and Implementation (NSDI'15). USENIX Association, Berkeley, CA, USA, 469--483. Google ScholarDigital Library
- Sumit Ganguly, Avi Silberschatz, and Shalom Tsur. 1990. A Framework for the Parallel Processing of Datalog Queries. In Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data (SIGMOD '90). ACM, New York, NY, USA, 143--152. Google ScholarDigital Library
- Google. {n. d.}. B-Tree Containers from Google. https://isocpp.org/blog/2013/02/b-tree-containers-from-google. ({n. d.}). Accessed: 2017-02-14.Google Scholar
- Goetz Graefe. 2010. A survey of B-tree locking techniques. ACM Transactions on Database Systems (TODS) 35, 3 (2010), 16. Google ScholarDigital Library
- Todd J. Green, Shan Shan Huang, Boon Thau Loo, and Wenchao Zhou. 2013. Datalog and Recursive Query Processing. Foundations and Trends in Databases 5, 2 (2013), 105--195. Google ScholarDigital Library
- Behnaz Hassanshahi, Raghavendra Kagalavadi Ramesh, Padmanabhan Krishnan, Bernhard Scholz, and Yi Lu. 2017. An Efficient Tunable Selective Points-to Analysis for Large Codebases. In Proceedings of the 6th ACM SIGPLAN International Workshop on State Of the Art in Program Analysis (SOAP 2017). ACM, New York, NY, USA, 13--18. Google ScholarDigital Library
- John L. Hennessy and David A. Patterson. 2011. Computer Architecture, Fifth Edition: A Quantitative Approach (5th ed.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. Google ScholarDigital Library
- Krystof Hoder, Nikolaj Bjørner, and Leonardo Mendonça de Moura. 2011. μZ - An Efficient Engine for Fixed Points with Constraints. In Proceedings of the International Conference on Computer Aided Verification, Vol. LNCS 6806. Springer, 457--462. Google ScholarDigital Library
- Shane V. Howley and Jeremy Jones. 2012. A Non-blocking Internal Binary Search Tree. In Proceedings of the Twenty-fourth Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA '12). ACM, New York, NY, USA, 161--171. Google ScholarDigital Library
- G. Hulin. 1989. Parallel Processing of Recursive Queries in Distributed Architectures. In Proceedings of the 15th International Conference on Very Large Data Bases (VLDB '89). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 87--96. Google ScholarDigital Library
- Herbert Jordan, Bernhard Scholz, and Pavle Subotić. 2016. Souffle: On Synthesis of Program Analyzers. In International Conference on Computer Aided Verification.Google Scholar
- Herbert Jordan, Bernhard Scholz, and Pavle Subotic. 2017. Optimal On The Fly Index Selection in Polynomial Time. CoRR abs/1709.03685 (2017). arXiv:1709.03685 http://arxiv.org/abs/1709.03685Google Scholar
- Tomas Karnagel, Roman Dementiev, Ravi Rajwar, Konrad Lai, Thomas Legler, Benjamin Schlegel, and Wolfgang Lehner. 2014. Improving in-memory database index performance with Intel® Transactional Synchronization Extensions. In High Performance Computer Architecture (HPCA), 2014 IEEE 20th International Symposium on. IEEE, 476--487.Google ScholarCross Ref
- H. T. Kung and John T. Robinson. 1981. On Optimistic Methods for Concurrency Control. ACM Trans. Database Syst. 6, 2 (June 1981), 213--226. Google ScholarDigital Library
- Christoph Lameter. 2005. Effective synchronization on Linux/NUMA systems. In Gelato Conference, Vol. 2005.Google Scholar
- Philip L. Lehman and s. Bing Yao. 1981. Efficient Locking for Concurrent Operations on B-trees. ACM Trans. Database Syst. 6, 4 (Dec. 1981), 650--670. Google ScholarDigital Library
- Justin J. Levandoski, David B. Lomet, and Sudipta Sengupta. 2013. The Bw-Tree: A B-tree for New Hardware Platforms. In Proceedings of the 2013 IEEE International Conference on Data Engineering (ICDE 2013) (ICDE '13). IEEE Computer Society, Washington, DC, USA, 302--313. Google ScholarDigital Library
- V. Benjamin Livshits and Monica S. Lam. 2005. Finding Security Vulnerabilities in Java Applications with Static Analysis. In Proceedings of the 14th Conference on USENIX Security Symposium - Volume 14 (SSYM'05). USENIX Association, Berkeley, CA, USA, 18--18. Google ScholarDigital Library
- Yandong Mao, Eddie Kohler, and Robert Tappan Morris. 2012. Cache Craftiness for Fast Multicore Key-value Storage. In Proceedings of the 7th ACM European Conference on Computer Systems (EuroSys '12). ACM, New York, NY, USA, 183--196. Google ScholarDigital Library
- Carlos Alberto Martınez-Angeles, Inês Dutra, Vıtor Santos Costa, and Jorge Buenabad-Chávez. 2014. A datalog engine for gpus. Declarative Programming and Knowledge Management (2014), 152--168.Google Scholar
- Daniel A Menascé and Tatuo Nakanishi. 1982. Optimistic versus pessimistic concurrency control mechanisms in database management systems. Information systems 7, 1 (1982), 13--27.Google Scholar
- Aravind Natarajan and Neeraj Mittal. 2014. Fast Concurrent Lock-free Binary Search Trees. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '14). ACM, New York, NY, USA, 317--328. Google ScholarDigital Library
- Rotem Oshman and Nir Shavit. 2013. The SkipTrie: Low-depth Concurrent Search Without Rebalancing. In Proceedings of the 2013 ACM Symposium on Principles of Distributed Computing (PODC '13). ACM, New York, NY, USA, 23--32. Google ScholarDigital Library
- Aleksandar Prokopec, Nathan Grasso Bronson, Phil Bagwell, and Martin Odersky. 2012. Concurrent Tries with Efficient Non-blocking Snapshots. SIGPLAN Not. 47, 8 (Feb. 2012), 151--160. Google ScholarDigital Library
- James Reinders. 2007. Intel threading building blocks: outfitting C++ for multi-core processor parallelism. "O'Reilly Media, Inc.". Google ScholarDigital Library
- Thomas W. Reps. 1995. Demand Interprocedural Program Analysis Using Logic Databases. Springer US, Boston, MA, 163--196.Google Scholar
- Bernhard Scholz, Herbert Jordan, Pavle Subotić, and Till Westmann. 2016. On Fast Large-scale Program Analysis in Datalog. In Proceedings of the 25th International Conference on Compiler Construction (CC 2016). ACM, New York, NY, USA, 196--206. Google ScholarDigital Library
- Bernhard Scholz, Kostyantyn Vorobyov, Padmanabhan Krishnan, and Till Westmann. 2015. A Datalog Source-to-Source Translator for Static Program Analysis: An Experience Report. In 24th Australasian Software Engineering Conference, ASWEC 2015, Adelaide, SA, Australia, September 28 - October 1, 2015. IEEE Computer Society, 28--37. Google ScholarDigital Library
- Jürgen Seib and Georg Lausen. 1991. Parallelizing Datalog Programs by Generalized Pivoting. In Proceedings of the Tenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS '91). ACM, New York, NY, USA, 241--251. Google ScholarDigital Library
- Jiwon Seo, Jongsoo Park, Jaeho Shin, and Monica S. Lam. 2013. Distributed Socialite: A Datalog-based Language for Large-scale Graph Analysis. Proc. VLDB Endow. 6, 14 (Sept. 2013), 1906--1917. Google ScholarDigital Library
- Jason Sewall, Jatin Chhugani, Changkyu Kim, Nadathur Satish, and Pradeep Dubey. 2011. PALM: Parallel Architecture-Friendly Latch-Free Modifications to B+ Trees on Many-Core Processors. PVLDB 4 (08 2011), 795--806.Google Scholar
- Marianne Shaw, Paraschos Koutris, Bill Howe, and Dan Suciu. 2012. Optimizing Large-scale Semi-Naïve Datalog Evaluation in Hadoop. In Proceedings of the Second International Conference on Datalog in Academia and Industry (Datalog 2.0'12). Springer-Verlag, Berlin, Heidelberg, 165--176. Google ScholarDigital Library
- Alexander Shkapsky, Kai Zeng, and Carlo Zaniolo. 2013. Graph Queries in a Next-generation Datalog System. Proc. VLDB Endow. 6, 12 (Aug. 2013), 1258--1261. Google ScholarDigital Library
- Julian Shun and Guy E. Blelloch. 2014. Phase-concurrent Hash Tables for Determinism. In Proceedings of the 26th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA '14). ACM, New York, NY, USA, 96--107. Google ScholarDigital Library
- J. Whaley, D. Avots, M. Carbin, and M. S. Lam. 2005. Using Datalog with binary decision diagrams for program analysis. In APLAS. 97--118. Google ScholarDigital Library
- Ouri Wolfson and Aya Ozeri. 1990. A New Paradigm for Parallel and Distributed Rule-processing. SIGMOD Rec. 19, 2 (May 1990), 133--142. Google ScholarDigital Library
- Ouri Wolfson and Avi Silberschatz. 1988. Distributed Processing of Logic Programs. SIGMOD Rec. 17, 3 (June 1988), 329--336. Google ScholarDigital Library
- Mohan Yang, Alexander Shkapsky, and Carlo Zaniolo. 2017. Scaling up the performance of more powerful Datalog systems on multicore machines. VLDB J. 26, 2 (2017), 229--248. Google ScholarDigital Library
Index Terms
- A specialized B-tree for concurrent datalog evaluation
Recommendations
T-Tree or B-Tree: Main Memory Database Index Structure Revisited
ADC '00: Proceedings of the Australasian Database ConferenceWhile the B-tree (or the B+-tree) is the most popular index structure in disk-based relational database systems, the T-tree has been widely accepted as a promising index structure for main memory databases where the entire database (or most of them) ...
B-tree concurrency control and recovery in page-server database systems
We develop new algorithms for the management of transactions in a page-shipping client-server database system in which the physical database is organized as a sparse B-tree index. Our starvation-free fine-grained locking protocol combines adaptive ...
Datalog LITE: a deductive query language with linear time model checking
We present Datalog LITE, a new deductive query language with a linear-time model-checking algorithm, that is, linear time data complexity and program complexity. Datalog LITE is a variant of Datalog that uses stratified negation, restricted variable ...
Comments