ABSTRACT
Deterministic database systems have been shown to yield high throughput on a cluster of commodity machines while ensuring the strong consistency between replicas, provided that the data can be well-partitioned on these machines. However, data partitioning can be suboptimal for many reasons in real-world applications. In this paper, we present T-Part, a transaction execution engine that partitions transactions in a deterministic database system to deal with the unforeseeable workloads or workloads whose data are hard to partition. By modeling the dependency between transactions as a T-graph and continuously partitioning that graph, T-Part allows each transaction to know which later transactions on other machines will read its writes so that it can push forward the writes to those later transactions immediately after committing. This forward-pushing reduces the chance that the later transactions stall due to the unavailability of remote data. We implement a prototype for T-Part. Extensive experiments are conducted and the results demonstrate the effectiveness of T-Part.
- Calvin source code. https://github.com/yaledb/calvin.Google Scholar
- Elasql. http://www.elasql.org.Google Scholar
- Nuodb. http://www.nuodb.com.Google Scholar
- The top 5 aws ec2 performance problems. http://www.datadoghq.com/wp-content/uploads/2015/06/Top-5-AWS-Ec2-Performance-Problems-Guide-Ebook.pdf.Google Scholar
- Vanilladb. http://www.vanilladb.org.Google Scholar
- C. Amza, A. L. Cox, and W. Zwaenepoel. Conflict-aware scheduling for dynamic content applications. In USENIX Symposium on Internet Technologies and Systems, volume 21, page 22, 2003. Google ScholarDigital Library
- T. Cao, M. Vaz Salles, B. Sowell, Y. Yue, A. Demers, J. Gehrke, and W. White. Fast checkpoint recovery algorithms for frequently consistent applications. In Proc. of SIGMOD, pages 265--276. ACM, 2011. Google ScholarDigital Library
- T. Chandra, R. Griesemer, and J. Redstone. Paxos made live-an engineering perspective (2006 invited talk). In Proc. of PODC, volume 7, 2007. Google ScholarDigital Library
- C. Curino, E. Jones, Y. Zhang, and S. Madden. Schism: a workload-driven approach to database replication and partitioning. Proc. of the VLDB Endowment, 3(1--2):48--57, 2010. Google ScholarDigital Library
- S. Das, D. Agrawal, and A. El Abbadi. G-store: A scalable data store for transactional multi key access in the cloud. In Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC '10, pages 163--174, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- S. Elnikety, S. Dropsho, and W. Zwaenepoel. Tashkent+: Memory-aware load balancing and update filtering in replicated databases. ACM SIGOPS Operating Systems Review, 41(3):399--412, 2007. Google ScholarDigital Library
- J. M. Faleiro and D. J. Abadi. Rethinking serializable multiversion concurrency control. Proc. of the VLDB Endowment, 8(11):1190--1201, 2015. Google ScholarDigital Library
- J. M. Faleiro, A. Thomson, and D. J. Abadi. Lazy evaluation of transactions in database systems. In Proc. of SIGMOD, pages 15--26. ACM, 2014. Google ScholarDigital Library
- P. Hunt, M. Konar, F. P. Junqueira, and B. Reed. Zookeeper: wait-free coordination for internet-scale systems. In Proc. of the 2010 USENIX conference on USENIX annual technical conference, volume 8, pages 11--11, 2010. Google ScholarDigital Library
- R. Kallman, H. Kimura, J. Natkins, A. Pavlo, A. Rasin, S. Zdonik, E. P. Jones, S. Madden, M. Stonebraker, Y. Zhang, et al. H-store: a high-performance, distributed main memory transaction processing system. Proc. of the VLDB Endowment, 1(2):1496--1499, 2008. Google ScholarDigital Library
- G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM journal on Scientific Computing, 20(1):359--392, 1999. Google ScholarDigital Library
- B. Kemme and G. Alonso. Don't be lazy, be consistent: Postgres-r, a new way to implement database replication. In Proc. of VLDB, pages 134--143, 2000. Google ScholarDigital Library
- E. B. Nightingale, P. M. Chen, and J. Flinn. Speculative execution in a distributed file system. ACM Transactions on Computer Systems (TOCS), 24(4):361--392, 2006. Google ScholarDigital Library
- V. S. Pai, M. Aron, G. Banga, M. Svendsen, P. Druschel, W. Zwaenepoel, and E. Nahum. Locality-aware request distribution in cluster-based network servers. In ACM Sigplan Notices, volume 33, pages 205--216. ACM, 1998. Google ScholarDigital Library
- A. Pavlo, C. Curino, and S. Zdonik. Skew-aware automatic database partitioning in shared-nothing, parallel oltp systems. In Proc. of SIGMOD, pages 61--72. ACM, 2012. Google ScholarDigital Library
- A. Quamar, K. A. Kumar, and A. Deshpande. Sword: scalable workload-aware data placement for transactional workloads. In Proc. of EDBT, pages 430--441. ACM, 2013. Google ScholarDigital Library
- B. Reed and F. P. Junqueira. A simple totally ordered broadcast protocol. In Proc. of the 2nd Workshop on Large-Scale Distributed Systems and Middleware, page 2. ACM, 2008. Google ScholarDigital Library
- C. Sapia. Promise: Predicting query behavior to enable predictive caching strategies for olap systems. In Proc. of Data Warehousing and Knowledge Discovery, pages 224--233. Springer, 2000. Google ScholarDigital Library
- A. J. Smith. Sequentiality and prefetching in database systems. ACM Transactions on Database Systems (TODS), 3(3):223--247, 1978. Google ScholarDigital Library
- G. Soundararajan, M. Mihailescu, and C. Amza. Context-aware prefetching at the storage server. In Proc. of the USENIX Annual Technical Conference, pages 377--390, 2008. Google ScholarDigital Library
- I. Stanton and G. Kliot. Streaming graph partitioning for large distributed graphs. In Proc. of SIGKDD, pages 1222--1230. ACM, 2012. Google ScholarDigital Library
- M. Stonebraker, S. Madden, D. J. Abadi, S. Harizopoulos, N. Hachem, and P. Helland. The end of an architectural era:(it's time for a complete rewrite). In Proc. of the VLDB Endowment, pages 1150--1160. VLDB Endowment, 2007. Google ScholarDigital Library
- F. Tauheed, T. Heinis, F. Schürmann, H. Markram, and A. Ailamaki. Scout: Prefetching for latent structure following queries. Proc. of the VLDB Endowment, 5(11):1531--1542, 2012. Google ScholarDigital Library
- A. Thomson and D. J. Abadi. The case for determinism in database systems. Proc. of the VLDB Endowment, 3(1--2):70--80, 2010. Google ScholarDigital Library
- A. Thomson, T. Diamond, S.-C. Weng, K. Ren, P. Shao, and D. J. Abadi. Calvin: Fast distributed transactions for partitioned database systems. In Proc. of SIGMOD, pages 1--12. ACM, 2012. Google ScholarDigital Library
- E.-J. van Baaren. Wikibench: A distributed, wikipedia based web application benchmark. Master's thesis, VU University Amsterdam, 2009.Google Scholar
- T. Yamamuro, Y. Suga, N. Kotani, T. Hitaka, and M. Yamamuro. Buffer cache de-duplication for query dispatch in replicated databases. In Database Systems for Advanced Applications, pages 352--366. Springer, 2011. Google ScholarDigital Library
- S. Yang, X. Yan, B. Zong, and A. Khan. Towards effective partition management for large graphs. In Proc. of SIGMOD, pages 517--528. ACM, 2012. Google ScholarDigital Library
- G.-W. You, S.-W. Hwang, and N. Jain. Ursa: Scalable load and power management in cloud storage systems. ACM Transactions on Storage, 9(1):1, 2013. Google ScholarDigital Library
- X. Zhang, M. Barrientos, J. B. Chen, and M. Seltzer. Hacc: An architecture for cluster-based web servers. In Proc. of the USENIX Windows NT Symposium, volume 3, pages 16--16. USENIX Association, 1999. Google ScholarDigital Library
- V. Zuikeviciute and F. Pedone. Conflict-aware load-balancing techniques for database replication. In Proc. of ACM SAC, pages 2169--2173. ACM, 2008. Google ScholarDigital Library
Index Terms
T-Part: Partitioning of Transactions for Forward-Pushing in Deterministic Database Systems
Recommendations
Lazy evaluation of transactions in database systems
SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of DataExisting database systems employ an \textit{eager} transaction processing scheme---that is, upon receiving a transaction request, the system executes all the operations entailed in running the transaction (which typically includes reading database ...
Using Tickets to Enforce the Serializability of Multidatabase Transactions
To enforce global serializability in a multidatabase environment the multidatabase transaction manager must take into account the indirect (transitive) conflicts between multidatabase transactions caused by local transactions. Such conflicts are ...
Transaction Healing: Scaling Optimistic Concurrency Control on Multicores
SIGMOD '16: Proceedings of the 2016 International Conference on Management of DataToday's main-memory databases can support very high transaction rate for OLTP applications. However, when a large number of concurrent transactions contend on the same data records, the system performance can deteriorate significantly. This is ...
Comments