Efficient and stable quorum-based log replication and replay for modern cluster-databases

Wang, Donghui; Cai, Peng; Qian, Weining; Zhou, Aoying

doi:10.1007/s11704-020-0210-y

Efficient and stable quorum-based log replication and replay for modern cluster-databases

Research Article
Published: 09 January 2022

Volume 16, article number 165612, (2022)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Donghui Wang¹,
Peng Cai¹,
Weining Qian¹ &
…
Aoying Zhou¹

107 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

The modern in-memory database (IMDB) can support highly concurrent on-line transaction processing (OLTP) workloads and generate massive transactional logs per second. Quorum-based replication protocols such as Paxos or Raft have been widely used in the distributed databases to offer higher availability and fault-tolerance. However, it is non-trivial to replicate IMDB because high transaction rate has brought new challenges. First, the leader node in quorum replication should have adaptivity by considering various transaction arrival rates and the processing capability of follower nodes. Second, followers are required to replay logs to catch up the state of the leader in the highly concurrent setting to reduce visibility gap. Third, modern databases are often built with a cluster of commodity machines connected by low configuration networks, in which the network anomalies often happen. In this case, the performance would be significantly affected because the follower node falls into the long-duration exception handling process (e.g., fetch lost logs from the leader). To this end, we build QuorumX, an efficient and stable quorum-based replication framework for IMDB under heavy OLTP workloads. QuorumX combines critical path based batching and pipeline batching to provide an adaptive log propagation scheme to obtain a stable and high performance at various settings. Further, we propose a safe and coordination-free log replay scheme to minimize the visibility gap between the leader and follower IMDBs. We further carefully design the process for the follower node in order to alleviate the influence of the unreliable network on the replication performance. Our evaluation results with the YCSB, TPC-C and a realistic microbenchmark demonstrate that QuorumX achieves the performance close to asynchronous primary-backup replication and could always provide a stable service with data consistency and a low-level visibility gap.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fast Quorum-Based Log Replication and Replay for Fast Databases

Scalable and adaptive log manager in distributed systems

Article 08 August 2022

Plover: parallel logging for replication systems

Article 03 January 2020

References

Chandra T D, Griesemer R, Redstone J. Paxos made live: an engineering perspective. In: Proceedings of the 26th Annual ACM Symposium on Principles of Distributed Computing. 2007, 398–407
Ongaro D, Ousterhout J. In search of an understandable consensus algorithm. In: Proceedings of 2014 USENIX Annual Technical Conference. 2014, 305–319
van Renesse R, Altinbuken D. Paxos made moderately complex. ACM Computing Surveys, 2015, 47(3): 42
Article Google Scholar
Rao J, Shekita E J, Tata S. Using paxos to build a scalable, consistent, and highly available datastore. Proceedings of the VLDB Endowment, 2011, 4(4): 243–254
Article Google Scholar
Zheng J, Lin Q, Xu J, Wei C, Zeng C, Yang P, Zhang Y. PaxosStore: high-availability storage made practical in WeChat. Proceedings of the VLDB Endowment, 2017, 10(12): 1730–1741
Article Google Scholar
Zhu T, Zhao Z, Li F, Qian W, Zhou A, Xie D, Stutsman R, Li H, Hu H. Solar: towards a shared-everything database on distributed log-structured storage. In: Proceedings of 2018 USENIX Conference on Usenix Annual Technical Conference. 2018, 795–807
Gilbert S, Lynch N. Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services. ACM SIGACT News, 2002, 33(2): 51–59
Article Google Scholar
Breitbart Y, Garcia-Molina H, Silberschatz A. Overview of multidatabase transaction management. The VLDB Journal, 1992, 1(2): 181–239
Article Google Scholar
Daudjee K, Salem K. Lazy database replication with ordering guarantees. In: Proceedings of the 20th International Conference on Data Engineering. 2004, 424–435
Elnikety S, Pedone F, Zwaenepoel W. Database replication using generalized snapshot isolation. In: Proceedings of the 24th IEEE Symposium on Reliable Distributed Systems. 2005, 73–84
Corbett J C, Dean J, Epstein M, Fikes A, Frost C, et al. Spanner: Google’s globally-distributed database. In: Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation. 2012, 251–264
DeCandia G, Hastorun D, Jampani M, Kakulapati G, Lakshman A, Pilchin A, Sivasubramanian S, Vosshall P, Vogels W. Dynamo: amazon’s highly available key-value store. In: Proceedings of the 21st ACM SIGOPS Symposium on Operating Systems Principles. 2007, 205–220
Lakshman A, Malik P. Cassandra: a decentralized structured storage system. ACM SIGOPS Operating Systems Review, 2010, 44(2): 35–40
Article Google Scholar
Santos N, Schiper A. Tuning paxos for high-throughput with batching and pipelining. In: Proceedings of the 13th International Conference on Distributed Computing and Networking. 2012, 153–167
Özcan F, Tian Y, Tözün P. Hybrid transactional/analytical processing: a survey. In: Proceedings of the 2017 ACM International Conference on Management of Data. 2017, 1771–1775
Lee J, Moon S, Kim K, Kim D H, Cha S K, Han W S. Parallel replication across formats in SAP HANA for scaling out mixed OLTP/OLAP workloads. Proceedings of the VLDB Endowment, 2017, 10(12): 1598–1609
Article Google Scholar
Qin D, Brown A D, Goel A. Scalable replay-based replication for fast databases. Proceedings of the VLDB Endowment, 2017, 10(13): 2025–2036
Article Google Scholar
Zheng W, Tu S, Kohler E, Liskov B. Fast databases with fast durability and recovery through multicore parallelism. In: Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation. 2014, 465–477
Romano P, Leonetti M. Self-tuning batching in total order broadcast protocols via analytical modelling and reinforcement learning. In: Proceedings of 2012 International Conference on Computing, Networking and Communications. 2012, 786–792
Friedman R, Hadad E. Adaptive batching for replicated servers. In: Proceedings of the 2006 25th IEEE Symposium on Reliable Distributed Systems. 2006, 311–320
Yu X, Bezerra G, Pavlo A, Devadas S, Stonebraker M. Staring into the abyss: an evaluation of concurrency control with one thousand cores. Proceedings of the VLDB Endowment, 2014, 8(3): 209–220
Article Google Scholar
Wang T, Kimura H. Mostly-optimistic concurrency control for highly contended dynamic workloads on a thousand cores. Proceedings of the VLDB Endowment, 2016, 10(2): 49–60
Article Google Scholar
Ren K, Thomson A, Abadi D J. Lightweight locking for main memory database systems. Proceedings of the VLDB Endowment, 2012, 6(2): 145–156
Article Google Scholar
Kemme B, Alonso G. Don’t be lazy, be consistent: Postgres-R, a new way to implement database replication. In: Proceedings of the 26th International Conference on Very Large Data Bases. 2000, 134–143
Wiesmann M, Pedone F, Schiper A, Kemme B, Alonso G. Database replication techniques: a three parameter classification. In: Proceedings of the 19th IEEE Symposium on Reliable Distributed Systems. 2000, 206–215
Stonebraker M. Concurrency control and consistency of multiple copies of data in distributed INGRES. IEEE Transactions on Software Engineering, 1979, SE-5(3): 188–194
Article MATH Google Scholar
Hong C, Zhou D, Yang M, Kuo C, Zhang L, Zhou L. KuaFu: closing the parallelism gap in database replication. In: Proceedings of the 2013 IEEE 29th International Conference on Data Engineering. 2013, 1186–1195
Hunt P, Konar M, Junqueira F P, Reed B. ZooKeeper: wait-free coordination for internet-scale systems. In: Proceedings of 2010 USENIX Annual Technical Conference. 2010
Wang D, Cai P, Qian W, Zhou A. Fast quorum-based log replication and replay for fast databases. In: Proceedings of the 24th International Conference on Database Systems for Advanced Applications. 2019, 209–226

Download references

Acknowledgements

This work was partially supported by National Key R&D Program of China (2018YFB1003404), NSFC (Grant Nos. 61972149, 61977026), and ECNU Academic Innovation Promotion Program for Excellent Doctoral Students.

Author information

Authors and Affiliations

School of Data Science and Engineering, East China Normal University, Shanghai, 200062, China
Donghui Wang, Peng Cai, Weining Qian & Aoying Zhou

Authors

Donghui Wang
View author publications
You can also search for this author inPubMed Google Scholar
Peng Cai
View author publications
You can also search for this author inPubMed Google Scholar
Weining Qian
View author publications
You can also search for this author inPubMed Google Scholar
Aoying Zhou
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Peng Cai.

Additional information

Donghui Wang is a PhD candidate in School of Data Science and Engineering from East China Normal University (ECNU), China. She received her bachelor’s degree in computer science and technology from Zhejiang Normal University, China in 2016. Her research interests include high performance transaction processing in database management systems and high availability in distributed systems.

Peng Cai is a researcher in the School of Data Science and Engineering at East China Normal University (ECNU), China. He received his PhD degree in computer science and technology from ECNU, China in 2011. He joined ECNU in 2015, prior to which Peng worked for the IBM China Research Lab and Baidu. His work has been published in various leading conferences, such as ICDE, SIGIR and ACL. His main research interests include in-memory transaction processing and building adaptive systems using machine learning techniques.

Weining Qian is a professor and Dean of the School of Data Science and Engineering, East China Normal University, China. He received his MS and PhD degrees in computer science from Fudan University, China in 2001 and 2004, respectively. He is now serving as a standing committee member of Database Technology Committee of China Computer Federation, and committee member of ACM SIGMOD China Chapter. His research interests include scalable transaction processing, benchmarking big data systems, and management and analysis of massive datasets.

Aoying Zhou, a professor, Vice President of East China Normal University, China. He got his master’s and bachelor’s degrees in computer science from Sichuan University, China in 1988 and 1985 respectively, and he won his PhD degree from Fudan University in 1993. He is the winner of the National Science Fund for Distinguished Young Scholars supported by National Natural Science Foundation of China (NSFC). He is a CCF Fellow and the Vice Director of Database Technology Committee of CCF. He served Vice PC Chair of ICDE’2009, ICDE’2012, PC Co-chair of VLDB’2014. His research interests include Web data management, data management for data-intensive computing, in-memory cluster computing and distributed transaction processing and benchmarking for big data.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, D., Cai, P., Qian, W. et al. Efficient and stable quorum-based log replication and replay for modern cluster-databases. Front. Comput. Sci. 16, 165612 (2022). https://doi.org/10.1007/s11704-020-0210-y

Download citation

Received: 19 May 2020
Accepted: 23 November 2020
Published: 09 January 2022
DOI: https://doi.org/10.1007/s11704-020-0210-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient and stable quorum-based log replication and replay for modern cluster-databases

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Fast Quorum-Based Log Replication and Replay for Fast Databases

Scalable and adaptive log manager in distributed systems

Plover: parallel logging for replication systems

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now