Skip to main content
Log in

Efficient and stable quorum-based log replication and replay for modern cluster-databases

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

The modern in-memory database (IMDB) can support highly concurrent on-line transaction processing (OLTP) workloads and generate massive transactional logs per second. Quorum-based replication protocols such as Paxos or Raft have been widely used in the distributed databases to offer higher availability and fault-tolerance. However, it is non-trivial to replicate IMDB because high transaction rate has brought new challenges. First, the leader node in quorum replication should have adaptivity by considering various transaction arrival rates and the processing capability of follower nodes. Second, followers are required to replay logs to catch up the state of the leader in the highly concurrent setting to reduce visibility gap. Third, modern databases are often built with a cluster of commodity machines connected by low configuration networks, in which the network anomalies often happen. In this case, the performance would be significantly affected because the follower node falls into the long-duration exception handling process (e.g., fetch lost logs from the leader). To this end, we build QuorumX, an efficient and stable quorum-based replication framework for IMDB under heavy OLTP workloads. QuorumX combines critical path based batching and pipeline batching to provide an adaptive log propagation scheme to obtain a stable and high performance at various settings. Further, we propose a safe and coordination-free log replay scheme to minimize the visibility gap between the leader and follower IMDBs. We further carefully design the process for the follower node in order to alleviate the influence of the unreliable network on the replication performance. Our evaluation results with the YCSB, TPC-C and a realistic microbenchmark demonstrate that QuorumX achieves the performance close to asynchronous primary-backup replication and could always provide a stable service with data consistency and a low-level visibility gap.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Chandra T D, Griesemer R, Redstone J. Paxos made live: an engineering perspective. In: Proceedings of the 26th Annual ACM Symposium on Principles of Distributed Computing. 2007, 398–407

  2. Ongaro D, Ousterhout J. In search of an understandable consensus algorithm. In: Proceedings of 2014 USENIX Annual Technical Conference. 2014, 305–319

  3. van Renesse R, Altinbuken D. Paxos made moderately complex. ACM Computing Surveys, 2015, 47(3): 42

    Article  Google Scholar 

  4. Rao J, Shekita E J, Tata S. Using paxos to build a scalable, consistent, and highly available datastore. Proceedings of the VLDB Endowment, 2011, 4(4): 243–254

    Article  Google Scholar 

  5. Zheng J, Lin Q, Xu J, Wei C, Zeng C, Yang P, Zhang Y. PaxosStore: high-availability storage made practical in WeChat. Proceedings of the VLDB Endowment, 2017, 10(12): 1730–1741

    Article  Google Scholar 

  6. Zhu T, Zhao Z, Li F, Qian W, Zhou A, Xie D, Stutsman R, Li H, Hu H. Solar: towards a shared-everything database on distributed log-structured storage. In: Proceedings of 2018 USENIX Conference on Usenix Annual Technical Conference. 2018, 795–807

  7. Gilbert S, Lynch N. Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services. ACM SIGACT News, 2002, 33(2): 51–59

    Article  Google Scholar 

  8. Breitbart Y, Garcia-Molina H, Silberschatz A. Overview of multidatabase transaction management. The VLDB Journal, 1992, 1(2): 181–239

    Article  Google Scholar 

  9. Daudjee K, Salem K. Lazy database replication with ordering guarantees. In: Proceedings of the 20th International Conference on Data Engineering. 2004, 424–435

  10. Elnikety S, Pedone F, Zwaenepoel W. Database replication using generalized snapshot isolation. In: Proceedings of the 24th IEEE Symposium on Reliable Distributed Systems. 2005, 73–84

  11. Corbett J C, Dean J, Epstein M, Fikes A, Frost C, et al. Spanner: Google’s globally-distributed database. In: Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation. 2012, 251–264

  12. DeCandia G, Hastorun D, Jampani M, Kakulapati G, Lakshman A, Pilchin A, Sivasubramanian S, Vosshall P, Vogels W. Dynamo: amazon’s highly available key-value store. In: Proceedings of the 21st ACM SIGOPS Symposium on Operating Systems Principles. 2007, 205–220

  13. Lakshman A, Malik P. Cassandra: a decentralized structured storage system. ACM SIGOPS Operating Systems Review, 2010, 44(2): 35–40

    Article  Google Scholar 

  14. Santos N, Schiper A. Tuning paxos for high-throughput with batching and pipelining. In: Proceedings of the 13th International Conference on Distributed Computing and Networking. 2012, 153–167

  15. Özcan F, Tian Y, Tözün P. Hybrid transactional/analytical processing: a survey. In: Proceedings of the 2017 ACM International Conference on Management of Data. 2017, 1771–1775

  16. Lee J, Moon S, Kim K, Kim D H, Cha S K, Han W S. Parallel replication across formats in SAP HANA for scaling out mixed OLTP/OLAP workloads. Proceedings of the VLDB Endowment, 2017, 10(12): 1598–1609

    Article  Google Scholar 

  17. Qin D, Brown A D, Goel A. Scalable replay-based replication for fast databases. Proceedings of the VLDB Endowment, 2017, 10(13): 2025–2036

    Article  Google Scholar 

  18. Zheng W, Tu S, Kohler E, Liskov B. Fast databases with fast durability and recovery through multicore parallelism. In: Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation. 2014, 465–477

  19. Romano P, Leonetti M. Self-tuning batching in total order broadcast protocols via analytical modelling and reinforcement learning. In: Proceedings of 2012 International Conference on Computing, Networking and Communications. 2012, 786–792

  20. Friedman R, Hadad E. Adaptive batching for replicated servers. In: Proceedings of the 2006 25th IEEE Symposium on Reliable Distributed Systems. 2006, 311–320

  21. Yu X, Bezerra G, Pavlo A, Devadas S, Stonebraker M. Staring into the abyss: an evaluation of concurrency control with one thousand cores. Proceedings of the VLDB Endowment, 2014, 8(3): 209–220

    Article  Google Scholar 

  22. Wang T, Kimura H. Mostly-optimistic concurrency control for highly contended dynamic workloads on a thousand cores. Proceedings of the VLDB Endowment, 2016, 10(2): 49–60

    Article  Google Scholar 

  23. Ren K, Thomson A, Abadi D J. Lightweight locking for main memory database systems. Proceedings of the VLDB Endowment, 2012, 6(2): 145–156

    Article  Google Scholar 

  24. Kemme B, Alonso G. Don’t be lazy, be consistent: Postgres-R, a new way to implement database replication. In: Proceedings of the 26th International Conference on Very Large Data Bases. 2000, 134–143

  25. Wiesmann M, Pedone F, Schiper A, Kemme B, Alonso G. Database replication techniques: a three parameter classification. In: Proceedings of the 19th IEEE Symposium on Reliable Distributed Systems. 2000, 206–215

  26. Stonebraker M. Concurrency control and consistency of multiple copies of data in distributed INGRES. IEEE Transactions on Software Engineering, 1979, SE-5(3): 188–194

    Article  MATH  Google Scholar 

  27. Hong C, Zhou D, Yang M, Kuo C, Zhang L, Zhou L. KuaFu: closing the parallelism gap in database replication. In: Proceedings of the 2013 IEEE 29th International Conference on Data Engineering. 2013, 1186–1195

  28. Hunt P, Konar M, Junqueira F P, Reed B. ZooKeeper: wait-free coordination for internet-scale systems. In: Proceedings of 2010 USENIX Annual Technical Conference. 2010

  29. Wang D, Cai P, Qian W, Zhou A. Fast quorum-based log replication and replay for fast databases. In: Proceedings of the 24th International Conference on Database Systems for Advanced Applications. 2019, 209–226

Download references

Acknowledgements

This work was partially supported by National Key R&D Program of China (2018YFB1003404), NSFC (Grant Nos. 61972149, 61977026), and ECNU Academic Innovation Promotion Program for Excellent Doctoral Students.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peng Cai.

Additional information

Donghui Wang is a PhD candidate in School of Data Science and Engineering from East China Normal University (ECNU), China. She received her bachelor’s degree in computer science and technology from Zhejiang Normal University, China in 2016. Her research interests include high performance transaction processing in database management systems and high availability in distributed systems.

Peng Cai is a researcher in the School of Data Science and Engineering at East China Normal University (ECNU), China. He received his PhD degree in computer science and technology from ECNU, China in 2011. He joined ECNU in 2015, prior to which Peng worked for the IBM China Research Lab and Baidu. His work has been published in various leading conferences, such as ICDE, SIGIR and ACL. His main research interests include in-memory transaction processing and building adaptive systems using machine learning techniques.

Weining Qian is a professor and Dean of the School of Data Science and Engineering, East China Normal University, China. He received his MS and PhD degrees in computer science from Fudan University, China in 2001 and 2004, respectively. He is now serving as a standing committee member of Database Technology Committee of China Computer Federation, and committee member of ACM SIGMOD China Chapter. His research interests include scalable transaction processing, benchmarking big data systems, and management and analysis of massive datasets.

Aoying Zhou, a professor, Vice President of East China Normal University, China. He got his master’s and bachelor’s degrees in computer science from Sichuan University, China in 1988 and 1985 respectively, and he won his PhD degree from Fudan University in 1993. He is the winner of the National Science Fund for Distinguished Young Scholars supported by National Natural Science Foundation of China (NSFC). He is a CCF Fellow and the Vice Director of Database Technology Committee of CCF. He served Vice PC Chair of ICDE’2009, ICDE’2012, PC Co-chair of VLDB’2014. His research interests include Web data management, data management for data-intensive computing, in-memory cluster computing and distributed transaction processing and benchmarking for big data.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, D., Cai, P., Qian, W. et al. Efficient and stable quorum-based log replication and replay for modern cluster-databases. Front. Comput. Sci. 16, 165612 (2022). https://doi.org/10.1007/s11704-020-0210-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11704-020-0210-y

Keywords

Navigation