Skip to main content
Log in

Scalable and adaptive log manager in distributed systems

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

On-line transaction processing (OLTP) systems rely on transaction logging and quorum-based consensus protocol to guarantee durability, high availability and strong consistency. This makes the log manager a key component of distributed database management systems (DDBMSs). The leader of DDBMSs commonly adopts a centralized logging method to writing log entries into a stable storage device and uses a constant log replication strategy to periodically synchronize its state to followers. With the advent of new hardware and high parallelism of transaction processing, the traditional centralized design of logging limits scalability, and the constant trigger condition of replication can not always maintain optimal performance under dynamic workloads.

In this paper, we propose a new log manager named Salmo with scalable logging and adaptive replication for distributed database systems. The scalable logging eliminates centralized contention by utilizing a highly concurrent data structure and speedy log hole tracking. The kernel of adaptive replication is an adaptive log shipping method, which dynamically adjusts the number of log entries transmitted between leader and followers based on the real-time workload. We implemented and evaluated Salmo in the open-sourced transaction processing systems Cedar and DBx1000. Experimental results show that Salmo scales well by increasing the number of working threads, improves peak throughput by 1.56× and reduces latency by more than 4× over log replication of Raft, and maintains efficient and stable performance under dynamic workloads all the time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Stonebraker M. Concurrency control and consistency of multiple copies of data in distributed INGRES. IEEE Transactions on Software Engineering, 1979, SE-5(3): 188–194

    Article  Google Scholar 

  2. Corbett J C, Dean J, Epstein M, Fikes A, Frost C, et al. Spanner: Google’s globally distributed database. ACM Transactions on Computer Systems, 2013, 31(3): 8

    Article  Google Scholar 

  3. Lamport L. The part-time parliament. ACM Transactions on Computer Systems, 1998, 16(2): 133–169

    Article  Google Scholar 

  4. Ongaro D, Ousterhout J. In search of an understandable consensus algorithm. In: Proceedings of 2014 USENIX Conference on USENIX Annual Technical Conference. 2014, 305–320

  5. Gray J, McJones P, Blasgen M, Lindsay B, Lorie R, Price T, Putzolu F, Traiger I. The recovery manager of the system R database manager. ACM Computing Surveys, 1981, 13(2): 223–243

    Article  Google Scholar 

  6. Mohan C, Haderle D, Lindsay B, Pirahesh H, Schwarz P. ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM Transactions on Database Systems, 1992, 17(1): 94–162

    Article  Google Scholar 

  7. Diaconu C, Freedman C, Ismert E, Larson P A, Mittal P, Stonecipher R, Verma N, Zwilling M. Hekaton: SQL server’s memory-optimized OLTP engine. In: Proceedings of 2013 ACM SIGMOD International Conference on Management of Data. 2013, 1243–1254

  8. Levandoski J J, Lomet D B, Sengupta S, Stutsman R, Wang R. High performance transactions in deuteronomy. In: Proceedings of the 7th Biennial Conference on Innovative Data Systems Research. 2015

  9. Lim H, Kaminsky M, Andersen D G. Cicada: dependably fast multicore in-memory transactions. In: Proceedings of 2017 ACM International Conference on Management of Data. 2017, 21–35

  10. Johnson R, Pandis I, Stoica R, Athanassoulis M, Ailamaki A. Aether: a scalable approach to logging. Proceedings of the VLDB Endowment, 2010, 3(1–2): 681–692

    Article  Google Scholar 

  11. Johnson R, Pandis I, Stoica R, Athanassoulis M, Ailamaki A. Scalability of write-ahead logging on multicore and multisocket hardware. The VLDB Journal, 2012, 21(2): 239–263

    Article  Google Scholar 

  12. Kim K, Wang T, Johnson R, Pandis I. ERMIA: fast memory-optimized database system for heterogeneous workloads. In: Proceedings of 2016 International Conference on Management of Data. 2016, 1675–1687

  13. Jung H, Han H, Kang S. Scalable database logging for multicores. Proceedings of the VLDB Endowment, 2017, 11(2): 135–148

    Article  Google Scholar 

  14. Tu S, Zheng W, Kohler E, Liskov B, Madden S. Speedy transactions in multicore in-memory databases. In: Proceedings of the 24th ACM Symposium on Operating Systems Principles. 2013, 18–32

  15. Zheng W, Tu S, Kohler E, Liskov B. Fast databases with fast durability and recovery through multicore parallelism. In: Proceedings of the 11th USENIX conference on Operating Systems Design and Implementation. 2014, 465–477

  16. Kimura H. FOEDUS: OLTP engine for a thousand cores and NVRAM. In: Proceedings of 2015 ACM SIGMOD International Conference on Management of Data. 2015, 691–706

  17. Kim J, Jang H, Son S, Han H, Kang S, Jung H. Border-collie: a wait-free, read-optimal algorithm for database logging on multicore hardware. In: Proceedings of 2019 International Conference on Management of Data. 2019, 723–740

  18. Xia Y, Yu X, Pavlo A, Devadas S. Taurus: lightweight parallel logging for in-memory database management systems. Proceedings of the VLDB Endowment, 2020, 14(2): 189–201

    Article  Google Scholar 

  19. Haubenschild M, Sauer C, Neumann T, Leis V. Rethinking logging, checkpoints, and recovery for high-performance storage engines. In: Proceedings of 2020 ACM SIGMOD International Conference on Management of Data. 2020, 877–892

  20. Zhou H, Guo J, Hu H, Qian W, Zhou X, Zhou A. Plover: parallel logging for replication systems. Frontiers of Computer Science, 2020, 14(4): 144606

    Article  Google Scholar 

  21. Hong C, Zhou D, Yang M, Kuo C, Zhang L, Zhou L. KuaFu: closing the parallelism gap in database replication. In: Proceedings of the 29th International Conference on Data Engineering. 2013, 1186–1195

  22. Poke M, Hoefler T. DARE: high-performance state machine replication on RDMA networks. In: Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing. 2015, 107–118

  23. Qin D, Brown A D, Goel A. Scalable replay-based replication for fast databases. Proceedings of the VLDB Endowment, 2017, 10(13): 2025–2036

    Article  Google Scholar 

  24. Wang T, Johnson R, Pandis I. Query fresh: log shipping on steroids. Proceedings of the VLDB Endowment, 2017, 11(4): 406–419

    Article  Google Scholar 

  25. Guo J, Chu J, Cai P, Zhou M, Zhou A. Low-overhead Paxos replication. Data Science and Engineering, 2017, 2(2): 169–177

    Article  Google Scholar 

  26. Arora V, Mittal T, Agrawal D, El-Abbadi A, Xue X, Zhi Y, Zhu J. Leader or majority: why have one when you can have both? Improving read scalability in raft-like consensus protocols. In: Proceedings of the 9th USENIX Conference on Hot Topics in Cloud Computing. 2017

  27. Cao W, Liu Z, Wang P, Chen S, Zhu C, Zheng S, Wang Y, Ma G. PolarFS: an ultra-low latency and failure resilient distributed file system for shared storage cloud database. Proceedings of the VLDB Endowment, 2018, 11(12): 1849–1862

    Article  Google Scholar 

  28. Zhang Z, Hu H, Yu Y, Qian W, Shu K. Dependency preserved raft for transactions. In: Proceedings of the 25th International Conference of Database Systems for Advanced Applications. 2020, 228–245

  29. Pan Y, Li Y, Zhang C, Zhang R, Hong D. The design and implementation of an efficient order management system based on CEDAR. Journal of East China Normal University (Natural Science), 2018, (3): 88–96

  30. Helland P, Sammer H, Lyon J, Carr R, Garrett P, Reuter A. Group commit timers and high volume transaction systems. In: Proceedings of the 2nd International Workshop on High Performance Transaction Systems. 1987, 301–329

  31. Spiro P M, Joshi A M, Rengarajan T K. Designing an optimized transaction commit protocol. Digital Technical Journal, 1991, 3(1): 1–32

    Google Scholar 

  32. Howard H. ARC: analysis of raft consensus. Technical Report UCAM-CL-TR-857. Cambridge: University of Cambridge, 2014

    Google Scholar 

  33. Huang D, Liu Q, Cui Q, Fang Z, Ma X, Xu F, Shen L, Tang L, Zhou Y, Huang M, Wei W, Liu C, Zhang J, Li J, Wu X, Song L, Sun R, Yu S, Zhao L, Cameron N, Pei L, Tang X. TiDB: a raft-based HTAP database. Proceedings of the VLDB Endowment, 2020, 13(12): 3072–3084

    Article  Google Scholar 

  34. Wang T, Johnson R. Scalable logging through emerging non-volatile memory. Proceedings of the VLDB Endowment, 2014, 7(10): 865–876

    Article  Google Scholar 

  35. Li K, Han F. Memory transaction engine of OceanBase. Journal of East China Normal University (Natural Science), 2014, 5: 149–163

    Google Scholar 

  36. Cooper B F, Silberstein A, Tam E, Ramakrishnan R, Sears R. Benchmarking cloud serving systems with YCSB. In: Proceedings of the 1st ACM Symposium on Cloud Computing. 2010, 143–154

  37. Zhu T, Zhao Z, Li F, Qian W, Zhou A, Xie D, Stutsman R, Li H, Hu H. SolarDB: toward a shared-everything database on distributed log-structured storage. ACM Transaction on Storage, 2019, 15(2): 11

    Google Scholar 

  38. Gray J. Notes on data base operating systems. In: Proceedings of Operating Systems, an Advanced Course. 1978, 393–481

  39. Harizopoulos S, Abadi D J, Madden S, Stonebraker M. OLTP through the looking glass, and what we found there. In: Proceedings of 2008 ACM SIGMOD International Conference on Management of Data. 2008, 981–992

  40. Huang J, Schwan K, Qureshi M K. NVRAM-aware logging in transaction systems. Proceedings of the VLDB Endowment, 2014, 8(4): 389–400

    Article  Google Scholar 

  41. Arulraj J, Perron M, Pavlo A. Write-behind logging. Proceedings of the VLDB Endowment, 2016, 10(4): 337–348

    Article  Google Scholar 

  42. Hagmann R. Reimplementing the cedar file system using logging and group commit. In: Proceedings of the Eleventh ACM Symposium on Operating Systems Principles. 1987, 155–162

  43. Lamport L. Paxos made simple, fast, and byzantine. In: Proceedings of the 6th International Conference on Principles of Distributed Systems. 2002, 7–9

  44. Lamport L. Fast Paxos. Distributed Computing, 2006, 19(2): 79–103

    Article  MathSciNet  Google Scholar 

  45. Burrows M. The Chubby lock service for loosely-coupled distributed systems. In: Proceedings of the 7th Symposium on Operating Systems Design and Implementation. 2006, 335–350

  46. Baker J, Bond C, Corbett J C, Furman J J, Khorlin A, Larson J, Leon J M, Li Y, Lloyd A, Yushprakh V. Megastore: providing scalable, highly available storage for interactive services. In: Proceedings of the 5th Biennial Conference on Innovative Data Systems Research. 2011, 223–234

  47. Shute J, Vingralek R, Samwel B, Handy B, Whipkey C, Rollins E, Oancea M, Littlefield K, Menestrina D, Ellner S, Cieslewicz J, Rae I, Stancescu T, Apte H. F1: a distributed SQL database that scales. Proceedings of the VLDB Endowment, 2013, 6(11): 1068–1079

    Article  Google Scholar 

  48. Zheng J, Lin Q, Xu J, Wei C, Zeng C, Yang P, Zhang Y. PaxosStore: high-availability storage made practical in WeChat. Proceedings of the VLDB Endowment, 2017, 10(12): 1730–1741

    Article  Google Scholar 

  49. Rao J, Shekita E J, Tata S. Using Paxos to build a scalable, consistent, and highly available datastore. Proceedings of the VLDB Endowment, 2011, 4(4): 243–254

    Article  Google Scholar 

  50. Ousterhout J, Agrawal P, Erickson D, Kozyrakis C, Leverich J, Mazières D, Mitra S, Narayanan A, Parulkar G, Rosenblum M, Rumble S M, Stratmann E, Stutsman R. The case for RAMClouds: scalable high-performance storage entirely in DRAM. ACM SIGOPS Operating Systems Review, 2010, 43(4): 92–105

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant Nos. 62002119, 61977026, 62072180, and 61772202) supported by the Fundamental Research Funds for the Central Universities, Southwest Minzu University (2021PTJS23), and supported by the Open Fund of Shanghai Engineering Research Center on Big Data Management System. We also thank FCS reviewers for valuable comments which greatly helped to improve this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huan Zhou.

Additional information

Huan Zhou is a lecturer of the School of Computer Science and Engineering, Southwest Minzu University (SWU), China. She received her BS in computer science and technology from Sichuan Normal University (SICNU), China in 2013 and her PhD in software engineering from East China Normal University (ECNU), China in 2019. Before she joined SWU in 2021, she had worked as a postdoctoral researcher in the School of Data Science and Engineering, ECNU, China. Her research interests include database management system and data mining.

Weining Qian is a professor and dean of the School of Data Science and Engineering, East China Normal University (ECNU), China. He received his BS, MS and PhD in computer science from Fudan University, China in 1998, 2001 and 2004, respectively. He is now serving as a standing committee member of Technical Committee on Databases of China Computer Federation (CCF), a member of Expert Group on Artificial Intelligence Science and Technology Innovation of MoE and a vice president of Data Science and Knowledge Systems Engineering Committee of Systems Engineering Society of China (SESC). His research interests include scalable transaction processing, benchmarking on big data systems, big data analysis and processing, big data application and computational education.

Xuan Zhou is a professor and a vice dean of the School of Data Science and Engineering, East China Normal University (ECNU), China. He obtained his BS from Fudan University, China in 2001 and his PhD from the National University of Singapore, Singapore in 2005, both in computer science. Since his graduation, he had worked as a scientist at the L3S Research Centre (Germany) and the CSIRO ICT Centre (Australia) until the end of 2010. Before he joined ECNU in 2017, he spent six years in Renmin University of China, as an associate professor. He is the winner of the Program for New Century Excellent Talents in University of Ministry of Education (MoE). His research interests include database system and information retrieval.

Qiwen Dong is a professor of the School of Data Science and Engineering, East China Normal University (ECNU), China. He received his BS and MS in aerospace engineering and mechanics from Harbin Institute of Technology, China in 2000 and 2002, respectively, and his PhD in computer science from Harbin Institute of Technology, China in 2008. He was a postdoctoral researcher in the School of Computer Science of Fudan University, China from 2008 to 2010, and subsequently served as an associate professor there until 2016. In the meantime, he visited the University of Michigan, USA. His research interests include bioinformatics, data mining, big-data. etc.

Aoying Zhou, Vice President of East China Normal University (ECNU), Vice President of Guizhou University (GZU), Founding Dean of School of Data Science and Engineering (DaSE) ECNU, Professor. He got his BS and MS in computer science from Sichuan University, China in 1985 and 1988, respectively, and he won his PhD from Fudan University, China in 1993. He is the winner of the National Science Fund for Distinguished Young Scholars supported by the National Natural Science Foundation of China (NSFC). He is CCF (China Computer Federation) Fellow and Associate Editor-in-Chief of Chinese Journal of Computer. He served General Chair of ER’2004, Vice PC Chair of ICDE’2009 and ICDE’2012, PC Co-chair of VLDB’2014. His research interests include Web data management, data management for data-intensive computing, inmemory cluster computing, distributed transaction processing, benchmarking for big data and performance.

Wenrong Tan is a professor and dean of the School of Computer Science and Engineering, Southwest Minzu University (SWU), China. She received her BS from Sichuan University, China in 1988 and her MS from Chongqing University, China in 1991, both in computer science. She is now serving as a vice president of Sichuan Province Computer Federation and a vice president of Sichuan Institute of Artificial Intelligence. Her research interests include internet of things and natural language processing.

Electronic Supplementary Material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, H., Qian, W., Zhou, X. et al. Scalable and adaptive log manager in distributed systems. Front. Comput. Sci. 17, 172205 (2023). https://doi.org/10.1007/s11704-022-1357-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11704-022-1357-5

Keywords

Navigation