Optimizing Inter-data-center Large-Scale Database Parallel Replication with Workload-Driven Partitioning

Gao, Zhen; Min, Hong; Li, Xiao; Huang, Jie; Jin, Yi; Lei, An; Bourbonnais, Serge; Zheng, Miao; Fuh, Gene

doi:10.1007/978-3-662-49214-7_6

Optimizing Inter-data-center Large-Scale Database Parallel Replication with Workload-Driven Partitioning

Zhen Gao²¹,
Hong Min¹⁹,
Xiao Li²⁰,
Jie Huang²¹,
Yi Jin²²,
An Lei²¹,
Serge Bourbonnais²⁰,
Miao Zheng²³ &
…
Gene Fuh²³

Chapter
First Online: 07 January 2016

617 Accesses

Part of the book series: Lecture Notes in Computer Science ((TLDKS,volume 9510))

Abstract

Geographically distributed data centers are deployed for non-stop business operations by many enterprises. In case of disastrous events, ongoing workloads must be failed over from the current data center to another active one within just a few seconds to achieve continuous service availability. Software-based parallel database replication techniques are designed to meet very high throughput with near-real-time latency. Understanding workload characteristics is one of the key factors for improving replication performance. In this paper, we propose a workload-driven method to optimize database replication latency and minimize transaction splits with a minimum of parallel replication consistency groups. Our two-phased approach includes (1) a log-based mechanism for workload pattern discovery; (2) a history-based algorithm on pattern analysis, database partitioning and partition adjustment. The experimental results from a real banking batch workload and a benchmark OLTP workload demonstrate the effectiveness of the solution even for partitioning 1000 s of database tables in very large workloads. Finally, the algorithm to automate the cyclic flow of workload profile capturing and partitioning readjustment is developed and verified.

Y. Jin—Work done while employed by IBM.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Cecchet, E., Candea, G., Ailamaki, A.: Middleware-based database replication: the gaps between theory and practice. In: SIGMOD (2008)
Google Scholar
Codd, E.F.: The Relational Model for Database Management, Version 2. Addison-Wesley, New York (1990). ISBN: 9780201141924
MATH Google Scholar
Corbett, J.C., et al.: Spanner: Google’s globally-distributed database. In: OSDI (2012)
Google Scholar
Curino, C., Jones, E., Zhang, Y., Madden, S.: Schism: a workload-driven approach to database replication and partitioning. Proc. VLDB 3, 48–57 (2010)
Article Google Scholar
DeCusatis, C.: Handbook of Fiber Optic Data Communication: A Practical Guide to Optical Networking, 4th edn. Academic Press, London (2013). ISBN: 10 0124016731
Google Scholar
Fiduccia, C.M., Mattheyses, R.M.: A linear-time heuristic for improving network partitions. In: Proceedings of the 19th Design Automation Conference, pp. 175–181, January 1982
Google Scholar
Garey, M.R., Johnson, D.S.: Computers and Intractability; A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York (1990)
Google Scholar
Graham, R.L.: Bounds on multiprocessing anomalies and related packing algorithms. In: AFIPS Spring Joint Computing Conference, pp. 205–217 (1972)
Google Scholar
Gray, J., Helland, P., O’Neil, P., Shasha, D.: The dangers of replication and a solution. In: SIGMOD (1996)
Google Scholar
Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20(1), 359–392 (1998)
Article MathSciNet Google Scholar
Karypis, G., Kumar, V.: Multilevel algorithms for multi-constraint graph partitioning. In: Proceedings of the 1998 ACM/IEEE Conference on Supercomputing (1998)
Google Scholar
Kemme, B., Jiménez-Peris, R., Patiño-Martínez, M.: Database replication. Synth. Lect. Data Manag. 5, 1–153 (2010). Morgan & Claypool Publishers
Article Google Scholar
Kernighan, B.W., Lin, S.: An efficient heuristic procedure for partitioning graphs. Bell Syst. Techn. J. 49, 291–307 (1970)
Article MATH Google Scholar
Lin, Y., Kemme, B., Patiño-Martínez, M., Jiménez-Peris, R.: Middleware based data replication providing snapshot isolation. In: SIGMOD (2005)
Google Scholar
Patiño-Martínez, M., Jiménez-Peris, R., Kemme, B., Alonso, G.: MIDDLE-R: consistent database replication at the middleware level. ACM TOCS 23(4), 375–423 (2005)
Article Google Scholar
Pavlo, A., Curino, C., Zdonik, S.B.: Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems. In: SIGMOD (2012)
Google Scholar
Pothen, A., Simon, H.D., Liou, K.: Partitioning sparse matrices with eigenvectors of graphs. SIAM J. Matrix Anal. Appl. 11(3), 430–452 (1990)
Article MATH MathSciNet Google Scholar
Quamar, A., Kumar, K.A., Deshpande, A.: SWORD: scalable workload-aware data placement for transactional workloads. In: EDBT (2013)
Google Scholar
Serrano, D., Patiño-Martínez, M., Jiménez-Peris, R., Kemme, B.: Boosting database replication scalability through partial replication and 1-copy-snapshot-isolation. In: Proceedings of the 13th PRDC (2007)
Google Scholar
Stonebraker, M.: The Case for Shared Nothing. IEEE Database Eng. Bull. 9(1), 4–9 (1986)
Google Scholar
http://glaros.dtc.umn.edu/gkhome/views/metis
IBM Infosphere Data Replication. http://www-03.ibm.com/software/
Oracle GoldenGate. http://www.oracle.com/technetwork/middleware/goldengate/
http://www.tpc.org/tpce/

Download references

Acknowledgements

We would like to thank Austin D’Costa and James Z. Teng for their insights.

Author information

Authors and Affiliations

IBM T. J. Watson Research Center, Yorktown Heights, NY, USA
Hong Min
IBM Silicon Valley Lab, San Jose, CA, USA
Xiao Li & Serge Bourbonnais
School of Software Engineering, Tongji University, Shanghai, China
Zhen Gao, Jie Huang & An Lei
Pivotal Inc., Beijing, China
Yi Jin
IBM System and Technology Group, New York, USA
Miao Zheng & Gene Fuh

Authors

Zhen Gao
View author publications
You can also search for this author in PubMed Google Scholar
Hong Min
View author publications
You can also search for this author in PubMed Google Scholar
Xiao Li
View author publications
You can also search for this author in PubMed Google Scholar
Jie Huang
View author publications
You can also search for this author in PubMed Google Scholar
Yi Jin
View author publications
You can also search for this author in PubMed Google Scholar
An Lei
View author publications
You can also search for this author in PubMed Google Scholar
Serge Bourbonnais
View author publications
You can also search for this author in PubMed Google Scholar
Miao Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Gene Fuh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Zhen Gao or Hong Min .

Editor information

Editors and Affiliations

IRIT, Paul Sabatier University, Toulouse, France
Abdelkader Hameurlain
FAW, University of Linz, Linz, Austria
Josef Küng
FAW, University of Linz, Linz, Austria
Roland Wagner
Universidad Politécnica de Valencia, Valencia, Spain
Hendrik Decker
Czech Technical University, Prague, Czech Republic
Lenka Lhotska
University of Auckland, Auckland, New Zealand
Sebastian Link

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Gao, Z. et al. (2016). Optimizing Inter-data-center Large-Scale Database Parallel Replication with Workload-Driven Partitioning. In: Hameurlain, A., Küng, J., Wagner, R., Decker, H., Lhotska, L., Link, S. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XXIV. Lecture Notes in Computer Science(), vol 9510. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-49214-7_6

Download citation

DOI: https://doi.org/10.1007/978-3-662-49214-7_6
Published: 07 January 2016
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-49213-0
Online ISBN: 978-3-662-49214-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics