Neuroshard: towards automatic multi-objective sharding with deep reinforcement learning
Article No.: 1, Pages 1 - 12
Abstract
Large databases whose data does not fit on a single server need to shard their rows across multiple different database instances. Distributed transactions are significantly more expensive than local transactions, so a popular approach is to collect a trace of past accesses to the database and model it as a graph (or a hypergraph), and solve an NP-Hard partitioning problem with an objective of minimizing the fanout, or the number of database instances that need to participate in each query. Due to the large amount of data that needs to be sharded, this problem cannot be solved optimally, and therefore, databases use heuristic partitioning algorithms, which can be fairly effective in practice. However, fanout is only one objective that affects performance. Other important objectives include load balancing, which ensures that no single database instance becomes too overloaded, or equalizing the write traffic for each database to avoid lock contention and I/O amplification. Designing heuristics for more than one objective is difficult and error-prone.
We present Neuroshard, the first system that learns shard assignments directly from the workload, and optimizes for multiple sharding objectives simultaneously. Neuroshard represents past queries as a neural hypergraph, and uses Deep Reinforcement Learning with Multi-Task learning to generate a learned partitioner that is able to optimize for multiple objectives in parallel. We implement Neuroshard on a distributed database that uses MariaDB, and got very promising initial results showing that this approach can achieve our versatility and scalability goals, in contrast to baseline approaches that optimize for only one objective which can work well in one context but perform poorly in another.
References
[1]
2021. MariaDB Server: The open source relational database. https://mariadb.org/.
[2]
Kenshin Abe, Zijian Xu, Issei Sato, and Masashi Sugiyama. 2019. Solving NP-Hard Problems on Graphs by Reinforcement Learning without Domain Knowledge. arXiv preprint arXiv:1905.11623 (2019).
[3]
Konstantin Andreev and Harald Racke. 2006. Balanced graph partitioning. Theory of Computing Systems 39, 6 (2006), 929--939.
[4]
Thomas D Barrett, William R Clements, Jakob N Foerster, and AI Lvovsky. 2019. Exploratory Combinatorial Optimization with Reinforcement Learning. arXiv preprint arXiv:1909.04063 (2019).
[5]
Irwan Bello, Hieu Pham, Quoc Le, Mohammad Norouzi, and Samy Bengio. 2017. Neural Combinatorial Optimization with Reinforcement Learning. In ICLR (Workshop).
[6]
Philip A. Bernstein and Eric Newcomer. 2009. Principles of transaction processing. Morgan Kaufmann. 330--x336 pages.
[7]
Gabriel Campero Durand, Rufat Piriyev, Marcus Pinnecke, David Broneske, Balasubramanian Gurumurthy, and Gunter Saake. 2019. Automated Vertical Partitioning with Deep Reinforcement Learning. In New Trends in Databases and Information Systems. 126--134.
[8]
Surajit Chaudhuri and Vivek Narasayya. 2007. Self-Tuning Database Systems: A Decade of Progress. In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB '07). 3âĂŞ14.
[9]
Li Chen, Justinas Lingys, Kai Chen, and Feng Liu. 2018. Auto: Scaling deep reinforcement learning for datacenter-scale automatic traffic optimization. In Proceedings of the 2018 conference of the ACM special interest group on data communication. 191--205.
[10]
Zhao Chen, Vijay Badrinarayanan, Chen-Yu Lee, and Andrew Rabinovich. 2018. GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks. In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research), Vol. 80. 794--803.
[11]
Sandeep Chinchali, Pan Hu, Tianshu Chu, Manu Sharma, Manu Bansal, Rakesh Misra, Marco Pavone, and Sachin Katti. 2018. Cellular network traffic scheduling with deep reinforcement learning. In Thirty-second AAAI conference on artificial intelligence.
[12]
Carlo Curino, Evan Philip Charles Jones, Yang Zhang, and Samuel R Madden. 2010. Schism: a workload-driven approach to database replication and partitioning. (2010).
[13]
Hanjun Dai, Elias Khalil, Yuyu Zhang, Bistra Dilkina, and Le Song. 2017. Learning combinatorial optimization algorithms over graphs. In Advances in Neural Information Processing Systems. 6348--6358.
[14]
Karen Devine, Erik Boman, Robert Heaphy, Bruce Hendrickson, and Courtenay Vaughan. 2002. Zoltan data management services for parallel dynamic applications. Computing in Science & Engineering 4, 2 (2002), 90--96.
[15]
David J DeWitt, Shahram Ghandeharizadeh, Donovan A Schneider, Allan Bricker, Hui-I Hsiao, and Rick Rasmussen. 1990. The Gamma database machine project. (1990).
[16]
Gabriel Campero Durand, Marcus Pinnecke, Rufat Piriyev, Mahmoud Mohsen, David Broneske, Gunter Saake, Maya S. Sekeran, Fabián Rodriguez, and Laxmi Balami. 2018. GridFormation: Towards Self-Driven Online Data Partitioning Using Reinforcement Learning. In Proceedings of the First International Workshop on Exploiting Artificial Intelligence Techniques for Data Management (aiDM'18).
[17]
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press. http://www.deeplearningbook.org.
[18]
Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Advances in neural information processing systems. 1024--1034.
[19]
Masatoshi Hanai, Toyotaro Suzumura, Wen Jun Tan, Elvis Liu, Georgios Theodoropoulos, and Wentong Cai. 2019. Distributed Edge Partitioning for Trillion-Edge Graphs. Proc. VLDB Endow. (Sept. 2019), 2379âĂŞ2392.
[20]
Benjamin Hilprecht, Carsten Binnig, and Uwe Röhm. 2020. Learning a Partitioning Advisor for Cloud Databases. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (SIGMOD '20). 143ÂĂŞ157.
[21]
Sepp Hochreiter and JÃijrgen Schmidhuber. 1997. Long Short-Term Memory. Neural Computation (1997).
[22]
Igor Kabiljo, Brian Karrer, Mayank Pundir, Sergey Pupyrev, Alon Shalita, Alessandro Presta, and Yaroslav Akhremtsev. 2017. Social hash partitioner: a scalable distributed hypergraph partitioner. arXiv preprint arXiv:1707.06665 (2017).
[23]
George Karypis. 1998. hMETIS 1.5: A hypergraph partitioning package. Technical Report (1998).
[24]
George Karypis and Vipin Kumar. 1997. METIS: A software package for partitioning unstructured graphs, partitioning meshes, and computing fill-reducing orderings of sparse matrices. (1997).
[25]
Guoliang Li, Xuanhe Zhou, Shifu Li, and Bo Gao. 2019. QTune: A Query-Aware Database Tuning System with Deep Reinforcement Learning. Proc. VLDB Endow. (Aug. 2019), 2118ÂĂŞ2130.
[26]
Zhuwen Li, Qifeng Chen, and Vladlen Koltun. 2018. Combinatorial optimization with graph convolutional networks and guided tree search. In Advances in Neural Information Processing Systems. 539--548.
[27]
Hongzi Mao, Mohammad Alizadeh, Ishai Menache, and Srikanth Kandula. 2016. Resource management with deep reinforcement learning. In Proceedings of the 15th ACM workshop on hot topics in networks. 50--56.
[28]
Hongzi Mao, Ravi Netravali, and Mohammad Alizadeh. 2017. Neural adaptive video streaming with pensieve. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication. 197--210.
[29]
Hongzi Mao, Malte Schwarzkopf, Shaileshh Bojja Venkatakrishnan, Zili Meng, and Mohammad Alizadeh. 2019. Learning scheduling algorithms for data processing clusters. In Proceedings of the ACM Special Interest Group on Data Communication. 270--288.
[30]
Paolo Massa and Paolo Avesani. 2005. Controversial users demand local trust metrics: an experimental study on epinions.com community. AAAIÂĂŹ05 (2005).
[31]
Yoshinori Matsunobu, Siying Dong, and Herman Lee. 2020. MyRocks: LSM-tree database storage engine serving Facebook's social graph. Proceedings of the VLDB Endowment 13, 12 (2020), 3217--3230.
[32]
Christian Mayer, Ruben Mayer, Sukanya Bhowmik, Lukas Epple, and Kurt Rothermel. 2018. Hype: Massive hypergraph partitioning with neighborhood expansion. In 2018 IEEE International Conference on Big Data (Big Data). IEEE, 458--467.
[33]
Azalia Mirhoseini, Hieu Pham, Quoc V Le, Benoit Steiner, Rasmus Larsen, Yuefeng Zhou, Naveen Kumar, Mohammad Norouzi, Samy Bengio, and Jeff Dean. 2017. Device placement optimization with reinforcement learning. In International Conference on Machine Learning. PMLR, 2430--2439.
[34]
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. 2015. Human-level control through deep reinforcement learning. Nature 518 (2015), 529--533.
[35]
George E. Monahan. 1982. State of the art - a survey of partially observable markov decision processes: theory, models, and algorithms. Management Science (1982).
[36]
Abdul Quamar, K. Ashwin Kumar, and Amol Deshpande. 2013. SWORD: Scalable Workload-Aware Data Placement for Transactional Workloads. In Proceedings of the 16th International Conference on Extending Database Technology (Genoa, Italy) (EDBT '13). 430ÂĂŞ441.
[37]
John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. 2015. Trust Region Policy Optimization. In Proceedings of the 32nd International Conference on Machine Learning (Proceedings of Machine Learning Research), Vol. 37. 1889--1897.
[38]
Marco Serafini, Rebecca Taft, Aaron J Elmore, Andrew Pavlo, Ashraf Aboulnaga, and Michael Stonebraker. 2016. Clay: fine-grained adaptive partitioning for general database schemas. Proceedings of the VLDB Endowment 10, 4 (2016), 445--456.
[39]
Jeff Shute, Radek Vingralek, Bart Samwel, Ben Handy, Chad Whipkey, Eric Rollins, Mircea Oancea, Kyle Littlefield, David Menestrina, Stephan Ellner, John Cieslewicz, Ian Rae, Traian Stancescu, and Himani Apte. 2013. F1: A Distributed SQL Database That Scales. Proc. VLDB Endow. 6, 11 (Aug. 2013), 1068ÂĂŞ1079.
[40]
David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis. 2016. Mastering the Game of Go with Deep Neural Networks and Tree Search. Nature 529, 7587 (Jan. 2016), 484--489.
[41]
Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning. MIT Press.
[42]
Richard S. Sutton, David McAllester, Satinder Singh, and Yishay Mansour. 1999. Policy Gradient Methods for Reinforcement Learning with Function Approximation. In Proceedings of the 12th International Conference on Neural Information Processing Systems. 1057ÂĂŞ1063.
[43]
Rebecca Taft, Essam Mansour, Marco Serafini, Jennie Duggan, Aaron J Elmore, Ashraf Aboulnaga, Andrew Pavlo, and Michael Stonebraker. 2014. E-store: Finegrained elastic partitioning for distributed transaction processing systems. Proceedings of the VLDB Endowment 8, 3 (2014), 245--256.
[44]
Dana Van Aken, Andrew Pavlo, Geoffrey J. Gordon, and Bohan Zhang. 2017. Automatic Database Management System Tuning Through Large-Scale Machine Learning. In Proceedings of the 2017 ACM International Conference on Management of Data (SIGMOD '17). 1009ÂĂŞ1024.
[45]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Uszkoreit Jakob, Llion Jones, Aidan N. Gomez, ÅΑukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in neural information processing systems.
[46]
Haonan Wang, Hao He, Mohammad Alizadeh, and Hongzi Mao. 2019. Learning Caching Policies with Subsampling. In NeurIPS Machine Learning for Systems Workshop.
[47]
Chenzi Zhang, Fan Wei, Qin Liu, Zhihao Gavin Tang, and Zhenguo Li. 2017. Graph edge partitioning via neighborhood heuristic. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 605--614.
[48]
Ji Zhang, Yu Liu, Ke Zhou, Guoliang Li, Zhili Xiao, Bin Cheng, Jiashu Xing, Yangtao Wang, Tianheng Cheng, Li Liu, Minwei Ran, and Zekang Li. 2019. An End-to-End Automatic Cloud Database Tuning System Using Deep Reinforcement Learning. In Proceedings of the 2019 International Conference on Management of Data (SIGMOD '19). 415ÂĂŞ432.
[49]
Jie Zhou, Ganqu Cui, Shengding Hu, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, Lifeng Wang, Changcheng Li, and Maosong Sun. 2020. Graph neural networks: A review of methods and applications. AI Open 1 (2020), 57--81.
[50]
Jia Zou, Amitabh Das, Pratik Barhate, Arun Iyengar, Binhang Yuan, Dimitrije Jankov, and Chris Jermaine. 2021. Lachesis: Automated Partitioning for UDF-Centric Analytics. Proc. VLDB Endow. 14, 8 (2021), 1262--1275.
Information & Contributors
Information
Published In
June 2022
53 pages
ISBN:9781450393775
DOI:10.1145/3533702
- Conference Chairs:
- Rajesh Bordawekar,
- Oded Shmueli,
- Yael Amsterdamer,
- Donatella Firmani,
- Ryan Marcus
Copyright © 2022 ACM.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
Published: 11 August 2022
Check for updates
Qualifiers
- Research-article
Funding Sources
- DiDi Faculty Research Award
- J.P. Morgan Faculty Research Award
- NSF
- ONR
- ARO
- Accenture Research Award
Conference
SIGMOD/PODS '22
Sponsor:
SIGMOD/PODS '22: International Conference on Management of Data
June 17, 2022
Pennsylvania, Philadelphia
Acceptance Rates
Overall Acceptance Rate 19 of 26 submissions, 73%
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 342Total Downloads
- Downloads (Last 12 months)145
- Downloads (Last 6 weeks)19
Reflects downloads up to 01 Mar 2025
Other Metrics
Citations
View Options
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in