research-article

Public Access

Neuroshard: towards automatic multi-objective sharding with deep reinforcement learning

Authors:

Zhengneng Chen,

Junfeng YangAuthors Info & Claims

aiDM '22: Proceedings of the Fifth International Workshop on Exploiting Artificial Intelligence Techniques for Data Management

Article No.: 1, Pages 1 - 12

https://doi.org/10.1145/3533702.3534908

Published: 11 August 2022 Publication History

Abstract

Large databases whose data does not fit on a single server need to shard their rows across multiple different database instances. Distributed transactions are significantly more expensive than local transactions, so a popular approach is to collect a trace of past accesses to the database and model it as a graph (or a hypergraph), and solve an NP-Hard partitioning problem with an objective of minimizing the fanout, or the number of database instances that need to participate in each query. Due to the large amount of data that needs to be sharded, this problem cannot be solved optimally, and therefore, databases use heuristic partitioning algorithms, which can be fairly effective in practice. However, fanout is only one objective that affects performance. Other important objectives include load balancing, which ensures that no single database instance becomes too overloaded, or equalizing the write traffic for each database to avoid lock contention and I/O amplification. Designing heuristics for more than one objective is difficult and error-prone.

We present Neuroshard, the first system that learns shard assignments directly from the workload, and optimizes for multiple sharding objectives simultaneously. Neuroshard represents past queries as a neural hypergraph, and uses Deep Reinforcement Learning with Multi-Task learning to generate a learned partitioner that is able to optimize for multiple objectives in parallel. We implement Neuroshard on a distributed database that uses MariaDB, and got very promising initial results showing that this approach can achieve our versatility and scalability goals, in contrast to baseline approaches that optimize for only one objective which can work well in one context but perform poorly in another.

References

[1]

2021. MariaDB Server: The open source relational database. https://mariadb.org/.

[2]

Kenshin Abe, Zijian Xu, Issei Sato, and Masashi Sugiyama. 2019. Solving NP-Hard Problems on Graphs by Reinforcement Learning without Domain Knowledge. arXiv preprint arXiv:1905.11623 (2019).

[3]

Konstantin Andreev and Harald Racke. 2006. Balanced graph partitioning. Theory of Computing Systems 39, 6 (2006), 929--939.

Digital Library

[4]

Thomas D Barrett, William R Clements, Jakob N Foerster, and AI Lvovsky. 2019. Exploratory Combinatorial Optimization with Reinforcement Learning. arXiv preprint arXiv:1909.04063 (2019).

[5]

Irwan Bello, Hieu Pham, Quoc Le, Mohammad Norouzi, and Samy Bengio. 2017. Neural Combinatorial Optimization with Reinforcement Learning. In ICLR (Workshop).

[6]

Philip A. Bernstein and Eric Newcomer. 2009. Principles of transaction processing. Morgan Kaufmann. 330--x336 pages.

[7]

Gabriel Campero Durand, Rufat Piriyev, Marcus Pinnecke, David Broneske, Balasubramanian Gurumurthy, and Gunter Saake. 2019. Automated Vertical Partitioning with Deep Reinforcement Learning. In New Trends in Databases and Information Systems. 126--134.

[8]

Surajit Chaudhuri and Vivek Narasayya. 2007. Self-Tuning Database Systems: A Decade of Progress. In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB '07). 3âĂŞ14.

Digital Library

[9]

Li Chen, Justinas Lingys, Kai Chen, and Feng Liu. 2018. Auto: Scaling deep reinforcement learning for datacenter-scale automatic traffic optimization. In Proceedings of the 2018 conference of the ACM special interest group on data communication. 191--205.

Digital Library

[10]

Zhao Chen, Vijay Badrinarayanan, Chen-Yu Lee, and Andrew Rabinovich. 2018. GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks. In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research), Vol. 80. 794--803.

[11]

Sandeep Chinchali, Pan Hu, Tianshu Chu, Manu Sharma, Manu Bansal, Rakesh Misra, Marco Pavone, and Sachin Katti. 2018. Cellular network traffic scheduling with deep reinforcement learning. In Thirty-second AAAI conference on artificial intelligence.

[12]

Carlo Curino, Evan Philip Charles Jones, Yang Zhang, and Samuel R Madden. 2010. Schism: a workload-driven approach to database replication and partitioning. (2010).

[13]

Hanjun Dai, Elias Khalil, Yuyu Zhang, Bistra Dilkina, and Le Song. 2017. Learning combinatorial optimization algorithms over graphs. In Advances in Neural Information Processing Systems. 6348--6358.

[14]

Karen Devine, Erik Boman, Robert Heaphy, Bruce Hendrickson, and Courtenay Vaughan. 2002. Zoltan data management services for parallel dynamic applications. Computing in Science & Engineering 4, 2 (2002), 90--96.

Digital Library

[15]

David J DeWitt, Shahram Ghandeharizadeh, Donovan A Schneider, Allan Bricker, Hui-I Hsiao, and Rick Rasmussen. 1990. The Gamma database machine project. (1990).

[16]

Gabriel Campero Durand, Marcus Pinnecke, Rufat Piriyev, Mahmoud Mohsen, David Broneske, Gunter Saake, Maya S. Sekeran, Fabián Rodriguez, and Laxmi Balami. 2018. GridFormation: Towards Self-Driven Online Data Partitioning Using Reinforcement Learning. In Proceedings of the First International Workshop on Exploiting Artificial Intelligence Techniques for Data Management (aiDM'18).

Digital Library

[17]

Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press. http://www.deeplearningbook.org.

Digital Library

[18]

Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Advances in neural information processing systems. 1024--1034.

[19]

Masatoshi Hanai, Toyotaro Suzumura, Wen Jun Tan, Elvis Liu, Georgios Theodoropoulos, and Wentong Cai. 2019. Distributed Edge Partitioning for Trillion-Edge Graphs. Proc. VLDB Endow. (Sept. 2019), 2379âĂŞ2392.

Digital Library

[20]

Benjamin Hilprecht, Carsten Binnig, and Uwe Röhm. 2020. Learning a Partitioning Advisor for Cloud Databases. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (SIGMOD '20). 143ÂĂŞ157.

Digital Library

[21]

Sepp Hochreiter and JÃijrgen Schmidhuber. 1997. Long Short-Term Memory. Neural Computation (1997).

[22]

Igor Kabiljo, Brian Karrer, Mayank Pundir, Sergey Pupyrev, Alon Shalita, Alessandro Presta, and Yaroslav Akhremtsev. 2017. Social hash partitioner: a scalable distributed hypergraph partitioner. arXiv preprint arXiv:1707.06665 (2017).

[23]

George Karypis. 1998. hMETIS 1.5: A hypergraph partitioning package. Technical Report (1998).

[24]

George Karypis and Vipin Kumar. 1997. METIS: A software package for partitioning unstructured graphs, partitioning meshes, and computing fill-reducing orderings of sparse matrices. (1997).

[25]

Guoliang Li, Xuanhe Zhou, Shifu Li, and Bo Gao. 2019. QTune: A Query-Aware Database Tuning System with Deep Reinforcement Learning. Proc. VLDB Endow. (Aug. 2019), 2118ÂĂŞ2130.

Digital Library

[26]

Zhuwen Li, Qifeng Chen, and Vladlen Koltun. 2018. Combinatorial optimization with graph convolutional networks and guided tree search. In Advances in Neural Information Processing Systems. 539--548.

[27]

Hongzi Mao, Mohammad Alizadeh, Ishai Menache, and Srikanth Kandula. 2016. Resource management with deep reinforcement learning. In Proceedings of the 15th ACM workshop on hot topics in networks. 50--56.

Digital Library

[28]

Hongzi Mao, Ravi Netravali, and Mohammad Alizadeh. 2017. Neural adaptive video streaming with pensieve. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication. 197--210.

Digital Library

[29]

Hongzi Mao, Malte Schwarzkopf, Shaileshh Bojja Venkatakrishnan, Zili Meng, and Mohammad Alizadeh. 2019. Learning scheduling algorithms for data processing clusters. In Proceedings of the ACM Special Interest Group on Data Communication. 270--288.

[30]

Paolo Massa and Paolo Avesani. 2005. Controversial users demand local trust metrics: an experimental study on epinions.com community. AAAIÂĂ&Zacute;05 (2005).

[31]

Yoshinori Matsunobu, Siying Dong, and Herman Lee. 2020. MyRocks: LSM-tree database storage engine serving Facebook's social graph. Proceedings of the VLDB Endowment 13, 12 (2020), 3217--3230.

Digital Library

[32]

Christian Mayer, Ruben Mayer, Sukanya Bhowmik, Lukas Epple, and Kurt Rothermel. 2018. Hype: Massive hypergraph partitioning with neighborhood expansion. In 2018 IEEE International Conference on Big Data (Big Data). IEEE, 458--467.

[33]

Azalia Mirhoseini, Hieu Pham, Quoc V Le, Benoit Steiner, Rasmus Larsen, Yuefeng Zhou, Naveen Kumar, Mohammad Norouzi, Samy Bengio, and Jeff Dean. 2017. Device placement optimization with reinforcement learning. In International Conference on Machine Learning. PMLR, 2430--2439.

[34]

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. 2015. Human-level control through deep reinforcement learning. Nature 518 (2015), 529--533.

[35]

George E. Monahan. 1982. State of the art - a survey of partially observable markov decision processes: theory, models, and algorithms. Management Science (1982).

[36]

Abdul Quamar, K. Ashwin Kumar, and Amol Deshpande. 2013. SWORD: Scalable Workload-Aware Data Placement for Transactional Workloads. In Proceedings of the 16th International Conference on Extending Database Technology (Genoa, Italy) (EDBT '13). 430ÂĂŞ441.

Digital Library

[37]

John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. 2015. Trust Region Policy Optimization. In Proceedings of the 32nd International Conference on Machine Learning (Proceedings of Machine Learning Research), Vol. 37. 1889--1897.

[38]

Marco Serafini, Rebecca Taft, Aaron J Elmore, Andrew Pavlo, Ashraf Aboulnaga, and Michael Stonebraker. 2016. Clay: fine-grained adaptive partitioning for general database schemas. Proceedings of the VLDB Endowment 10, 4 (2016), 445--456.

Digital Library

[39]

Jeff Shute, Radek Vingralek, Bart Samwel, Ben Handy, Chad Whipkey, Eric Rollins, Mircea Oancea, Kyle Littlefield, David Menestrina, Stephan Ellner, John Cieslewicz, Ian Rae, Traian Stancescu, and Himani Apte. 2013. F1: A Distributed SQL Database That Scales. Proc. VLDB Endow. 6, 11 (Aug. 2013), 1068ÂĂŞ1079.

Digital Library

[40]

David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis. 2016. Mastering the Game of Go with Deep Neural Networks and Tree Search. Nature 529, 7587 (Jan. 2016), 484--489.

[41]

Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning. MIT Press.

Digital Library

[42]

Richard S. Sutton, David McAllester, Satinder Singh, and Yishay Mansour. 1999. Policy Gradient Methods for Reinforcement Learning with Function Approximation. In Proceedings of the 12th International Conference on Neural Information Processing Systems. 1057ÂĂŞ1063.

Digital Library

[43]

Rebecca Taft, Essam Mansour, Marco Serafini, Jennie Duggan, Aaron J Elmore, Ashraf Aboulnaga, Andrew Pavlo, and Michael Stonebraker. 2014. E-store: Finegrained elastic partitioning for distributed transaction processing systems. Proceedings of the VLDB Endowment 8, 3 (2014), 245--256.

Digital Library

[44]

Dana Van Aken, Andrew Pavlo, Geoffrey J. Gordon, and Bohan Zhang. 2017. Automatic Database Management System Tuning Through Large-Scale Machine Learning. In Proceedings of the 2017 ACM International Conference on Management of Data (SIGMOD '17). 1009ÂĂŞ1024.

Digital Library

[45]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Uszkoreit Jakob, Llion Jones, Aidan N. Gomez, ÅΑukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in neural information processing systems.

[46]

Haonan Wang, Hao He, Mohammad Alizadeh, and Hongzi Mao. 2019. Learning Caching Policies with Subsampling. In NeurIPS Machine Learning for Systems Workshop.

[47]

Chenzi Zhang, Fan Wei, Qin Liu, Zhihao Gavin Tang, and Zhenguo Li. 2017. Graph edge partitioning via neighborhood heuristic. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 605--614.

Digital Library

[48]

Ji Zhang, Yu Liu, Ke Zhou, Guoliang Li, Zhili Xiao, Bin Cheng, Jiashu Xing, Yangtao Wang, Tianheng Cheng, Li Liu, Minwei Ran, and Zekang Li. 2019. An End-to-End Automatic Cloud Database Tuning System Using Deep Reinforcement Learning. In Proceedings of the 2019 International Conference on Management of Data (SIGMOD '19). 415ÂĂŞ432.

Digital Library

[49]

Jie Zhou, Ganqu Cui, Shengding Hu, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, Lifeng Wang, Changcheng Li, and Maosong Sun. 2020. Graph neural networks: A review of methods and applications. AI Open 1 (2020), 57--81.

[50]

Jia Zou, Amitabh Das, Pratik Barhate, Arun Iyengar, Binhang Yuan, Dimitrije Jankov, and Chris Jermaine. 2021. Lachesis: Automated Partitioning for UDF-Centric Analytics. Proc. VLDB Endow. 14, 8 (2021), 1262--1275.

Digital Library

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

aiDM '22: Proceedings of the Fifth International Workshop on Exploiting Artificial Intelligence Techniques for Data Management

June 2022

53 pages

ISBN:9781450393775

DOI:10.1145/3533702

Conference Chairs:
Rajesh Bordawekar
IBM T. J. Watson Research Center
,
Oded Shmueli
Technion - Israel Institute of Technology
,
Yael Amsterdamer
Bar-Ilan University
,
Donatella Firmani
Sapienza University of Rome
,
Ryan Marcus
MIT CSAIL

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMOD: ACM Special Interest Group on Management of Data

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 August 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Funding Sources

DiDi Faculty Research Award
J.P. Morgan Faculty Research Award
NSF
ONR
ARO
Accenture Research Award

Conference

SIGMOD/PODS '22

Sponsor:

SIGMOD

SIGMOD/PODS '22: International Conference on Management of Data

June 17, 2022

Pennsylvania, Philadelphia

Acceptance Rates

Overall Acceptance Rate 19 of 26 submissions, 73%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
342
Total Downloads

Downloads (Last 12 months)145
Downloads (Last 6 weeks)19

Reflects downloads up to 01 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten