Abstract
Modern high load applications store data using multiple database instances. Such an architecture requires data consistency, and it is important to ensure even distribution of data among nodes. Load balancing is used to achieve these goals.
Hashing is the backbone of virtually all load balancing systems. Since the introduction of classic Consistent Hashing, many algorithms have been devised for this purpose.
One of the purposes of the load balancer is to ensure storage cluster scalability. It is crucial for the performance of the whole system to transfer as few data records as possible during node addition or removal. The load balancer hashing algorithm has the greatest impact on this process.
In this paper we experimentally evaluate several hashing algorithms used for load balancing, conducting both simulated and real system experiments. To evaluate algorithm performance, we have developed a benchmark suite based on Unidata MDM—a scalable toolkit for various Master Data Management (MDM) applications. For assessment, we have employed three criteria—uniformity of the produced distribution, the number of moved records, and computation speed. Following the results of our experiments, we have created a table, in which each algorithm is given an assessment according to the abovementioned criteria.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Allen, M., Cervo, D.: Multi-Domain Master Data Management: Advanced MDM and Data Governance in Practice, 1st edn. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2015)
Alon, N., Dietzfelbinger, M., Miltersen, P.B., Petrank, E., Tardos, G.: Linear hashing. Technical report (1997)
Chi, L., Zhu, X.: Hashing techniques: a survey and taxonomy. ACM Comput. Surv. 50(1), 1–36 (2017). https://doi.org/10.1145/3047307
DeCandia, G., et al.: Dynamo: amazon’s highly available key-value store. SIGOPS Oper. Syst. Rev. 41(6), 205–220 (2007). https://doi.org/10.1145/1323293.1294281
Eisenbud, D.E., et al.: Maglev: a fast and reliable software network load balancer. In: Proceedings of the 13th Usenix Conference on Networked Systems Design and Implementation, pp. 523–535. NSDI’16, USENIX Association, USA (2016)
Honicky, R., Miller, E.: A fast algorithm for online placement and reorganization of replicated data. In: Proceedings International Parallel and Distributed Processing Symposium, p. 10 (2003). https://doi.org/10.1109/IPDPS.2003.1213151
Honicky, R., Miller, E.: Replication under scalable hashing: a family of algorithms for scalable decentralized data distribution. In: 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings, pp. 96 (2004). https://doi.org/10.1109/IPDPS.2004.1303042
Jafarnejad Ghomi, E., Masoud Rahmani, A., Nasih Qader, N.: Load-balancing algorithms in cloud computing: a survey. J. Netw. Comput. Appl. 88, 50–71 (2017). https://doi.org/10.1016/j.jnca.2017.04.007
Karger, D., Lehman, E., Leighton, T., Panigrahy, R., Levine, M., Lewin, D.: Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the world wide web. In: Proceedings of the Twenty-Ninth Annual ACM Symposium on Theory of Computing, pp. 654–663. STOC 1997, Association for Computing Machinery, New York, NY, USA (1997). https://doi.org/10.1145/258533.258660
Kuznetsov, S., et al.: Unidata – a modern master data management platform. In: Proceedings of the 1st International Workshop on Data Platform Design, Management, and Optimization (DATAPLAT) co-located with the 25th International Conference on Extending Database Technology and the 25th International Conference on Database Theory (EDBT/ICDT 2022), Edinburgh, UK, March 29, 2022. CEUR Workshop Proceedings, CEUR-WS.org (2022)
Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010). https://doi.org/10.1145/1773912.1773922
Lamping, J., Veach, E.: A fast, minimal memory, consistent hash algorithm (2014)
Litwin, W., Menon, J., Risch, T.: Lh* schemes with scalable availability (2001)
Litwin, W., Neimat, M.A.: High-availability LH* schemes with mirroring. In: Proceedings of the First IFCIS International Conference on Cooperative Information Systems, p. 196. COOPIS 1996, IEEE Computer Society, USA (1996)
Litwin, W., Neimat, M.A., Lev, G., Ndiaye, S., Seck, T.: LH*s: a high-availability and high-security scalable distributed data structure. In: Proceedings Seventh International Workshop on Research Issues in Data Engineering. High Performance Database Management for Large-Scale Applications, pp. 141–150 (1997). https://doi.org/10.1109/RIDE.1997.583720
Litwin, W., Risch, T.: LH*g: a high-availability scalable distributed data structure by record grouping. IEEE Trans. Knowl. Data Eng. 14(4), 923–927 (2002). https://doi.org/10.1109/TKDE.2002.1019223
Litwin, W., Moussa, R., Schwarz, T.: \(LH*_{RS}\)-a highly-available scalable distributed data structure. ACM Trans. Database Syst. 30(3), 769–811 (2005). https://doi.org/10.1145/1093382.1093386
Litwin, W., Neimat, M.A., Schneider, D.A.: LH: linear hashing for distributed files. In: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, pp. 327–336. SIGMOD 1993, Association for Computing Machinery, New York, NY, USA (1993). https://doi.org/10.1145/170035.170084
Loshin, D.: Master Data Management. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2009)
Mendelson, G., Vargaftik, S., Barabash, K., Lorenz, D.H., Keslassy, I., Orda, A.: Anchorhash: a scalable consistent hash. IEEE/ACM Trans. Netw. 29(2), 517–528 (2021). https://doi.org/10.1109/TNET.2020.3039547
Thaler, D., Ravishankar, C.: A Name-Based Mapping Scheme for Rendezvous. Technical report. https://www.eecs.umich.edu/techreports/cse/96/CSE-TR-316-96.pdf
Özsu, M.T., Valduriez, P.: Principles of Distributed Database Systems, 3rd edn. Springer, Cham (2011)
Acknowledgments
We would like to thank Anna Smirnova for her help with the preparation of the paper.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Slesarev, A., Mikhailov, M., Chernishev, G. (2022). Benchmarking Hashing Algorithms for Load Balancing in a Distributed Database Environment. In: Fournier-Viger, P., et al. Advances in Model and Data Engineering in the Digitalization Era. MEDI 2022. Communications in Computer and Information Science, vol 1751. Springer, Cham. https://doi.org/10.1007/978-3-031-23119-3_8
Download citation
DOI: https://doi.org/10.1007/978-3-031-23119-3_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-23118-6
Online ISBN: 978-3-031-23119-3
eBook Packages: Computer ScienceComputer Science (R0)