Skip to main content

Benchmarking Hashing Algorithms for Load Balancing in a Distributed Database Environment

  • Conference paper
  • First Online:
Advances in Model and Data Engineering in the Digitalization Era (MEDI 2022)

Abstract

Modern high load applications store data using multiple database instances. Such an architecture requires data consistency, and it is important to ensure even distribution of data among nodes. Load balancing is used to achieve these goals.

Hashing is the backbone of virtually all load balancing systems. Since the introduction of classic Consistent Hashing, many algorithms have been devised for this purpose.

One of the purposes of the load balancer is to ensure storage cluster scalability. It is crucial for the performance of the whole system to transfer as few data records as possible during node addition or removal. The load balancer hashing algorithm has the greatest impact on this process.

In this paper we experimentally evaluate several hashing algorithms used for load balancing, conducting both simulated and real system experiments. To evaluate algorithm performance, we have developed a benchmark suite based on Unidata MDM—a scalable toolkit for various Master Data Management (MDM) applications. For assessment, we have employed three criteria—uniformity of the produced distribution, the number of moved records, and computation speed. Following the results of our experiments, we have created a table, in which each algorithm is given an assessment according to the abovementioned criteria.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/postgres/postgres/blob/master/src/backend/partitioning/partbounds.c.

  2. 2.

    https://colab.research.google.com/drive/1pbJUFFP9JsSTSn7nrWv0tYdiUg2uRxAv?usp=sharing.

References

  1. Allen, M., Cervo, D.: Multi-Domain Master Data Management: Advanced MDM and Data Governance in Practice, 1st edn. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2015)

    Google Scholar 

  2. Alon, N., Dietzfelbinger, M., Miltersen, P.B., Petrank, E., Tardos, G.: Linear hashing. Technical report (1997)

    Google Scholar 

  3. Chi, L., Zhu, X.: Hashing techniques: a survey and taxonomy. ACM Comput. Surv. 50(1), 1–36 (2017). https://doi.org/10.1145/3047307

    Article  Google Scholar 

  4. DeCandia, G., et al.: Dynamo: amazon’s highly available key-value store. SIGOPS Oper. Syst. Rev. 41(6), 205–220 (2007). https://doi.org/10.1145/1323293.1294281

    Article  Google Scholar 

  5. Eisenbud, D.E., et al.: Maglev: a fast and reliable software network load balancer. In: Proceedings of the 13th Usenix Conference on Networked Systems Design and Implementation, pp. 523–535. NSDI’16, USENIX Association, USA (2016)

    Google Scholar 

  6. Honicky, R., Miller, E.: A fast algorithm for online placement and reorganization of replicated data. In: Proceedings International Parallel and Distributed Processing Symposium, p. 10 (2003). https://doi.org/10.1109/IPDPS.2003.1213151

  7. Honicky, R., Miller, E.: Replication under scalable hashing: a family of algorithms for scalable decentralized data distribution. In: 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings, pp. 96 (2004). https://doi.org/10.1109/IPDPS.2004.1303042

  8. Jafarnejad Ghomi, E., Masoud Rahmani, A., Nasih Qader, N.: Load-balancing algorithms in cloud computing: a survey. J. Netw. Comput. Appl. 88, 50–71 (2017). https://doi.org/10.1016/j.jnca.2017.04.007

    Article  Google Scholar 

  9. Karger, D., Lehman, E., Leighton, T., Panigrahy, R., Levine, M., Lewin, D.: Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the world wide web. In: Proceedings of the Twenty-Ninth Annual ACM Symposium on Theory of Computing, pp. 654–663. STOC 1997, Association for Computing Machinery, New York, NY, USA (1997). https://doi.org/10.1145/258533.258660

  10. Kuznetsov, S., et al.: Unidata – a modern master data management platform. In: Proceedings of the 1st International Workshop on Data Platform Design, Management, and Optimization (DATAPLAT) co-located with the 25th International Conference on Extending Database Technology and the 25th International Conference on Database Theory (EDBT/ICDT 2022), Edinburgh, UK, March 29, 2022. CEUR Workshop Proceedings, CEUR-WS.org (2022)

    Google Scholar 

  11. Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010). https://doi.org/10.1145/1773912.1773922

    Article  Google Scholar 

  12. Lamping, J., Veach, E.: A fast, minimal memory, consistent hash algorithm (2014)

    Google Scholar 

  13. Litwin, W., Menon, J., Risch, T.: Lh* schemes with scalable availability (2001)

    Google Scholar 

  14. Litwin, W., Neimat, M.A.: High-availability LH* schemes with mirroring. In: Proceedings of the First IFCIS International Conference on Cooperative Information Systems, p. 196. COOPIS 1996, IEEE Computer Society, USA (1996)

    Google Scholar 

  15. Litwin, W., Neimat, M.A., Lev, G., Ndiaye, S., Seck, T.: LH*s: a high-availability and high-security scalable distributed data structure. In: Proceedings Seventh International Workshop on Research Issues in Data Engineering. High Performance Database Management for Large-Scale Applications, pp. 141–150 (1997). https://doi.org/10.1109/RIDE.1997.583720

  16. Litwin, W., Risch, T.: LH*g: a high-availability scalable distributed data structure by record grouping. IEEE Trans. Knowl. Data Eng. 14(4), 923–927 (2002). https://doi.org/10.1109/TKDE.2002.1019223

    Article  Google Scholar 

  17. Litwin, W., Moussa, R., Schwarz, T.: \(LH*_{RS}\)-a highly-available scalable distributed data structure. ACM Trans. Database Syst. 30(3), 769–811 (2005). https://doi.org/10.1145/1093382.1093386

    Article  Google Scholar 

  18. Litwin, W., Neimat, M.A., Schneider, D.A.: LH: linear hashing for distributed files. In: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, pp. 327–336. SIGMOD 1993, Association for Computing Machinery, New York, NY, USA (1993). https://doi.org/10.1145/170035.170084

  19. Loshin, D.: Master Data Management. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2009)

    MATH  Google Scholar 

  20. Mendelson, G., Vargaftik, S., Barabash, K., Lorenz, D.H., Keslassy, I., Orda, A.: Anchorhash: a scalable consistent hash. IEEE/ACM Trans. Netw. 29(2), 517–528 (2021). https://doi.org/10.1109/TNET.2020.3039547

    Article  Google Scholar 

  21. Thaler, D., Ravishankar, C.: A Name-Based Mapping Scheme for Rendezvous. Technical report. https://www.eecs.umich.edu/techreports/cse/96/CSE-TR-316-96.pdf

  22. Özsu, M.T., Valduriez, P.: Principles of Distributed Database Systems, 3rd edn. Springer, Cham (2011)

    Google Scholar 

Download references

Acknowledgments

We would like to thank Anna Smirnova for her help with the preparation of the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to George Chernishev .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Slesarev, A., Mikhailov, M., Chernishev, G. (2022). Benchmarking Hashing Algorithms for Load Balancing in a Distributed Database Environment. In: Fournier-Viger, P., et al. Advances in Model and Data Engineering in the Digitalization Era. MEDI 2022. Communications in Computer and Information Science, vol 1751. Springer, Cham. https://doi.org/10.1007/978-3-031-23119-3_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-23119-3_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-23118-6

  • Online ISBN: 978-3-031-23119-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics