Abstract
Due to the gradual expansion in data volume used in social networks and cloud computing, the term “Big data” has appeared with its challenges to store the immense datasets. Many tools and algorithms appeared to handle the challenges of storing big data. NoSQL databases, such as Cassandra and MongoDB, are designed with a novel data management system that can handle and process huge volumes of data. Partitioning data in NoSQL databases is considered one of the critical challenges in database design. In this paper, a MapReduce Rendezvous Hashing-Based Virtual Hierarchies (MR-RHVH) framework is proposed for scalable partitioning of Cassandra NoSQL database. The MapReduce framework is used to implement MR-RHVH on Cassandra to enhance its performance in highly distributed environments. MR-RHVH distributes the nodes to rendezvous regions based on a proposed Adopted Virtual Hierarchies strategy. Each region is responsible for a set of nodes. In addition, a proposed bloom filter evaluator is used to ensure the accurate allocation of keys to nodes in each region. Moreover, a number of experiments were performed to evaluate the performance of MR-RHVH framework, using YCSB for database benchmarking. The results show high scalability rate and less time consuming for MR-RHVH framework over different recent systems.

















Similar content being viewed by others
References
Anagnostopoulos I, Zeadally S, Exposito E (2016) Handling big data: research challenges and future directions. J Supercomput 72(4):1494–1516
\(10-K\) Annual Report. SEC Filings. Facebook. 28 Jan 2016. Retrieved 26 Mar 2016
Agrawal R, Ailamaki A, Bernstein PA, Brewer EA, Carey MJ, Chaudhuri S et al (2008) The Claremont report on database research. SIGMOD Rec 37(3):9–19
Cruz F, Maia F, Matos M, Oliveira R, Paulo Ja, Pereira J, Vilaça R (2013) MeT: Workload aware elasticity for NoSQL. In: Proceeding EuroSys ’13 Proceedings of the 8th ACM European Conference on Computer Systems, New York, NY, USA, pp 183–196
HBase Development Team (2013) HBase: BigTable-like structured storage for Hadoop HDFS [EB/OL]. http://wiki.apache.org/hadoop/Hbase/. Accessed 20 Mar 2013
Chodorow K, Dirolf M (2010) MongoDB: the definitive guide, 1st edn, O’Reilly Media, p 216, ISBN 978-1-4493-8156-1
Chang F, Dean J, Ghemawat S, Hsieh WC et al (2008) BigTable: a distributed storage system for structured data. ACM Trans Comput Syst (TOCS) J 26(2):205–218
DeCandia G, Hastorun D, Jampani M et al (2007) Dynamo: Amazon’s highly available key-value storeC. In: Proceedings of the 21st ACM Symposium on Operating Systems Principles, SOSP 2007, 205–220, Stevenson, Washington, USA, October 14–17
Lakshman A, Malik P (2010) Cassandra: a decentralized structured storage system. Oper Syst Rev 44(2):35–40
Karger D, Lehman E, Leighton T, Panigrahy R, Levine M, Lewin D (1997) Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the world-wide web. In: Proceedings of the 29th Annual ACM Symposium on Theory of Computing, ’97, ACM, New York, NY, USA, pp 654–663
Chen Z, Yang S, Tan S, Zhang G, Yang H (2013) Hybrid range consistent hash partitioning strategy—a new data partition strategy for NoSQL database. In: 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), 2013, IEEE, pp 1161–1169
Turk A, Selvitopi RO, Ferhatosmanoglu H, Aykanat C (2014) Temporal workload-aware replicated partitioning for social networks. IEEE Trans Knowl Data Eng 26(11):2832–2845
Huang X, Wang J, Zhong Y, Song S, Yu PS (2015) Optimizing data partition for scaling out NoSQL cluster. Concur Comput: Pract Exp 27(18):5793–5809
Schall D, Härder T (2015) Dynamic physiological partitioning on a shared-nothing database Cluster. In: IEEE 31st International Conference on Data Engineering (ICDE), 2015, IEEE, pp 1095–1106
Yao Z, Ravishankar CV, Tripathi S (2001) Hash-based virtual hierarchies for caching in hybrid content-delivery networks. The University of California, Riverside, Department of Computer Science and Engineering, California
Cooper BF, Silberstein A, Tam E, Ramakrishnan R, Sears R (2010) Benchmarking cloud serving systems with YCSB. In: Proceedings of the 1st ACM Symposium on Cloud computing, ACM, pp 143–154
Abramova V, Bernardino J, Furtado P (2014) Testing cloud benchmark scalability with cassandra. In: IEEE World Congress on Services (SERVICES), 2014, IEEE, pp 434–441
Srinivasan L, Varma V (2015) Adaptive load-balancing for consistent hashing in heterogeneous clusters. In: 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), 2015, IEEE, pp 1135–1138
Wang X, Loguinov D (2007) Load-balancing performance of consistent hashing: asymptotic analysis of random node join. IEEE/ACM Trans Netw 15(4):892–905
Dede E, Sendir B, Kuzlu P, Weachock J, Govindaraju M, Ramakrishnan L (2016) Processing Cassandra datasets with Hadoop-streaming based approaches. IEEE Trans Serv Comput 9(1):46–58
Kuhlenkamp J, Klems M, Röss O (2014) Benchmarking scalability and elasticity of distributed database systems. Proc VLDB Endow 7(12):1219–1230
Braam PJ et al (2004) The Lustre storage architecture. ftp://ftp.uniduisburg.de/pub/linux/filesys/Lustre/lustre.pdf, 2004
Thaler DG, Ravishankar CV (1998) Using name-based mappings to increase hit rates. IEEE/ACM Trans Netw (TON) 6(1):1–14
Seada K, Helmy A (2004) Rendezvous regions: a scalable architecture for service location and data-centric storage in large-scale wireless networks. In: Proceedings of the 18th International on Parallel and Distributed Processing Symposium, 2004, IEEE, p 218
Kurihara Yuki (2015) Digest::MurmurHash. GitHub.com. Retrieved 18 Mar 2015
Jenkins B (2012) SpookyHash: a 128-bit noncryptographic hash
Bloom BH (1970) Space/time trade-offs in hash coding with allowable errors. Commun ACM 13(7):422–426
Li Zhe, Ross Kenneth A (1995) Perf join: an alternative to two-way semijoin and bloomjoin. In: CIKM ’95: Proceedings of the 4th International Conference on Information and Knowledge Management, pp 137–144, 1995
Bringer J, Morel C, Rathgeb C (2015) Security analysis of bloom filter-based iris biometric template protection. In: International Conference on Biometrics (ICB), 2015, IEEE, pp 527–534
DataStaX (2016) -https://datastax.github.io/python-driver/api/cassandra/policies.html-retrieved. Accessed Jan 4 2016
VMware VSpher (2016). Server Virtualization with VMware vSphere | VMware India”. www.vmware.com. Retrieved 08 Mar 2016
Xue R, Guan Z, Gao S, Ao L (2014) NM2H: Design and implementation of NoSQL extension for HDFS metadata management. In: IEEE 17th International Conference on Computational Science and Engineering (CSE), 2014, IEEE, pp 1282–1289
Gudivada VN, Rao D, Raghavan VV (2014) NoSQL systems for big data management. In: IEEE World Congress on Services (SERVICES), 2014, IEEE, pp 190–197
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Elghamrawy, S.M., Hassanien, A.E. A partitioning framework for Cassandra NoSQL database using Rendezvous hashing. J Supercomput 73, 4444–4465 (2017). https://doi.org/10.1007/s11227-017-2027-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-017-2027-5