Abstract
The proliferation in Web 2.0 applications has increased the volume, velocity, and variety of data sources which have exceeded the limitations and expected use cases of traditional relational DBMSs. Cloud serving NoSQL data stores address these concerns and provide replication mechanisms to ensure fault tolerance, high availability, and improved scalability. In this paper, we empirically explore the impact of replication on the performance of Cassandra and MongoDB NoSQL datastores. We evaluate the impact of replication in comparison to non-replicated clusters of equal size hosted on a private cloud environment. Our benchmarking experiments are conducted for read and write heavy workloads subject to different access distributions and tunable consistency levels. Our results demonstrate that replication must be taken into consideration in empirical and modelling studies in order to achieve an accurate evaluation of the performance of these datastores.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
An arbiter node does not replicate data and only exists to break ties when electing a new primary if necessary.
- 2.
Available at https://github.com/thumbtack-technology/ycsb.
References
Bailis, P., Venkataraman, S., Franklin, M.J., Hellerstein, J.M., Stoica, I.: Probabilistically bounded staleness for practical partial quorums. Proc. VLDB Endow. 5(8), 776–787 (2012)
Cassandra. http://cassandra.apache.org/
Chodorow, K.: MongoDB: The Definitive Guide. O’Reilly Media Inc, Sebastopol (2013)
Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking cloud serving systems with ycsb. In: Proceedings of the 1st ACM symposium on Cloud computing, pp. 143–154. ACM (2010)
Datastax Coperation. Benchmarking top NoSQL databases. A performance comparison for architects and IT managers (2013)
Dede, E., Sendir, B., Kuzlu, P., Hartog, J., Govindaraju, M.: An evaluation of cassandra for hadoop. In: IEEE Sixth International Conference on Cloud Computing (CLOUD), pp. 494–501. IEEE (2013)
Diomin and Grigorchuk. Benchmarking Couchbase server for interactive applications (2013). http://www.altoros.com/
Gandini, A., Gribaudo, M., Knottenbelt, W.J., Osman, R., Piazzolla, P.: Performance analysis of nosql databases. In: 11th European Performance Engineering Workshop (EPEW) (2014)
Haughian, G.: Benchmarking Replication in NoSQL Data Stores. Master’s thesis, Imperial College London, UK (2014)
Hewitt, E.: Cassandra: The Definitive Guide. O’Reilly Media Inc., Sebastopol (2010)
Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. ACM SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010)
MongoDB Inc., MongoDB manual: Replication. http://docs.mongodb.org/manual/replication/
MongoDB Inc., MongoDB manual: Sharded cluster config servers. http://docs.mongodb.org/manual/core/sharded-cluster-config-servers/
MongoDB Inc., MongoDB manual: Sharded collection balancer. http://docs.mongodb.org/manual/core/sharding-balancing/
MongoDB Inc., MongoDB manual: Sharding. http://docs.mongodb.org/manual/sharding/
Nelubin and Engber. NoSQL failover characteristics: Aerospike, Cassandra, Couchbase, MongoDB (2013). http://www.thumbtack.net/
Nelubin and Engber. Ultra-high performance NoSQL benchmarking (2013). http://www.thumbtack.net/
Osman, R., Harrison, P.G.: Approximating closed fork-join queueing networks using product-form stochastic petri-nets. J. Syst. Softw. 110, 264–278 (2015)
Osman, R., Piazzolla, P.: Modelling replication in NoSQL datastores. In: Norman, G., Sanders, W. (eds.) QEST 2014. LNCS, vol. 8657, pp. 194–209. Springer, Heidelberg (2014)
Pirzadeh, P., Tatemura, J., Hacigumus, H.: Performance evaluation of range queries in key value stores. In: IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), pp. 1092–1101. IEEE (2011)
Pokluda, A., Sun, W.: Benchmarking failover characteristics of large-scale data storage applications: Cassandra and Voldemort
Rabl, T., Gómez-Villamor, S., Sadoghi, M., Muntés-Mulero, V., Jacobsen, H.-A., Mankovskii, S.: Solving big data challenges for enterprise application performance management. Proc. VLDB Endowment 5(12), 1724–1735 (2012)
Rogers, A.: VOLTDB in-memory database achieves best-in-class results, running in the cloud, on the YCSB benchmark, May 2014. http://tinyurl.com/VoltDB-YCSB. Last Accessed June 2016
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Haughian, G., Osman, R., Knottenbelt, W.J. (2016). Benchmarking Replication in Cassandra and MongoDB NoSQL Datastores. In: Hartmann, S., Ma, H. (eds) Database and Expert Systems Applications. DEXA 2016. Lecture Notes in Computer Science(), vol 9828. Springer, Cham. https://doi.org/10.1007/978-3-319-44406-2_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-44406-2_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44405-5
Online ISBN: 978-3-319-44406-2
eBook Packages: Computer ScienceComputer Science (R0)