Abstract
With the recent trend towards big data, a number of scalable data management systems: NoSQL and NewSQL are developed to manage massive data effectively. The algorithms involved in the architectural design of a data management system defines the response time of an application. The behavior and performance of different NoSQL and NewSQL systems vary on the basis of these architectural aspects. Hence, the architectural assessment of a data management system is a vital task to perform in order to understand their weaknesses and strengths. Therefore, this paper assesses the architecture of some well-known NoSQL and NewSQL systems in detail. To enhance the clarity of discussion and analysis, we identified and grouped together the logically related architectural features, forming a feature vector (FV). Feature vectors related to transactional properties, fault tolerance, data storage, and data handling are designed and involved in architectural assessment. Various significant features are identified and assigned to a feature vector. Some well-known NoSQL and NewSQL systems are analyzed, compared, and discussed in depth with respect to these feature vectors. The discussion involves describing the algorithms used in implementation of a particular architectural feature by each of the systems and their suitability analysis in various scenarios. Important guidelines are presented that helps in filtering the potential data management systems on the basis of application requirements.
Similar content being viewed by others
References
Abramova, V., Bernardino, J.: Nosql databases: Mongodb vs cassandra. In: Proceedings of the International C* Conference on Computer Science and Software Engineering, pp. 14–22. ACM (2013)
Aerospike high performance nosql database. [online] http://www.aerospike.com/
Aggarwal, D., Roopam, S.: Emerging technologies for big data processing: Nosql and newsql data stores
Al-Saeedi, B.: Factors influencing the database selection for b2c web applications
Amirian, P., Basiri, A., Winstanley, A.: Efficient online sharing of geospatial big data using nosql xml databases. In: Computing for Geospatial Research and Application (COM. Geo), 2013 Fourth International Conference on, pp. 152–152. IEEE (2013)
Amirian, P., Basiri, A., Winstanley, A.: Evaluation of data management systems for geospatial big data. In: International Conference on Computational Science and Its Applications, pp. 678–690. Springer (2014)
Andersson, E., Berggren, Z.: A comparison between mongodb and mysql document store considering performance (2017)
Angles, R.: A comparison of current graph database models. In: Data Engineering Workshops (ICDEW), 2012 IEEE 28th International Conference on, pp. 171–177. IEEE (2012)
Apache accumulo. [online] https://accumulo.apache.org/
Apache cassandra. [online] http://cassandra.apache.org/
Apache couchdb. [online] http://couchdb.apache.org/
Apache hbase – apache hbase™ home. [online] https://hbase.apache.org/
Bailis, P., Ghodsi, A.: Eventual consistency today: limitations, extensions, and beyond. Commun. ACM 56(5), 55–63 (2013)
Barber, R.J., Herbert, D.M., Mohan, C., Somani, A., Watts, S.J., Zaharioudakis, M.: Data recovery in a transactional database using write-ahead logging and file caching (2001). US Patent 6,173,292
Bermbach, D., Kuhlenkamp, J.: Consistency in distributed storage systems. In: Networked Systems, pp. 175–189. Springer (2013)
Best database for integrating data from silos | marklogic. [online] http://www.marklogic.com/
Bhamra, K.: A comparative analysis of mongodb and cassandra. Master’s thesis, The University of Bergen (2018)
Bhatt, P.: Mongodb vs redis: Critical analysis and comparison
Bhatt, P.: Performance comparison between column store nosql databases
Bigtable - scalable nosql database service | google cloud platform. [online] https://cloud.google.com/bigtable/
Bommena, S.: Cyclic redundancy check (crc) (2008)
Buerli, M., Obispo, C.: The current state of graph databases. Department of Computer Science, Cal Poly San Luis Obispo, mbuerli@ calpoly. edu pp. 1–7 (2012)
Caldarola, E.G., Rinaldi, A.M.: Big data: A survey the new paradigms, methodologies and tools
Castruccio, S., Genton, M.G.: Principles for statistical inference on big spatio-temporal data from climate models. Stat. Probab. Lett. 136, 92–98 (2018)
Cattell, R.: Scalable sql and nosql data stores. ACM Sigmod Record 39(4), 12–27 (2011)
Cloud data warehouse | home | snowflake. [online] http://www.snowflake.net/
Cloudant. [online] https://cloudant.com/
Clustrix - scale-out rdbms. [online] http://www.clustrix.com,
Collet, Y.: Lz4: Extremely fast compression algorithm. code. google. com (2013)
Database administrators stack exchange. [online] https://dba.stackexchange.com/
Davidson, S.B., Garcia-Molina, H., Skeen, D.: Consistency in a partitioned network: a survey. ACM Comput. Surv. (CSUR) 17(3), 341–370 (1985)
Davoudian, A., Chen, L., Liu, M.: A survey on nosql stores. ACM Comput. Surv. (CSUR) 51(2), 1–43 (2018)
Db-engines ranking - popularity ranking of database management systems. [online] http://db-engines.com/en/ranking/
Debnath, B., Sengupta, S., Li, J.: Flashstore: high throughput persistent key-value store. Proc. VLDB Endow. 3(1–2), 1414–1425 (2010)
Deri, L., Mainardi, S., Fusco, F.: tsdb: A compressed database for time series. In: International Workshop on Traffic Monitoring and Analysis, pp. 143–156. Springer (2012)
Dgraph labs. [online] https://dgraph.io/
Dziedzic, A., Duggan, J., Elmore, A.J., Gadepally, V., Stonebraker, M.: Bigdawg: a polystore for diverse interactive applications. In: Data Syst. Interactive Anal. Workshop 2015 (2015)
Ediger, D., Jiang, K., Riedy, E.J., Bader, D.A.: Graphct: multithreaded algorithms for massive graph analysis. IEEE Trans. Parallel Distrib. Syst. 24(11), 2220–2229 (2013)
Ehcache. [online] http://www.ehcache.org/
Fox, A., Eichelberger, C., Hughes, J., Lyon, S.: Spatio-temporal indexing in non-relational distributed databases. In: Big Data, 2013 IEEE International Conference on, pp. 291–299. IEEE (2013)
Gajendran, S.K.: A survey on nosql databases. University of Illinois (2012)
Gessert, F., Wingerath, W., Friedrich, S., Ritter, N.: Nosql database systems: a survey and decision guidance. Computer Science-Research and Development pp. 1–13 (2016)
Ghrab, A., Romero, O., Skhiri, S., Zimányi, E.: Analytics-aware graph database modeling. Tech. rep, Technical report (2014)
Ghrab, A., Skhiri, S., Jouili, S., Zimányi, E.: An analytics-aware conceptual model for evolving graphs. In: International Conference on Data Warehousing and Knowledge Discovery, pp. 1–12. Springer (2013)
Giraph - welcome to apache giraph! [online] http://giraph.apache.org/
Graphbase - the world’s most powerful graph dbms. [online] http://graphbase.net/
Grolinger, K., Higashino, W.A., Tiwari, A., Capretz, M.A.: Data management in cloud environments: Nosql and newsql data stores. Journal of Cloud Computing: Advances, Systems and Applications 2(1), 1 (2013)
Gurevich, Y.: Comparative survey of nosql/newsql db systems. Ph.D. thesis, The Open University (2015)
Gustavsson, S., Andler, S.F.: Self-stabilization and eventual consistency in replicated real-time databases. In: Proceedings of the first workshop on Self-healing systems, pp. 105–107. ACM (2002)
Hajoui, O., Dehbi, R., Talea, M., Batouta, Z.I.: An advanced comparative study of the most promising nosql and newsql databases with a multi-criteria analysis method. J. Theor. Appl. Inf. Technol. 81(3), 579 (2015)
Home | hypertable - big data. big performance. [online] http://hypertable.org/
Home | scylladb. [online] http://www.scylladb.com/
Hoque, I., Gupta, I.: Lfgraph: Simple and fast distributed graph analytics. In: Proceedings of the First ACM SIGOPS Conference on Timely Results in Operating Systems, p. 9. ACM (2013)
[online] https://www.quora.com/What-is-a-transaction-and-ACID-properties-in-DBMS
Infinitegraph. [online] www.objectivity.com/products/infinitegraph/
Infogrid web graph database. [online] http://infogrid.org/
Jomeiri, A., Shamsi, M., Kazemi, E.: Comparative study of column oriented nosql databases on characteristics. Int. J. Enhanc. Res. Sci. Technol. Eng. 4(4), 118–124 (2015)
Kalid, S., Syed, A., Mohammad, A., Halgamuge, M.N.: Big-data nosql databases: A comparison and analysis of “big-table”, “dynamodb”, and “cassandra”. In: Big Data Analysis (ICBDA), 2017 IEEE 2nd International Conference on, pp. 89–93. IEEE (2017)
Kamal, S.H., Elazhary, H.H., Hassanein, E.E.: A qualitative comparison of nosql data stores. Int. J. Adv. Comput. Sci. Appl 10(2), 330–338 (2019)
Karger, D., Lehman, E., Leighton, T., Panigrahy, R., Levine, M., Lewin, D.: Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the world wide web. In: Proceedings of the twenty-ninth annual ACM symposium on Theory of computing, pp. 654–663. ACM (1997)
Kaur, K., Sachdeva, M.: Performance evaluation of newsql databases. In: Inventive Systems and Control (ICISC), 2017 International Conference on, pp. 1–5. IEEE (2017)
Kepner, J., Gadepally, V., Hutchison, D., Jananthan, H., Mattson, T., Samsi, S., Reuther, A.: Associative array model of sql, nosql, and newsql databases. In: High Performance Extreme Computing Conference (HPEC), 2016 IEEE, pp. 1–9. IEEE (2016)
Khan, W., Shahzad, W.: Predictive performance comparison analysis of relational & nosql graph databases. Int. J. Adv. Comput. Sci. Appl. 8(5), 523–530 (2017)
Khazaei, H., Fokaefs, M., Zareian, S., Beigi-Mohammadi, N., Ramprasad, B., Shtern, M., Gaikwad, P., Litoiu, M.: How do i choose the right nosql solution? A comprehensive theoretical and experimental survey. Big Data and Information Analytics (BDIA) 2, 185 (2016)
Kumar, R., Charu, S.: Newsql databases: Scalable rdbms for oltp needs to handle big data
Kumar, K.S., Mohanavalli, S., et al.: A performance comparison of document oriented nosql databases. In: Computer, Communication and Signal Processing (ICCCSP), 2017 International Conference on, pp. 1–6. IEEE (2017)
Kunda, D., Phiri, H.: A comparative study of nosql and relational database. Zambia ICT J. 1(1), 1–4 (2017)
Lee, J.G., Kang, M.: Geospatial big data: challenges and opportunities. Big Data Res. 2(2), 74–81 (2015)
Leveldb.org. [online] http://leveldb.org/
Lloyd, W., Freedman, M.J., Kaminsky, M., Andersen, D.G.: Don’t settle for eventual: scalable causal consistency for wide-area storage with cops. In: Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, pp. 401–416. ACM (2011)
Lourenço, J.R., Cabral, B., Carreiro, P., Vieira, M., Bernardino, J.: Choosing the right nosql database for the job: a quality attribute evaluation. J. Big Data 2(1), 1 (2015)
Lucchese, F.: From p2p to nosql: a continuous metric for classifying large-scale storage systems. J. Parallel Distrib. Comput. 113, 227–249 (2018)
Mahgoub, A., Ganesh, S., Meyer, F., Grama, A., Chaterji, S.: Suitability of nosql systems—cassandra and scylladb—for iot workloads. In: Communication Systems and Networks (COMSNETS), 2017 9th International Conference on, pp. 476–479. IEEE (2017)
Maia, D.C.M., Camargos, B.D., Holanda, M.: Performance analysis on voluntary geographic information systems with document-based nosql database. In: Developments and Advances in Intelligent Systems and Applications, pp. 181–197. Springer (2018)
Mariadb.org. [online] https://mariadb.org,
memcached - a distributed memory object caching system. [online] http://www.memcached.org,
Memsql: The fastest in-memory database. [online] http://memsql.com,
Mongodb for giant ideas. [online] https://www.mongodb.com/
Murugan, P., Sentraya, A.: A study of nosql and newsql databases for data aggregation on big data (2013)
Nair, S.M., Roy, R., Varghese, S.M.: Performance evaluation of mongodb and couchdb databases (2017)
Neo4j: The world’s leading graph database. [online] https://neo4j.com/
Nguyen, T.T., Nguyen, M.H.: Zing database: high-performance key-value store for large-scale storage service. Vietnam J. Comput. Sci. 2(1), 13–23 (2015)
Nosql database | couchbase. [online] https://www.couchbase.com/
Nuodb.com. [online] https://www.nuodb.com,
Orientdb - distributed multi-model and graph database. [online] http://orientdb.com/orientdb/
Oussous, A., Benjelloun, F.Z., Lahcen, A.A., Belfkih, S.: Comparison and classification of nosql databases for big data. In: Proceedings of International Conference on Big Data, Cloud and Applications (2015)
Patil, M.M., Hanni, A., Tejeshwar, C., Patil, P.: A qualitative analysis of the performance of mongodb vs mysql database based on insertion and retriewal operations using a web/android application to explore load balancing—sharding in mongodb and its advantages. In: I-SMAC (IoT in Social, Mobile, Analytics and Cloud)(I-SMAC), 2017 International Conference on, pp. 325–330. IEEE (2017)
Pereira, D.A., Ourique de Morais, W., Pignaton de Freitas, E.: Nosql real-time database performance comparison. Int. J. Parallel Emerg. Distrib. Syst. 32, 144–156 (2017)
Pivotal gemfire | big data. [online] http://pivotal.io/big-data/pivotal-gemfire
Pouchdb, the javascript database that syncs! [online] https://pouchdb.com/
Pouyanfar, S., Yang, Y., Chen, S.C., Shyu, M.L., Iyengar, S.: Multimedia big data analytics: a survey. ACM Comput. Surv. (CSUR) 51(1), 10 (2018)
Pritchett, D.: Base: an acid alternative. Queue 6(3), 48–55 (2008)
Priyanka, A.: A review of nosql databases, types and comparison with relational database. Int. J. Eng. Sci. 4963, 5 (2016)
Raut, A.: Nosql database and its comparison with rdbms. Int. J. Comput. Intell. Res. 13(7), 1645–1651 (2017)
Ravendb - the open source nosql database for .net. [online] https://ravendb.net/
Redis. [online] http://www.redis.io/
Reilly, E.D.: Memory-mapped i/o (2003)
Rethinkdb: the open-source database for the realtime web. [online] https://www.rethinkdb.com/
Rocksdb | a persistent key-value store. [online] http://rocksdb.org/
Sahatqija, K., Ajdari, J., Zenuni, X., Raufi, B., Ismaili, F.: Comparison between relational and nosql databases. In: 2018 41st International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), pp. 0216–0221. IEEE (2018)
Saino, L., Psaras, I., Pavlou, G.: Understanding sharded caching systems. In: Computer Communications, IEEE INFOCOM 2016-The 35th Annual IEEE International Conference on, pp. 1–9. IEEE (2016)
Schreiner, G.A., Knob, R., Duarte, D., Vilain, P., Mello, R.d.S.: Newsql through the looking glass. In: Proceedings of the 21st International Conference on Information Integration and Web-based Applications & Services, pp. 361–369 (2019)
Simmen, D., Schnaitter, K., Davis, J., He, Y., Lohariwala, S., Mysore, A., Shenoi, V., Tan, M., Xiao, Y.: Large-scale graph analytics in aster 6: bringing context to big data discovery. Proc. VLDB Endow. 7(13), 1405–1416 (2014)
Sparksee (graph database) | wikiwand. [online] http://www.wikiwand.com
Stack overflow - where developers learn, share, and build careers. [online] https://stackoverflow.com/
Stonebraker, M.: Concurrency control and consistency of multiple copies of data in distributed ingres. IEEE Trans. Softw. Eng. 5(3), 188 (1979)
Tan, H., Luo, W., Ni, L.M.: Clost: a hadoop-based storage system for big spatio-temporal data analytics. In: Proceedings of the 21st ACM international conference on Information and knowledge management, pp. 2139–2143. ACM (2012)
Tanase, I., Xia, Y., Nai, L., Liu, Y., Tan, W., Crawford, J., Lin, C.Y.: A highly efficient runtime and graph library for large scale graph analytics. In: Proceedings of Workshop on GRAph Data management Experiences and Systems, pp. 1–6. ACM (2014)
Tarantool - index. [online] https://tarantool.org/
Tauro, C.J., Patil, B.R., Prashanth, K.: A comparative analysis of different nosql databases on data model, query model and replication model. In: Proceedings of the International Conference on ERCICA (2013)
Terrastore system properties. [online] http://db-engines.com/en/system/Terrastore
Tidke, B., Mehta, R.: A comprehensive review and open challenges of stream big data. In: Soft Computing: Theories and Applications, pp. 89–99. Springer (2018)
Twitter. it’s what’s happening. [online] https://twitter.com/
Velocitygraph - graph database. [online] https://velocitydb.com/VelocityGraph.aspx
Vogels, W.: Eventually consistent. Commun. ACM 52(1), 40–44 (2009)
Voltdb: World’s fastest, in-memory operational database. [online] http://voltdb.com,
Wei, L.Y., Hsu, Y.T., Peng, W.C., Lee, W.C.: Indexing spatial data in cloud data managements. Pervasive Mob. Comput. 15, 48–61 (2014)
Wlodarczyk, T.W.: Overview of time series storage and processing in a cloud environment. In: Cloud Computing Technology and Science (CloudCom), 2012 IEEE 4th International Conference on, pp. 625–628. IEEE (2012)
Xia, Y., Tanase, I.G., Nai, L., Tan, W., Liu, Y., Crawford, J., Lin, C.Y.: Graph analytics and storage. In: Big Data (Big Data), 2014 IEEE International Conference on, pp. 942–951. IEEE (2014)
Yuan, L.Y., Wu, L., You, J.H., Chi, Y.: A demonstration of rubato db: A highly scalable newsql database system for oltp and big data applications. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 907–912. ACM (2015)
Zhang, X., Song, W., Liu, L.: An implementation approach to store gis spatial data on nosql database. In: Geoinformatics (GeoInformatics), 2014 22nd International Conference on, pp. 1–5. IEEE (2014)
Zhong, Y., Han, J., Zhang, T., Fang, J.: A distributed geospatial data storage and processing framework for large-scale webgis. In: Geoinformatics (GEOINFORMATICS), 2012 20th International Conference on, pp. 1–7. IEEE (2012)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chaudhry, N., Yousaf, M.M. Architectural assessment of NoSQL and NewSQL systems. Distrib Parallel Databases 38, 881–926 (2020). https://doi.org/10.1007/s10619-020-07310-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10619-020-07310-1