ABSTRACT
Big Data paradigm has placed pressure on well-established relational databases during the last decade. Both academia and industry have proposed several alternative database schemes in order to model the captured data more efficiently. Among these approaches, graph databases seem the most promising candidate to supplement relational schemes. Within this study, a comparison is performed among Neo4j, one of the leading graph databases, and Apache Spark, a unified engine for distributed large-scale data processing environment, in terms of processing limits. The results reveal that Neo4j is limited to the physical RAM memory of the processing environment. Yet, until this limit is reached, the processing engine of Neo4j outperforms the Apache Spark engine.
- Apache Storm, "" in Apache Storm, 2020.Google Scholar
- J. Dean and S. Ghemawat, "MapReduce: Simplified data processing on large clusters," in Sixth OSDI Symposium on Operating Systems Design and Implementation, San Francisco, CA, 2004.Google Scholar
- M. Isard and , "Dryad: Distributed data-parallel programs from sequential building blocks," in EuroSys Conference, Lisbon, Portugal, 2007.Google Scholar
- S. Melnik and , " Dremel: Interactive analysis of Webscale datasets," in VLDB Endowment, 2010.Google Scholar
- G. Malewicz and , "Pregel: A system for large-scale graph processing," in ACM SIGMOD/PODS Conference, Indianapolis, IN, 2010.Google Scholar
- M. Kornacker and , "Impala: A modern, open-source SQL engine for Hadoop," in Seventh Biennial CIDR Conference on Innovative Data Systems Research, Asilomar, CA, 2015.Google Scholar
- M. Zaharia and , "Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing," in Ninth USENIX NSDI Symposium on Networked Systems Design and Implementation, San Jose, CA, 2012.Google Scholar
- J. Gonzalez and , "GraphX: Graph processing in a distributed dataflow framework," in11th OSDI Symposium on Operating Systems Design and Implementation, Broomfield, CO, 2014.Google Scholar
- "Spark SQL, DataFrames and Datasets Guide," [Online]. Available: spark.apache.org/docs/latest/sql-programming-guide.html. [Accessed 27 10 2020].Google Scholar
- A. Jindal and , "Vertexica: Your relational friend for graph," in VLDB, 2014.Google Scholar
- R. V. Bruggen, "Learning Neo4j," Packt Publishing, 2015.Google Scholar
- M. Kendea, V. Gkantouna, A. Rapti, S. Sioutas, G. Tzimas and D. Tsolis, "Graph dbs vs. column-oriented stores: A pure performance comparison," in International Workshop on Algorithmic Aspects of Cloud Computing, 2015.Google Scholar
- A. J. Stothers and A. Nguyen, "Can Neo4j Replace PostgreSQL in Healthcare?," in AMIA Summits on Translational Science Proceedings, 2020.Google Scholar
- A. Virk and R. Rani, "Efficient Approach for Social Recommendations Using Graphs on Neo4j," in International Conference on Inventive Research in Computing Applications (ICIRCA), 2018.Google Scholar
- N. Giarelis, N. Kanakaris and N. Karacapilidis, "An Innovative Graph-Based Approach to Advance Feature Selection from Multiple Textual Documents," in AIAI 2020, IFIP AICT 583, 2020.Google Scholar
- N. Giarelis, N. Kanakaris and N. Karacapilidis, "On a Novel Representation of Multiple Textual Documents in a Single Graph," in12th KES International Conference on Intelligent Decision Technologies, Split, Croatia, 2020.Google Scholar
- N. Giarelis, N. Kanakaris and N. Karacapilidis, "On the utilization of structural and textual information of a scientific knowledge graph to discover future research collaborations: a link prediction perspective," Lecture Notes in Artificial Intelligence, vol. 12323, pp. 437-450, 2020.Google Scholar
- L. Page, S. Brin, R. Motwani and T. Winograd, "The PageRank citation ranking: Bringing order to the web," Stanford InfoLab, 1999.Google Scholar
- S. Brin and L. Page, "The anatomy of a large-scale hypertextual Web search engine," Computer Networks and ISDN Systems, vol. 30, no. 1, p. 107–117, 1998.Google Scholar
- F. Oliveira, "Submitting User Applications with spark-submit," [Online]. Available: https://aws.amazon.com/blogs/big-data/submitting-user-applications-with-spark-submit. [Accessed 07 09 2020].Google Scholar
- "Running Spark on YARN," [Online]. Available: https://spark.apache.org/docs/latest/running-on-yarn.html. [Accessed 09 09 2020].Google Scholar
- K. Aziz, D. Zaidouni and M. Bellafkih, "Leveraging resource management for efficient performance of Apache Spark," Journal of Big Data, vol. 6, no. 1, pp. 1-23, 2019.Google Scholar
- A. Gounaris and J. Torres, "A methodology for spark parameter tuning," Big data research, vol. 11, pp. 22-32, 2018.Google Scholar
- "Spark Configuration," [Online]. Available: https://spark.apache.org/docs/latest/configuration.html. [Accessed 10 09 2020].Google Scholar
- M. Needham and E. A. Hodler, Graph Algorithms: Practical Examples in Apache Spark and Neo4j, O'Reilly Media, 2019Google Scholar
Recommendations
Performance comparison of Apache Hadoop and Apache Spark
ICAICR '19: Proceedings of the Third International Conference on Advanced Informatics for Computing ResearchThe term 'Big Data' is a broad term used for the data sets, which is enormous and traditional data processing applications find it hard to process. Both Apache Spark and Apache Hadoop are one of the significant parts of the big data family. Some of the ...
Community Detection at scale: A comparison study among Apache Spark and Neo4j
PCI '22: Proceedings of the 26th Pan-Hellenic Conference on InformaticsThe proliferation of data generation devices, including IoT and edge computing has led to the big data paradigm, which has considerably placed pressure on well-established relational databases during the last decade. Researchers have proposed several ...
A comparative between hadoop mapreduce and apache Spark on HDFS
IML '17: Proceedings of the 1st International Conference on Internet of Things and Machine LearningData is growing now in a very high speed with a large volume, Spark and MapReduce1 both provide a processing model for analyzing and managing this large data -Big Data- stored on HDFS. In this paper, we discuss a comparative between Apache Spark and ...
Comments