research-article

Assessing the computational limits of GraphDBs’ engines - A comparison study between Neo4j and Apache Spark

Authors:
Ioannis Ballas

University of Peloponnese, Greece

University of Peloponnese, Greece
View Profile

,
Vassilios Tsakanikas

University of Peloponnese, Greece

University of Peloponnese, Greece
View Profile

,
Evaggelos Pefanis

University of Peloponnese, Greece

University of Peloponnese, Greece
View Profile

,
Vassilios Tampakas

University of Peloponnese, Greece

University of Peloponnese, Greece
View Profile

PCI '20: Proceedings of the 24th Pan-Hellenic Conference on InformaticsNovember 2020Pages 428–433https://doi.org/10.1145/3437120.3437356

Published:04 March 2021Publication History

PCI '20: Proceedings of the 24th Pan-Hellenic Conference on Informatics

Pages 428–433

ABSTRACT

Big Data paradigm has placed pressure on well-established relational databases during the last decade. Both academia and industry have proposed several alternative database schemes in order to model the captured data more efficiently. Among these approaches, graph databases seem the most promising candidate to supplement relational schemes. Within this study, a comparison is performed among Neo4j, one of the leading graph databases, and Apache Spark, a unified engine for distributed large-scale data processing environment, in terms of processing limits. The results reveal that Neo4j is limited to the physical RAM memory of the processing environment. Yet, until this limit is reached, the processing engine of Neo4j outperforms the Apache Spark engine.

References

Apache Storm, "" in Apache Storm, 2020.Google Scholar
J. Dean and S. Ghemawat, "MapReduce: Simplified data processing on large clusters," in Sixth OSDI Symposium on Operating Systems Design and Implementation, San Francisco, CA, 2004.Google Scholar
M. Isard and , "Dryad: Distributed data-parallel programs from sequential building blocks," in EuroSys Conference, Lisbon, Portugal, 2007.Google Scholar
S. Melnik and , " Dremel: Interactive analysis of Webscale datasets," in VLDB Endowment, 2010.Google Scholar
G. Malewicz and , "Pregel: A system for large-scale graph processing," in ACM SIGMOD/PODS Conference, Indianapolis, IN, 2010.Google Scholar
M. Kornacker and , "Impala: A modern, open-source SQL engine for Hadoop," in Seventh Biennial CIDR Conference on Innovative Data Systems Research, Asilomar, CA, 2015.Google Scholar
M. Zaharia and , "Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing," in Ninth USENIX NSDI Symposium on Networked Systems Design and Implementation, San Jose, CA, 2012.Google Scholar
J. Gonzalez and , "GraphX: Graph processing in a distributed dataflow framework," in11th OSDI Symposium on Operating Systems Design and Implementation, Broomfield, CO, 2014.Google Scholar
"Spark SQL, DataFrames and Datasets Guide," [Online]. Available: spark.apache.org/docs/latest/sql-programming-guide.html. [Accessed 27 10 2020].Google Scholar
A. Jindal and , "Vertexica: Your relational friend for graph," in VLDB, 2014.Google Scholar
R. V. Bruggen, "Learning Neo4j," Packt Publishing, 2015.Google Scholar
M. Kendea, V. Gkantouna, A. Rapti, S. Sioutas, G. Tzimas and D. Tsolis, "Graph dbs vs. column-oriented stores: A pure performance comparison," in International Workshop on Algorithmic Aspects of Cloud Computing, 2015.Google Scholar
A. J. Stothers and A. Nguyen, "Can Neo4j Replace PostgreSQL in Healthcare?," in AMIA Summits on Translational Science Proceedings, 2020.Google Scholar
A. Virk and R. Rani, "Efficient Approach for Social Recommendations Using Graphs on Neo4j," in International Conference on Inventive Research in Computing Applications (ICIRCA), 2018.Google Scholar
N. Giarelis, N. Kanakaris and N. Karacapilidis, "An Innovative Graph-Based Approach to Advance Feature Selection from Multiple Textual Documents," in AIAI 2020, IFIP AICT 583, 2020.Google Scholar
N. Giarelis, N. Kanakaris and N. Karacapilidis, "On a Novel Representation of Multiple Textual Documents in a Single Graph," in12th KES International Conference on Intelligent Decision Technologies, Split, Croatia, 2020.Google Scholar
N. Giarelis, N. Kanakaris and N. Karacapilidis, "On the utilization of structural and textual information of a scientific knowledge graph to discover future research collaborations: a link prediction perspective," Lecture Notes in Artificial Intelligence, vol. 12323, pp. 437-450, 2020.Google Scholar
L. Page, S. Brin, R. Motwani and T. Winograd, "The PageRank citation ranking: Bringing order to the web," Stanford InfoLab, 1999.Google Scholar
S. Brin and L. Page, "The anatomy of a large-scale hypertextual Web search engine," Computer Networks and ISDN Systems, vol. 30, no. 1, p. 107–117, 1998.Google Scholar
F. Oliveira, "Submitting User Applications with spark-submit," [Online]. Available: https://aws.amazon.com/blogs/big-data/submitting-user-applications-with-spark-submit. [Accessed 07 09 2020].Google Scholar
"Running Spark on YARN," [Online]. Available: https://spark.apache.org/docs/latest/running-on-yarn.html. [Accessed 09 09 2020].Google Scholar
K. Aziz, D. Zaidouni and M. Bellafkih, "Leveraging resource management for efficient performance of Apache Spark," Journal of Big Data, vol. 6, no. 1, pp. 1-23, 2019.Google Scholar
A. Gounaris and J. Torres, "A methodology for spark parameter tuning," Big data research, vol. 11, pp. 22-32, 2018.Google Scholar
"Spark Configuration," [Online]. Available: https://spark.apache.org/docs/latest/configuration.html. [Accessed 10 09 2020].Google Scholar
M. Needham and E. A. Hodler, Graph Algorithms: Practical Examples in Apache Spark and Neo4j, O'Reilly Media, 2019Google Scholar

Recommendations

Performance comparison of Apache Hadoop and Apache Spark
ICAICR '19: Proceedings of the Third International Conference on Advanced Informatics for Computing Research

The term 'Big Data' is a broad term used for the data sets, which is enormous and traditional data processing applications find it hard to process. Both Apache Spark and Apache Hadoop are one of the significant parts of the big data family. Some of the ...
Read More
Community Detection at scale: A comparison study among Apache Spark and Neo4j
PCI '22: Proceedings of the 26th Pan-Hellenic Conference on Informatics

The proliferation of data generation devices, including IoT and edge computing has led to the big data paradigm, which has considerably placed pressure on well-established relational databases during the last decade. Researchers have proposed several ...
Read More
A comparative between hadoop mapreduce and apache Spark on HDFS
IML '17: Proceedings of the 1st International Conference on Internet of Things and Machine Learning

Data is growing now in a very high speed with a large volume, Spark and MapReduce¹ both provide a processing model for analyzing and managing this large data -Big Data- stored on HDFS. In this paper, we discuss a comparative between Apache Spark and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PCI '20: Proceedings of the 24th Pan-Hellenic Conference on Informatics
November 2020
433 pages
ISBN:9781450388979
DOI:10.1145/3437120
Editors:
Nikitas N. Karanikolas,
Athanasios Voulodimos,
Cleo Sgouropoulou,
Mara Nikolaidou,
Stefanos Gritzalis
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 March 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
PageRank
apache spark
graph database
neo4j
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate190of390submissions,49%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 144
  Total Downloads
- Downloads (Last 12 months)24
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Assessing the computational limits of GraphDBs’ engines - A comparison study between Neo4j and Apache Spark

PCI '20: Proceedings of the 24th Pan-Hellenic Conference on Informatics

ABSTRACT

References

Cited By

Recommendations

Performance comparison of Apache Hadoop and Apache Spark

Community Detection at scale: A comparison study among Apache Spark and Neo4j

A comparative between hadoop mapreduce and apache Spark on HDFS

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Assessing the computational limits of GraphDBs’ engines - A comparison study between Neo4j and Apache Spark

PCI '20: Proceedings of the 24th Pan-Hellenic Conference on Informatics

ABSTRACT

References

Cited By

Recommendations

Performance comparison of Apache Hadoop and Apache Spark

Community Detection at scale: A comparison study among Apache Spark and Neo4j

A comparative between hadoop mapreduce and apache Spark on HDFS

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media