skip to main content
10.1145/3437120.3437356acmotherconferencesArticle/Chapter ViewAbstractPublication PagespciConference Proceedingsconference-collections
research-article

Assessing the computational limits of GraphDBs’ engines - A comparison study between Neo4j and Apache Spark

Published:04 March 2021Publication History

ABSTRACT

Big Data paradigm has placed pressure on well-established relational databases during the last decade. Both academia and industry have proposed several alternative database schemes in order to model the captured data more efficiently. Among these approaches, graph databases seem the most promising candidate to supplement relational schemes. Within this study, a comparison is performed among Neo4j, one of the leading graph databases, and Apache Spark, a unified engine for distributed large-scale data processing environment, in terms of processing limits. The results reveal that Neo4j is limited to the physical RAM memory of the processing environment. Yet, until this limit is reached, the processing engine of Neo4j outperforms the Apache Spark engine.

References

  1. Apache Storm, "" in Apache Storm, 2020.Google ScholarGoogle Scholar
  2. J. Dean and S. Ghemawat, "MapReduce: Simplified data processing on large clusters," in Sixth OSDI Symposium on Operating Systems Design and Implementation, San Francisco, CA, 2004.Google ScholarGoogle Scholar
  3. M. Isard and , "Dryad: Distributed data-parallel programs from sequential building blocks," in EuroSys Conference, Lisbon, Portugal, 2007.Google ScholarGoogle Scholar
  4. S. Melnik and , " Dremel: Interactive analysis of Webscale datasets," in VLDB Endowment, 2010.Google ScholarGoogle Scholar
  5. G. Malewicz and , "Pregel: A system for large-scale graph processing," in ACM SIGMOD/PODS Conference, Indianapolis, IN, 2010.Google ScholarGoogle Scholar
  6. M. Kornacker and , "Impala: A modern, open-source SQL engine for Hadoop," in Seventh Biennial CIDR Conference on Innovative Data Systems Research, Asilomar, CA, 2015.Google ScholarGoogle Scholar
  7. M. Zaharia and , "Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing," in Ninth USENIX NSDI Symposium on Networked Systems Design and Implementation, San Jose, CA, 2012.Google ScholarGoogle Scholar
  8. J. Gonzalez and , "GraphX: Graph processing in a distributed dataflow framework," in11th OSDI Symposium on Operating Systems Design and Implementation, Broomfield, CO, 2014.Google ScholarGoogle Scholar
  9. "Spark SQL, DataFrames and Datasets Guide," [Online]. Available: spark.apache.org/docs/latest/sql-programming-guide.html. [Accessed 27 10 2020].Google ScholarGoogle Scholar
  10. A. Jindal and , "Vertexica: Your relational friend for graph," in VLDB, 2014.Google ScholarGoogle Scholar
  11. R. V. Bruggen, "Learning Neo4j," Packt Publishing, 2015.Google ScholarGoogle Scholar
  12. M. Kendea, V. Gkantouna, A. Rapti, S. Sioutas, G. Tzimas and D. Tsolis, "Graph dbs vs. column-oriented stores: A pure performance comparison," in International Workshop on Algorithmic Aspects of Cloud Computing, 2015.Google ScholarGoogle Scholar
  13. A. J. Stothers and A. Nguyen, "Can Neo4j Replace PostgreSQL in Healthcare?," in AMIA Summits on Translational Science Proceedings, 2020.Google ScholarGoogle Scholar
  14. A. Virk and R. Rani, "Efficient Approach for Social Recommendations Using Graphs on Neo4j," in International Conference on Inventive Research in Computing Applications (ICIRCA), 2018.Google ScholarGoogle Scholar
  15. N. Giarelis, N. Kanakaris and N. Karacapilidis, "An Innovative Graph-Based Approach to Advance Feature Selection from Multiple Textual Documents," in AIAI 2020, IFIP AICT 583, 2020.Google ScholarGoogle Scholar
  16. N. Giarelis, N. Kanakaris and N. Karacapilidis, "On a Novel Representation of Multiple Textual Documents in a Single Graph," in12th KES International Conference on Intelligent Decision Technologies, Split, Croatia, 2020.Google ScholarGoogle Scholar
  17. N. Giarelis, N. Kanakaris and N. Karacapilidis, "On the utilization of structural and textual information of a scientific knowledge graph to discover future research collaborations: a link prediction perspective," Lecture Notes in Artificial Intelligence, vol. 12323, pp. 437-450, 2020.Google ScholarGoogle Scholar
  18. L. Page, S. Brin, R. Motwani and T. Winograd, "The PageRank citation ranking: Bringing order to the web," Stanford InfoLab, 1999.Google ScholarGoogle Scholar
  19. S. Brin and L. Page, "The anatomy of a large-scale hypertextual Web search engine," Computer Networks and ISDN Systems, vol. 30, no. 1, p. 107–117, 1998.Google ScholarGoogle Scholar
  20. F. Oliveira, "Submitting User Applications with spark-submit," [Online]. Available: https://aws.amazon.com/blogs/big-data/submitting-user-applications-with-spark-submit. [Accessed 07 09 2020].Google ScholarGoogle Scholar
  21. "Running Spark on YARN," [Online]. Available: https://spark.apache.org/docs/latest/running-on-yarn.html. [Accessed 09 09 2020].Google ScholarGoogle Scholar
  22. K. Aziz, D. Zaidouni and M. Bellafkih, "Leveraging resource management for efficient performance of Apache Spark," Journal of Big Data, vol. 6, no. 1, pp. 1-23, 2019.Google ScholarGoogle Scholar
  23. A. Gounaris and J. Torres, "A methodology for spark parameter tuning," Big data research, vol. 11, pp. 22-32, 2018.Google ScholarGoogle Scholar
  24. "Spark Configuration," [Online]. Available: https://spark.apache.org/docs/latest/configuration.html. [Accessed 10 09 2020].Google ScholarGoogle Scholar
  25. M. Needham and E. A. Hodler, Graph Algorithms: Practical Examples in Apache Spark and Neo4j, O'Reilly Media, 2019Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    PCI '20: Proceedings of the 24th Pan-Hellenic Conference on Informatics
    November 2020
    433 pages

    Copyright © 2020 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 4 March 2021

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate190of390submissions,49%
  • Article Metrics

    • Downloads (Last 12 months)24
    • Downloads (Last 6 weeks)1

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format