skip to main content
10.1145/3579142.3594293acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Design of Highly Scalable Graph Database Systems without Exponential Performance Degradation

Published:18 June 2023Publication History

ABSTRACT

The main challenge faced by today's graph database systems is sacrificing performance (computation) for scalability (storage). Such systems probably can store a large amount of data across many instances but can't offer adequate graph-computing power to deeply penetrate dynamic graph dataset in real time. A seemingly simple and intuitive graph query like K-hop traversal or finding all shortest paths may lead to deep traversal of large amount of graph data, which tends to cause a typical BSP (Bulky Synchronous Processing) system to exchange heavily amongst its distributed instances, therefore causing significant latencies. This paper proposes three schools of architectural designs for distributed and horizontally scalable graph database while achieving highly performant graph data processing capabilities. The first school, coined HTAP, augments distributed consensus algorithm RAFT paired with vector-based computing acceleration to achieve fast online data ingestion and real-time deep-data traversal in a TP and AP hybrid mode. The second school, named as GRID, leverages human-intelligence for data partitioning, and preserving the HTAP data processing capabilities across all partitioned clusters. The last school incorporates SHARD and advanced GQL optimization techniques to allow data partitioning to be done fully automated yet strive to achieve lower latency via minimum I/O cost data migration model when queries spread across multiple clusters.

References

  1. Maciej Besta, Emanuel Peter, Robert Gerstenberger. 2019 Demystifying Graph Databases: Analysis and Taxonomy of Data Organization, System Designs, and Graph Queries: Towards understanding modern graph processing, storage, and analytics. CS.DB 2019 Google ScholarGoogle ScholarCross RefCross Ref
  2. Diego Ongaro and John Ousterhout. 2014. In Search of Understandable Consensus Algorithm. USENIX Annual Technical Conference, Philadelphia, PA, 2019. https://www.usenix.org/system/files/conference/atc14/atc14-paper-ongaro.pdfGoogle ScholarGoogle Scholar
  3. Muhammad Attahir Jibril, Alexander Baumstark, Kai-Uwe Sattler. 2022. Adaptive Update Handling for Graph HTAP. 2022 IEEE 38th International Conference on Data Engineering Workshops. ICDEW. Google ScholarGoogle ScholarCross RefCross Ref
  4. Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simplified Data Processing on Large Clusters. OSDI 2004.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Shariram Ramesh, Animesh Baranawal and Yogesh Simmhan. 2020. A Distributed Path Query Engine for Temporal Property Graphs. 20th IEEE/ACM Int'l Symposium on CCGrid, Melbourne, Australia. arXiv:2002.03274v1Google ScholarGoogle Scholar
  6. G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A system for large-scale graph processing. In Proceedings of the ACM International Conference on Management of Data (SIGMOD), pages 135--146, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. H.I. Abdalla, A. artoli. 2019. Towards an Efficient Data Fragmentation Allocation, and Clustering Approach in a Distributed Environment. All Works. 3730. Google ScholarGoogle ScholarCross RefCross Ref
  8. Sergey Brin, Lawrence Page. 1998 The Anatomy of a Large-scale Hypertextual Web Search Engine. Computer Networks and ISDN Systems 1998.Google ScholarGoogle Scholar
  9. Bo Wang, Zhuowen Tu, John K. Tsotsos. 2013. Dynamic Label Propagation for Semi-supervised Multi-class Multi-label Classification. IEEE Int'l Conference on Computer Vision 2013 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Frank McSherry, Michael Isard and Derek G. Murray, 2015. Scalability! But at what COST. HotOS XV 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Wenfei Fan, Ruochun Jin, Muyang Liu. 2020. Application Driven Graph Partitioning. SIGMOD'20, Portland, OR, 2020. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Louis Jachiet, Pierre Geneves, Nils Gesbert, Nabil Layaida. 2020 On the Optimization of Recursive RelationalQueries: Application to Graph Queries. SIGMOD'20 Portland, OR. Google ScholarGoogle ScholarCross RefCross Ref
  13. Jaciej Besta, Emanuel Peter, Robert Gerstenberger, Marc Fischer. 2019 Demystifying Graph Databases: Analysis and Taxonomy of Data Organization, System Designs and Graph Queries. Cs.DB 2019 arXiv:1910.09017v1Google ScholarGoogle Scholar
  14. Li-Yung Ho, Jan-jan Wu, Pangfeng Liu. 2012. Distributed Graph Database for Large-Scale Social Computing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Diogo Fernandes, Jorge Bernardino. 2018. Graph Datbases Comparison: AllegroGraph, ArangoDB, InfiniteGraph, Neo4j, and OrientDB. DATA 2018 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Zhisong Fu, Zhengwei Wu, Huoyi Li, Yize Li. 2019. GeaBase: a High-Performance Distributed Graph Database for Industry-scale Applications. Int'l Journal of HPC and Networking, 2019 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. G. Graefe, R. L. Cole, D. L. Davison, W. J. McKenna and R. H. Wolniewicz, "Extensible Query Optimization and Parallel Execution in Volcano", in Query Processing for Advanced Database Applications. San Mateo, CA, 1992Google ScholarGoogle Scholar
  18. Yifei Yang, Matt Youill, Matthew Woicik, Yizhou Liu. 2021. FlexPushdown DB: Hybrid Pushdown and Caching in a Cloud DBMS. Proceedings of the VLDB Endowment. 2021 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Manish Kumar Abhishek and D. Rajeswara Rao. 2020. Dynamic Allocation of High Performance Computing Resources. Int'l Journal of Advanced Trends in Computer Science and Engineering. Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Design of Highly Scalable Graph Database Systems without Exponential Performance Degradation

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          BiDEDE '23: Proceedings of the International Workshop on Big Data in Emergent Distributed Environments
          June 2023
          56 pages
          ISBN:9798400700934
          DOI:10.1145/3579142

          Copyright © 2023 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 18 June 2023

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          BiDEDE '23 Paper Acceptance Rate7of15submissions,47%Overall Acceptance Rate25of47submissions,53%
        • Article Metrics

          • Downloads (Last 12 months)187
          • Downloads (Last 6 weeks)7

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader