ABSTRACT
The main challenge faced by today's graph database systems is sacrificing performance (computation) for scalability (storage). Such systems probably can store a large amount of data across many instances but can't offer adequate graph-computing power to deeply penetrate dynamic graph dataset in real time. A seemingly simple and intuitive graph query like K-hop traversal or finding all shortest paths may lead to deep traversal of large amount of graph data, which tends to cause a typical BSP (Bulky Synchronous Processing) system to exchange heavily amongst its distributed instances, therefore causing significant latencies. This paper proposes three schools of architectural designs for distributed and horizontally scalable graph database while achieving highly performant graph data processing capabilities. The first school, coined HTAP, augments distributed consensus algorithm RAFT paired with vector-based computing acceleration to achieve fast online data ingestion and real-time deep-data traversal in a TP and AP hybrid mode. The second school, named as GRID, leverages human-intelligence for data partitioning, and preserving the HTAP data processing capabilities across all partitioned clusters. The last school incorporates SHARD and advanced GQL optimization techniques to allow data partitioning to be done fully automated yet strive to achieve lower latency via minimum I/O cost data migration model when queries spread across multiple clusters.
- Maciej Besta, Emanuel Peter, Robert Gerstenberger. 2019 Demystifying Graph Databases: Analysis and Taxonomy of Data Organization, System Designs, and Graph Queries: Towards understanding modern graph processing, storage, and analytics. CS.DB 2019 Google ScholarCross Ref
- Diego Ongaro and John Ousterhout. 2014. In Search of Understandable Consensus Algorithm. USENIX Annual Technical Conference, Philadelphia, PA, 2019. https://www.usenix.org/system/files/conference/atc14/atc14-paper-ongaro.pdfGoogle Scholar
- Muhammad Attahir Jibril, Alexander Baumstark, Kai-Uwe Sattler. 2022. Adaptive Update Handling for Graph HTAP. 2022 IEEE 38th International Conference on Data Engineering Workshops. ICDEW. Google ScholarCross Ref
- Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simplified Data Processing on Large Clusters. OSDI 2004.Google ScholarDigital Library
- Shariram Ramesh, Animesh Baranawal and Yogesh Simmhan. 2020. A Distributed Path Query Engine for Temporal Property Graphs. 20th IEEE/ACM Int'l Symposium on CCGrid, Melbourne, Australia. arXiv:2002.03274v1Google Scholar
- G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A system for large-scale graph processing. In Proceedings of the ACM International Conference on Management of Data (SIGMOD), pages 135--146, 2010.Google ScholarDigital Library
- H.I. Abdalla, A. artoli. 2019. Towards an Efficient Data Fragmentation Allocation, and Clustering Approach in a Distributed Environment. All Works. 3730. Google ScholarCross Ref
- Sergey Brin, Lawrence Page. 1998 The Anatomy of a Large-scale Hypertextual Web Search Engine. Computer Networks and ISDN Systems 1998.Google Scholar
- Bo Wang, Zhuowen Tu, John K. Tsotsos. 2013. Dynamic Label Propagation for Semi-supervised Multi-class Multi-label Classification. IEEE Int'l Conference on Computer Vision 2013 Google ScholarDigital Library
- Frank McSherry, Michael Isard and Derek G. Murray, 2015. Scalability! But at what COST. HotOS XV 2015. Google ScholarDigital Library
- Wenfei Fan, Ruochun Jin, Muyang Liu. 2020. Application Driven Graph Partitioning. SIGMOD'20, Portland, OR, 2020. Google ScholarDigital Library
- Louis Jachiet, Pierre Geneves, Nils Gesbert, Nabil Layaida. 2020 On the Optimization of Recursive RelationalQueries: Application to Graph Queries. SIGMOD'20 Portland, OR. Google ScholarCross Ref
- Jaciej Besta, Emanuel Peter, Robert Gerstenberger, Marc Fischer. 2019 Demystifying Graph Databases: Analysis and Taxonomy of Data Organization, System Designs and Graph Queries. Cs.DB 2019 arXiv:1910.09017v1Google Scholar
- Li-Yung Ho, Jan-jan Wu, Pangfeng Liu. 2012. Distributed Graph Database for Large-Scale Social Computing. Google ScholarDigital Library
- Diogo Fernandes, Jorge Bernardino. 2018. Graph Datbases Comparison: AllegroGraph, ArangoDB, InfiniteGraph, Neo4j, and OrientDB. DATA 2018 Google ScholarDigital Library
- Zhisong Fu, Zhengwei Wu, Huoyi Li, Yize Li. 2019. GeaBase: a High-Performance Distributed Graph Database for Industry-scale Applications. Int'l Journal of HPC and Networking, 2019 Google ScholarDigital Library
- G. Graefe, R. L. Cole, D. L. Davison, W. J. McKenna and R. H. Wolniewicz, "Extensible Query Optimization and Parallel Execution in Volcano", in Query Processing for Advanced Database Applications. San Mateo, CA, 1992Google Scholar
- Yifei Yang, Matt Youill, Matthew Woicik, Yizhou Liu. 2021. FlexPushdown DB: Hybrid Pushdown and Caching in a Cloud DBMS. Proceedings of the VLDB Endowment. 2021 Google ScholarDigital Library
- Manish Kumar Abhishek and D. Rajeswara Rao. 2020. Dynamic Allocation of High Performance Computing Resources. Int'l Journal of Advanced Trends in Computer Science and Engineering. Google ScholarCross Ref
Index Terms
- Design of Highly Scalable Graph Database Systems without Exponential Performance Degradation
Recommendations
Distributed temporal graph analytics with GRADOOP
AbstractTemporal property graphs are graphs whose structure and properties change over time. Temporal graph datasets tend to be large due to stored historical information, asking for scalable analysis capabilities. We give a complete overview of Gradoop, ...
In situ graph querying and analytics with graphgen: extended abstract
GRADES-NDA '18: Proceedings of the 1st ACM SIGMOD Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA)After several decades of research but limited adoption in practice, graph querying and analytics are finally starting to gain a foothold in the data management landscape. This is driven to a large degree by the increasing desire to model and query the ...
LSGraph: A Locality-centric High-performance Streaming Graph Engine
EuroSys '24: Proceedings of the Nineteenth European Conference on Computer SystemsStreaming graph has been broadly employed across various application domains. It involves updating edges to the graph and then performing analytics on the updated graph. However, existing solutions either suffer from poor data locality and high ...
Comments