skip to main content
10.1145/3597926.3598046acmconferencesArticle/Chapter ViewAbstractPublication PagesisstaConference Proceedingsconference-collections
research-article

GDsmith: Detecting Bugs in Cypher Graph Database Engines

Published: 13 July 2023 Publication History

Abstract

Graph database engines stand out in the era of big data for their efficiency of modeling and processing linked data. To assure high quality of graph database engines, it is highly critical to conduct automatic test generation for graph database engines, e.g., random test generation, the most commonly adopted approach in practice. However, random test generation faces the challenge of generating complex inputs (i.e., property graphs and queries) for producing non-empty query results; generating such type of inputs is important especially for detecting wrong-result bugs. To address this challenge, in this paper, we propose GDsmith, the first approach for testing Cypher graph database engines. GDsmith ensures that each randomly generated query satisfies the semantic requirements. To increase the probability of producing complex queries that return non-empty results, GDsmith includes two new techniques: graph-guided generation of complex pattern combinations and data-guided generation of complex conditions. Our evaluation results demonstrate that GDsmith is effective and efficient for producing complex queries that return non-empty results for bug detection, and substantially outperforms the baselines. GDsmith successfully detects 28 bugs on the released versions of three highly popular open-source graph database engines and receives positive feedback from their developers.

References

[1]
Ibrahim Abdelaziz, Essam Mansour, Mourad Ouzzani, Ashraf Aboulnaga, and Panos Kalnis. 2017. Query Optimizations over Decentralized RDF Graphs. In Proceedings of the 33rd International Conference on Data Engineering. 139–142. https://doi.org/10.1109/ICDE.2017.59
[2]
Junjie Chen, Wenxiang Hu, Dan Hao, Yingfei Xiong, Hongyu Zhang, Lu Zhang, and Bing Xie. 2016. An Empirical Comparison of Compiler Testing Techniques. In Proceedings of the 38th International Conference on Software Engineering. 180–190. https://doi.org/10.1145/2884781.2884878
[3]
Peter Pin-Shan Chen. 1976. The Entity-Relationship Model—Toward a Unified View of Data. ACM Transactions on Database Systems, 9–36. https://doi.org/10.1145/320434.320440
[4]
Tsong Yueh Chen, Shing Chi Cheung, and Siu Ming Yiu. 1998. Metamorphic Testing: a New Approach for Generating Next Test Cases. Technical Report HKUST-CS98-01.
[5]
Xinyue Chen, Chenglong Wang, and Alvin Cheung. 2020. Testing Query Execution Engines with Mutations. In Proceedings of the 8th International Workshop on Testing Database Systems. 6:1–6:5. https://doi.org/10.1145/3395032.3395322
[6]
The Apache Software Foundation. 2022. Cypher for Gremlin. https://github.com/opencypher/cypher-for-gremlin/tree/master/tinkerpop/cypher-gremlin-server-client
[7]
The Apache Software Foundation. 2022. Gremlin Query Language. https://tinkerpop.apache.org/gremlin.html
[8]
Nadime Francis, Alastair Green, Paolo Guagliardo, Leonid Libkin, Tobias Lindaaker, Victor Marsault, Stefan Plantikow, Mats Rydberg, Martin Schuster, Petra Selmer, and Andrés Taylor. 2018. Formal Semantics of the Language Cypher. arXiv preprint arXiv:1802.09984, arxiv:1802.09984
[9]
Nadime Francis, Alastair Green, Paolo Guagliardo, Leonid Libkin, Tobias Lindaaker, Victor Marsault, Stefan Plantikow, Mats Rydberg, Petra Selmer, and Andrés Taylor. 2018. Cypher: An Evolving Query Language for Property Graphs. In Proceedings of the 2018 International Conference on Management of Data. 1433–1445. https://doi.org/10.1145/3183713.3190657
[10]
Bogdan Ghit, Nicolás Poggi, Josh Rosen, Reynold Xin, and Peter A. Boncz. 2020. SparkFuzz: Searching Correctness Regressions in Modern Query Engines. In Proceedings of the 8th International Workshop on Testing Database Systems. 1:1–1:6. https://doi.org/10.1145/3395032.3395327
[11]
Lior Kogan. 2017. V1: A Visual Query Language for Property Graphs. arXiv preprint arXiv:1710.04470, arxiv:1710.04470
[12]
Takahiro Konno, Runhe Huang, Tao Ban, and Chuanhe Huang. 2017. Goods Recommendation Based on Retail Knowledge in a Neo4j Graph Database Combined with an Inference Mechanism Implemented in Jess. In 2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation. 1–8. https://doi.org/10.1109/UIC-ATC.2017.8397433
[13]
William M. McKeeman. 1998. Differential Testing for Software. Digital Technical Journal, 100–107. http://www.hpl.hp.com/hpjournal/dtj/vol10num1/vol10num1art9.pdf
[14]
Memgraph. 2022. Memgraph: Frictionless, Innovative, Graph Applications. https://memgraph.com/
[15]
Neo4j. 2022. The Fastest Path To Graph Productivity: Neo4j Graph Database. https://neo4j.com/product/neo4j-graph-database/
[16]
The openCypher Implementers Group. 2022. Cypher Query Language Reference, Version 9. https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf
[17]
RedisGraph. 2022. RedisGraph - a Graph Database Module for Redis. https://oss.redis.com/redisgraph/
[18]
Manuel Rigger. 2022. SQLancer: Detecting Logic Bugs in DBMS. https://github.com/sqlancer/sqlancer
[19]
Manuel Rigger and Zhendong Su. 2020. Detecting Optimization Bugs in Database Engines via Non-optimizing Reference Engine Construction. In Proceedings of the 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1140–1152. https://doi.org/10.1145/3368089.3409710
[20]
Manuel Rigger and Zhendong Su. 2020. Finding Bugs in Database Systems via Query Partitioning. Proceedings of the ACM on Programming Languages, 211:1–211:30. https://doi.org/10.1145/3428279
[21]
Manuel Rigger and Zhendong Su. 2020. Testing Database Engines via Pivoted Query Synthesis. In Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation. 667–682. https://www.usenix.org/conference/osdi20/presentation/rigger
[22]
Andreas Seltenreich. 2022. Bug Squashing with SQLsmith. https://github.com/anse1/sqlsmith
[23]
Sudipta Sen, Akash Mehta, Runa Ganguli, and Soumya Sen. 2021. Recommendation of Influenced Products Using Association Rule Mining: Neo4j as a Case Study. SN Computer Science, 74. https://doi.org/10.1007/s42979-021-00460-8
[24]
Donald R. Slutz. 1998. Massive Stochastic Testing of SQL. In Proceedings of the 24rd International Conference on Very Large Data Bases. 618–622. http://www.vldb.org/conf/1998/p618.pdf
[25]
solid IT gmbh. 2022. DB-Engines Ranking of Graph DBMS. https://db-engines.com/en/ranking/graph+dbms
[26]
Jian Wang, Ke Wang, Jing Li, Jianmin Jiang, Yanfei Wang, Jing Mei, and Shaochun Li. 2020. Accelerating Epidemiological Investigation Analysis by Using NLP and Knowledge Reasoning: A Case Study on COVID-19. In 2020 American Medical Informatics Association Annual Symposium. 1258–1267. https://knowledge.amia.org/72332-amia-1.4602255/t003-1.4606204/t003-1.4606205/3417206-1.4606266/3415131-1.4606263
[27]
Mingzhe Wang, Zhiyong Wu, Xinyi Xu, Jie Liang, Chijin Zhou, Huafeng Zhang, and Yu Jiang. 2021. Industry Practice of Coverage-Guided Enterprise-Level DBMS Fuzzing. In Proceedings of the 43rd International Conference on Software Engineering: Software Engineering in Practice. 328–337. https://doi.org/10.1109/ICSE-SEIP52600.2021.00042
[28]
Ran Wang, Zhengyi Yang, Wenjie Zhang, and Xuemin Lin. 2020. An Empirical Study on Recent Graph Database Systems. In Knowledge Science, Engineering and Management. 328–340.
[29]
Rui Yang, Yingying Zheng, Lei Tang, Wensheng Dou, Wei Wang, and Jun Wei. 2023. Randomized Differential Testing of RDF Stores. In Proceedings of the 45th International Conference on Software Engineering: Demonstrations. 136–140.
[30]
Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. 2011. Finding And Understanding Bugs in C Compilers. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation. 283–294. https://doi.org/10.1145/1993498.1993532
[31]
Michal Zalewski. 2022. American Fuzzy Lop (2.52b). https://lcamtuf.coredump.cx/afl/
[32]
Yingying Zheng, Wensheng Dou, Yicheng Wang, Zheng Qin, Lei Tang, Yu Gao, Dong Wang, Wei Wang, and Jun Wei. 2022. Finding Bugs in Gremlin-Based Graph Database Systems via Randomized Differential Testing. In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis. 302–313. https://doi.org/10.1145/3533767.3534409
[33]
Rui Zhong, Yongheng Chen, Hong Hu, Hangfan Zhang, Wenke Lee, and Dinghao Wu. 2020. SQUIRREL: Testing Database Management Systems with Language Validity and Coverage Feedback. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security. 955–970. https://doi.org/10.1145/3372297.3417260

Cited By

View all
  • (2024)Detecting Metadata-Related Logic Bugs in Database Systems via Raw Database ConstructionProceedings of the VLDB Endowment10.14778/3659437.365944517:8(1884-1897)Online publication date: 31-May-2024
  • (2024)Testing Graph Database Systems via Graph-Aware Metamorphic RelationsProceedings of the VLDB Endowment10.14778/3636218.363623617:4(836-848)Online publication date: 5-Mar-2024
  • (2024)Testing Gremlin-Based Graph Database Systems via Query DisassemblingProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680392(1695-1707)Online publication date: 11-Sep-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ISSTA 2023: Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis
July 2023
1554 pages
ISBN:9798400702211
DOI:10.1145/3597926
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 July 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Cypher
  2. Differential testing
  3. Graph database systems

Qualifiers

  • Research-article

Funding Sources

Conference

ISSTA '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 58 of 213 submissions, 27%

Upcoming Conference

ISSTA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)134
  • Downloads (Last 6 weeks)16
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Detecting Metadata-Related Logic Bugs in Database Systems via Raw Database ConstructionProceedings of the VLDB Endowment10.14778/3659437.365944517:8(1884-1897)Online publication date: 31-May-2024
  • (2024)Testing Graph Database Systems via Graph-Aware Metamorphic RelationsProceedings of the VLDB Endowment10.14778/3636218.363623617:4(836-848)Online publication date: 5-Mar-2024
  • (2024)Testing Gremlin-Based Graph Database Systems via Query DisassemblingProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680392(1695-1707)Online publication date: 11-Sep-2024
  • (2024)Testing Graph Database Systems with Graph-State Persistence OracleProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680311(666-677)Online publication date: 11-Sep-2024
  • (2024)Differential Optimization Testing of Gremlin-Based Graph Database Systems2024 IEEE Conference on Software Testing, Verification and Validation (ICST)10.1109/ICST60714.2024.00012(25-36)Online publication date: 27-May-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media