Skip to main content

Comparison of Approaches for Querying Chemical Compounds

  • Conference paper
  • First Online:
Heterogeneous Data Management, Polystores, and Analytics for Healthcare (DMAH 2019, Poly 2019)

Abstract

Chemical compounds form a database with specific features that can be utilized for more efficient query processing. Currently, there exists no comparison of performance and memory usage of the respective and most efficient approaches on the same data set. In this paper, we address this lack of information and we create an unbiased benchmark of the most popular index building methods for subgraph querying of chemical databases. In addition, we compare the results with the performance of an SQL and a graph database for which there exist various unconfirmed hypotheses on their efficiency.

This work was partially supported by the Charles University project PROGRES Q48.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. AMBIT, 19 May 2017. http://ambit.sourceforge.net/

  2. ChEMBL, 2 May 2019. https://www.ebi.ac.uk/chembl/

  3. Neo4j database, 19 May 2017. https://neo4j.com/

  4. SMILES, 2 May 2019. http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html

  5. The Chemistry Development Kit, 19 May 2017. https://github.com/cdk/

  6. Agrafiotis, D.K., et al.: Efficient substructure searching of large chemical libraries: the ABCD chemical cartridge. J. Chem. Inf. Model. 51(12), 3113–3130 (2011)

    Article  Google Scholar 

  7. Azaouzi, M., Ben Romdhane, L.: A minimal rare substructures-based model for graph database indexing. In: Madureira, A.M., Abraham, A., Gamboa, D., Novais, P. (eds.) ISDA 2016. AISC, vol. 557, pp. 250–259. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-53480-0_25

    Chapter  Google Scholar 

  8. Bauer, U.: Minimum cycle basis algorithms for the chemistry development toolkit (2004)

    Google Scholar 

  9. Bonnici, V., Ferro, A., Giugno, R., Pulvirenti, A., Shasha, D.: Enhancing graph database indexing by suffix tree structure. In: Dijkstra, T.M.H., Tsivtsivadze, E., Marchiori, E., Heskes, T. (eds.) PRIB 2010. LNCS, vol. 6282, pp. 195–203. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-16001-1_17

    Chapter  Google Scholar 

  10. Cordella, L.P., Foggia, P., Sansone, C., Vento, M.: A (sub)graph isomorphism algorithm for matching large graphs. IEEE Trans. Pattern Anal. Mach. Intell. 26(10), 1367–1372 (2004)

    Article  Google Scholar 

  11. Dongoran, E.S.S., Saleh, W.K.R., Gozali, A.A.: Analysis and implementation of graph indexing for graph database using GraphGrep algorithm. In: ICoICT 2015, pp. 59–64 (2015)

    Google Scholar 

  12. Ehrlich, H.-C., Rarey, M.: Systematic benchmark of substructure search in molecular graphs - from Ullmann to VF2. J. Cheminform. 4(1), 13 (2012)

    Article  Google Scholar 

  13. Golovin, A., Henrick, K.: Chemical substructure search in SQL. J. Chem. Inf. Model. 49(1), 22–27 (2009)

    Article  Google Scholar 

  14. He, H., Singh, A.K.: Closure-tree: an index structure for graph queries. In: ICDE 2006, p. 38 (2006)

    Google Scholar 

  15. He, H., Singh, A.K.: Graphs-at-a-time: query language and access methods for graph databases. In: 2008 ACM SIGMOD, pp. 405–418. ACM, New York (2008)

    Google Scholar 

  16. Hoksza, D., Jelínek, J.: Using Neo4j for mining protein graphs: a case study. In: DEXA 2015, pp. 230–234, September 2015

    Google Scholar 

  17. Jiang, H., Wang, H., Yu, P.S., Zhou, S.: GString: a novel approach for efficient search in graph databases. In: ICDE 2007, pp. 566–575 (2007)

    Google Scholar 

  18. Kruskal, J.B.: On the shortest spanning subtree of a graph and the traveling salesman problem. Am. Math. Soc. 7(1), 48–50 (1956)

    Article  MathSciNet  Google Scholar 

  19. Lee, J., Han, W.-S., Kasperovics, R., Lee, J.-H.: An in-depth comparison of subgraph isomorphism algorithms in graph databases. VLDB Endow. 6(2), 133–144 (2012)

    Article  Google Scholar 

  20. May, J.: Substructure search face-off: are the slowest queries the same between tools? NextMove Software (2015), 19 May 2017

    Google Scholar 

  21. Microsoft: Windows Subsystem for Linux Documentation, 25 April 2019. https://docs.microsoft.com/en-us/windows/wsl/about

  22. Oracle: An Introduction to Graph: Database, Analytics, and Cloud Services, 25 April 2019. https://www.slideshare.net/JeanIhm/an-introduction-to-graph-database-analytics-and-cloud-services

  23. Oracle: Parallel Graph AnalytiX (PGX), 25 April 2019

    Google Scholar 

  24. Oracle: PGQL - Property Graph Query Language, 25 April 2019

    Google Scholar 

  25. Shang, H., Zhang, Y., Lin, X., Yu, J.X.: Taming verification hardness: an efficient algorithm for testing subgraph isomorphism. VLDB Endow. 1(1), 364–375 (2008)

    Article  Google Scholar 

  26. Ullmann, J.R.: An algorithm for subgraph isomorphism. J. ACM 23(1), 31–42 (1976)

    Article  MathSciNet  Google Scholar 

  27. Vajda, K.: JChem Cartridge for Oracle. ChemAxon Ltd. (2015), 19 May 2017

    Google Scholar 

  28. Šípek, V.: Comparison of approaches for querying of chemical compounds. Master thesis, Charles University, Prague, Czech Republic (2019). http://www.ksi.mff.cuni.cz/~holubova/dp/Sipek.pdf

  29. Williams, D.W., Huan, J., Wang, W.: Graph database indexing using structured graph decomposition. In: ICDE 2007, pp. 976–985 (2007)

    Google Scholar 

  30. Yan, X., Han, J.: gSpan: graph-based substructure pattern mining. In: ICDM 2002, pp. 721–724 (2002)

    Google Scholar 

  31. Yan, X., Yu, P.S., Han, J.: Graph indexing: a frequent structure-based approach. In: 2004 ACM SIGMOD, pp. 335–346. ACM, New York (2004)

    Google Scholar 

  32. Zaharevitz, D.: AIDS Antiviral Screen Data. NIH/NCI (2015), 19 May 2017

    Google Scholar 

  33. Zhang, S., Li, S., Yang, J.: GADDI: distance index based subgraph matching in biological networks. In: EDBT 2009, pp. 192–203. ACM, New York (2009)

    Google Scholar 

  34. Zhao, P., Han, J.: On graph query optimization in large networks. VLDB Endow. 3(1–2), 340–351 (2010)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Martin Svoboda .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Šípek, V., Holubová, I., Svoboda, M. (2019). Comparison of Approaches for Querying Chemical Compounds. In: Gadepally, V., et al. Heterogeneous Data Management, Polystores, and Analytics for Healthcare. DMAH Poly 2019 2019. Lecture Notes in Computer Science(), vol 11721. Springer, Cham. https://doi.org/10.1007/978-3-030-33752-0_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-33752-0_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-33751-3

  • Online ISBN: 978-3-030-33752-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics