Abstract
Chemical compounds form a database with specific features that can be utilized for more efficient query processing. Currently, there exists no comparison of performance and memory usage of the respective and most efficient approaches on the same data set. In this paper, we address this lack of information and we create an unbiased benchmark of the most popular index building methods for subgraph querying of chemical databases. In addition, we compare the results with the performance of an SQL and a graph database for which there exist various unconfirmed hypotheses on their efficiency.
This work was partially supported by the Charles University project PROGRES Q48.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
AMBIT, 19 May 2017. http://ambit.sourceforge.net/
ChEMBL, 2 May 2019. https://www.ebi.ac.uk/chembl/
Neo4j database, 19 May 2017. https://neo4j.com/
SMILES, 2 May 2019. http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html
The Chemistry Development Kit, 19 May 2017. https://github.com/cdk/
Agrafiotis, D.K., et al.: Efficient substructure searching of large chemical libraries: the ABCD chemical cartridge. J. Chem. Inf. Model. 51(12), 3113–3130 (2011)
Azaouzi, M., Ben Romdhane, L.: A minimal rare substructures-based model for graph database indexing. In: Madureira, A.M., Abraham, A., Gamboa, D., Novais, P. (eds.) ISDA 2016. AISC, vol. 557, pp. 250–259. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-53480-0_25
Bauer, U.: Minimum cycle basis algorithms for the chemistry development toolkit (2004)
Bonnici, V., Ferro, A., Giugno, R., Pulvirenti, A., Shasha, D.: Enhancing graph database indexing by suffix tree structure. In: Dijkstra, T.M.H., Tsivtsivadze, E., Marchiori, E., Heskes, T. (eds.) PRIB 2010. LNCS, vol. 6282, pp. 195–203. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-16001-1_17
Cordella, L.P., Foggia, P., Sansone, C., Vento, M.: A (sub)graph isomorphism algorithm for matching large graphs. IEEE Trans. Pattern Anal. Mach. Intell. 26(10), 1367–1372 (2004)
Dongoran, E.S.S., Saleh, W.K.R., Gozali, A.A.: Analysis and implementation of graph indexing for graph database using GraphGrep algorithm. In: ICoICT 2015, pp. 59–64 (2015)
Ehrlich, H.-C., Rarey, M.: Systematic benchmark of substructure search in molecular graphs - from Ullmann to VF2. J. Cheminform. 4(1), 13 (2012)
Golovin, A., Henrick, K.: Chemical substructure search in SQL. J. Chem. Inf. Model. 49(1), 22–27 (2009)
He, H., Singh, A.K.: Closure-tree: an index structure for graph queries. In: ICDE 2006, p. 38 (2006)
He, H., Singh, A.K.: Graphs-at-a-time: query language and access methods for graph databases. In: 2008 ACM SIGMOD, pp. 405–418. ACM, New York (2008)
Hoksza, D., Jelínek, J.: Using Neo4j for mining protein graphs: a case study. In: DEXA 2015, pp. 230–234, September 2015
Jiang, H., Wang, H., Yu, P.S., Zhou, S.: GString: a novel approach for efficient search in graph databases. In: ICDE 2007, pp. 566–575 (2007)
Kruskal, J.B.: On the shortest spanning subtree of a graph and the traveling salesman problem. Am. Math. Soc. 7(1), 48–50 (1956)
Lee, J., Han, W.-S., Kasperovics, R., Lee, J.-H.: An in-depth comparison of subgraph isomorphism algorithms in graph databases. VLDB Endow. 6(2), 133–144 (2012)
May, J.: Substructure search face-off: are the slowest queries the same between tools? NextMove Software (2015), 19 May 2017
Microsoft: Windows Subsystem for Linux Documentation, 25 April 2019. https://docs.microsoft.com/en-us/windows/wsl/about
Oracle: An Introduction to Graph: Database, Analytics, and Cloud Services, 25 April 2019. https://www.slideshare.net/JeanIhm/an-introduction-to-graph-database-analytics-and-cloud-services
Oracle: Parallel Graph AnalytiX (PGX), 25 April 2019
Oracle: PGQL - Property Graph Query Language, 25 April 2019
Shang, H., Zhang, Y., Lin, X., Yu, J.X.: Taming verification hardness: an efficient algorithm for testing subgraph isomorphism. VLDB Endow. 1(1), 364–375 (2008)
Ullmann, J.R.: An algorithm for subgraph isomorphism. J. ACM 23(1), 31–42 (1976)
Vajda, K.: JChem Cartridge for Oracle. ChemAxon Ltd. (2015), 19 May 2017
Šípek, V.: Comparison of approaches for querying of chemical compounds. Master thesis, Charles University, Prague, Czech Republic (2019). http://www.ksi.mff.cuni.cz/~holubova/dp/Sipek.pdf
Williams, D.W., Huan, J., Wang, W.: Graph database indexing using structured graph decomposition. In: ICDE 2007, pp. 976–985 (2007)
Yan, X., Han, J.: gSpan: graph-based substructure pattern mining. In: ICDM 2002, pp. 721–724 (2002)
Yan, X., Yu, P.S., Han, J.: Graph indexing: a frequent structure-based approach. In: 2004 ACM SIGMOD, pp. 335–346. ACM, New York (2004)
Zaharevitz, D.: AIDS Antiviral Screen Data. NIH/NCI (2015), 19 May 2017
Zhang, S., Li, S., Yang, J.: GADDI: distance index based subgraph matching in biological networks. In: EDBT 2009, pp. 192–203. ACM, New York (2009)
Zhao, P., Han, J.: On graph query optimization in large networks. VLDB Endow. 3(1–2), 340–351 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Šípek, V., Holubová, I., Svoboda, M. (2019). Comparison of Approaches for Querying Chemical Compounds. In: Gadepally, V., et al. Heterogeneous Data Management, Polystores, and Analytics for Healthcare. DMAH Poly 2019 2019. Lecture Notes in Computer Science(), vol 11721. Springer, Cham. https://doi.org/10.1007/978-3-030-33752-0_15
Download citation
DOI: https://doi.org/10.1007/978-3-030-33752-0_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33751-3
Online ISBN: 978-3-030-33752-0
eBook Packages: Computer ScienceComputer Science (R0)