Abstract
Chemical space networks (CSNs) have recently been introduced as a conceptual alternative to coordinate-based representations of chemical space. CSNs were initially designed as threshold networks using the Tanimoto coefficient as a continuous similarity measure. The analysis of CSNs generated from sets of bioactive compounds revealed that many statistical properties were strongly dependent on their edge density. While it was difficult to compare CSNs at pre-defined similarity threshold values, CSNs with constant edge density were directly comparable. In the current study, alternative CSN representations were constructed by applying the matched molecular pair (MMP) formalism as a substructure-based similarity criterion. For more than 150 compound activity classes, MMP-based CSNs (MMP-CSNs) were compared to corresponding threshold CSNs (THR-CSNs) at a constant edge density by applying different parameters from network science, measures of community structure distributions, and indicators of structure–activity relationship (SAR) information content. MMP-CSNs were found to be an attractive alternative to THR-CSNs, yielding low edge densities and well-resolved topologies. MMP-CSNs and corresponding THR-CSNs often had similar topology and closely corresponding community structures, although there was only limited overlap in similarity relationships. The homophily principle from network science was shown to affect MMP-CSNs and THR-CSNs in different ways, despite the presence of conserved topological features. Moreover, activity cliff distributions in alternative CSN designs markedly differed, which has important implications for SAR analysis.
Similar content being viewed by others
References
Dobson CM (2004) Chemical space and biology. Nature 432:824–828
Bohacek RS, McMartin C, Guida WC (1996) The art and practice of structure-based drug design: a molecular modelling perspective. Med Res Rev 16:3–50
Pearlman R, Smith K (2002) Novel software tools for chemical diversity. 3D QSAR in drug design: three-dimensional. Quant Struct Act Relat 2:339–353
Maggiora GM, Bajorath J (2014) Chemical space networks—a powerful new paradigm for the description of chemical space. J Comput Aided Mol Des 28:795–802
Maggiora GM, Vogt M, Stumpfe D, Bajorath J (2014) Molecular similarity in medicinal chemistry. J Med Chem 57:3186–3204
Watts D, Strogatz S (1998) Collective dynamics of ‘small-world’ networks. Nature 393:440–442
Barabási A, Albert R (1999) Emergence of scaling in random networks. Science 286:509–512
Newman M (2010) Networks—an introduction. Oxford University Press Inc., New York
Newman M (2003) The structure and function of complex networks. SIAM Rev 45:167–256
Albert R, Barabási A (2002) Statistical mechanics of complex networks. Rev Mod Phys 74:47–97
McPherson M, Smith-Lovin L, Cook J (2001) Birds of a feather: homophily in social networks. Annu Rev Sociol 27:415–444
Wawer M, Peltason L, Weskamp N, Teckentrup A, Bajorath J (2008) Structure-activity relationship anatomy by network-like similarity graphs and local structure-activity relationship indices. J Med Chem 51:6075–6084
Tanaka N, Ohno K, Niimi T, Moritomo A, Mori K, Orita M (2009) Small-world phenomena in chemical library networks: application to fragment-based drug discovery. J Chem Inf Model 49:2677–2686
Krein MP, Sukumar N (2011) Exploration of the topology of chemical spaces with network measures. J Phys Chem A 115:12905–12918
Fourches D, Tropsha A (2013) Using graph indices for the analysis and comparison of chemical data sets. Mol Inf 32:827–842
Zwierzyna M, Vogt M, Maggiora GM, Bajorath J (2015) Design and characterization of chemical space networks for different compound data sets. J Comput Aided Mol Des 29:113–125
Rogers D, Hahn M (2010) Extended-connectivity fingerprints. J Chem Inf Model 50:742–754
Stumpfe D, Hu Y, Dimova D, Bajorath J (2014) Recent progress in understanding activity cliffs and their utility in medicinal chemistry. J Med Chem 57:18–28
Hu X, Hu Y, Vogt M, Stumpfe D, Bajorath J (2012) MMP-cliffs: systematic identification of activity cliffs on the basis of matched molecular pairs. J Chem Inf Model 52:1138–1145
Stumpfe D, Bajorath J (2012) Frequency of occurrence and potency range distribution of activity cliffs in bioactive compounds. J Chem Inf Model 52:2348–2353
Kenny PW, Sadowski J (2005) Structure modification in chemical databases. In: Oprea TI (ed) Chemoinformatics in drug discovery. Wiley-VCH, Weinheim, pp 271–285
Hussain J, Rea C (2010) Computationally efficient algorithm to identify matched molecular pairs (MMPs) in large data sets. J Chem Inf Model 50:339–348
Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey A, Light Y, McGlinchey S, Michalovich D, Al-Lazikani B, Overington JP (2012) ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res 40(Database issue):D1100–D1107
Java Universal Network/Graph Framework. http://jung.sourceforge.net. Accessed 12 Oct 2014
Fruchterman TMJ, Reingold EM (1991) Graph drawing by force-directed placement. Softw Pract Exp 21:1129–1164
Newman M, Park J (2003) Why social networks are different from other types of networks. Phys Rev E 68:036122
Foster D, Foster J, Grassberger P, Paczuski M (2011) Clustering drives assortativity and community structure in ensembles of networks. Phys Rev E 84:066117
Newman M (2004) Fast algorithm for detecting community structure in networks. Phys Rev E 69:066133
Maggiora GM, Shanmugasundaram V (2005) An information-theoretic characterization of partitioned property spaces. J Math Chem 38:1–20
Vinh NX, Epps J, Bailey J (2010) Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. J Mach Learn Res 11:2837–2854
Acknowledgments
The authors thank Ye Hu for help with data set collection and Dilyana Dimova for MMP routines. BZ is supported by the China Scholarship Council.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhang, B., Vogt, M., Maggiora, G.M. et al. Comparison of bioactive chemical space networks generated using substructure- and fingerprint-based measures of molecular similarity. J Comput Aided Mol Des 29, 595–608 (2015). https://doi.org/10.1007/s10822-015-9852-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10822-015-9852-5