Abstract
The analysis of structure–activity relationships (SARs) becomes rather challenging when large and heterogeneous compound data sets are studied. In such cases, many different compounds and their activities need to be compared, which quickly goes beyond the capacity of subjective assessments. For a comprehensive large-scale exploration of SARs, computational analysis and visualization methods are required. Herein, we introduce a two-layered SAR visualization scheme specifically designed for increasingly large compound data sets. The approach combines a new compound pair-based variant of generative topographic mapping (GTM), a machine learning approach for nonlinear mapping, with chemical space networks (CSNs). The GTM component provides a global view of the activity landscapes of large compound data sets, in which informative local SAR environments are identified, augmented by a numerical SAR scoring scheme. Prioritized local SAR regions are then projected into CSNs that resolve these regions at the level of individual compounds and their relationships. Analysis of CSNs makes it possible to distinguish between regions having different SAR characteristics and select compound subsets that are rich in SAR information.









Similar content being viewed by others
References
Wermuth CG (ed) (2011) The practice of medicinal chemistry. Academic Press-Elsevier: Burlington, San Diego
Stumpfe D, Bajorath J (2012) Methods for SAR visualization. RSC Adv 2:369–378
Stumpfe D, Bajorath J (2016) Recent developments in SAR visualization. Med Chem Comm 7:1045–1055
Maynard AT, Roberts CD (2015) Quantifying, visualizing, and monitoring lead optimization. J Med Chem 59:4189–4201
Reutlinger M, Guba W, Martin RE, Alanine AI, Hoffmann T, Klenner A, Hiss JA, Schneider P, Schneider G (2011) Neighborhood-preserving visualization of adaptive structure-activity landscapes: application to drug discovery. Angew Chemie Int Ed 50:11633–11636
Wassermann AM, Wawer M, Bajorath J (2010) Activity landscape representations for structure-activity relationship analysis. J Med Chem 53:8209–8223
Medina-Franco JL, Martinez-Mayorga K, Bender A, Marín RM, Giulianotti MA, Pinilla C, Houghten RA (2009) Characterization of activity landscapes using 2D and 3D similarity methods: consensus activity cliffs. J Chem Inf Model 49:477–491
Maggiora GM (2006) On outliers and activity cliffs why QSAR often disappoints. J Chem Inf Model 46:1535–1535
Stumpfe D, Bajorath J (2012) Exploring activity cliffs in medicinal chemistry. J Med Chem 55:2932–2942
Stumpfe D, Hu Y, Dimova D, Bajorath J (2013) Recent progress in understanding activity cliffs and their utility in medicinal chemistry. J Med Chem 57:18–28
Peltason L, Bajorath J (2007) SAR index: quantifying the nature of structure- activity relationships. J Med Chem 50:5571–5578
Peltason L, Bajorath J (2009) Systematic computational analysis of structure–activity relationships: concepts, challenges and recent advances. Future Med Chem 1:451–466
Maggiora GM, Bajorath J (2014) Chemical space networks: a powerful new paradigm for the description of chemical space. J Comput Aided Mol Des 28:795–802
Vogt M, Stumpfe D, Maggiora GM, Bajorath J (2016) Lessons learned from the design of chemical space networks and opportunities for new applications. J Comput Aided Mol Des 30:191–208
Kenny PW, Sadowski J (2006) Structure modification in chemical databases. Chemoinformatics Drug Discov 23:271–285
Griffen E, Leach AG, Robb GR, Warner DJ (2011) Matched molecular pairs as a medicinal chemistry tool. J Med Chem 54:7739–7750
Hu X, Hu Y, Vogt M et al (2012) MMP-cliffs: systematic identification of activity cliffs on the basis of matched molecular pairs. J Chem Inf Model 52:1138–1145
Zhang B, Vogt M, Maggiora GM, Bajorath J (2015) Comparison of bioactive chemical space networks generated using substructure-and fingerprint-based measures of molecular similarity. J Comput Aided Mol Des 29:595–608
Bishop CM, Svensén M, Williams CK (1998) GTM: the generative topographic mapping. Neural Comput 10:215–234
Kireeva N, Baskin II, Gaspar HA et al (2012) Generative topographic mapping (GTM): universal tool for data visualization, structure-activity modeling and dataset comparison. Mol Inform 31:301–312
Sidorov P, Gaspar H, Marcou G, Varnek A, Horvath D (2015) Mappability of drug-like space: towards a polypharmacologically competent map of drug-relevant compounds. J Comput Aided Mol Des 29:1087–1108
Gaspar HA, Marcou G, Horvath D, Arault A, Lozano S, Vayer P, Varnek A (2013) Generative topographic mapping-based classification models and their applicability domain: application to the biopharmaceutics drug disposition classification system (BDDCS). J Chem Inf Model 53:3318–3325
Gaspar HA, Baskin II, Marcou G, Horvath D, Varnek A (2015) GTM-based QSAR models and their applicability domains. Mol Inform 34:348–356
Gaspar HA, Baskin II, Marcou G, Horvath D, Varnek A (2015) Stargate GTM: bridging descriptor and activity spaces. J Chem Inf Model 55:2403–2410
Klimenko K, Marcou G, Horvath D, Varnek A (2016) Chemical space mapping and structure-activity analysis of the ChEMBL antiviral compound set. J Chem Inf Model 56:1438–1454
Wawer M, Bajorath J (2011) Extracting SAR information from a large collection of anti-malarial screening hits by NSG-SPT analysis. ACS Med Chem Lett 2:201–206
Liu T, Lin Y, Wen X et al (2007) BindingDB: a web-accessible database of experimentally determined protein–ligand binding affinities. Nucleic Acids Res 35:D198–D201
Gamo F-J, Sanz LM, Vidal J, de Cozar C, Alvarez E, Lavandera JL, Vanderwall DE, Green DVS, Kumar V, Hasan S, Brown JR, Peishoff CS, Cardon LR, Garcia-Bustos JF (2010) Thousands of chemical starting points for antimalarial lead identification. Nature 465:305–310
Hussain J, Rea C (2010) Computationally efficient algorithm to identify matched molecular pairs (MMPs) in large data sets. J Chem Inf Model 50:339–348
OEChem TK (2012) OpenEye Scientific Software, Inc., St. Fe, NM
Varnek A, Fourches D, Horvath D, Klimchuk O, Gaudin C, Vayer P, Solov’ev V, Hoonakker F, Tetko IV, Marcou G (2008) ISIDA-platform for virtual screening based on fragment and pharmacophoric descriptors. Curr Comput Aided Drug Des 4:191
Ruggiu F, Marcou G, Varnek A, Horvath D (2010) ISIDA property-labelled fragment descriptors. Mol Inform 29:855–868
Oprisiu I, Varlamova E, Muratov E, Artemenko A, Marcou G, Polishchuk P, Kuz’min V, Varnek A (2012) QSPR approach to predict nonadditive properties of mixtures. application to bubble point temperatures of binary mixtures of liquids. Mol Inform 31:491–502
Horvath D, Brown JB, Marcou G, Varnek A (2014) An evolutionary optimizer of libsvm models. Challenges 5:450–472
O’Madadhain J, Fisher D, Smyth P, White S, Boey Y-B (2005) Analysis and visualization of network data using JUNG. J Stat Softw 10:1–35
Fruchterman TMJ, Reingold EM (1991) Graph drawing by force-directed placement. Softw Pract Exp 21:1129–1164
Lounkine E, Wawer M, Wassermann AM, Bajorath J (2010) SARANEA—a freely available program to mine structure-activity and structure-selectivity relationship information in compound data sets. J Chem Inf Model 50:68–78
Newman M (2010) Networks—an introduction, Oxford University Press Inc., New York
Kohonen T (1998) The self-organizing map. Neurocomputing 21:1–6
Tetko IV (2008) Associative neural networks. Meth Mol Biol 458:185–202
Acknowledgements
S.K. is supported by a Ph.D. fellowship from Region Alsace. We thank Martin Vogt for helpful discussions. The authors are grateful to OpenEye Scientific Software, Inc., for the free academic license of the OpenEye Toolkits.
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Kayastha, S., Kunimoto, R., Horvath, D. et al. From bird’s eye views to molecular communities: two-layered visualization of structure–activity relationships in large compound data sets. J Comput Aided Mol Des 31, 961–977 (2017). https://doi.org/10.1007/s10822-017-0070-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10822-017-0070-1