Skip to main content
Log in

From bird’s eye views to molecular communities: two-layered visualization of structure–activity relationships in large compound data sets

  • Published:
Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

Abstract

The analysis of structure–activity relationships (SARs) becomes rather challenging when large and heterogeneous compound data sets are studied. In such cases, many different compounds and their activities need to be compared, which quickly goes beyond the capacity of subjective assessments. For a comprehensive large-scale exploration of SARs, computational analysis and visualization methods are required. Herein, we introduce a two-layered SAR visualization scheme specifically designed for increasingly large compound data sets. The approach combines a new compound pair-based variant of generative topographic mapping (GTM), a machine learning approach for nonlinear mapping, with chemical space networks (CSNs). The GTM component provides a global view of the activity landscapes of large compound data sets, in which informative local SAR environments are identified, augmented by a numerical SAR scoring scheme. Prioritized local SAR regions are then projected into CSNs that resolve these regions at the level of individual compounds and their relationships. Analysis of CSNs makes it possible to distinguish between regions having different SAR characteristics and select compound subsets that are rich in SAR information.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Wermuth CG (ed) (2011) The practice of medicinal chemistry. Academic Press-Elsevier: Burlington, San Diego

    Google Scholar 

  2. Stumpfe D, Bajorath J (2012) Methods for SAR visualization. RSC Adv 2:369–378

    Article  CAS  Google Scholar 

  3. Stumpfe D, Bajorath J (2016) Recent developments in SAR visualization. Med Chem Comm 7:1045–1055

    Article  CAS  Google Scholar 

  4. Maynard AT, Roberts CD (2015) Quantifying, visualizing, and monitoring lead optimization. J Med Chem 59:4189–4201

    Article  Google Scholar 

  5. Reutlinger M, Guba W, Martin RE, Alanine AI, Hoffmann T, Klenner A, Hiss JA, Schneider P, Schneider G (2011) Neighborhood-preserving visualization of adaptive structure-activity landscapes: application to drug discovery. Angew Chemie Int Ed 50:11633–11636

    Article  CAS  Google Scholar 

  6. Wassermann AM, Wawer M, Bajorath J (2010) Activity landscape representations for structure-activity relationship analysis. J Med Chem 53:8209–8223

    Article  CAS  Google Scholar 

  7. Medina-Franco JL, Martinez-Mayorga K, Bender A, Marín RM, Giulianotti MA, Pinilla C, Houghten RA (2009) Characterization of activity landscapes using 2D and 3D similarity methods: consensus activity cliffs. J Chem Inf Model 49:477–491

    Article  CAS  Google Scholar 

  8. Maggiora GM (2006) On outliers and activity cliffs why QSAR often disappoints. J Chem Inf Model 46:1535–1535

    Article  CAS  Google Scholar 

  9. Stumpfe D, Bajorath J (2012) Exploring activity cliffs in medicinal chemistry. J Med Chem 55:2932–2942

    Article  CAS  Google Scholar 

  10. Stumpfe D, Hu Y, Dimova D, Bajorath J (2013) Recent progress in understanding activity cliffs and their utility in medicinal chemistry. J Med Chem 57:18–28

    Article  Google Scholar 

  11. Peltason L, Bajorath J (2007) SAR index: quantifying the nature of structure- activity relationships. J Med Chem 50:5571–5578

    Article  CAS  Google Scholar 

  12. Peltason L, Bajorath J (2009) Systematic computational analysis of structure–activity relationships: concepts, challenges and recent advances. Future Med Chem 1:451–466

    Article  CAS  Google Scholar 

  13. Maggiora GM, Bajorath J (2014) Chemical space networks: a powerful new paradigm for the description of chemical space. J Comput Aided Mol Des 28:795–802

    Article  CAS  Google Scholar 

  14. Vogt M, Stumpfe D, Maggiora GM, Bajorath J (2016) Lessons learned from the design of chemical space networks and opportunities for new applications. J Comput Aided Mol Des 30:191–208

    Article  CAS  Google Scholar 

  15. Kenny PW, Sadowski J (2006) Structure modification in chemical databases. Chemoinformatics Drug Discov 23:271–285

    Google Scholar 

  16. Griffen E, Leach AG, Robb GR, Warner DJ (2011) Matched molecular pairs as a medicinal chemistry tool. J Med Chem 54:7739–7750

    Article  CAS  Google Scholar 

  17. Hu X, Hu Y, Vogt M et al (2012) MMP-cliffs: systematic identification of activity cliffs on the basis of matched molecular pairs. J Chem Inf Model 52:1138–1145

    Article  CAS  Google Scholar 

  18. Zhang B, Vogt M, Maggiora GM, Bajorath J (2015) Comparison of bioactive chemical space networks generated using substructure-and fingerprint-based measures of molecular similarity. J Comput Aided Mol Des 29:595–608

    Article  CAS  Google Scholar 

  19. Bishop CM, Svensén M, Williams CK (1998) GTM: the generative topographic mapping. Neural Comput 10:215–234

    Article  Google Scholar 

  20. Kireeva N, Baskin II, Gaspar HA et al (2012) Generative topographic mapping (GTM): universal tool for data visualization, structure-activity modeling and dataset comparison. Mol Inform 31:301–312

    Article  CAS  Google Scholar 

  21. Sidorov P, Gaspar H, Marcou G, Varnek A, Horvath D (2015) Mappability of drug-like space: towards a polypharmacologically competent map of drug-relevant compounds. J Comput Aided Mol Des 29:1087–1108

    Article  CAS  Google Scholar 

  22. Gaspar HA, Marcou G, Horvath D, Arault A, Lozano S, Vayer P, Varnek A (2013) Generative topographic mapping-based classification models and their applicability domain: application to the biopharmaceutics drug disposition classification system (BDDCS). J Chem Inf Model 53:3318–3325

    Article  CAS  Google Scholar 

  23. Gaspar HA, Baskin II, Marcou G, Horvath D, Varnek A (2015) GTM-based QSAR models and their applicability domains. Mol Inform 34:348–356

    Article  CAS  Google Scholar 

  24. Gaspar HA, Baskin II, Marcou G, Horvath D, Varnek A (2015) Stargate GTM: bridging descriptor and activity spaces. J Chem Inf Model 55:2403–2410

    Article  CAS  Google Scholar 

  25. Klimenko K, Marcou G, Horvath D, Varnek A (2016) Chemical space mapping and structure-activity analysis of the ChEMBL antiviral compound set. J Chem Inf Model 56:1438–1454

    Article  CAS  Google Scholar 

  26. Wawer M, Bajorath J (2011) Extracting SAR information from a large collection of anti-malarial screening hits by NSG-SPT analysis. ACS Med Chem Lett 2:201–206

    Article  CAS  Google Scholar 

  27. Liu T, Lin Y, Wen X et al (2007) BindingDB: a web-accessible database of experimentally determined protein–ligand binding affinities. Nucleic Acids Res 35:D198–D201

    Article  CAS  Google Scholar 

  28. Gamo F-J, Sanz LM, Vidal J, de Cozar C, Alvarez E, Lavandera JL, Vanderwall DE, Green DVS, Kumar V, Hasan S, Brown JR, Peishoff CS, Cardon LR, Garcia-Bustos JF (2010) Thousands of chemical starting points for antimalarial lead identification. Nature 465:305–310

    Article  CAS  Google Scholar 

  29. Hussain J, Rea C (2010) Computationally efficient algorithm to identify matched molecular pairs (MMPs) in large data sets. J Chem Inf Model 50:339–348

    Article  CAS  Google Scholar 

  30. OEChem TK (2012) OpenEye Scientific Software, Inc., St. Fe, NM

  31. Varnek A, Fourches D, Horvath D, Klimchuk O, Gaudin C, Vayer P, Solov’ev V, Hoonakker F, Tetko IV, Marcou G (2008) ISIDA-platform for virtual screening based on fragment and pharmacophoric descriptors. Curr Comput Aided Drug Des 4:191

    Article  CAS  Google Scholar 

  32. Ruggiu F, Marcou G, Varnek A, Horvath D (2010) ISIDA property-labelled fragment descriptors. Mol Inform 29:855–868

    Article  CAS  Google Scholar 

  33. Oprisiu I, Varlamova E, Muratov E, Artemenko A, Marcou G, Polishchuk P, Kuz’min V, Varnek A (2012) QSPR approach to predict nonadditive properties of mixtures. application to bubble point temperatures of binary mixtures of liquids. Mol Inform 31:491–502

    Article  CAS  Google Scholar 

  34. Horvath D, Brown JB, Marcou G, Varnek A (2014) An evolutionary optimizer of libsvm models. Challenges 5:450–472

    Article  Google Scholar 

  35. O’Madadhain J, Fisher D, Smyth P, White S, Boey Y-B (2005) Analysis and visualization of network data using JUNG. J Stat Softw 10:1–35

    Google Scholar 

  36. Fruchterman TMJ, Reingold EM (1991) Graph drawing by force-directed placement. Softw Pract Exp 21:1129–1164

    Article  Google Scholar 

  37. Lounkine E, Wawer M, Wassermann AM, Bajorath J (2010) SARANEA—a freely available program to mine structure-activity and structure-selectivity relationship information in compound data sets. J Chem Inf Model 50:68–78

    Article  CAS  Google Scholar 

  38. Newman M (2010) Networks—an introduction, Oxford University Press Inc., New York

    Book  Google Scholar 

  39. Kohonen T (1998) The self-organizing map. Neurocomputing 21:1–6

    Article  Google Scholar 

  40. Tetko IV (2008) Associative neural networks. Meth Mol Biol 458:185–202

    Google Scholar 

Download references

Acknowledgements

S.K. is supported by a Ph.D. fellowship from Region Alsace. We thank Martin Vogt for helpful discussions. The authors are grateful to OpenEye Scientific Software, Inc., for the free academic license of the OpenEye Toolkits.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Alexandre Varnek or Jürgen Bajorath.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kayastha, S., Kunimoto, R., Horvath, D. et al. From bird’s eye views to molecular communities: two-layered visualization of structure–activity relationships in large compound data sets. J Comput Aided Mol Des 31, 961–977 (2017). https://doi.org/10.1007/s10822-017-0070-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10822-017-0070-1

Keywords

Navigation