Ligand-based approaches to activity prediction for the early stage of structure–activity–relationship progression

Maeda, Itsuki; Sato, Akinori; Tamura, Shunsuke; Miyao, Tomoyuki

doi:10.1007/s10822-022-00449-2

Ligand-based approaches to activity prediction for the early stage of structure–activity–relationship progression

Published: 29 March 2022

Volume 36, pages 237–252, (2022)
Cite this article

Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

482 Accesses
1 Citation
3 Altmetric
Explore all metrics

Abstract

The retrospective evaluation of virtual screening approaches and activity prediction models are important for methodological development. However, for fair comparison, evaluation data sets must be carefully prepared. In this research, we compiled structure–activity–relationship matrix-based data sets for 15 biological targets along with many diverse inactive compounds, assuming the early stage of structure–activity–relationship progression. To use a large number of diverse inactive compounds and a limited number of active compounds, similarity profiles (SPs) are proposed as a set of molecular descriptors. Using these highly imbalanced data sets, we evaluated various approaches including SPs, under-sampling, support vector machine (SVM), and message passing neural networks. We found that for the under-sampling approaches, cluster-based sampling is better than random sampling. For virtual screening, SPs with inactive reference compounds and the under-sampling SVM also perform well. For classification, SPs with many inactive references performed as well as the under-sampling SVM trained on a balanced data set. Although the performance of SPs and the under-sampling SVM were comparable, SPs with many inactive references were preferable for selecting structurally distinct compounds from the active training compounds.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial intelligence to deep learning: machine intelligence approach for drug discovery

Article 12 April 2021

Deep learning in drug discovery: an integrative review and future challenges

Article Open access 17 November 2022

Machine Learning in Drug Discovery: A Review

Article 11 August 2021

Data availability

All data sets used in this study are available in an open-access deposition on the ZENODO platform [33].

References

Stumpfe D, Bajorath J (2020) Current trends, overlooked issues, and unmet challenges in virtual screening. J Chem Inf Model 60:4112–4115. https://doi.org/10.1021/acs.jcim.9b01101
Article CAS PubMed Google Scholar
Škuta C, Cortés-Ciriano I, Dehaen W et al (2020) QSAR-derived affinity fingerprints (part 1): fingerprint construction and modeling performance for similarity searching, bioactivity classification and scaffold hopping. J Cheminform 12:1–16. https://doi.org/10.1186/s13321-020-00443-6
Article Google Scholar
Wassermann AM, Heikamp K, Bajorath J (2011) Potency-directed similarity searching using support vector machines. Chem Biol Drug Des 77:30–38. https://doi.org/10.1111/j.1747-0285.2010.01059.x
Article CAS PubMed Google Scholar
Jing Y, Bian Y, Hu Z et al (2018) Correction to: deep learning for drug design: an artificial intelligence paradigm for drug discovery in the big data era. AAPS J 20:1–1. https://doi.org/10.1208/s12248-018-0243-4
Article CAS Google Scholar
Sakai M, Nagayasu K, Shibui N et al (2021) Prediction of pharmacological activities from chemical structures with graph convolutional neural networks. Sci Rep 11:525. https://doi.org/10.1038/s41598-020-80113-7
Article CAS PubMed PubMed Central Google Scholar
Li X, Fourches D (2020) Inductive transfer learning for molecular activity prediction: next-Gen QSAR Models with MolPMoFiT. J Cheminform 12:1–15. https://doi.org/10.1186/s13321-020-00430-x
Article CAS Google Scholar
Tsou LK, Yeh SH, Ueng SH et al (2020) Comparative study between deep learning and QSAR classifications for TNBC inhibitors and novel GPCR agonist discovery. Sci Rep 10:1–11. https://doi.org/10.1038/s41598-020-73681-1
Article CAS Google Scholar
Yonchev D, Vogt M, Bajorath J (2020) From SAR diagnostics to compound design: development chronology of the compound optimization monitor (COMO) method. Mol Inform 39:2000046. https://doi.org/10.1002/minf.202000046
Article CAS PubMed Central Google Scholar
Kunimoto R, Miyao T, Bajorath J (2018) Computational method for estimating progression saturation of analog series. RSC Adv 8:5484–5492. https://doi.org/10.1039/c7ra13748f
Article CAS Google Scholar
Lipinski CA (2010) Overview of hit to lead: the medicinal chemist’s role from HTS retest to lead optimization hand off. In: Hayward MM (ed) Lead-seeking approaches. Springer, New York, pp 1–24
Google Scholar
Hawkins PCD, Skillman AG, Nicholls A (2007) Comparison of shape-matching and docking as virtual screening tools. J Med Chem 50:74–82. https://doi.org/10.1021/jm0603365
Article CAS PubMed Google Scholar
Sato T, Yuki H, Takaya D et al (2012) Application of support vector machine to three-dimensional shape-based virtual screening using comprehensive three-dimensional molecular shape overlay with known inhibitors. J Chem Inf Model 52:1015–1026. https://doi.org/10.1021/ci200562p
Article CAS PubMed Google Scholar
Sato A, Miyao T, Jasial S, Funatsu K (2021) Comparing predictive ability of QSAR/QSPR models using 2D and 3D molecular representations. J Comput Aided Mol Des 35:179–193. https://doi.org/10.1007/s10822-020-00361-7
Article CAS PubMed Google Scholar
Wassermann AM, Haebel P, Weskamp N, Bajorath J (2012) SAR matrices: automated extraction of information-rich SAR tables from large compound data sets. J Chem Inf Model 52:1769–1776. https://doi.org/10.1021/ci300206e
Article CAS PubMed Google Scholar
Mendez D, Gaulton A, Bento AP et al (2019) ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res 47:D930–D940. https://doi.org/10.1093/nar/gky1075
Article CAS PubMed Google Scholar
Kim S, Chen J, Cheng T et al (2021) PubChem in 2021: new data content and improved web interfaces. Nucleic Acids Res 49:D1388–D1395. https://doi.org/10.1093/nar/gkaa971
Article CAS PubMed Google Scholar
Kenny PW, Sadowski J (2005) Structure modification in chemical databases. In: Oprea TI (ed) Chemoinformatics in drug discovery. Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim, pp 271–285
Chapter Google Scholar
MolProp TK, version 2.5.4; OpenEye Scientific Software Inc, Santa Fe
Wawer M, Bajorath J (2011) Local structural changes, global data views: graphical substructure- activity relationship trailing. J Med Chem 54:2944–2951. https://doi.org/10.1021/jm200026b
Article CAS PubMed Google Scholar
Matsumoto K, Miyao T, Funatsu K (2021) Ranking-oriented quantitative structure-activity relationship modeling combined with assay-wise data integration. ACS Omega 6:11964–11973. https://doi.org/10.1021/acsomega.1c00463
Article CAS PubMed PubMed Central Google Scholar
Rogers D, Hahn M (2010) Extended-connectivity fingerprints. J Chem Inf Model 50:742–754. https://doi.org/10.1021/ci100050t
Article CAS PubMed Google Scholar
Jones E, Oliphant T, Peterson P (2021) SciPy: Open source scientific tools for python. https://www.scipy.org. Accessed 31 Oct 2021
Vapnik VN (2000) The nature of statistical learning theory. Springer-Verlag, New York
Book Google Scholar
Gilmer J, Schoenholz SS, Riley PF, et al (2017) Neural message passing for quantum chemistry. In: 34th International Conference on Machine Learning. PMLR, pp 2053–2070
Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory. ACM, pp 144–152
Ralaivola L, Swamidass SJ, Saigo H, Baldi P (2005) Graph kernels for chemical informatics. Neural Netw 18:1093–1110. https://doi.org/10.1016/j.neunet.2005.07.009
Article PubMed Google Scholar
Pedregosa F, Varoquaux G, Gramfort A et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830
Google Scholar
OEChem TK, version 3.0.0; OpenEye Scientific Software Inc, Santa Fe
Tang B, Kramer ST, Fang M et al (2020) A self-attention based message passing neural network for predicting molecular lipophilicity and aqueous solubility. J Cheminform 12:1–9. https://doi.org/10.1186/s13321-020-0414-z
Article Google Scholar
Paszke A, Gross S, Chintala S, et al. (2017) Automatic differentiation in pytorch. In: 31st Conference on Neural Information Processing Systems
Akiba T, Sano S, Yanase T, et al. (2019) Optuna: a next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, pp 2623–2631
Park HS, Jun CH (2009) A simple and fast algorithm for K-medoids clustering. Expert Syst Appl 36:3336–3341. https://doi.org/10.1016/j.eswa.2008.01.039
Article Google Scholar
Maeda I, Sato A, Tamura S, Miyao T, Compound activity data sets for 15 biological targets compiled from the ChEMBL and PubChem databases. https://doi.org/10.5281/zenodo.5748597

Download references

Acknowledgements

We thank Dr. Ryo Kunimoto at Daiichi Sankyo Company, Limited, for helping us with data set preparation. This work was financially supported by the Grant-in-Aid for Transformative Research Areas (A) 21A204 Digitalization-driven Transformative Organic Synthesis (Digi-TOS) from the Ministry of Education, Culture, Sports, Science & Technology, Japan.

Author information

Authors and Affiliations

Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara, 630-0192, Japan
Itsuki Maeda, Akinori Sato, Shunsuke Tamura & Tomoyuki Miyao
Data Science Center, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara, 630-0192, Japan
Tomoyuki Miyao

Authors

Itsuki Maeda
View author publications
You can also search for this author in PubMed Google Scholar
Akinori Sato
View author publications
You can also search for this author in PubMed Google Scholar
Shunsuke Tamura
View author publications
You can also search for this author in PubMed Google Scholar
Tomoyuki Miyao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tomoyuki Miyao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 5623 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Maeda, I., Sato, A., Tamura, S. et al. Ligand-based approaches to activity prediction for the early stage of structure–activity–relationship progression. J Comput Aided Mol Des 36, 237–252 (2022). https://doi.org/10.1007/s10822-022-00449-2

Download citation

Received: 02 December 2021
Accepted: 07 March 2022
Published: 29 March 2022
Issue Date: March 2022
DOI: https://doi.org/10.1007/s10822-022-00449-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Ligand-based approaches to activity prediction for the early stage of structure–activity–relationship progression

Abstract

Access this article

Similar content being viewed by others

Artificial intelligence to deep learning: machine intelligence approach for drug discovery

Deep learning in drug discovery: an integrative review and future challenges

Machine Learning in Drug Discovery: A Review

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (DOCX 5623 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Ligand-based approaches to activity prediction for the early stage of structure–activity–relationship progression

Abstract

Access this article

Similar content being viewed by others

Artificial intelligence to deep learning: machine intelligence approach for drug discovery

Deep learning in drug discovery: an integrative review and future challenges

Machine Learning in Drug Discovery: A Review

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (DOCX 5623 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation