Abstract
Deep learning (DL) has become increasingly popular in the field of drug discovery. A large variety of end-to-end DL methods for chemical compounds have recently been proposed in the literature, potentially eliminating the need for expert-designed compound representations. This study aims to determine which types of representations and DL algorithms are most suitable for the specific problem of anti-cancer drug response prediction. A newly developed chemoinformatics package called DeepMol was used to benchmark 12 different compound representation methods on 5 anti-cancer drug sensitivity datasets. We found that DL models that are able to learn compound representations directly from SMILES strings or molecular graphs can perform as well as or even better than models trained on molecular fingerprints, even on smaller datasets. We also conclude that popular molecular fingerprints might not always be the best choice and less well-known fingerprints might be worth exploring in future drug response prediction studies.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. In: Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, vol. 16, pp. 265–283 (2016)
Adam, G., Rampášek, L., Safikhani, Z., Smirnov, P., Haibe-Kains, B., Goldenberg, A.: Machine learning approaches to drug response prediction: challenges and recent progress. NPJ Precis. Oncol. 4(1), 19 (2020). https://doi.org/10.1038/s41698-020-0122-1
Ali, M., Aittokallio, T.: Machine learning and feature selection for drug response prediction in precision oncology applications. Biophys. Rev. 11(1), 31–39 (2018). https://doi.org/10.1007/s12551-018-0446-z
Bento, A.P., et al.: An open source chemical structure curation pipeline using RDKit. J. Cheminformatics 12(1), 1–16 (2020). https://doi.org/10.1186/s13321-020-00456-1
Carhart, R.E., Smith, D.H., Venkataraghavan, R.: Atom pairs as molecular features in structure-activity studies: definition and applications. J. Chem. Inf. Comput. Sci. 25(2), 64–73 (1985). https://doi.org/10.1021/ci00046a002
Cereto-Massagué, A., Ojeda, M.J., Valls, C., Mulero, M., Garcia-Vallvé, S., Pujadas, G.: Molecular fingerprint similarity search in virtual screening. Methods 71, 58–63 (2015). https://doi.org/10.1016/j.ymeth.2014.08.005
Chollet, F.: Others: Keras (2015). https://keras.io
Cortés-Ciriano, I., Bender, A.: KekuleScope: prediction of cancer cell line sensitivity and compound potency using convolutional neural networks trained on compound images. J. Cheminformatics 11(1), 1–16 (2019). https://doi.org/10.1186/s13321-019-0364-5
Durant, J.L., Leland, B.A., Henry, D.R., Nourse, J.G.: Reoptimization of MDL keys for use in drug discovery. J. Chem. Inf. Comput. Sci. 42(6), 1273–1280 (2002). https://doi.org/10.1021/ci010132r
Duvenaud, D., et al.: Convolutional networks on graphs for learning molecular fingerprints. J. Chem. Inf. Model. 56(2), 399–411 (2015)
Hop, P., Allgood, B., Yu, J.: Geometric deep learning autonomously learns chemical features that outperform those engineered by domain experts. Mol. Pharm. 15(10), 4371–4377 (2018). https://doi.org/10.1021/acs.molpharmaceut.7b01144
Jaeger, S., Fulle, S., Turk, S.: Mol2vec: unsupervised machine learning approach with chemical intuition. J. Chem. Inf. Model. 93(3), 297–312 (2018). https://doi.org/10.1021/acs.jcim.7b00616
Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1746–1751. Association for Computational Linguistics, Stroudsburg, PA, USA (2014). https://doi.org/10.3115/v1/D14-1181
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations (2014)
Kipf, T.N., Welling, M.: Semi-Supervised Classification with Graph Convolutional Networks. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, 24–26 April 2017, Conference Track Proceedings. OpenReview.net (2017)
Landrum, G., Others: RDKit: Open-source cheminformatics (2006)
Mayr, A., et al.: Large-scale comparison of machine learning methods for drug target prediction on ChEMBL. Chem. Sci. 9(24), 5441–5451 (2018). https://doi.org/10.1039/C8SC00148K
Mendez, D., et al.: ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res. 47(D1), D930–D940 (2019). https://doi.org/10.1093/nar/gky1075
Morgan, H.L.: The generation of a unique machine description for chemical structures-a technique developed at chemical abstracts service. J. Chem. Doc. 5(2), 107–113 (1965). https://doi.org/10.1021/c160017a018
Pan, S., Wu, J., Zhu, X., Long, G., Zhang, C.: Finding the best not the most: regularized loss minimization subgraph selection for graph classification. Pattern Recogn. 48(11), 3783–3796 (2015). https://doi.org/10.1016/j.patcog.2015.05.019
Pedregosa, F., et al.: Scikit-learn: machine Learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2012)
Ramsundar, B., Eastman, P., Walters, P., Pande, V., Leswing, K., Wu, Z.: Deep Learning for the Life Sciences. O’Reilly Media, Newton (2019)
Rogers, D., Hahn, M.: Extended-connectivity fingerprints. J. Chem. Inf. Model. 50(5), 742–754 (2010). https://doi.org/10.1021/ci100050t
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., Bengio, Y.: Graph attention networks. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, 30 April–3 May 3, 2018, Conference Track Proceedings. OpenReview.net (2018)
Wu, Z., et al.: MoleculeNet: a benchmark for molecular machine learning. Chem. Sci. 9(2), 513–530 (2018). https://doi.org/10.1039/C7SC02664A
Xiong, Z., et al.: Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism. J. Med. Chem. 63(16), 8749–8760 (2020). https://doi.org/10.1021/acs.jmedchem.9b00959
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Baptista, D., Correia, J., Pereira, B., Rocha, M. (2022). A Comparison of Different Compound Representations for Drug Sensitivity Prediction. In: Rocha, M., Fdez-Riverola, F., Mohamad, M.S., Casado-Vara, R. (eds) Practical Applications of Computational Biology & Bioinformatics, 15th International Conference (PACBB 2021). PACBB 2021. Lecture Notes in Networks and Systems, vol 325. Springer, Cham. https://doi.org/10.1007/978-3-030-86258-9_15
Download citation
DOI: https://doi.org/10.1007/978-3-030-86258-9_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86257-2
Online ISBN: 978-3-030-86258-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)