A Comparison of Different Compound Representations for Drug Sensitivity Prediction

Baptista, Delora; Correia, João; Pereira, Bruno; Rocha, Miguel

doi:10.1007/978-3-030-86258-9_15

A Comparison of Different Compound Representations for Drug Sensitivity Prediction

Delora Baptista¹³,
João Correia¹³,
Bruno Pereira¹³ &
…
Miguel Rocha¹³

Conference paper
First Online: 28 August 2021

396 Accesses
1 Citations
1 Altmetric

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 325))

Abstract

Deep learning (DL) has become increasingly popular in the field of drug discovery. A large variety of end-to-end DL methods for chemical compounds have recently been proposed in the literature, potentially eliminating the need for expert-designed compound representations. This study aims to determine which types of representations and DL algorithms are most suitable for the specific problem of anti-cancer drug response prediction. A newly developed chemoinformatics package called DeepMol was used to benchmark 12 different compound representation methods on 5 anti-cancer drug sensitivity datasets. We found that DL models that are able to learn compound representations directly from SMILES strings or molecular graphs can perform as well as or even better than models trained on molecular fingerprints, even on smaller datasets. We also conclude that popular molecular fingerprints might not always be the best choice and less well-known fingerprints might be worth exploring in future drug response prediction studies.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. In: Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, vol. 16, pp. 265–283 (2016)
Google Scholar
Adam, G., Rampášek, L., Safikhani, Z., Smirnov, P., Haibe-Kains, B., Goldenberg, A.: Machine learning approaches to drug response prediction: challenges and recent progress. NPJ Precis. Oncol. 4(1), 19 (2020). https://doi.org/10.1038/s41698-020-0122-1
Article Google Scholar
Ali, M., Aittokallio, T.: Machine learning and feature selection for drug response prediction in precision oncology applications. Biophys. Rev. 11(1), 31–39 (2018). https://doi.org/10.1007/s12551-018-0446-z
Article Google Scholar
Bento, A.P., et al.: An open source chemical structure curation pipeline using RDKit. J. Cheminformatics 12(1), 1–16 (2020). https://doi.org/10.1186/s13321-020-00456-1
Article Google Scholar
Carhart, R.E., Smith, D.H., Venkataraghavan, R.: Atom pairs as molecular features in structure-activity studies: definition and applications. J. Chem. Inf. Comput. Sci. 25(2), 64–73 (1985). https://doi.org/10.1021/ci00046a002
Article Google Scholar
Cereto-Massagué, A., Ojeda, M.J., Valls, C., Mulero, M., Garcia-Vallvé, S., Pujadas, G.: Molecular fingerprint similarity search in virtual screening. Methods 71, 58–63 (2015). https://doi.org/10.1016/j.ymeth.2014.08.005
Article Google Scholar
Chollet, F.: Others: Keras (2015). https://keras.io
Cortés-Ciriano, I., Bender, A.: KekuleScope: prediction of cancer cell line sensitivity and compound potency using convolutional neural networks trained on compound images. J. Cheminformatics 11(1), 1–16 (2019). https://doi.org/10.1186/s13321-019-0364-5
Article Google Scholar
Durant, J.L., Leland, B.A., Henry, D.R., Nourse, J.G.: Reoptimization of MDL keys for use in drug discovery. J. Chem. Inf. Comput. Sci. 42(6), 1273–1280 (2002). https://doi.org/10.1021/ci010132r
Article Google Scholar
Duvenaud, D., et al.: Convolutional networks on graphs for learning molecular fingerprints. J. Chem. Inf. Model. 56(2), 399–411 (2015)
Google Scholar
Hop, P., Allgood, B., Yu, J.: Geometric deep learning autonomously learns chemical features that outperform those engineered by domain experts. Mol. Pharm. 15(10), 4371–4377 (2018). https://doi.org/10.1021/acs.molpharmaceut.7b01144
Article Google Scholar
Jaeger, S., Fulle, S., Turk, S.: Mol2vec: unsupervised machine learning approach with chemical intuition. J. Chem. Inf. Model. 93(3), 297–312 (2018). https://doi.org/10.1021/acs.jcim.7b00616
Article Google Scholar
Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1746–1751. Association for Computational Linguistics, Stroudsburg, PA, USA (2014). https://doi.org/10.3115/v1/D14-1181
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations (2014)
Google Scholar
Kipf, T.N., Welling, M.: Semi-Supervised Classification with Graph Convolutional Networks. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, 24–26 April 2017, Conference Track Proceedings. OpenReview.net (2017)
Google Scholar
Landrum, G., Others: RDKit: Open-source cheminformatics (2006)
Google Scholar
Mayr, A., et al.: Large-scale comparison of machine learning methods for drug target prediction on ChEMBL. Chem. Sci. 9(24), 5441–5451 (2018). https://doi.org/10.1039/C8SC00148K
Article Google Scholar
Mendez, D., et al.: ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res. 47(D1), D930–D940 (2019). https://doi.org/10.1093/nar/gky1075
Article Google Scholar
Morgan, H.L.: The generation of a unique machine description for chemical structures-a technique developed at chemical abstracts service. J. Chem. Doc. 5(2), 107–113 (1965). https://doi.org/10.1021/c160017a018
Article Google Scholar
Pan, S., Wu, J., Zhu, X., Long, G., Zhang, C.: Finding the best not the most: regularized loss minimization subgraph selection for graph classification. Pattern Recogn. 48(11), 3783–3796 (2015). https://doi.org/10.1016/j.patcog.2015.05.019
Article MATH Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine Learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2012)
MathSciNet MATH Google Scholar
Ramsundar, B., Eastman, P., Walters, P., Pande, V., Leswing, K., Wu, Z.: Deep Learning for the Life Sciences. O’Reilly Media, Newton (2019)
Google Scholar
Rogers, D., Hahn, M.: Extended-connectivity fingerprints. J. Chem. Inf. Model. 50(5), 742–754 (2010). https://doi.org/10.1021/ci100050t
Article Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., Bengio, Y.: Graph attention networks. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, 30 April–3 May 3, 2018, Conference Track Proceedings. OpenReview.net (2018)
Google Scholar
Wu, Z., et al.: MoleculeNet: a benchmark for molecular machine learning. Chem. Sci. 9(2), 513–530 (2018). https://doi.org/10.1039/C7SC02664A
Article Google Scholar
Xiong, Z., et al.: Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism. J. Med. Chem. 63(16), 8749–8760 (2020). https://doi.org/10.1021/acs.jmedchem.9b00959
Article Google Scholar

Download references

Author information

Authors and Affiliations

Centre of Biological Engineering, University of Minho, Campus of Gualtar, Braga, Portugal
Delora Baptista, João Correia, Bruno Pereira & Miguel Rocha

Authors

Delora Baptista
View author publications
You can also search for this author in PubMed Google Scholar
João Correia
View author publications
You can also search for this author in PubMed Google Scholar
Bruno Pereira
View author publications
You can also search for this author in PubMed Google Scholar
Miguel Rocha
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Delora Baptista .

Editor information

Editors and Affiliations

Department de Informática, Universidade do Minho, Braga, Portugal
Miguel Rocha
Superior de Ingeniería Informática, Universidade de Vigo, Escuela, Ourense, Spain
Florentino Fdez-Riverola
Department of Genetics and Genomics, United Arab Emirates University, Abu Dhabi, United Arab Emirates
Mohd Saberi Mohamad
BISITE, Digital Innovation Hub, University of Salamanca, Salamanca, Salamanca, Spain
Roberto Casado-Vara

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Baptista, D., Correia, J., Pereira, B., Rocha, M. (2022). A Comparison of Different Compound Representations for Drug Sensitivity Prediction. In: Rocha, M., Fdez-Riverola, F., Mohamad, M.S., Casado-Vara, R. (eds) Practical Applications of Computational Biology & Bioinformatics, 15th International Conference (PACBB 2021). PACBB 2021. Lecture Notes in Networks and Systems, vol 325. Springer, Cham. https://doi.org/10.1007/978-3-030-86258-9_15

Download citation

DOI: https://doi.org/10.1007/978-3-030-86258-9_15
Published: 28 August 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86257-2
Online ISBN: 978-3-030-86258-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics