Skip to main content

A Comparison of Different Compound Representations for Drug Sensitivity Prediction

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 325))

Abstract

Deep learning (DL) has become increasingly popular in the field of drug discovery. A large variety of end-to-end DL methods for chemical compounds have recently been proposed in the literature, potentially eliminating the need for expert-designed compound representations. This study aims to determine which types of representations and DL algorithms are most suitable for the specific problem of anti-cancer drug response prediction. A newly developed chemoinformatics package called DeepMol was used to benchmark 12 different compound representation methods on 5 anti-cancer drug sensitivity datasets. We found that DL models that are able to learn compound representations directly from SMILES strings or molecular graphs can perform as well as or even better than models trained on molecular fingerprints, even on smaller datasets. We also conclude that popular molecular fingerprints might not always be the best choice and less well-known fingerprints might be worth exploring in future drug response prediction studies.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. In: Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, vol. 16, pp. 265–283 (2016)

    Google Scholar 

  2. Adam, G., Rampášek, L., Safikhani, Z., Smirnov, P., Haibe-Kains, B., Goldenberg, A.: Machine learning approaches to drug response prediction: challenges and recent progress. NPJ Precis. Oncol. 4(1), 19 (2020). https://doi.org/10.1038/s41698-020-0122-1

    Article  Google Scholar 

  3. Ali, M., Aittokallio, T.: Machine learning and feature selection for drug response prediction in precision oncology applications. Biophys. Rev. 11(1), 31–39 (2018). https://doi.org/10.1007/s12551-018-0446-z

    Article  Google Scholar 

  4. Bento, A.P., et al.: An open source chemical structure curation pipeline using RDKit. J. Cheminformatics 12(1), 1–16 (2020). https://doi.org/10.1186/s13321-020-00456-1

    Article  Google Scholar 

  5. Carhart, R.E., Smith, D.H., Venkataraghavan, R.: Atom pairs as molecular features in structure-activity studies: definition and applications. J. Chem. Inf. Comput. Sci. 25(2), 64–73 (1985). https://doi.org/10.1021/ci00046a002

    Article  Google Scholar 

  6. Cereto-Massagué, A., Ojeda, M.J., Valls, C., Mulero, M., Garcia-Vallvé, S., Pujadas, G.: Molecular fingerprint similarity search in virtual screening. Methods 71, 58–63 (2015). https://doi.org/10.1016/j.ymeth.2014.08.005

    Article  Google Scholar 

  7. Chollet, F.: Others: Keras (2015). https://keras.io

  8. Cortés-Ciriano, I., Bender, A.: KekuleScope: prediction of cancer cell line sensitivity and compound potency using convolutional neural networks trained on compound images. J. Cheminformatics 11(1), 1–16 (2019). https://doi.org/10.1186/s13321-019-0364-5

    Article  Google Scholar 

  9. Durant, J.L., Leland, B.A., Henry, D.R., Nourse, J.G.: Reoptimization of MDL keys for use in drug discovery. J. Chem. Inf. Comput. Sci. 42(6), 1273–1280 (2002). https://doi.org/10.1021/ci010132r

    Article  Google Scholar 

  10. Duvenaud, D., et al.: Convolutional networks on graphs for learning molecular fingerprints. J. Chem. Inf. Model. 56(2), 399–411 (2015)

    Google Scholar 

  11. Hop, P., Allgood, B., Yu, J.: Geometric deep learning autonomously learns chemical features that outperform those engineered by domain experts. Mol. Pharm. 15(10), 4371–4377 (2018). https://doi.org/10.1021/acs.molpharmaceut.7b01144

    Article  Google Scholar 

  12. Jaeger, S., Fulle, S., Turk, S.: Mol2vec: unsupervised machine learning approach with chemical intuition. J. Chem. Inf. Model. 93(3), 297–312 (2018). https://doi.org/10.1021/acs.jcim.7b00616

    Article  Google Scholar 

  13. Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1746–1751. Association for Computational Linguistics, Stroudsburg, PA, USA (2014). https://doi.org/10.3115/v1/D14-1181

  14. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations (2014)

    Google Scholar 

  15. Kipf, T.N., Welling, M.: Semi-Supervised Classification with Graph Convolutional Networks. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, 24–26 April 2017, Conference Track Proceedings. OpenReview.net (2017)

    Google Scholar 

  16. Landrum, G., Others: RDKit: Open-source cheminformatics (2006)

    Google Scholar 

  17. Mayr, A., et al.: Large-scale comparison of machine learning methods for drug target prediction on ChEMBL. Chem. Sci. 9(24), 5441–5451 (2018). https://doi.org/10.1039/C8SC00148K

    Article  Google Scholar 

  18. Mendez, D., et al.: ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res. 47(D1), D930–D940 (2019). https://doi.org/10.1093/nar/gky1075

    Article  Google Scholar 

  19. Morgan, H.L.: The generation of a unique machine description for chemical structures-a technique developed at chemical abstracts service. J. Chem. Doc. 5(2), 107–113 (1965). https://doi.org/10.1021/c160017a018

    Article  Google Scholar 

  20. Pan, S., Wu, J., Zhu, X., Long, G., Zhang, C.: Finding the best not the most: regularized loss minimization subgraph selection for graph classification. Pattern Recogn. 48(11), 3783–3796 (2015). https://doi.org/10.1016/j.patcog.2015.05.019

    Article  MATH  Google Scholar 

  21. Pedregosa, F., et al.: Scikit-learn: machine Learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2012)

    MathSciNet  MATH  Google Scholar 

  22. Ramsundar, B., Eastman, P., Walters, P., Pande, V., Leswing, K., Wu, Z.: Deep Learning for the Life Sciences. O’Reilly Media, Newton (2019)

    Google Scholar 

  23. Rogers, D., Hahn, M.: Extended-connectivity fingerprints. J. Chem. Inf. Model. 50(5), 742–754 (2010). https://doi.org/10.1021/ci100050t

    Article  Google Scholar 

  24. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

  25. Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., Bengio, Y.: Graph attention networks. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, 30 April–3 May 3, 2018, Conference Track Proceedings. OpenReview.net (2018)

    Google Scholar 

  26. Wu, Z., et al.: MoleculeNet: a benchmark for molecular machine learning. Chem. Sci. 9(2), 513–530 (2018). https://doi.org/10.1039/C7SC02664A

    Article  Google Scholar 

  27. Xiong, Z., et al.: Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism. J. Med. Chem. 63(16), 8749–8760 (2020). https://doi.org/10.1021/acs.jmedchem.9b00959

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Delora Baptista .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Baptista, D., Correia, J., Pereira, B., Rocha, M. (2022). A Comparison of Different Compound Representations for Drug Sensitivity Prediction. In: Rocha, M., Fdez-Riverola, F., Mohamad, M.S., Casado-Vara, R. (eds) Practical Applications of Computational Biology & Bioinformatics, 15th International Conference (PACBB 2021). PACBB 2021. Lecture Notes in Networks and Systems, vol 325. Springer, Cham. https://doi.org/10.1007/978-3-030-86258-9_15

Download citation

Publish with us

Policies and ethics