Skip to main content

Machine Learning to Predict Toxicity of Compounds

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11139))

Abstract

Toxicology studies are subject to several concerns, and they raise the importance of an early detection of the potential for toxicity of chemical compounds which is currently evaluated through in vitro assays assessing their bioactivity, or using costly and ethically questionable in vivo tests on animals. Thus we investigate the prediction of the bioactivity of chemical compounds from their physico-chemical structure, and propose that it be automated using machine learning (ML) techniques based on data from in vitro assessment of several hundred chemical compounds. We provide the results of tests with this approach using several ML techniques, using both a restricted dataset and a larger one. Since the available empirical data is unbalanced, we also use data augmentation techniques to improve the classification accuracy, and present the resulting improvements.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Akbani, R., Kwek, S., Japkowicz, N.: Applying support vector machines to imbalanced datasets. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 39–50. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30115-8_7

    Chapter  Google Scholar 

  2. Breiman, L.: Random Forests. Mach. Learn. 45, 5–32 (2001)

    Article  Google Scholar 

  3. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    Article  Google Scholar 

  4. Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794. ACM (2016)

    Google Scholar 

  5. Chollet, F., et al.: Keras (2015). https://github.com/fchollet/keras

  6. Cramer, C.E., Gelenbe, E.: Video quality and traffic QoS in learning-based subsampled and receiver-interpolated video sequences. IEEE J. Sel. Areas Commun. 18(2), 150–167 (2000)

    Article  Google Scholar 

  7. Dix, D.J., Houck, K.A., Martin, M.T., Richard, A.M., Setzer, R.W., Kavlock, R.J.: The ToxCast program for prioritizing toxicity testing of environmental chemicals. Toxicol. Sci. 95(1), 5–12 (2007)

    Article  Google Scholar 

  8. Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29(5), 1189–1232 (2001). https://doi.org/10.1214/aos/1013203451

    Article  MathSciNet  MATH  Google Scholar 

  9. Gelenbe, E.: Learning in the recurrent random neural network. Neural Comput. 5(1), 154–164 (1993)

    Article  Google Scholar 

  10. Gelenbe, E., Mao, Z.H., Li, Y.D.: Function approximation with spiked random networks. IEEE Trans. Neural Netw. 10(1), 3–9 (1999)

    Article  Google Scholar 

  11. Gelenbe, E.: Réseaux neuronaux aléatoires stables. Comptes rendus de l’Académie des Sciences. Série 2, Mécanique, Physique, Chimie, Sciences de l’Univers, Sciences de la Terre 310(3), 177–180 (1990)

    Google Scholar 

  12. Gelenbe, E.: A class of genetic algorithms with analytical solution. Rob. Auton. Syst. 22, 59–64 (1997)

    Article  Google Scholar 

  13. Gelenbe, E.: Learning in genetic algorithms. In: Sipper, M., Mange, D., Pérez-Uribe, A. (eds.) ICES 1998. LNCS, vol. 1478, pp. 268–279. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0057628

    Chapter  Google Scholar 

  14. Gelenbe, E., Yin, Y.: Deep learning with dense random neural networks. In: Gruca, A., Czachórski, T., Harezlak, K., Kozielski, S., Piotrowska, A. (eds.) ICMMI 2017. AISC, vol. 659, pp. 3–18. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-67792-7_1

    Chapter  Google Scholar 

  15. Goh, G.B., Hodas, N.O., Vishnu, A.: Deep learning for computational chemistry. J. Comput. Chem. 38(16), 1291–1307 (2017)

    Article  Google Scholar 

  16. He, H., Garcia, E.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)

    Article  Google Scholar 

  17. Hansch, C.: Quantitative structure-activity relationships and the unnamed science. Acc. Chem. Res. 26(4), 147–153 (1993)

    Article  Google Scholar 

  18. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)

    Article  Google Scholar 

  19. Lemaître, G., Nogueira, F., Aridas, C.K.: Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. J. Mach. Learn. Res. 18(17), 1–5 (2017)

    MATH  Google Scholar 

  20. Martin, M.T., Judson, R.S., Reif, D.M., Kavlock, R.J., Dix, D.J.: Profiling chemicals based on chronic toxicity results from the U.S. EPA ToxRef database. Environ. Health Perspect. 117(3), 392–399 (2009)

    Article  Google Scholar 

  21. Rogers, D., Hahn, M.: Extended-connectivity fingerprints. J. Chem. Inf. Model. 50(5), 742–754 (2010)

    Article  Google Scholar 

  22. Schultz, T.W., Hewitt, M., Netzeva, T.I., Cronin, M.T.D.: Assessing applicability domains of toxicological QSARs: definition, confidence in predicted values, and the role of mechanisms of action. QSAR Comb. Sci. 26(2), 238–254 (2007)

    Article  Google Scholar 

  23. Sipes, N.S., et al.: Predictive models of prenatal developmental toxicity from ToxCast high-throughput screening data. Toxicol. Sci. 124(1), 109–127 (2011)

    Article  Google Scholar 

  24. Thomas, R.S., et al.: A comprehensive statistical analysis of predicting in vivo hazard using high-throughput in vitro screening. Toxicol. Sci. 128(2), 398–417 (2012)

    Article  Google Scholar 

  25. Yin, Y., Gelenbe, E.: Single-cell based random neural network for deep learning. In: 2017 International Joint Conference on Neural Networks (IJCNN), pp. 86–93 (2017)

    Google Scholar 

  26. Yin, Y., Wang, L., Gelenbe, E.: Multi-layer neural networks for quality of service oriented server-state classification in cloud servers. In: 2017 International Joint Conference on Neural Networks (IJCNN), pp. 1623–1627 (2017)

    Google Scholar 

  27. Zang, Q., Rotroff, D.M., Judson, R.S.: Binary classification of a large collection of environmental chemicals from estrogen receptor assays by quantitative structure-activity relationship and machine learning methods. J. Chem. Inf. Model. 53(12), 3244–3261 (2013)

    Article  Google Scholar 

  28. Zhang, Y., Yin, Y., Guo, D., Yu, X., Xiao, L.: Cross-validation based weights and structure determination of chebyshev-polynomial neural networks for pattern classification. Pattern Recogn. 47(10), 3414–3428 (2014)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ingrid Grenet .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Grenet, I., Yin, Y., Comet, JP., Gelenbe, E. (2018). Machine Learning to Predict Toxicity of Compounds. In: Kůrková, V., Manolopoulos, Y., Hammer, B., Iliadis, L., Maglogiannis, I. (eds) Artificial Neural Networks and Machine Learning – ICANN 2018. ICANN 2018. Lecture Notes in Computer Science(), vol 11139. Springer, Cham. https://doi.org/10.1007/978-3-030-01418-6_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-01418-6_33

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-01417-9

  • Online ISBN: 978-3-030-01418-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics