Skip to main content
Log in

Machine learning of chemical reactivity from databases of organic reactions

  • Published:
Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

Abstract

Databases of chemical reactions contain knowledge about the reactivity of specific reagents. Although information is in general only explicitly available for compounds reported to react, it is possible to derive information about substructures that do not react in the reported reactions. Both types of information (positive and negative) can be used to train machine learning techniques to predict if a compound reacts or not with a specific reagent. The whole process was implemented with two databases of reactions, one involving BuNH2 as the reagent, and the other NaCNBH3. Negative information was derived using MOLMAP molecular descriptors, and classification models were developed with Random Forests also based on MOLMAP descriptors. MOLMAP descriptors were based exclusively on calculated physicochemical features of molecules. Correct predictions were achieved for ∼90% of independent test sets. While NaCNBH3 is a selective reducing reagent widely used in organic synthesis, BuNH2 is a nucleophile that mimics the reactivity of the lysine side chain (involved in an initiating step of the mechanism leading to skin sensitization).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Abbreviations

MOLMAP:

MOLecular maps of atom-level properties

BuNH2 :

Butylamine

RF:

Random forest

VOC:

Volatile organic compounds

QSAR:

Quantitative structure activity relationship

OOB:

Out of bag

SVM:

Support vector machines

ROC:

Receiver operating characteristic

SOM:

Self organizing maps

HTS:

High-throughput screening

References

  1. Aptula AO, Patlewicz G, Roberts DW (2005) Chem Res Toxicol 18:1420. doi:10.1021/tx050075m

    Article  CAS  Google Scholar 

  2. Benigni R (2005) Chem Rev 105:1767. doi:10.1021/cr030049y

    Article  CAS  Google Scholar 

  3. Metz JT, Huth JR, Hajduk PJ (2007) J Comput Aided Mol Des 21:139. doi:10.1007/s10822-007-9109-z

    Article  CAS  Google Scholar 

  4. http://ec.europa.eu/environment/chemicals/reach/reach_intro.htm

  5. Directive 2003/15/EC of the European Parliament and of the Council of 27 February 2003 amending Council Directive 76/768/EEC. OJ L066, 26–35, 11 March 2003

  6. Lilienblum W, Dekant W, Foth H, Gebel T, Hengstler JG, Kahl R, Kramer P-J, Schweinfurth H, Wollin K-M (2008) Arch Toxicol 82:211. doi:10.1007/s00204-008-0279-9

    Article  CAS  Google Scholar 

  7. Aptula AO, Patlewicz G, Roberts DW, Schultz TW (2006) Toxicol In Vitro 20:239. doi:10.1016/j.tiv.2005.07.003

    Article  CAS  Google Scholar 

  8. Gerberick GF, Vassallo JD, Bailey RE, Chaney JG, Morrall SW, Lepoittevin J-P (2004) Toxicol Sci 81:332. doi:10.1093/toxsci/kfh213

    Article  CAS  Google Scholar 

  9. Gerberick GF, Vassallo JD, Foertsch LM, Price BB, Chaney JG, Lepoittevin J-P (2007) Toxicol Sci 97:427. doi:10.1093/toxsci/kfm064

    Article  Google Scholar 

  10. Natsch A, Emter R, Ellis G (2009) Toxicol Sci 107:106. doi:10.1093/toxsci/kfn204

    Article  CAS  Google Scholar 

  11. Patlewicz G, Aptula AO, Roberts DW, Uriarte E (2008) QSAR Comb Sci 27:60. doi:10.1002/qsar.200710067

    Article  CAS  Google Scholar 

  12. Gramatica P, Pilutti P, Papa E (2004) Atmos Environ 38:6167. doi:10.1016/j.atmosenv.2004.07.026

    Article  CAS  Google Scholar 

  13. Chaudry UA, Popelier PLA (2003) J Phys Chem A 107:4578. doi:10.1021/jp034272a

    Article  CAS  Google Scholar 

  14. Zhang H, Qu X, Ando H (2005) J Mol Struct THEOCHEM 725:31. doi:10.1016/j.theochem.2005.02.086

    Article  CAS  Google Scholar 

  15. Hiob R, Karelson M (2000) J Chem Inf Comput Sci 40:1062. doi:10.1021/ci0004457

    CAS  Google Scholar 

  16. Meylan WM, Howard PH (2003) Environ Toxicol Chem 22:1724. doi:10.1897/01-275

    Article  CAS  Google Scholar 

  17. Gramatica P, Consonni V, Todeschini R (1999) Chemosphere 38:1371. doi:10.1016/S0045-6535(98)00539-6

    Article  CAS  Google Scholar 

  18. Atkinson R (1998) Environ Toxicol Chem 7:435. doi:10.1897/1552-8618(1988)7[435:EOGHRR]2.0.CO;2

    Article  Google Scholar 

  19. Gramatica P, Pilutti P, Papa E (2004) J Chem Inf Comput Sci 44:1794

    CAS  Google Scholar 

  20. Klamt A (1993) Chemosphere 26:1273. doi:10.1016/0045-6535(93)90181-4

    Article  CAS  Google Scholar 

  21. Fatemi MH (2006) Anal Chim Acta 556:355. doi:10.1016/j.aca.2005.09.033

    Article  CAS  Google Scholar 

  22. Huth JR, Mendoza R, Olejniczak ET, Johnson RW, Cothron DA, Liu Y, Lerner CG, Chen J, Hajduk PJ (2005) J Am Chem Soc 127:217

    Article  CAS  Google Scholar 

  23. Satoh H, Itono S, Funatsu K, Takano K, Nakata TA (1999) J Chem Inf Comput Sci 39:671. doi:10.1021/ci9801567

    CAS  Google Scholar 

  24. Satoh H, Funatsu K, Takano K, Nakata T (2000) Bull Chem Soc Jpn 73:1955. doi:10.1246/bcsj.73.1955

    Article  CAS  Google Scholar 

  25. Simon V, Gasteiger J, Zupan J (1993) J Am Chem Soc 115:9148. doi:10.1021/ja00073a034

    Article  CAS  Google Scholar 

  26. Gupta S, Mathew S, Abreu PM, Aires-de-Sousa J (2006) Bioorg Med Chem 14:1199. doi:10.1016/j.bmc.2005.09.047

    Article  CAS  Google Scholar 

  27. Zhang Q, Aires-de-Sousa J (2007) J Chem Inf Model 47:1. doi:10.1021/ci050520j

    Article  Google Scholar 

  28. Zhang Q-Y, Aires-de-Sousa J (2005) J Chem Inf Model 45:1775. doi:10.1021/ci0502707

    Article  CAS  Google Scholar 

  29. Latino DARS, Aires-de-Sousa J (2006) Angew Chem Int Ed 45:2066. doi:10.1002/anie.200503833

    Article  CAS  Google Scholar 

  30. Latino DARS, Zhang Q-Y, Aires-de-Sousa J (2008) Bioinformatics 24:2236. doi:10.1093/bioinformatics/btn405

    Article  CAS  Google Scholar 

  31. http://www2.chemie.uni-erlangen.de/software/petra/

  32. Kohonen T (1998) Self-Organization and Associative Memory. Springer, Berlin

    Google Scholar 

  33. Breiman L (2001) Mach Learn 45:5. doi:10.1023/A:1010933404324

    Article  Google Scholar 

  34. Svetnik V, Liaw A, Tong C, Culberson JC, Sheridan RP, Feuston BPJ (2003) Chem Inf Comput Sci 43:1947

    CAS  Google Scholar 

  35. R Development Core Team (2004). R: A language and environment for statistical computing. R foundation for statistical computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org

  36. Fortran original by Leo Breiman, Adele Cutler, R port by Andy Liaw and Matthew Wiener. (2004). http://www.stat.berkeley.edu/users/breiman/

  37. Clayden J, Greeves N, Warren S, Wothers P (2001) Organic Chemistry. Oxford University Press, Oxford

    Google Scholar 

Download references

Acknowledgments

G.C. and S.G. acknowledge Fundação para a Ciência e Tecnologia (Lisbon, Portugal) for financial support under grants SFRH/BD/18354/2004 and SFRH/BPD/14475/2003. Molecular Networks GmbH (Erlangen, Germany) and Infochem (Munich, Germany) are acknowledged for access to the PETRA program and to subsets of chemical reactions from the SPRESI database, respectively.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to João Aires-de-Sousa.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (PDF 80 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Carrera, G.V.S.M., Gupta, S. & Aires-de-Sousa, J. Machine learning of chemical reactivity from databases of organic reactions. J Comput Aided Mol Des 23, 419–429 (2009). https://doi.org/10.1007/s10822-009-9275-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10822-009-9275-2

Keywords

Navigation