Skip to main content

Advertisement

Log in

Structure–reactivity modeling using mixture-based representation of chemical reactions

  • Published:
Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

Abstract

We describe a novel approach of reaction representation as a combination of two mixtures: a mixture of reactants and a mixture of products. In turn, each mixture can be encoded using an earlier reported approach involving simplex descriptors (SiRMS). The feature vector representing these two mixtures results from either concatenated product and reactant descriptors or the difference between descriptors of products and reactants. This reaction representation doesn’t need an explicit labeling of a reaction center. The rigorous “product-out” cross-validation (CV) strategy has been suggested. Unlike the naïve “reaction-out” CV approach based on a random selection of items, the proposed one provides with more realistic estimation of prediction accuracy for reactions resulting in novel products. The new methodology has been applied to model rate constants of E2 reactions. It has been demonstrated that the use of the fragment control domain applicability approach significantly increases prediction accuracy of the models. The models obtained with new “mixture” approach performed better than those required either explicit (Condensed Graph of Reaction) or implicit (reaction fingerprints) reaction center labeling.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Chen WL, Chen DZ, Taylor KT (2013) Automatic reaction mapping and reaction center detection. Wiley Interdiscip Rev Comput Mol Sci 3(6):560–593. doi:10.1002/wcms.1140

    Article  CAS  Google Scholar 

  2. Zhang J, Kleinöder T, Gasteiger J (2006) Prediction of pKa values for aliphatic carboxylic acids and alcohols with empirical atomic charge descriptors. J Chem Inf Model 46(6):2256–2266. doi:10.1021/ci060129d

    Article  CAS  Google Scholar 

  3. Gasteiger J, Hondelmann U, Rose P, Witzenbichler W (1995) Computer-assisted prediction of the degradation of chemicals: hydrolysis of amides and benzoylphenylureas. J Chem Soc Perkin Trans 2(2):193–204. doi:10.1039/p29950000193

    Article  Google Scholar 

  4. Varnek A, Fourches D, Horvath D, Klimchuk O, Gaudin C, Vayer P, Solov’ev V, Hoonakker F, Tetko IV, Marcou G (2008) ISIDA—platform for virtual screening based on fragment and pharmacophoric descriptors. Curr Comput Aided Drug Des 4(3):191–198. doi:10.2174/157340908785747465

    Article  CAS  Google Scholar 

  5. Ruggiu F, Marcou G, Varnek A, Horvath D (2010) ISIDA property-labelled fragment descriptors. Mol Inform 29(12):855–868. doi:10.1002/minf.201000099

    Article  CAS  Google Scholar 

  6. Varnek A, Fourches D, Hoonakker F, Solov’ev VP (2005) Substructural fragments: an universal language to encode reactions, molecular and supramolecular structures. J Comput Aided Mol Des 19(9):693–703. doi:10.1007/s10822-005-9008-0

    Article  CAS  Google Scholar 

  7. Hoonakker F, Lachiche N, Varnek A, Wagner A (2011) A representation to apply usual data mining techniques to chemical reactions—illustration on the rate constant of SN2 reactions in water. Int J Artif Intell Tools 20(02):253–270. doi:10.1142/S0218213011000140

    Article  Google Scholar 

  8. de Luca A, Horvath D, Marcou G, Solov’ev V, Varnek A (2012) Mining chemical reactions using neighborhood behavior and condensed graphs of reactions approaches. J Chem Inf Model 52(9):2325–2338. doi:10.1021/ci300149n

    Article  Google Scholar 

  9. Madzhidov TI, Polishchuk PG, Nugmanov RI, Bodrov AV, Lin AI, Baskin II, Varnek AA, Antipin IS (2014) Structure-reactivity relationships in terms of the condensed graphs of reactions. Russ J Org Chem 50(4):459–463. doi:10.1134/S1070428014040010

    Article  CAS  Google Scholar 

  10. Nugmanov RI, Madzhidov TI, Haliullina GR, Baskin II, Antipin IS, Varnek A (2014) Development of “structure-reactivity” models for nucleophilic substitution reactions with participation of azides. J Struct Chem 55(6):1080–1087

    Article  Google Scholar 

  11. Madzhidov T, Bodrov A, Gimadiev T, Nugmanov R, Antipin I, Varnek A (2015) Obtaining structure-reactivity relationships for bimolecular elimination reactions with Condensed Reaction Graph approach. J Struct Chem 56(7):1227–1234

    Article  CAS  Google Scholar 

  12. Marcou G, Aires de Sousa J, Latino DARS, de Luca A, Horvath D, Rietsch V, Varnek A (2015) Expert system for predicting reaction conditions: the michael reaction case. J Chem Inf Model 55(2):239–250. doi:10.1021/ci500698a

    Article  CAS  Google Scholar 

  13. Faulon J-L, Visco DP, Pophale RS (2003) The signature molecular descriptor. 1. Using extended valence sequences in QSAR and QSPR studies. J Chem Inf Comput Sci 43(3):707–720. doi:10.1021/ci020345w

    Article  CAS  Google Scholar 

  14. Ridder L, Wagener M (2008) SyGMa: combining expert knowledge and empirical scoring in the prediction of metabolites. ChemMedChem 3(5):821–832. doi:10.1002/cmdc.200700312

    Article  CAS  Google Scholar 

  15. Schneider N, Lowe DM, Sayle RA, Landrum GA (2015) Development of a novel fingerprint for chemical reactions and its application to large-scale reaction classification and similarity. J Chem Inf Model 55(1):39–53. doi:10.1021/ci5006614

    Article  CAS  Google Scholar 

  16. Zhang Q-Y, Aires-de-Sousa J (2005) Structure-based classification of chemical reactions without assignment of reaction centers. J Chem Inf Model 45(6):1775–1783. doi:10.1021/ci0502707

    Article  CAS  Google Scholar 

  17. Kravtsov AA, Karpov PV, Baskin II, Palyulin VA, Zefirov NS (2011) Prediction of rate constants of SN2 reactions by the multicomponent QSPR method. Dokl Chem 440 (2):299–301. doi:10.1134/s0012500811100107

    Article  CAS  Google Scholar 

  18. Faulon J-L, Misra M, Martin S, Sale K, Sapra R (2008) Genome scale enzyme—metabolite and drug—target interaction predictions using the signature molecular descriptor. Bioinformatics 24(2):225–233. doi:10.1093/bioinformatics/btm580

    Article  CAS  Google Scholar 

  19. Kravtsov AA, Karpov PV, Baskin II, Palyulin VA, Zefirov NS (2011) Prediction of the preferable mechanism of nucleophilic substitution at saturated carbon atom and prognosis of S N 1 rate constants by means of QSPR. Dokl Chem 441 (1):314–317. doi:10.1134/s0012500811110048

    Article  CAS  Google Scholar 

  20. Muller C, Marcou G, Horvath D, Aires-de-Sousa J, Varnek A (2012) Models for identification of erroneous atom-to-atom mapping of reactions performed by automated algorithms. J Chem Inf Model 52(12):3116–3122. doi:10.1021/ci300418q

    Article  CAS  Google Scholar 

  21. Patel H, Bodkin MJ, Chen B, Gillet VJ (2009) Knowledge-based approach to de novo design using reaction vectors. J Chem Inf Model 49(5):1163–1184. doi:10.1021/ci800413m

    Article  CAS  Google Scholar 

  22. Oprisiu I, Varlamova E, Muratov E, Artemenko A, Marcou G, Polishchuk P, Kuz’min V, Varnek A (2012) QSPR approach to predict nonadditive properties of mixtures. Application to bubble point temperatures of binary mixtures of liquids. Mol Inform 31(6–7):491–502. doi:10.1002/minf.201200006

    Article  CAS  Google Scholar 

  23. Palm VA (1974–1978) Tables of rate and equilibrium constants of heterolytic organic reactions, vol 1–5. Moscow

  24. Catalán J, Díaz C (1997) A generalized solvent acidity scale: the solvatochromism of o-tert-butylstilbazolium betaine dye and its homomorph o, o′-di-tert-butylstilbazolium betaine dye. Liebigs Ann 1997 (9):1941–1949. doi:10.1002/jlac.199719970921

    Article  Google Scholar 

  25. Catalán J, Díaz C, López V, Pérez P, De Paz J-LG, Rodríguez JG (1996) A generalized solvent basicity scale: the solvatochromism of 5-nitroindoline and its homomorph 1-methyl-5-nitroindoline. Liebigs Ann 1996 (11):1785–1794. doi:10.1002/jlac.199619961112

    Article  Google Scholar 

  26. Catalán J, López V, Pérez P, Martin-Villamil R, Rodríguez J-G (1995) Progress towards a generalized solvent polarity scale: The solvatochromism of 2-(dimethylamino)-7-nitrofluorene and its homomorph 2-fluoro-7-nitrofluorene. Liebigs Ann 1995 (2):241–252. doi:10.1002/jlac.199519950234

    Article  Google Scholar 

  27. Taft RW, Kamlet MJ (1976) The solvatochromic comparison method. 2. The .alpha.-scale of solvent hydrogen-bond donor (HBD) acidities. J Am Chem Soc 98(10):2886–2894. doi:10.1021/ja00426a036

    Article  CAS  Google Scholar 

  28. Kamlet MJ, Taft RW (1976) The solvatochromic comparison method. I. The .beta.-scale of solvent hydrogen-bond acceptor (HBA) basicities. J Am Chem Soc 98(2):377–383. doi:10.1021/ja00418a009

    Article  CAS  Google Scholar 

  29. Kamlet MJ, Abboud JL, Taft RW (1977) The solvatochromic comparison method. 6. The .pi.* scale of solvent polarities. J Am Chem Soc 99(18):6027–6038. doi:10.1021/ja00460a031

    Article  CAS  Google Scholar 

  30. cxcalc. 5.4 edn. Chemaxon, Budapest, Hungary

  31. Kuz’min VE, Artemenko AG, Muratov EN (2008) Hierarchical QSAR technology based on the Simplex representation of molecular structure. J Comput Aided Mol Des 22(6–7):403–421. doi:10.1007/s10822-008-9179-6

    Article  Google Scholar 

  32. Kuz’min VE, Artemenko AG, Polischuk PG, Muratov EN, Khromov AI, Liahovskiy AV, Andronati SA, Makan SY (2005) Hierarchic system of QSAR models (1D-4D) on the base of simplex representation of molecular structure. J Mol Model 11:457–467. doi:10.1007/s00894-005-0237-x

    Article  Google Scholar 

  33. RDKit: Open-Source Cheminformatics. http://www.rdkit.org

  34. Carhart RE, Smith DH, Venkataraghavan R (1985) Atom pairs as molecular features in structure-activity studies: definition and applications. J Chem Inf Comput Sci 25(2):64–73. doi:10.1021/ci00046a002

    Article  CAS  Google Scholar 

  35. Rogers D, Hahn M (2010) Extended-Connectivity Fingerprints. J Chem Inf Model 50(5):742–754. doi:10.1021/ci100050t

    Article  CAS  Google Scholar 

  36. Nilakantan R, Bauman N, Dixon JS, Venkataraghavan R (1987) Topological torsion: a new molecular descriptor for SAR applications. Comparison with other descriptors. J Chem Inf Comput Sci 27(2):82–85. doi:10.1021/ci00054a008

    Article  CAS  Google Scholar 

  37. Liaw A, Wiener M (2002) Classification and regression by randomForest. R News 2(3):18–22

    Google Scholar 

  38. Max Kuhn. Contributions from Jed Wing and Steve Weston and Andre Williams and Chris Keefer and Allan Engelhardt and Tony Cooper and Zachary Mayer and the R Core Team caret: Classification and Regression Training (2014). R package version 6.0–30 edn.

Download references

Acknowledgements

This work was supported by Russian Science Foundation, Grant No. 14-43-00024.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Pavel Polishchuk, Timur Madzhidov or Alexandre Varnek.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Polishchuk, P., Madzhidov, T., Gimadiev, T. et al. Structure–reactivity modeling using mixture-based representation of chemical reactions. J Comput Aided Mol Des 31, 829–839 (2017). https://doi.org/10.1007/s10822-017-0044-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10822-017-0044-3

Keywords

Navigation