Abstract
In this study, we use machine learning algorithms with QM-derived COSMO-RS descriptors, along with Morgan fingerprints, to predict the absolute solubility of drug-like compounds. The QM-derived descriptors account for the molecular properties of the solute, i.e., the solute–solute interactions in an artificial-liquid-state (super-cooled liquid), and the solute–solvent interactions in solution. We employ two main approaches to predict solubility: (i) a hypothetical pathway that involves melting the solute at room temperature T = T¯ (\({\Delta }_{fus}{G}_{A}^{\ominus }\)) and mixing the artificially liquid solute into the solvent (\({\Delta }_{m}{G}_{\left(A:B\right)}^{\ominus }\)). In this approach \({\Delta }_{fus}{G}_{A}^{\ominus }\) is predicted using machine learning models, and the \({\Delta }_{m}{G}_{\left(A:B\right)}^{\ominus }\) is obtained from COSMO-RS calculations; (ii) direct solubility prediction using machine learning algorithms. The models were trained on a large number of Bayer in-house compounds for which water solubility data is available at physiological pH of 6.5 and ambient temperature. We also evaluated our models using external datasets from a solubility challenge. Our models present great improvements compared to the absolute solubility prediction with the QSAR model for the artificial liquid state as implemented in the COSMOtherm software, for both in-house and external datasets. We are furthermore able to demonstrate the superiority of QM-derived descriptors compared to cheminformatics descriptors. We finally present low-cost alternative models using fragment-based COSMOquick calculations with only marginal reduction in the quality of predicted solubility.
Similar content being viewed by others
Data availability
The online version contains supplementary material available at …
References
Chung TDY, Terry DB, Smith LH (2004) In vitro and in vivo assessment of ADME and PK properties during lead selection and lead optimization—guidelines, benchmarks and rules of thumb. In: Markossian S, Grossman A, Brimacombe K, Arkin M, Auld D, Austin C, Baell J, Chung TDY, Coussens NP, Dahlin JL, Devanarayan V, Foley TL, Glicksman M, Haas JV, Hall MD, Hoare S, Inglese J, Iversen PW, Kales SC, Lal-Nag M, Li Z, McGee J, McManus O, Riss T, Saradjian P, Sittampalam GS, Tarselli M, Trask OJ Jr, Wang Y, Weidner JR, Wildey MJ, Wilson K, Xia M, Xu X (eds) Assay guidance manual. Bethesda
Clark DE, Grootenhuis PD (2002) Progress in computational methods for the prediction of ADMET properties. Curr Opin Drug Discov Devel 5(3):382–390
Dearden JC (2007) In silico prediction of ADMET properties: how far have we come? Expert Opin Drug Metab Toxicol 3(5):635–639
Göller AH, Kuhnke L, Montanari F, Bonin A, Schneckener S, ter Laak A, Wichard J, Lobell M, Hillisch A (2020) Bayer’s in silico ADMET platform: a journey of machine learning over the past two decades. Drug Discov Today 25(9):1702–1709
Göller AH, Kuhnke L, ter Laak A, Meier K, Hillisch A (2022) Machine learning applied to the modeling of pharmacological and ADMET absorption, distribution, metabolism, excretion and toxicity (ADMET) endpoints. In: Heifetz A (ed) Artificial intelligence in drug design. New York, Springer, pp 61–101
Kier LB, Hall LH (2005) The prediction of ADMET properties using structure information representations. Chem Biodivers 2(11):1428–1437
Lucas AJ, Sproston JL, Barton P, Riley RJ (2019) Estimating human ADME properties, pharmacokinetic parameters and likely clinical dose in drug discovery. Expert Opin Drug Discov 14(12):1313–1327
Norinder U, Bergstrom CA (2006) Prediction of ADMET properties. ChemMedChem 1(9):920–937
Oliferenko PV, Oliferenko AA, Poda G, Palyulin VA, Zefirov NS, Katritzky AR (2009) New developments in hydrogen bonding acidity and basicity of small organic molecules for the prediction of physical and ADMET properties: part 2—the universal solvation equation. J Chem Inf Model 49(3):634–646
Zhou SF, Zhong WZ (2017) Drug design and discovery: principles and applications. Molecules 22(2):279
Eleftheriadou D, Luette S, Kneuer C (2019) In silico prediction of dermal absorption of pesticides—an evaluation of selected models against results from in vitro testing. SAR QSAR Environ Res 30(8):561–585
Elliott JR, Compton RG (2022) Modeling transcuticular uptake from particle-based formulations of lipophilic products. ACS Agric Sci Technol 2(3):603–614
Khayet M, Fernandez V (2012) Estimation of the solubility parameters of model plant surfaces and agrochemicals: a valuable tool for understanding plant surface interactions. Theor Biol Med Model 9:45
Xiao S, Gong Y, Li Z, Fantke P (2021) Improving pesticide uptake modeling into potatoes: considering tuber growth dynamics. J Agric Food Chem 69(12):3607–3616
Avdeef A, Fuguet E, Llinàs A, Ràfols C, Bosch E, Völgyi G, Verbić T, Boldyreva E, Takács-Novák K (2016) Equilibrium solubility measurement of ionizable drugs–consensus recommendations for improving data quality. ADMET and DMPK 4(2):117–178
Fink C, Sun DJ, Wagner K, Schneider M, Bauer H, Dolgos H, Mader K, Peters SA (2020) Evaluating the role of solubility in oral absorption of poorly water-soluble drugs using physiologically-based pharmacokinetic modeling. Clin Pharmacol Ther 107(3):650–661
Llinas A, Avdeef A (2019) Solubility challenge revisited after ten years, with multilab shake-flask data, using tight (SD ∼ 0.17 log) and loose (SD ∼ 0.62 log) test sets. J Chem Inf Model 59(6):3036–3040
Ono A, Matsumura N, Kimoto T, Akiyama Y, Funaki S, Tamura N, Hayashi S, Kojima Y, Fushimi M, Sudaki H, Aihara R, Haruna Y, Jiko M, Iwasaki M, Fujita T, Sugano K (2019) Harmonizing solubility measurement to lower inter-laboratory variance—progress of consortium of biopharmaceutical tools (CoBiTo) in Japan. ADMET DMPK 7(3):183–195
Kuramochi H, Kawamoto K (2006) Modification of UNIFAC parameter table revision 5 for representation of aqueous solubility and 1-octanol/water partition coefficient for POPs. Chemosphere 63(4):698–706
Banerjee S, Howard PH (1988) Improved estimation of solubility and partitioning through correction of UNIFAC-derived activity coefficients. Environ Sci Technol 22(7):839–841
Arbuckle WB (1986) Using UNIFAC to calculate aqueous solubilities. Environ Sci Technol 20(10):1060–1064
Ochsner AB, Sokoloski TD (1985) Prediction of solubility in nonideal multicomponent systems using the UNIFAC group contribution model. J Pharm Sci 74(6):634–637
Banerjee S (1985) Calculation of water solubility of organic compounds with UNIFAC-derived parameters. Environ Sci Technol 19(4):369–370
Fredenslund A, Jones RL, Prausnitz JM (1975) Group-contribution estimation of activity-coefficients in nonideal liquid-mixtures. Aiche J 21(6):1086–1099
Hildebrand, J. H., Solubility of non-electrolytes. 1936, 2nd ed. Pp. 203. New York: Reinhold Publishing Corp., London: Chapman & Hall, Ltd. 22s. 6d
Hildebrand JH (1949) A critique of the theory of solubility of non-electrolytes. Chem Rev 44(1):37–45
Hildebrand JH (1950) Factors determining solubility among non-electrolytes. Proc Natl Acad Sci USA 36(1):7–15
Martin A, Paruta AN, Adjei A (1981) Extended hildebrand solubility approach: methylxanthines in mixed solvents. J Pharm Sci 70(10):1115–1120
Martin A, Miralles MJ (1982) Extended Hildebrand solubility approach: solubility of tolbutamide, acetohexamide, and sulfisomidine in binary solvent mixtures. J Pharm Sci 71(4):439–442
Martin A, Wu PL, Adjei A, Lindstrom RE, Elworthy PH (1982) Extended Hildebrand solubility approach and the log linear solubility equation. J Pharm Sci 71(8):849–856
Bustamante P, Escalera B, Martin A, Selles E (1993) A modification of the extended Hildebrand approach to predict the solubility of structurally related drugs in solvent mixtures. J Pharm Pharmacol 45(4):253–257
Lin HM, Nash RA (1993) An experimental method for determining the Hildebrand solubility parameter of organic nonelectrolytes. J Pharm Sci 82(10):1018–1026
Jouyban-Gharamaleki A, Romero S, Bustamante P, Clark BJ (2000) Multiple solubility maxima of oxolinic acid in mixed solvents and a new extension of Hildebrand solubility approach. Chem Pharm Bull (Tokyo) 48(2):175–178
Wu PL, Beerbower A, Martin A (1982) Extended Hansen approach: calculating partial solubility parameters of solid solutes. J Pharm Sci 71(11):1285–1287
Barra J, Lescure F, Doelker E, Bustamante P (1997) The expanded Hansen approach to solubility parameters: Paracetamol and citric acid in individual solvents. J Pharm Pharmacol 49(7):644–651
Hansen CM (2007) Hansen solubility parameters: a user’s handbook. CRC Press
Louwerse MJ, Maldonado A, Rousseau S, Moreau-Masselon C, Roux B, Rothenberg G (2017) Revisiting Hansen solubility parameters by including thermodynamics. ChemPhysChem 18(21):2999–3006
Famini GR, Headley AD, Wilson L (1994) Using theoretical descriptors in Qsar and Lfer—the role of solute solvent interactions in solubility, acidity and basicity. Abstr Pap Am Chem S 207:96
Abraham MH, Green CE, Acree WE, Hernandez CE, Roy LE (1998) Descriptors for solutes from the solubility of solids: trans-stilbene as an example. J Chem Soc Perk T 2 12:2677–2681
Green CE, Abraham MH, Acree WE, De Fina KM, Sharp TL (2000) Solvation descriptors for pesticides from the solubility of solids: diuron as an example. Pest Manag Sci 56(12):1043–1053
Acree WE, Abraham MH (2002) Solubility of crystalline nonelectrolyte solutes in organic solvents: mathematical correlation of Benzil solubilities with the Abraham general solvation model. J Solution Chem 31(4):293–303
Jouyban A, Soltanpour S, Soltani S, Chan HK, Acree WE (2007) Solubility prediction of drugs in water-cosolvent mixtures using Abraham solvation parameters. J Pharm Pharm Sci 10(3):263–277
Jouyban A, Soltanpour S, Soltani S, Tamizi E, Fakhree MAA, Acree WE (2009) Prediction of drug solubility in mixed solvents using computed Abraham parameters. J Mol Liq 146(3):82–88
Abraham MH, Smith RE, Luchtefeld R, Boorem AJ, Luo R, Acree, Jr. WE (2010) Prediction of solubility of drugs and other compounds in organic solvents. J Pharm Sci 99(3):1500–1515
Abraham MH, Le J (1999) The correlation and prediction of the solubility of compounds in water using an amended solvation energy relationship. J Pharm Sci US 88(9):868–880
Sutter JM, Jurs PC (1996) Prediction of aqueous solubility for a diverse set of heteroatom-containing organic compounds using a quantitative structure-property relationship. J Chem Inf Comp Sci 36(1):100–107
Katritzky AR, Wang YL, Sild S, Tamm T, Karelson M (1998) QSPR studies on vapor pressure, aqueous solubility, and the prediction of water-air partition coefficients. J Chem Inf Comp Sci 38(4):720–725
Yan A, Gasteiger J (2003) Prediction of aqueous solubility of organic compounds based on a 3D structure representation. J Chem Inf Comput Sci 43(2):429–434
Rytting E, Lentz KA, Chen XQ, Qian F, Venkatesh S (2004) A quantitative structure-property relationship for predicting drug solubility in PEG 400/water cosolvent systems. Pharm Res-Dordr 21(2):237–244
Salahinejad M, Le TC, Winkler DA (2013) Aqueous solubility prediction: do crystal lattice interactions help? Mol Pharmaceut 10(7):2757–2766
Boobier S, Hose DRJ, Blacker AJ, Nguyen BN (2020) Machine learning with physicochemical relationships: solubility prediction in organic solvents and water. Nat Commun 11(1):5753. https://doi.org/10.1038/s41467-020-19594-z
Kurotani A, Kakiuchi T, Kikuchi J (2021) Solubility prediction from molecular properties and analytical data using an in-phase deep neural network (Ip-DNN). ACS Omega 6(22):14278–14287
Ye Z, Ouyang D (2021) Prediction of small-molecule compound solubility in organic solvents by machine learning algorithms. J Cheminform 13(1):98
Göller AH, Hennemann M, Keldenich J, Clark T (2006) In silico prediction of buffer solubility based on quantum-mechanical and HQSAR- and topology-based descriptors. J Chem Inf Model 46(2):648–658
Huuskonen J, Salo M, Taskinen J (1998) Aqueous solubility prediction of drugs based on molecular topology and neural network modeling. J Chem Inf Comput Sci 38(3):450–456
Huuskonen J (2000) Estimation of aqueous solubility for a diverse set of organic compounds based on molecular topology. J Chem Inf Comput Sci 40(3):773–777
Manallack DT, Tehan BG, Gancia E, Hudson BD, Ford MG, Livingstone DJ, Whitley DC, Pitt WR (2003) A consensus neural network-based technique for discriminating soluble and poorly soluble compounds. J Chem Inf Comput Sci 43(2):674–679
Jouyban A, Majidi MR, Jalilzadeh H, Asadpour-Zeynali K (2004) Modeling drug solubility in water-cosolvent mixtures using an artificial neural network. Farmaco 59(6):505–512
Lusci A, Pollastri G, Baldi P (2013) Deep architectures and deep learning in chemoinformatics: the prediction of aqueous solubility for drug-like molecules. J Chem Inf Model 53(7):1563–1575
Deng T, Jia GZ (2020) Prediction of aqueous solubility of compounds based on neural network. Mol Phys. https://doi.org/10.1080/00268976.2019.1600754
Tosca EM, Bartolucci R, Magni P (2021) Application of artificial neural networks to predict the intrinsic solubility of drug-like molecules. Pharmaceutics 13(7):1101
Jorgensen WL, Buckner JK, Boudon S, Tiradorives J (1988) Efficient computation of absolute free-energies of binding by computer-simulations - application to the methane dimer in water. J Chem Phys 89(6):3742–3746
Vangunsteren WF, Berendsen HJC (1990) Computer-simulation of molecular-dynamics—methodology, applications, and perspectives in chemistry. Angew Chem Int Edit 29(9):992–1023
Shirts MR, Bair E, Hooker G, Pande VS (2003) Equilibrium free energies from nonequilibrium measurements using maximum-likelihood methods. Phys Rev Lett. https://doi.org/10.1103/PhysRevLett.91.140601
van Gunsteren WF, Bakowies D, Baron R, Chandrasekhar I, Christen M, Daura X, Gee P, Geerke DP, Glattli A, Hunenberger PH, Kastenholz MA, Ostenbrink C, Schenk M, Trzesniak D, van der Vegt NFA, Yu HB (2006) Biomolecular modeling: goals, problems, perspectives. Angew Chem Int Ed 45(25):4064–4092
Christ CD, van Gunsteren WF (2007) Enveloping distribution sampling: a method to calculate free energy differences from a single simulation. J Chem Phys. https://doi.org/10.1063/1.2730508
Christ CD, van Gunsteren WF (2008) Multiple free energies from a single simulation: extending enveloping distribution sampling to nonoverlapping phase-space distributions. J Chem Phys. https://doi.org/10.1063/1.2913050
Christ CD, van Gunsteren WF (2009) Comparison of three enveloping distribution sampling Hamiltonians for the estimation of multiple free energy differences from a single simulation. J Comput Chem 30(11):1664–1679
Khavrutskii IV, Wallqvist A (2011) Improved binding free energy predictions from single-reference thermodynamic integration augmented with Hamiltonian replica exchange. J Chem Theory Comput 7(9):3001–3011
Miao YL, Sinko W, Pierce L, Bucher D, Walker RC, McCammon JA (2014) Improved reweighting of accelerated molecular dynamics simulations for free energy calculation. J Chem Theory Comput 10(7):2677–2689
Hospital A, Goñi JR, Orozco M, Gelpí JL (2015) Molecular dynamics simulations: advances and applications. Adv Appl Bioinform Chem 8:37–47
Sidler D, Cristofol-Clough M, Schwaninger A, Riniker S (2017) Replica exchange envelope distribution sampling (RE-EDS): arobust and accurate method to calculate multiple free energy differences from a single simulation. Abstr Pap Am Chem Soc 254.
Hahn DF, Hunenberger PH (2019) Alchemical free-energy calculations by multiple-replica lambda-dynamics: the conveyor belt thermodynamic integration scheme. J Chem Theory Comput 15(4):2392–2419
Filipe HAL, Loura LMS (2022) Molecular dynamics simulations: advances and applications. Molecules 27(7):2105
Klamt A, Schuurmann G (1993) Cosmo—a new approach to dielectric screening in solvents with explicit expressions for the screening energy and its gradient. J Chem Soc Perk T 2 5:799–805
Klamt A (1995) Conductor-like screening model for real solvents—a new approach to the quantitative calculation of solvation phenomena. J Phys Chem-Us 99(7):2224–2235
Klamt A, Jonas V, Burger T, Lohrenz JCW (1998) Refinement and parametrization of COSMO-RS. J Phys Chem A 102(26):5074–5085
Klamt A (2011) The COSMO and COSMO-RS solvation models. Wires Comput Mol Sci 1(5):699–709
Klamt A (2016) COSMO-RS for aqueous solvation and interfaces. Fluid Phase Equilibr 407:152–158
Klamt A (2018) The COSMO and COSMO-RS solvation models. Wires Comput Mol Sci. https://doi.org/10.1002/wcms.1338
Diedenhofen M, Eckert F, Klamt A (2003) Prediction of infinite dilution activity coefficients of organic compounds in ionic liquids using COSMO-RS. J Chem Eng Data 48(3):475–479
Putnam R, Taylor R, Klamt A, Eckert F, Schiller M (2003) Prediction of infinite dilution activity coefficients using COSMO-RS. Ind Eng Chem Res 42(15):3635–3641
Kashefolgheta S, Verde AV (2017) Developing force fields when experimental data is sparse: AMBER/GAFF-compatible parameters for inorganic and alkyl oxoanionst. Phys Chem Chem Phys 19(31):20593–20607
Satarifard V, Kashefolgheta S, Vila Verde A, Grafmüller A (2017) Is the solution activity derivative sufficient to parametrize ion-ion interactions? Ions for TIP5P water. J Chem Theory Comput 13(5):2112–2122
Matos GDR, Calabro G, Mobley DL (2019) Infinite dilution activity coefficients as constraints for force field parametrization and method development. J Chem Theory Comput 15(5):3066–3074
Klamt A, Diedenhofen M (2010) Blind prediction test of free energies of hydration with COSMO-RS. J Comput Aid Mol Des 24(4):357–360
Zhang J, Tuguldur B, van der Spoel D (2015) Force field benchmark of organic liquids: 2—Gibbs energy of solvation. J Chem Inf Model 55(6):1192–1201
Matos GDR, Kyu DY, Loeffler HH, Chodera JD, Shirts MR, Mobley DL (2017) Approaches for calculating solvation free energies and enthalpies demonstrated with an update of the FreeSolv database. J Chem Eng Data 62(5):1559–1569
Riquelme M, Lara A, Mobley DL, Verstraelen T, Matamala AR, Vohringer-Martinez E (2018) Hydration free energies in the FreeSolv database calculated with polarized iterative Hirshfeld charges. J Chem Inf Model 58(9):1779–1797
Kashefolgheta S, Oliveira MP, Rieder SR, Horta BAC, Acree WE, Hunenberger PH (2020) Evaluating classical force fields against experimental cross-solvation free energies. J Chem Theory Comput 16(12):7556–7580
Kashefolgheta S, Wang SZ, Acree WE, Hunenberger PH (2021) Evaluation of nine condensed-phase force fields of the GROMOS, CHARMM, OPLS, AMBER, and OpenFF families against experimental cross-solvation free energies. Phys Chem Chem Phys 23(23):13055–13074
Bannan CC, Burley KH, Chiu M, Shirts MR, Gilson MK, Mobley DL (2016) Blind prediction of cyclohexane-water distribution coefficients from the SAMPL5 challenge. J Comput Aided Mol Des 30(11):927–944
Bannan CC, Calabro G, Kyu DY, Mobley DL (2016) Calculating partition coefficients of small molecules in octanol/water and cyclohexane/water. J Chem Theory Comput 12(8):4015–4024
Zhang HY, Jiang Y, Cui ZH, Yin CH (2018) Force field benchmark of amino acids: 2—partition coefficients between water and organic solvents. J Chem Inf Model 58(8):1669–1681
Loschen C, Reinisch J, Klamt A (2020) COSMO-RS based predictions for the SAMPL6 logP challenge. J Comput Aided Mol Des 34(4):385–392
Warnau J, Wichmann K, Reinisch J (2021) COSMO-RS predictions of logP in the SAMPL7 blind challenge. J Comput Aided Mol Des 35(7):813–818
Andersson MP, Bennetzen MV, Klamt A, Stipp SLS (2014) First-principles prediction of liquid/liquid interfacial tension. J Chem Theory Comput 10(8):3401–3408
Remesal ER, Suarez JA, Marquez AM, Sanz JF, Rincon C, Guitian J (2017) Molecular dynamics simulations of the role of salinity and temperature on the hydrocarbon/water interfacial tension. Theor Chem Acc. https://doi.org/10.1007/s00214-017-2096-9
Klamt A, Schwobel J, Huniar U, Koch L, Terzi S, Gaudin T (2019) COSMOplex: self-consistent simulation of self-organizing inhomogeneous systems based on COSMO-RS. Phys Chem Chem Phys 21(18):9225–9238
Andersson MP, Hassenkam T, Matthiesen J, Nikolajsen LV, Okhrimenko DV, Dobberschutz S, Stipp SLS (2020) First-principles prediction of surface wetting. Langmuir 36(42):12451–12459
Abramov YA (2015) Major source of error in QSPR prediction of intrinsic thermodynamic solubility of drugs: solid vs nonsolid state contributions? Mol Pharm 12(6):2126–2141
Docherty R, Pencheva K, Abramov YA (2015) Low solubility in drug development: de-convoluting the relative importance of solvation and crystal packing. J Pharm Pharmacol 67(6):847–856
McDonagh JL, Palmer DS, van Mourik T, Mitchell JBO (2016) Are the sublimation thermodynamics of organic molecules predictable? J Chem Inf Model 56(11):2162–2179
Bera S, Dong X, Krishnarjuna B, Raab SA, Hales DA, Ji W, Tang Y, Shimon LJW, Ramamoorthy A, Clemmer DE, Wei G, Gazit E (2021) Solid-state packing dictates the unexpected solubility of aromatic peptides. Cell Rep Phys Sci 2(4):100391
Zhou Y, Wang J, Xiao Y, Wang T, Huang X (2018) The effects of polymorphism on physicochemical properties and pharmacodynamics of solid drugs. Curr Pharm Des 24(21):2375–2382
Gavezzotti A (1994) Are crystal structures predictable? Accounts Chem Res 27(10):309–314
Dunitz JD (2003) Are crystal structures predictable? Chem Commun 5:545–548
Day GM, Chisholm J, Shan N, Motherwell WS, Jones W (2004) An assessment of lattice energy minimization for the prediction of molecular organic crystal structures. Cryst Growth Des 4(6):1327–1340
Price SL (2009) Computed crystal energy landscapes for understanding and predicting organic crystal structures and polymorphism. Acc Chem Res 42(1):117–126
Salahinejad M, Le TC, Winkler DA (2013) Capturing the crystal: prediction of enthalpy of sublimation, crystal lattice energy, and melting points of organic compounds. J Chem Inf Model 53(1):223–229
Price SL (2014) Predicting crystal structures of organic compounds. Chem Soc Rev 43(7):2098–2111
Dybeck EC, Schieber NP, Shirts MR (2016) Effects of a more accurate polarizable Hamiltonian on polymorph free energies computed efficiently by reweighting point-charge potentials. J Chem Theory Comput 12(8):3491–3505
Beran GJO, Nanda K (2010) Predicting organic crystal lattice energies with chemical accuracy. J Phys Chem Lett 1(24):3480–3487
Buchholz HK, Stein M (2018) Accurate lattice energies of organic molecular crystals from periodic turbomole calculations. J Comput Chem 39(19):1335–1343
Palmer DS, Llinas A, Morao I, Day GM, Goodman JM, Glen RC, Mitchell JB (2008) Predicting intrinsic aqueous solubility by a thermodynamic cycle. Mol Pharm 5(2):266–279
Palmer DS, McDonagh JL, Mitchell JB, van Mourik T, Fedorov MV (2012) First-principles calculation of the intrinsic aqueous solubility of crystalline druglike molecules. J Chem Theory Comput 8(9):3322–3337
Fraczkiewicz R, Lobell M, Göller AH, Krenz U, Schoenneis R, Clark RD, Hillisch A (2015) Best of both worlds: combining pharma data and state of the art modeling technology to improve in silico pKa prediction. J Chem Inf Model 55(2):389–397
(2014) ADMET predictor, version 7.1; Simulations Plus, Inc.: Lancaster
Llinas A, Oprisiu I, Avdeef A (2020) Findings of the second challenge to predict aqueous solubility. J Chem Inf Model 60(10):4791–4803
Henderson LJ (1908) The theory of neutrality regulation in the animal organism. Am J Physiol 21(4):427–448
Henderson LJ (1908) Concerning the relationship between the strength of acids and their capacity to preserve neutrality. Am J Physiol 21(2):173–179
Po HN, Senozan NM (2001) The Henderson–Hasselbalch equation: its history and limitations. J Chem Educ 78(11):1499–1503
(2020) Pipeline pilot, version 21.2.0.2574, server version 21.2.0.2575; Dassault Systemes BIOVIA Corp.: San Diego
RDKit: Open-source cheminformatics. https://www.rdkit.org
Riniker S, Landrum GA (2015) Better informed distance geometry: using what we know to improve conformation generation. J Chem Inf Model 55(12):2562–2574
Spicher S, Grimme S (2020) Robust atomistic modeling of materials, organometallic, and biochemical systems. Angew Chem Int Ed Engl 59(36):15665–15673
Grimme S, Bannwarth C, Shushkov P (2017) A robust and accurate tight-binding quantum chemical method for structures, vibrational frequencies, and noncovalent interactions of large molecular systems parametrized for all spd-block elements (Z = 1–86). J Chem Theory Comput 13(5):1989–2009
Becke AD (1988) Density-functional exchange-energy approximation with correct asymptotic behavior. Phys Rev A 38(6):3098–3100
Perdew JP (1986) Density-functional approximation for the correlation energy of the inhomogeneous electron gas. Phys Rev B 33(12):8822–8824
Eichkorn K, Treutler O, Ohm H, Haser M, Ahlrichs R (1995) Auxiliary basis-sets to approximate coulomb potentials. Chem Phys Lett 240(4):283–289
Eichkorn K, Weigend F, Treutler O, Ahlrichs R (1997) Auxiliary basis sets for main row atoms and transition metals and their use to approximate Coulomb potentials. Theor Chem Acc 97(1–4):119–124
TURBOMOLE V7.2 2017, a development of University of Karlsruhe and Forschungszentrum Karlsruhe GmbH, 1989–2007, TURBOMOLE GmbH, since 2007. http://www.turbomole.com
COSMOtherm, release 19, © 2019 COSMOlogic GmbH & Co. KG, a Dassault Systèmes Company
BIOVIA COSMOquick 2021 (2020) Dassault Systemes
Loschen C, Klamt A (2012) COSMOquick: a novel interface for fast σ-profile composition and its application to COSMO-RS solvent screening using multiple reference solvents. Ind Eng Chem Res 51(43):14303–14308
Hornig M, Klamt A (2005) COSMOfrag: a novel tool for high-throughput ADME property prediction and similarity screening based on quantum chemistry. J Chem Inf Model 45(5):1169–1177
Morgan HL (1965) The generation of a unique machine description for chemical structures—a technique developed at chemical abstracts service. J Chem Doc 5(2):107–113
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830
Lundberg SM, Erion G, Chen H, DeGrave A, Prutkin JM, Nair B, Katz R, Himmelfarb J, Bansal N, Lee S-I (2020) From local explanations to global understanding with explainable AI for trees. Nat Mach Intell 2(1):56–67
Hall LH, Kier LB (1995) Electrotopological state indexes for atom types—a novel combination of electronic, topological, and valence state information. J Chem Inf Comp Sci 35(6):1039–1045
Huuskonen JJ, Livingstone DJ, Tetko IV (2000) Neural network modeling for estimation of partition coefficient based on atom-type electrotopological state indices. J Chem Inf Comp Sci 40(4):947–955
Huuskonen JJ, Villa AEP, Tetko IV (1999) Prediction of partition coefficient based on atom-type electrotopological state indices. J Pharm Sci 88(2):229–233
Kier LB, Hall LH (1990) An electrotopological-state index for atoms in molecules. Pharm Res 7(8):801–807
Kier LB, Hall LH (1999) Molecular structure description: the electrotopological state. Academic Press
Openochem oestate license. https://github.com/openochem/ochem-external-tools/blob/main/oestate/license.txt
Openchem. https://github.com/openochem
Sushko I, Novotarskyi S, Korner R, Pandey AK, Rupp M, Teetz W, Brandmaier S, Abdelaziz A, Prokopenko VV, Tanchuk VY, Todeschini R, Varnek A, Marcou G, Ertl P, Potemkin V, Grishina M, Gasteiger J, Schwab C, Baskin II, Palyulin VA, Radchenko EV, Welsh WJ, Kholodovych V, Chekmarev D, Cherkasov A, Aires-de-Sousa J, Zhang QY, Bender A, Nigsch F, Patiny L, Williams A, Tkachenko V, Tetko IV (2011) Online chemical modeling environment (OCHEM): web platform for data storage, model development and publishing of chemical information. J Comput Aid Mol Des 25(6):533–554
Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. Kdd 16:785–794
Dia M, Macris N, Krzakala F, Lesieur T, Zdeborová L (2016) Mutual information for symmetric rank-one matrix estimation: a proof of the replica formula. Adv Neural Inf Process Syst. https://doi.org/10.48550/arXiv.1606.04142
Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu T-Y (2017) Lightgbm: a highly efficient gradient boosting decision tree. Adv Neural Inf Process Syst 30
Zhang H, Si S, Hsieh CJ (2017) GPU-acceleration for large-scale Tree boosting. arXiv preprint arXiv:1706.08359.
Moriguchi I, Hirono S, Liu Q, Nakagome I, Matsushita Y (1992) Simple method of calculating octanol/water partition coefficient. Chem Pharm Bull 40(1):127–130
Poda G, Tetko I (2005) In Towards predictive ADME profiling of drug candidates: lipophilicity and solubility, abstracts of papers of the American Chemical Society. American Chemical Society: Washington, DC, pp U201–U202.
Tetko IV, Bruneau P (2004) Application of ALOGPS to predict 1-octanol/water distribution coefficients, logP, and logD, of AstraZeneca in-house database. J Pharm Sci 93(12):3103–3110
Tetko IV, Poda GI (2004) Application of ALOGPS 2.1 to predict log D distribution coefficient for Pfizer proprietary compounds. J Med Chem 47(23):5601–5604
Tetko IV, Tanchuk VY (2002) Application of associative neural networks for prediction of lipophilicity in ALOGPS 2.1 program. J Chem Inf Comput Sci 42(5):1136–1145
Tetko IV, Tanchuk VY, Kasheva TN, Villa AE (2001) Estimation of aqueous solubility of chemical compounds using E-state indices. J Chem Inf Comput Sci 41(6):1488–1493
Tetko IV, Tanchuk VY, Villa AE (2001) Prediction of n-octanol/water partition coefficients from PHYSPROP database using artificial neural networks and E-state indices. J Chem Inf Comput Sci 41(5):1407–1421
Viswanadhan VN, Ghose AK, Revankar GR, Robins RK (1989) Atomic physicochemical parameters for three dimensional structure directed quantitative structure-activity relationships: 4—additional parameters for hydrophobic and dispersive interactions and their application for an automated superposition of certain naturally occurring nucleoside antibiotics. J Chem Inf Comp Sci 29(3):163–172
Openchem alogps license.
Acknowledgements
None.
Funding
The work was funded by Bayer AG.
Author information
Authors and Affiliations
Contributions
SK did the QM calculations, created the ML models and wrote the manuscript. AB prepared the datasets. TG and AG developed the concept and guided the work. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interest
The authors have no competing interests as defined by Springer, or other interests that might be perceived to influence the results and/or discussion reported in this paper. The authors declare no competing interests.
Ethical approval
Not applicable.
Consent to participate
Not applicable.
Consent for publication
All authors have read and understood the publishing policy, and this manuscript is submitted in accordance with this policy.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Gheta, S.K.O., Bonin, A., Gerlach, T. et al. Predicting absolute aqueous solubility by applying a machine learning model for an artificially liquid-state as proxy for the solid-state. J Comput Aided Mol Des 37, 765–789 (2023). https://doi.org/10.1007/s10822-023-00538-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10822-023-00538-w