Abstract
It is expensive and time consuming to measure soil adsorption coefficient (logKoc) of compounds using traditional methods, and some existing models show lower accuracies. To solve these problems, a deep learning (DL) method based on undirected graph recursive neural network (UG-RNN) is proposed in this paper. Firstly, the structures of molecules are represented by directed acyclic graphs (DAG) using RNN model; after that when a number of such neural networks are bundled together, they form a multi-level and weight sharing deep neural network to extract the features of molecules; Third, logKoc values of compounds have been predicted using back-propagation neural network. The experimental results show that the UG-RNN model achieves a better prediction effect than some shallow models. After five-fold cross validation, the root mean square error (RMSE) value is 0.46, the average absolute error (AAE) value is 0.35, and the square correlation coefficient (R2) value is 0.86.
Similar content being viewed by others
References
Gawlik, B.M., Sotiriou, N., Feicht, E.A., et al., Alternatives for the determination of the soil adsorption coefficient, KOC, of non-ionicorganic compounds—a review, Chemosphere, 1997, vol. 34, no. 12, pp. 2525–2551.
González, M.P., Helguera, A.M., and Collado, I.G., A topological substructural molecular design to predict soil sorption coefficients for pesticides, Mol. Diversity, 2006, vol. 10, no. 2, pp. 109–118.
Liu, G. and Yu, J., QSAR analysis of soil sorption coefficients for polar organic chemicals: Substituted anilines and phenols, Water Res., 2005, vol. 39, no. 10, pp. 2048–2055.
Hodson, J. and Williams, N.A., The estimation of the adsorption coefficient (Koc) for soil by High Performance Liquid Chromatography, Chemosphere, 1988, vol. 17, no. 1, pp. 67–77.
OECD, Guideline for the testing of chemicals: Estimation of the adsorption coefficient (Koc) on soil and on sewage sludge using high performance liquid chromatography (HPLC), OECD Guidel. Test. Chem., 2000, vol. 1, no. 1, pp. 1–11.
Szabóet, G. and Bulman, G.F.A., Evaluation of silica-humate and alumina-humate HPLC stationary phases for estimation of the adsorption coefficient, Koc, of soil for some aromatics, Chemosphere, 1992, vol. 24, no. 4, pp. 403–412.
Gramatica, P., Giani, E., and Papa, E., Statistical external validation and consensus modeling: A QSPR case study for Koc prediction, J. Mol. Graph. Modell., 2007, vol. 25, no. 6, pp. 755–766.
Phillips, K.L., Toro, D.M., and Sandler, S.I., Prediction of soil sorption coefficients using model molecular structures for organic matter and the quantum mechanical COSMO-SAC model, Environ. Sci. Technol., 2011, vol. 45, no. 3, pp. 1021–1027.
Doucette, W.J., Quantitative structure-activity relationships for predicting soil-sediment sorption coefficients for organic chemicals, Environ. Toxicol. Chem., 2003, vol. 22, no. 8, pp. 1771–1788.
Huuskonen, J., Prediction of soil sorption coefficient of a diverse set of organic chemicals from molecular structure, J. Chem. Inf. Comput. Sci., 2003, vol. 43, no. 5, pp. 1457–1462.
Wang, Y., Chen, J., Yang, X., et al., In silico model for predicting soil ogranic carbon normalized sorption coefficient (Koc) of organic chemicals, Chemosphere, 2015, vol. 119, pp. 438–444.
Sabljic, A., On the prediction of soil sorption coefficients of organic pollutants from molecular structure: Application of molecular topology model, Environ. Sci. Technol., 1987, vol. 21, no. 4, pp. 358–366.
Baker, J.R., Mihelcic, J.R., and Sabljic, A., Reliable QSAR for estimating Koc for persistent organic pollutants: Correlation with molecular connectivity indices, Chemosphere, 2001, vol. 45, no. 2, pp. 213–221.
Bahnick, D.A. and Doucette, W.J., Use of molecular connectivity indices to estimate soil sorption coefficients for organic chemicals, Chemosphere, 1988, vol. 17, no. 9, pp. 1703–1715.
Kier, L.B. and Hall, L.H., Molecular Connectivity in Structure Activity Analysis, Chichester: Research Studies Press, 1986.
Poole, S.K. and Poole, C.F., Chromatographic models for the sorption of neutral organic compounds by soil from water and air, J. Chromatogr. A, 1999, vol. 845, nos. 1–2, pp. 381–400.
Tao, S., Lu, X., Cao, J., et al., A comparison of the fragment constant and molecular connectivity indices models for normalized sorption coefficient estimation, Water Environ. Res., 2001, vol. 73, no. 3, pp. 307–313.
Tao, S., Piao, H., Dawson, R., et al., Estimation of organic carbon normalized sorption coefficient (Koc) for soils by fragment constant method, Environ. Sci. Technol., 1999, vol. 33, no. 16, pp. 2719–2725.
Sabljic, A., Güsten, H., Verhaar, H., et al., QSAR modelling of soil sorption. Improvements and systematics of logKoc vs. logKow correlations, Chemosphere, 1995, vol. 31, no. 11, pp. 4489–4514.
Reis, R.R.D., Sampaio, S.C., and Melo, E.B.D., An alternative approach for the use of water solubility of nonionic pesticides in the modeling of the soil sorption coefficients, Water Res., 2014, vol. 53, pp. 191–199.
Goudarzi, N., Goodarzi, M., and Araujo, M.C., et al. QSAR modeling of soil sorption coefficients (Koc) of pesticides using SPA-ANN and SPA-MLR, J. Agric. Food Chem., 2009, vol. 57, no. 15, pp. 7153–7158.
Jiao, L. and Li, H., QSPR study on sediment sorption coefficient of thirty polychlorinated organic compounds, Comput. Appl. Chem., 2012, vol. 29, no. 4, pp. 409–412.
Liu, X., Wen Yang, and Zhao Yuan-Hui, Predictive model for soil sorption of organic pollutants and influencing factors, Environ. Chem., 2013, vol. 32, no. 7, pp. 1199–1204.
Brandmaier, S., Tetko, I.V., and Oberg, T., An evaluation of experimental design in QSAR modelling utilizing the k-medoid clustering, J. Chemometrics, 2012, vol. 26, no. 10, pp. 509–517.
Hinton, G.E. and Salakhutdinov, R.R., Reducing the dimensionality of data with neural networks, Science, 2006, vol. 313, no. 5786, pp. 504–507.
Lena, P.D., Nagata, K., and Baldi, P., Deep architectures for protein contact map prediction, Bioinformatics, 2012, vol. 28, no. 19, pp. 2449–2457.
Jesse, E. and Cheng Jianlin, DNdisorder: Predicting protein disorder using boosting and deep networks, Bioinformatics, 2013, vol. 14, no. 1, pp. 1–10.
Brandmaier, S., Sahlin, U., Tetko, I.V., et al., PLS-Optimal: A stepwise D-optimal design based on latent variables, J. Chem. Inf. Model., 2012, vol. 52, no. 4, pp. 975–983.
Sushko, I., Novotarskyi, S., Korner, R., et al., Online chemical modeling environmental (OCHEM): Web platform for data storage, model development and publishing of chemical information, J. Comput.-Aided Mol. Des., 2011, vol. 25, no. 6, pp. 533–554.
Chen, Q., Research on the Structure Design Method and Application in Modeling of Fermentation Processes, Northeast University of Science and Technology, 2014.
Baldi, P. and Pollastri, G., The principled design of large-scale recursive neural network architectures-DAGRNNs and the protein structure prediction problem, J. Mach. Learn. Res., 2003, vol. 4, no. 12, pp. 575–602.
Wu Lin and Baldi, P., Learning to play Go using recursive neural networks, Neural Networks, 2008, vol. 21, no. 9, pp. 1392–1400.
Xu, Y., Dai, Z., Chen, F., et al., Deep learning for drug-induced liver injury, J. Chem. Inf. Model., 2015, vol. 55, no. 10, pp. 2085–2093.
Lusci, A., Pollastir, G., and Baldi, P., Deep architectures and deep learning in chemoinformatices: The prediction of aqueous solubility for drug-like molecules, J. Chem. Inf. Model., 2013, vol. 53, no. 7, pp. 1563–1575.
Kim, M.T., Sedykh, A., Chakravarti, S.K., et al., Critical evaluation of human oral bioavailability for pharmaceutical drugs by using various cheminformatics approaches, Pharm. Res., 2014, vol. 31, no. 4, pp. 1002–1014.
Wang, B., Chen, J., Li, X., et al., Estimation of soil organic carbon normalized sorption coefficient (Koc) using least squares-support vector machine, QSAR Comb. Sci., 2009, vol. 28, no. 5, pp. 561–567.
Shao, Y., Liu, J., Wang, M., et al., Integrated QSAR models to predict the soil sorption coefficient for a large diverse set of compounds by using different modeling methods., Atmos. Environ., 2014, vol. 88, no. 5, pp. 212–218.
Wen, Y., Li, M., Wei, C., et al., Linear and non-linear relationships between soil sorption and hydrophobicity: Model, validation and influencing factors, Chemosphere, 2012, vol. 86, no. 6, pp. 634–640.
Author information
Authors and Affiliations
Corresponding author
Additional information
The article is published in the original.
About this article
Cite this article
Shi, X., Tian, S., Yu, L. et al. Prediction of soil adsorption coefficient based on deep recursive neural network. Aut. Control Comp. Sci. 51, 321–330 (2017). https://doi.org/10.3103/S0146411617050066
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.3103/S0146411617050066