Abstract
Wrapper methods look for the selection of a subset of features or variables in a data set, in such a way that these features are the most relevant for predicting a target value. In chemoinformatics context, the determination of the most significant set of descriptors is of great importance due to their contribution for improving ADMET prediction models. In this paper, a comprehensive analysis of descriptor selection aimed to physicochemical property prediction is presented. In addition, we propose an evolutionary approach where different fitness functions are compared. The comparison consists in establishing which method selects the subset of descriptors that best predicts a given property, as well as maintaining the cardinality of the subset to a minimum. The performance of the proposal was assessed for predicting hydrophobicity, using an ensemble of neural networks for the prediction task. The results showed that the evolutionary approach using a non linear fitness function constitutes a novel and a promising technique for this bioinformatic application.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Selick, H.E., Beresford, A.P., Tarbit, M.H.: The Emerging Importance of Predictive ADME Simulation in Drug Discovery. Drug Discov 7(2), 109–116 (2002)
Taskinen, J., Yliruusi, J.: Prediction of Physicochemical Properties Based on Neural Network Modeling. Adv. Drug Deliver. Rev. 55(9), 1163–1183 (2003)
Jónsdottir, S.Ó., Jørgensen, F.S., Brunak, S.: Prediction Methods and Databases Within Chemoinformatics: Emphasis on Drugs and Drug Candidates. Bioinformatics 21, 2145–2160 (2005)
Tetko, I.V., Bruneau, P., Mewes, H.-W., Rohrer, D.C., Poda, G.I.: Can we estimate the accuracy of ADME-Tox predictions? Drug Discov. Today 11, 700–707 (2006)
Huuskonnen, J.J., Livingstone, D.J., Tetko, I.V.: Neural Network Modeling for Estimation of Partition Coefficient Based on Atom-Type Electrotopological State Indices. J. Chem. Inf. Comput. Sci. 40, 947–995 (2000)
Agatonovic-Kustrin, S., Beresford, R.J.: Basic Concepts of Artificial Neural Network (ANN) Modeling and its Application in Pharmaceutical Research. J. Pharmaceut. Biomed. 22(5), 717–727 (2000)
Tetko, I.V., Livingstone, D.J., Luik, A.I.: Neural Networks Studies. 1. Comparison of Over-fitting and Overtraining. J. Chem. Inf. Comput. Sci. 35, 826–833 (1995)
Topliss, J.G., Edwards, R.P.: Chance Factors in Studies of Quantitative Structure-Activity Relationships. J. Med. Chem. 22(10), 1238–1244 (1979)
Li, L., Weinberg, C.R., Darden, T.A., Pedersen, L.G.: Gene selection for sample classifica-tion based on gene expression data: Study of sensitivity to choice of parameters of the GA/KNN method. Bioinformatics 17(12), 1131–1142 (2002)
Tan, T., Fu, X., Zhang, Y., Bourgeois, A.G.: A genetic algorithm-based method for feature subset selection. Soft Comput 12(2), 111–120 (2008)
Zhu, Z., Ong, Y., Dash, M.: Markov blanket-embedded genetic algorithm for gene selection. Pattern Recognition 40(11), 3236–3248 (2007)
Forman, G.: An extensive empirical study of feature selection metrics for text classification. JMLR 3, 1289–1306 (2003)
Lin, K., Kang, K., Huang, Y., Zhou, C., Wang, B.: Naive bayes text categorization using improved feature selection. Journal of Computational Information Systems 3(3), 1159–1164 (2007)
Montañés, E., Quevedo, J.R., Combarro, E.F., Díaz, I., Ranilla, J.: A hybrid feature selec-tion method for text categorization. International Journal of Uncertainty, Fuzziness and Knowlege-Based Systems 15(2), 133–151 (2007)
Kohavi, R., John, G.: Wrappers for feature selection. Artificial Intelligence 97, 273–324 (1997)
Blum, A., Langley, P.: Selection of relevant features and examples in machine learning. Artificial Intelligence 97, 245–271 (1997)
Guyon, I., Elisseeff, A.: An Introduction to Variable and Feature Selection. JMLR 3, 1157–1182 (2003)
Dutta, D., Guha, R., Wild, D., Chen, T.: Ensemble Feature Selection: Consistent Descriptor Subsets for Multiple QSAR Models. J. Chem. Inf. Model. 47, 989–997 (2007)
Liu, S., Liu, H., Yin, C., Wang, L.: VSMP: A novel variable selection and modeling method based on the prediction. J. Chem. Inf. Comp. Sci. 43(3), 964–969 (2003)
Wegner, J.K., Zell, A.: Prediction of aqueous solubility and partition coefficient optimized by a genetic algorithm based descriptor selection method. J. Chem. Inf. Comp. Sci. 43(3), 1077–1084 (2003)
Kah, M., Brown, C.D.: Prediction of the adsorption of lonizable pesticides in soils. J. Agr. Food Chem. 55(6), 2312–2322 (2007)
Bayram, E., Santago, P., Harrisb, R., Xiaob, Y., Clausetc, A.J., Schmittb, J.D.: Genetic algorithms and self-organizing maps: A powerful combination for modeling complex QSAR and QSPR problems. J. of Comput.-Aided Mol. Des. 18, 483–493 (2004)
So, S.-S., Karplus, M.: Evolutionary Optimization in Quantitative Structure-Activity Rela-tionship: An Application of Genetic Neural Networks. J. Med. Chem. 39, 1521–1530 (1996)
Fernández, M., Tundidor-Camba, A., Caballero, J.: Modeling of cyclin-dependent kinase inhibition by 1H-pyrazolo[3,4-d] pyrimidine derivatives using artificial neural network en-sembles. J. Chem Inf. and Model. 45(6), 1884–1895 (2005)
Goldberg, D.E., Deb, K.: A comparative analysis of selection schemes used in genetic algorithms. In: Foundations of Genetic Algorithms, pp. 69–93. Morgan Kaufmann, San Mateo, CA (1991)
Breiman, L.: Classification and Regression Trees. Chapman & Hall, Boca Raton (1993)
Trevino, V., Falciani, F.: GALGO: An R package for multivariate variable selection using genetic algorithms. Bioinformatics 22(9), 1154–1156 (2006)
Madsen, K., Nielsen, H.B., Tingleff, O.: Methods for Non-Linear Least Squares Problems. Technical University of Denmark, 2nd edn. (April, 2004)
Yaffe, D., Cohen, Y., Espinosa, G., Arenas, A., Giralt, F.: Fuzzy ARTMAP and back-propagation neural networks based quantitative structure - property relationships (QSPRs) for octanol: Water partition coefficient of organic compounds. J. Chem. Inf. Comp. Sci. 42(2), 162–183 (2002)
Linpinski, C.A., Lombardo, F., Dominy, B.W., Freeny, P.: Experimental and Computational Approaches to Estimate Solubility and Permeability in Drug Discovery and Development Settings. Adv. Drug Deliv. Rev. 23, 3–25 (1997)
Duprat, A., Huynh, T., Dreyfus, G.: Towards a principled methodology for neural network design and performance evaluation in qsar; application to the prediction of logp. J. Chem. Inf. Comp. Sci. 38, 586–594 (1998)
Wang, R., Fu, Y., Lai, L.: A new atom-additive method for calculating partition coefficients. J. Chem. Inf. Comp. Sci. 37(3), 615–621 (1997)
Tetko, I.V., Gasteiger, J., Todeschini, R., Mauri, A., Livingstone, D., Ertl, P., Palyulin, V.A., Radchenko, E.V., Zefirov, N.S., Makarenko, A.S., Tanchuk, V.Y., Prokopenko, V.V.: Virtual computational chemistry laboratory - design and description. J. Comput. Aid. Mol. Des. 19, 453–463 (2005)
Winkler, D.A.: Neural networks in ADME and toxicity prediction. Drug. Future 29(10), 1043–1057 (2004)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Soto, A.J., Cecchini, R.L., Vazquez, G.E., Ponzoni, I. (2008). A Wrapper-Based Feature Selection Method for ADMET Prediction Using Evolutionary Computing. In: Marchiori, E., Moore, J.H. (eds) Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics. EvoBIO 2008. Lecture Notes in Computer Science, vol 4973. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78757-0_17
Download citation
DOI: https://doi.org/10.1007/978-3-540-78757-0_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78756-3
Online ISBN: 978-3-540-78757-0
eBook Packages: Computer ScienceComputer Science (R0)