Abstract
Modeling non-linear descriptor-target activity/property relationships with many dependent descriptors has been a long-standing challenge in the design of biologically active molecules. In an effort to address this problem, we couple the supervised self-organizing map with the genetic algorithm. Although self-organizing maps are non-linear and topology-preserving techniques that hold great potential for modeling and decoding relationships, the large number of descriptors in typical quantitative structure--activity relationship or quantitative structure--property relationship analysis may lead to spurious correlation(s) and/or difficulty in the interpretation of resulting models. To reduce the number of descriptors to a manageable size, we chose the genetic algorithm for descriptor selection because of its flexibility and efficiency in solving complex problems. Feasibility studies were conducted using six different datasets, of moderate-to-large size and moderate-to-great diversity; each with a different biological endpoint. Since favorable training set statistics do not necessarily indicate a highly predictive model, the quality of all models was confirmed by withholding a portion of each dataset for external validation. We also address the variability introduced onto modeling through dataset partitioning and through the stochastic nature of the combined genetic algorithm supervised self-organizing map method using the z-score and other tests. Experiments show that the combined method provides comparable accuracy to the supervised self-organizing map alone, but using significantly fewer descriptors in the models generated. We observed consistently better results than partial least squares models. We conclude that the combination of genetic algorithms with the supervised self-organizing map shows great potential as a quantitative structure--activity/property relationship modeling tool.
Similar content being viewed by others
References
Barnett, S., Silicon Rally: The race to e-R&D, Pharma 2005, PriceWaterhouseCoopers, 1999.
C. Hansch (1969) Acc. Chem. Res. 2 232
N.R. Draper H. Smith (1998) Applied Regression Analysis Wiley New York
W. Lindberg J.A. Persson S. Wold (1983) Anal. Chem. 55 643
P. Geladi B.R. Kowalski (1986) Anal. Chim. Acta 185 1 Occurrence Handle10.1016/0003-2670(86)80028-9
D.R. Rogers A.J. Hopfinger (1994) J. Chem. Inf. Comput. Sci. 34 854
V. Simon J. Gasteiger J. Zupan (1993) J. Am. Chem. Soc. 115 IssueID20 9148
T. Kohonen (2001) Self-Organizing Maps EditionNumber3 Springer Berlin
J. Polanski (2000) Acta Biochim. Pol. 47 37
Kovalishyn, V.V., Tetko, I.V., Luik, A.I., Ivakhnenko, A.G. and Livingstone, D.J., Proceedings of the 12th European Symposium on Quantitative Structure--Activity Relationships: Molecular Modeling and Prediction of Bioactivity, August 23--28, 1998pp. 444--445, 2000.
D.K. Agrafiotis V.S. Lobanov (2000) J. Chem. Inf. Comput. Sci. 40 1356
G. Espinosa D. Yaffe A. Arenas Y. Cohen F. Giralt (2001) Ind. Eng. Chem. Res. 40 2757
V.S. Rose H.J.H. Macfie I.F. Croall (1991) QSAR: Ration. Approaches Des. Bioact. Compd. 16 213
S. Anzali J. Gasteiger U. Holzgrabe J. Polanski J. Sadowski A. Teckentrup M. Wagener (1998) Pers. Drug Discov. Design, 9 273
P. Bernard A. Golbraikh D. Kireev J.R. Chretien N. Rozhkova (1998) Analusis 26 333
M. Pintore O. Taboureau F. Ros J. Chretien (2001) Eur. J. Med. Chem. 36 349
R. Leardi R. Boggia M. Terrile (1992) J. Chemom. 6 267
B.T. Luke (1994) J. Chem. Inf. Comput. Sci. 34 1279
H. Kubinyi (1994) Quant. Struct.-Act. Relat. 13 IssueID3 285
S.S. So M. Karplus (1996) J. Med. Chem. 39 1521
T. Li H. Mei P. Cong (1991) Chemometr. Intell. Lab. Syst. 45 177
K. Tang T. Li (2002) Chemometr. Intell. Lab. Syst. 64 55
Vesanto, J., Himberg, J., Alhoniemi, E. and Parhankangas, J., In Proceedings of the Matlab DSP Conference 1999. pp. 35--40, Espoo, Finland, 1999.
H. Gao (2001) J. Chem. Inf. Comput. Sci. 41 402
J.D. Schmitt (2000) Curr. Med. Chem. 7 749
P.S. Hammond J.T. Cheney D.E. Johnston R.L. Ehrenkaufer R.R. Luedtke R.H. Mach (1999) Med. Chem. Res. 9 35
C. Hansch C. Silipo E.E. Steller (1975) J. Pharm. Sci. 64 1186
T.A. Andrea H. Kalayeh (1991) J. Med Chem. 34 2824
F. Yoshida J.G. Topliss (2000) J. Med. Chem. 43 2375
National Cancer Institute Anti-cancer Screen Database, http://dtp.nci.nih.gov/docs/cancer/cancer_data.html.
A. Golbraikh A. Tropsha (2003) J. Comput.-Aided Mol. Des. 17 241
A. Tropsha P. Gramatica V.K. Gombar (2003) QSAR Comb. Sci. 22 69
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bayram, E., Santago, P., Harris, R. et al. Genetic algorithms and self-organizing maps: a powerful combination for modeling complex QSAR and QSPR problems. J Comput Aided Mol Des 18, 483–493 (2004). https://doi.org/10.1007/s10822-004-5321-2
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/s10822-004-5321-2