Abstract
In this paper we propose a new method for the generation of 2D-QSAR models for the prediction of activity values of chemicals. Maximum common substructures which are extracted from the data set are used for molecule classification in a tree, where the node of the tree represents molecules or common structures to groups of molecules and the arcs of the tree represent non isomorphic substructures between two nodes of the tree. All paths between pairwise leaf nodes are used to represent the equation system used as representational space in the building of the QSAR model. The proposed model, which is based on the combining of non isomorphic structures, use of molecular descriptors for the calculation of path lengths and classification of the data set based on maximum common substructures, considerably improves the generation of QSAR models with regard to the classical model based only on the use of a set of molecular descriptors. Optimization algorithms based on genetic algorithm and differential evolution approximations have also been used, resulting in the improvement and refinement of the equations obtained.




Similar content being viewed by others
References
Michielan L, Moro S (2010) Pharmaceutical perspectives of nonlinear QSAR strategies. J Chem Inf Model 50(6):961–978. doi:10.1021/ci100072z
Benigni R, Bossa C (2008) Predictivity of QSAR. J Chem Inf Model 48(5):971–980. doi:10.1021/ci8000088
Agrafiotis DK, Bandyopadhyay D, Wegner JK, van Vlijmen H (2007) Recent advances in chemoinformatics. J Chem Inf Model 47(4):1279–1293. doi:10.1021/ci700059g
Engel T (2006) Basic overview of chemoinformatics. J Chem Inf Model 46(6):2267–2277. doi:10.1021/ci600234z
Liu P, Agrafiotis DK, Rassokhin DN (2011) Power keys: a novel class of topological descriptors based on exhaustive subgraph enumeration and their application in substructure searching. J Chem Inf Model 51(11):2843–2851. doi:10.1021/ci200282z
Sun H, Shahane Shsng, Xia M, Austin CP, Huang R (2012) A Structure Based Model for the Prediction of Phospholipidosis Induction Potential of Small Molecules. Journal of Chemical Information and Modeling. doi:10.1021/ci3001875
Medina-Franco JL, Yongye AB, Pérez-Villanueva J, Houghten RA, Martínez-Mayorga K (2011) Multitarget structure–activity relationships characterized by activity-difference maps and consensus similarity measure. J Chem Inf Model 51(9):2427–2439. doi:10.1021/ci200281v
Su B-H, Y-s Tu, Esposito EX, Tseng YJ (2012) Predictive toxicology modeling: protocols for exploring hERG classification and tetrahymena pyriformis end point predictions. J Chem Inf Model 52(6):1660–1673. doi:10.1021/ci300060b
Hsieh J-H, Yin S, Wang XS, Liu S, Dokholyan NV, Tropsha A (2011) Cheminformatics meets molecular mechanics: a combined application of knowledge-based pose scoring and physical force field-based hit scoring functions improves the accuracy of structure-based virtual screening. J Chem Inf Model 52(1):16–28. doi:10.1021/ci2002507
Al-Sha’er MA, Taha MO (2010) Elaborate ligand-based modeling reveals new nanomolar heat shock protein 90α inhibitors. J Chem Inf Model 50(9):1706–1723. doi:10.1021/ci100222k
Urbano Cuadrado M, Luque Ruiz I, Gómez-Nieto MA (2006) Refinement and use of the approximate similarity in QSAR models for benzodiazepine receptor ligands. J Chem Inf Model 46(5):2022–2029
Sheridan RP (2012) Three useful dimensions for domain applicability in QSAR models using random forest. J Chem Inf Model 52(3):814–823. doi:10.1021/ci300004n
Petrone P, Simms B, Nigsch F, Lounkine E, Kutchukian P, Cornett A, Deng Z, Davies J, Jenkins J, Glick M (2012) Rethinking molecular similarity: comparing compounds on the basis of biological activity. ACS Chem Biol 7(8):1399–1409
Cronin MTD, Schultz TW (2003) Pitfalls in QSAR. J Mol Struct (Thoechem) 622(1–2):39–51. doi:10.1016/s0166-1280(02)00616-4
Sanders MPA, Barbosa AJM, Zarzycka B, Nicolaes GAF, Klomp JPG, de Vlieg J, Del Rio A (2012) Comparative analysis of pharmacophore screening tools. J Chem Inf Model 52(6):1607–1620. doi:10.1021/ci2005274
Zaretzki J, Rydberg P, Bergeron C, Bennett KP, Olsen L, Breneman CM (2012) RS-predictor models augmented with SMARTCyp reactivities: robust metabolic regioselectivity predictions for nine CYP isozymes. J Chem Inf Model 52(6):1637–1659. doi:10.1021/ci300009z
Rivera-Borroto OM, Marrero-Ponce Y, García-de la Vega JM, Grau-Ábalo RC (2011) Comparison of combinatorial clustering methods on pharmacological data sets represented by machine learning-selected real molecular descriptors. J Chem Inf Model 51(12):3036–3049. doi:10.1021/ci2000083
Ewing T, Baber JC, Feher M (2006) Novel 2D fingerprints for ligand-based virtual screening. J Chem Inf Model 46(6):2423–2431. doi:10.1021/ci060155b
Senese CL, Duca J, Pan D, Hopfinger AJ, Tseng YJ (2004) 4D-fingerprints, universal QSAR and QSPR descriptors. J Chem Inf Comput Sci 44(5):1526–1539. doi:10.1021/ci049898s
Pan D, Iyer M, Liu J, Li Y, Hopfinger AJ (2004) Constructing optimum blood brain barrier QSAR models using a combination of 4D-molecular similarity measures and cluster analysis. J Chem Inf Comput Sci 44(6):2083–2098. doi:10.1021/ci0498057
Sciabola S, Morao I, de Groot MJ (2006) Pharmacophoric fingerprint method (TOPP) for 3D-QSAR modeling: application to CYP2D6 metabolic stability. J Chem Inf Model 47(1):76–84. doi:10.1021/ci060143q
Cerruela García G, Luque Ruiz I, Gómez-Nieto MAn (2011) Analysis and study of molecule data sets using snowflake diagrams of weighted maximum common subgraph trees. J Chem Inf Model 51(6):1216–1232. doi:10.1021/ci100484z
Urbano Cuadrado M, Luque Ruiz I, Gómez-Nieto MÁ (2006) A steroids QSAR approach based on approximate similarity measurements. J Chem Inf Model 46(4):1678–1686
Cuadrado MU, Ruiz IL, Gómez-Nieto MA (2007) QSAR models based on isomorphic and nonisomorphic data fusion for predicting the blood brain barrier permeability. J Comput Chem 28(7):1252–1260
Varnek A, Baskin I (2012) Machine learning methods for property prediction in chemoinformatics: quo vadis? J Chem Inf Model 52(6):1413–1437. doi:10.1021/ci200409x
Baroni M, Costantino G, Cruciani G, Riganelli D, Valigi R, Clementi S (1993) Generating optimal linear PLS estimations (GOLPE): an advanced chemometric tool for handling 3D-QSAR problems. Quant Struct-Act Relat 12(1):9–20. doi:10.1002/qsar.19930120103
O’Hara-Mays P (1997) Genetic algorithms in molecular modeling. In: James Devillers (eds) Principles of QSAR and Drug Design, vol 1. Academic Press, Harcourt Brace & Company: New York, 1996. 327 pp. ISBN 0-12-213810-4. $55.00. Journal of Chemical Information and Computer Sciences 37 (6):1204-1205. doi:10.1021/ci970394m
Hao M, Li Y, Wang Y, Yan Y, Zhang S (2011) Combined 3D-QSAR, molecular docking, and molecular dynamics study on piperazinyl-glutamate-pyridines/pyrimidines as potent P2Y12 antagonists for inhibition of platelet aggregation. J Chem Inf Model 51(10):2560–2572. doi:10.1021/ci2002878
Mercader AG, Duchowicz PR, Fernández FM, Castro EA (2011) Advances in the replacement and enhanced replacement method in QSAR and QSPR theories. J Chem Inf Model 51(7):1575–1581. doi:10.1021/ci200079b
Polanski J, Bak A, Gieleciak R, Magdziarz T (2005) Modeling robust QSAR. J Chem Inf Model 46(6):2310–2318. doi:10.1021/ci050314b
Nicolotti O, Carotti A (2005) QSAR and QSPR studies of a highly structured physicochemical domain. J Chem Inf Model 46(1):264–276. doi:10.1021/ci050293l
Mwense M, Wang XZ, Buontempo FV, Horan N, Young A, Osborn D (2004) Prediction of noninteractive mixture toxicity of organic compounds based on a fuzzy set method. J Chem Inf Comput Sci 44(5):1763–1773. doi:10.1021/ci0499368
Ghosh P, Bagchi MC (2009) QSAR modeling for quinoxaline derivatives using genetic algorithm and simulated annealing based feature selection. Curr Med Chem 16(30):4032–4048. doi:10.2174/092986709789352303
Leach AG, Jones HD, Cosgrove DA, Kenny PW, Ruston L, MacFaul P, Wood JM, Colclough N, Law B (2006) Matched molecular pairs as a guide in the optimization of pharmaceutical properties; a study of aqueous solubility, plasma protein binding and oral exposure. J Med Chem 49(23):6672–6682. doi:10.1021/jm0605233
Papadatos G, Alkarouri M, Gillet VJ, Willett P, Kadirkamanathan V, Luscombe CN, Bravi G, Richmond NJ, Pickett SD, Hussain J, Pritchard JM, Cooper AWJ, Macdonald SJF (2010) Lead optimization using matched molecular pairs: inclusion of contextual information for enhanced prediction of hERG inhibition, solubility, and lipophilicity. J Chem Inf Model 50(10):1872–1886. doi:10.1021/ci100258p
Prajapati K, Singh S, Pathak AK, Mehta P (2011) QSAR analysis on some 8-methoxy quinoline derivatives as H37RV (MTB) inhibitors. Int J ChemTech Res 3(1):408–422
Bagchi MC, Maiti BC, Bose S (2004) QSAR of anti tuberculosis drugs of INH type using graphical invariants. J Mol Struct (Thoechem) 679(3):179–186. doi:10.1016/j.theochem.2004.04.013
Price K, Storn RM, Lampinen JA (2005) Differential evolution: a practical approach to global optimization (natural computing series). Springer, New York
Hussain J, Rea C (2010) Computationally efficient algorithm to identify matched molecular pairs (MMPs) in large data sets. J Chem Inf Model 50(3):339–348. doi:10.1021/ci900450m
Raymond JW, Watson IA, Mahoui A (2009) Rationalizing lead optimization by associating quantitative relevance with molecular structure modification. J Chem Inf Model 49(8):1952–1962. doi:10.1021/ci9000426
Sheridan RP, Hunt P, Culberson JC (2005) Molecular transformations as a way of finding and exploiting consistent local QSAR. J Chem Inf Model 46(1):180–192. doi:10.1021/ci0503208
Birch AM, Kenny PW, Simpson I, Whittamore PRO (2009) Matched molecular pair analysis of activity and properties of glycogen phosphorylase inhibitors. Bioorg Med Chem Lett 19(3):850–853. doi:10.1016/j.bmcl.2008.12.003
Vargyas M, Csizmadia F (2008) Hierarchical clustering of chemical structures by maximum common substructures. Noordwijkerhout, The Netherlands, pp 1–5
Daylight Toolkit v4.94. Daylight Chemical Information Services Inc. http://www.daylight. 2010
Cerruela García G, Luque Ruiz I, Gómez-Nieto MA (2004) Step-by-step calculation of all maximum common substructures through a constraint satisfaction based algorithm. J Chem Inf Comput Sci 44(1):30–41
Fechner N, Jahn A, Hinselmann G, Zell A (2009) Atomic local neighborhood flexibility incorporation into a structured similarity measure for QSAR. J Chem Inf Model 49(3):549–560. doi:10.1021/ci800329r
Steffen A, Kogej T, Tyrchan C, Engkvist O (2009) Comparison of molecular fingerprint methods on the basis of biological profile data. J Chem Inf Model 49(2):338–347. doi:10.1021/ci800326z
Pandey G, Saxena AK (2006) 3D QSAR studies on protein tyrosine phosphatase 1B inhibitors: comparison of the quality and predictivity among 3D QSAR models obtained from different conformer-based alignments. J Chem Inf Model 46(6):2579–2590. doi:10.1021/ci600224n
Roy K, Leonard JT (2005) QSAR analyses of 3-(4-Benzylpiperidin-1-yl)-N-phenylpropylamine derivatives as potent CCR5 antagonists. J Chem Inf Model 45(5):1352–1368. doi:10.1021/ci050205x
Cuissart B, Touffet F, Crémilleux B, Bureau R, Rault S (2002) The maximum common substructure as a molecular depiction in a supervised classification context: experiments in quantitative structure/biodegradability relationships. J Chem Inf Comput Sci 42(5):1043–1052. doi:10.1021/ci020017w
Dimitrov S, Dimitrova G, Pavlov T, Dimitrova N, Patlewicz G, Niemela J, Mekenyan O (2005) A stepwise approach for defining the applicability domain of SAR and QSAR models. J Chem Inf Model 45(4):839–849. doi:10.1021/ci0500381
Shi LM, Fang H, Tong W, Wu J, Perkins R, Blair RM, Branham WS, Dial SL, Moland CL, Sheehan DM (2000) QSAR models using a large diverse set of estrogens. J Chem Inf Comput Sci 41(1):186–195. doi:10.1021/ci000066d
Maggiora GM, Johnson MA, Lajiness MS, Miller AB, Hagadone TR (1988) Looking for buried treasures: the search for new drug leads in large chemical databases. Math Comput Model 11:626–629. doi:10.1016/0895-7177(88)90568-7
Deb K (2000) An efficient constraint handling method for genetic algorithms. Comput Methods Appl Mech Eng 186(2–4):311–338. doi:10.1016/s0045-7825(99)00389-8
Tsoulos IG (2008) Modifications of real code genetic algorithm for global optimization. Appl Math Comput 203(2):598–607. doi:10.1016/j.amc.2008.05.005
Andre J, Siarry P, Dognon T (2001) An improvement of the standard genetic algorithm fighting premature convergence in continuous optimization. Adv Eng Softw 32(1):49–60. doi:10.1016/s0965-9978(00)00070-3
Deep K, Singh KP, Kansal ML, Mohan C (2009) A real coded genetic algorithm for solving integer and mixed integer optimization problems. Appl Math Comput 212(2):505–518. doi:10.1016/j.amc.2009.02.044
JChem, version 5.3.7. Chemaxon Ltd (2010)
Palacios-Bejarano B, Luque-Ruiz I, Gomez-Nieto MA An Open Environment to Support the Development of Computational Chemistry Solutions im AIP Conference Proceedings. In: AIP Conference Proceedings, 2009. vol 1. pp 519–522
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Palacios-Bejarano, B., Cerruela García, G., Luque Ruiz, I. et al. QSAR model based on weighted MCS trees approach for the representation of molecule data sets. J Comput Aided Mol Des 27, 185–201 (2013). https://doi.org/10.1007/s10822-013-9637-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10822-013-9637-7