Skip to main content

Advertisement

Log in

QSAR model based on weighted MCS trees approach for the representation of molecule data sets

  • Published:
Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

Abstract

In this paper we propose a new method for the generation of 2D-QSAR models for the prediction of activity values of chemicals. Maximum common substructures which are extracted from the data set are used for molecule classification in a tree, where the node of the tree represents molecules or common structures to groups of molecules and the arcs of the tree represent non isomorphic substructures between two nodes of the tree. All paths between pairwise leaf nodes are used to represent the equation system used as representational space in the building of the QSAR model. The proposed model, which is based on the combining of non isomorphic structures, use of molecular descriptors for the calculation of path lengths and classification of the data set based on maximum common substructures, considerably improves the generation of QSAR models with regard to the classical model based only on the use of a set of molecular descriptors. Optimization algorithms based on genetic algorithm and differential evolution approximations have also been used, resulting in the improvement and refinement of the equations obtained.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Michielan L, Moro S (2010) Pharmaceutical perspectives of nonlinear QSAR strategies. J Chem Inf Model 50(6):961–978. doi:10.1021/ci100072z

    Article  CAS  Google Scholar 

  2. Benigni R, Bossa C (2008) Predictivity of QSAR. J Chem Inf Model 48(5):971–980. doi:10.1021/ci8000088

    Article  CAS  Google Scholar 

  3. Agrafiotis DK, Bandyopadhyay D, Wegner JK, van Vlijmen H (2007) Recent advances in chemoinformatics. J Chem Inf Model 47(4):1279–1293. doi:10.1021/ci700059g

    Article  CAS  Google Scholar 

  4. Engel T (2006) Basic overview of chemoinformatics. J Chem Inf Model 46(6):2267–2277. doi:10.1021/ci600234z

    Article  CAS  Google Scholar 

  5. Liu P, Agrafiotis DK, Rassokhin DN (2011) Power keys: a novel class of topological descriptors based on exhaustive subgraph enumeration and their application in substructure searching. J Chem Inf Model 51(11):2843–2851. doi:10.1021/ci200282z

    Article  CAS  Google Scholar 

  6. Sun H, Shahane Shsng, Xia M, Austin CP, Huang R (2012) A Structure Based Model for the Prediction of Phospholipidosis Induction Potential of Small Molecules. Journal of Chemical Information and Modeling. doi:10.1021/ci3001875

  7. Medina-Franco JL, Yongye AB, Pérez-Villanueva J, Houghten RA, Martínez-Mayorga K (2011) Multitarget structure–activity relationships characterized by activity-difference maps and consensus similarity measure. J Chem Inf Model 51(9):2427–2439. doi:10.1021/ci200281v

    Article  CAS  Google Scholar 

  8. Su B-H, Y-s Tu, Esposito EX, Tseng YJ (2012) Predictive toxicology modeling: protocols for exploring hERG classification and tetrahymena pyriformis end point predictions. J Chem Inf Model 52(6):1660–1673. doi:10.1021/ci300060b

    Article  CAS  Google Scholar 

  9. Hsieh J-H, Yin S, Wang XS, Liu S, Dokholyan NV, Tropsha A (2011) Cheminformatics meets molecular mechanics: a combined application of knowledge-based pose scoring and physical force field-based hit scoring functions improves the accuracy of structure-based virtual screening. J Chem Inf Model 52(1):16–28. doi:10.1021/ci2002507

    Article  Google Scholar 

  10. Al-Sha’er MA, Taha MO (2010) Elaborate ligand-based modeling reveals new nanomolar heat shock protein 90α inhibitors. J Chem Inf Model 50(9):1706–1723. doi:10.1021/ci100222k

    Article  Google Scholar 

  11. Urbano Cuadrado M, Luque Ruiz I, Gómez-Nieto MA (2006) Refinement and use of the approximate similarity in QSAR models for benzodiazepine receptor ligands. J Chem Inf Model 46(5):2022–2029

    Article  CAS  Google Scholar 

  12. Sheridan RP (2012) Three useful dimensions for domain applicability in QSAR models using random forest. J Chem Inf Model 52(3):814–823. doi:10.1021/ci300004n

    Article  CAS  Google Scholar 

  13. Petrone P, Simms B, Nigsch F, Lounkine E, Kutchukian P, Cornett A, Deng Z, Davies J, Jenkins J, Glick M (2012) Rethinking molecular similarity: comparing compounds on the basis of biological activity. ACS Chem Biol 7(8):1399–1409

    Article  CAS  Google Scholar 

  14. Cronin MTD, Schultz TW (2003) Pitfalls in QSAR. J Mol Struct (Thoechem) 622(1–2):39–51. doi:10.1016/s0166-1280(02)00616-4

    Article  CAS  Google Scholar 

  15. Sanders MPA, Barbosa AJM, Zarzycka B, Nicolaes GAF, Klomp JPG, de Vlieg J, Del Rio A (2012) Comparative analysis of pharmacophore screening tools. J Chem Inf Model 52(6):1607–1620. doi:10.1021/ci2005274

    Article  CAS  Google Scholar 

  16. Zaretzki J, Rydberg P, Bergeron C, Bennett KP, Olsen L, Breneman CM (2012) RS-predictor models augmented with SMARTCyp reactivities: robust metabolic regioselectivity predictions for nine CYP isozymes. J Chem Inf Model 52(6):1637–1659. doi:10.1021/ci300009z

    Article  CAS  Google Scholar 

  17. Rivera-Borroto OM, Marrero-Ponce Y, García-de la Vega JM, Grau-Ábalo RC (2011) Comparison of combinatorial clustering methods on pharmacological data sets represented by machine learning-selected real molecular descriptors. J Chem Inf Model 51(12):3036–3049. doi:10.1021/ci2000083

    Article  CAS  Google Scholar 

  18. Ewing T, Baber JC, Feher M (2006) Novel 2D fingerprints for ligand-based virtual screening. J Chem Inf Model 46(6):2423–2431. doi:10.1021/ci060155b

    Article  CAS  Google Scholar 

  19. Senese CL, Duca J, Pan D, Hopfinger AJ, Tseng YJ (2004) 4D-fingerprints, universal QSAR and QSPR descriptors. J Chem Inf Comput Sci 44(5):1526–1539. doi:10.1021/ci049898s

    Article  CAS  Google Scholar 

  20. Pan D, Iyer M, Liu J, Li Y, Hopfinger AJ (2004) Constructing optimum blood brain barrier QSAR models using a combination of 4D-molecular similarity measures and cluster analysis. J Chem Inf Comput Sci 44(6):2083–2098. doi:10.1021/ci0498057

    Article  CAS  Google Scholar 

  21. Sciabola S, Morao I, de Groot MJ (2006) Pharmacophoric fingerprint method (TOPP) for 3D-QSAR modeling: application to CYP2D6 metabolic stability. J Chem Inf Model 47(1):76–84. doi:10.1021/ci060143q

    Article  Google Scholar 

  22. Cerruela García G, Luque Ruiz I, Gómez-Nieto MAn (2011) Analysis and study of molecule data sets using snowflake diagrams of weighted maximum common subgraph trees. J Chem Inf Model 51(6):1216–1232. doi:10.1021/ci100484z

    Article  Google Scholar 

  23. Urbano Cuadrado M, Luque Ruiz I, Gómez-Nieto MÁ (2006) A steroids QSAR approach based on approximate similarity measurements. J Chem Inf Model 46(4):1678–1686

    Article  Google Scholar 

  24. Cuadrado MU, Ruiz IL, Gómez-Nieto MA (2007) QSAR models based on isomorphic and nonisomorphic data fusion for predicting the blood brain barrier permeability. J Comput Chem 28(7):1252–1260

    Article  CAS  Google Scholar 

  25. Varnek A, Baskin I (2012) Machine learning methods for property prediction in chemoinformatics: quo vadis? J Chem Inf Model 52(6):1413–1437. doi:10.1021/ci200409x

    Article  CAS  Google Scholar 

  26. Baroni M, Costantino G, Cruciani G, Riganelli D, Valigi R, Clementi S (1993) Generating optimal linear PLS estimations (GOLPE): an advanced chemometric tool for handling 3D-QSAR problems. Quant Struct-Act Relat 12(1):9–20. doi:10.1002/qsar.19930120103

    Article  CAS  Google Scholar 

  27. O’Hara-Mays P (1997) Genetic algorithms in molecular modeling. In: James Devillers (eds) Principles of QSAR and Drug Design, vol 1. Academic Press, Harcourt Brace & Company:  New York, 1996. 327 pp. ISBN 0-12-213810-4. $55.00. Journal of Chemical Information and Computer Sciences 37 (6):1204-1205. doi:10.1021/ci970394m

  28. Hao M, Li Y, Wang Y, Yan Y, Zhang S (2011) Combined 3D-QSAR, molecular docking, and molecular dynamics study on piperazinyl-glutamate-pyridines/pyrimidines as potent P2Y12 antagonists for inhibition of platelet aggregation. J Chem Inf Model 51(10):2560–2572. doi:10.1021/ci2002878

    Article  CAS  Google Scholar 

  29. Mercader AG, Duchowicz PR, Fernández FM, Castro EA (2011) Advances in the replacement and enhanced replacement method in QSAR and QSPR theories. J Chem Inf Model 51(7):1575–1581. doi:10.1021/ci200079b

    Article  CAS  Google Scholar 

  30. Polanski J, Bak A, Gieleciak R, Magdziarz T (2005) Modeling robust QSAR. J Chem Inf Model 46(6):2310–2318. doi:10.1021/ci050314b

    Article  Google Scholar 

  31. Nicolotti O, Carotti A (2005) QSAR and QSPR studies of a highly structured physicochemical domain. J Chem Inf Model 46(1):264–276. doi:10.1021/ci050293l

    Article  Google Scholar 

  32. Mwense M, Wang XZ, Buontempo FV, Horan N, Young A, Osborn D (2004) Prediction of noninteractive mixture toxicity of organic compounds based on a fuzzy set method. J Chem Inf Comput Sci 44(5):1763–1773. doi:10.1021/ci0499368

    Article  CAS  Google Scholar 

  33. Ghosh P, Bagchi MC (2009) QSAR modeling for quinoxaline derivatives using genetic algorithm and simulated annealing based feature selection. Curr Med Chem 16(30):4032–4048. doi:10.2174/092986709789352303

    Article  CAS  Google Scholar 

  34. Leach AG, Jones HD, Cosgrove DA, Kenny PW, Ruston L, MacFaul P, Wood JM, Colclough N, Law B (2006) Matched molecular pairs as a guide in the optimization of pharmaceutical properties; a study of aqueous solubility, plasma protein binding and oral exposure. J Med Chem 49(23):6672–6682. doi:10.1021/jm0605233

    Article  CAS  Google Scholar 

  35. Papadatos G, Alkarouri M, Gillet VJ, Willett P, Kadirkamanathan V, Luscombe CN, Bravi G, Richmond NJ, Pickett SD, Hussain J, Pritchard JM, Cooper AWJ, Macdonald SJF (2010) Lead optimization using matched molecular pairs: inclusion of contextual information for enhanced prediction of hERG inhibition, solubility, and lipophilicity. J Chem Inf Model 50(10):1872–1886. doi:10.1021/ci100258p

    Article  CAS  Google Scholar 

  36. Prajapati K, Singh S, Pathak AK, Mehta P (2011) QSAR analysis on some 8-methoxy quinoline derivatives as H37RV (MTB) inhibitors. Int J ChemTech Res 3(1):408–422

    CAS  Google Scholar 

  37. Bagchi MC, Maiti BC, Bose S (2004) QSAR of anti tuberculosis drugs of INH type using graphical invariants. J Mol Struct (Thoechem) 679(3):179–186. doi:10.1016/j.theochem.2004.04.013

    Article  CAS  Google Scholar 

  38. Price K, Storn RM, Lampinen JA (2005) Differential evolution: a practical approach to global optimization (natural computing series). Springer, New York

    Google Scholar 

  39. Hussain J, Rea C (2010) Computationally efficient algorithm to identify matched molecular pairs (MMPs) in large data sets. J Chem Inf Model 50(3):339–348. doi:10.1021/ci900450m

    Article  CAS  Google Scholar 

  40. Raymond JW, Watson IA, Mahoui A (2009) Rationalizing lead optimization by associating quantitative relevance with molecular structure modification. J Chem Inf Model 49(8):1952–1962. doi:10.1021/ci9000426

    Article  CAS  Google Scholar 

  41. Sheridan RP, Hunt P, Culberson JC (2005) Molecular transformations as a way of finding and exploiting consistent local QSAR. J Chem Inf Model 46(1):180–192. doi:10.1021/ci0503208

    Article  Google Scholar 

  42. Birch AM, Kenny PW, Simpson I, Whittamore PRO (2009) Matched molecular pair analysis of activity and properties of glycogen phosphorylase inhibitors. Bioorg Med Chem Lett 19(3):850–853. doi:10.1016/j.bmcl.2008.12.003

    Article  CAS  Google Scholar 

  43. Vargyas M, Csizmadia F (2008) Hierarchical clustering of chemical structures by maximum common substructures. Noordwijkerhout, The Netherlands, pp 1–5

    Google Scholar 

  44. Daylight Toolkit v4.94. Daylight Chemical Information Services Inc. http://www.daylight. 2010

  45. Cerruela García G, Luque Ruiz I, Gómez-Nieto MA (2004) Step-by-step calculation of all maximum common substructures through a constraint satisfaction based algorithm. J Chem Inf Comput Sci 44(1):30–41

    Article  Google Scholar 

  46. Fechner N, Jahn A, Hinselmann G, Zell A (2009) Atomic local neighborhood flexibility incorporation into a structured similarity measure for QSAR. J Chem Inf Model 49(3):549–560. doi:10.1021/ci800329r

    Article  CAS  Google Scholar 

  47. Steffen A, Kogej T, Tyrchan C, Engkvist O (2009) Comparison of molecular fingerprint methods on the basis of biological profile data. J Chem Inf Model 49(2):338–347. doi:10.1021/ci800326z

    Article  CAS  Google Scholar 

  48. Pandey G, Saxena AK (2006) 3D QSAR studies on protein tyrosine phosphatase 1B inhibitors: comparison of the quality and predictivity among 3D QSAR models obtained from different conformer-based alignments. J Chem Inf Model 46(6):2579–2590. doi:10.1021/ci600224n

    Article  CAS  Google Scholar 

  49. Roy K, Leonard JT (2005) QSAR analyses of 3-(4-Benzylpiperidin-1-yl)-N-phenylpropylamine derivatives as potent CCR5 antagonists. J Chem Inf Model 45(5):1352–1368. doi:10.1021/ci050205x

    Article  CAS  Google Scholar 

  50. Cuissart B, Touffet F, Crémilleux B, Bureau R, Rault S (2002) The maximum common substructure as a molecular depiction in a supervised classification context: experiments in quantitative structure/biodegradability relationships. J Chem Inf Comput Sci 42(5):1043–1052. doi:10.1021/ci020017w

    Article  CAS  Google Scholar 

  51. Dimitrov S, Dimitrova G, Pavlov T, Dimitrova N, Patlewicz G, Niemela J, Mekenyan O (2005) A stepwise approach for defining the applicability domain of SAR and QSAR models. J Chem Inf Model 45(4):839–849. doi:10.1021/ci0500381

    Article  CAS  Google Scholar 

  52. Shi LM, Fang H, Tong W, Wu J, Perkins R, Blair RM, Branham WS, Dial SL, Moland CL, Sheehan DM (2000) QSAR models using a large diverse set of estrogens. J Chem Inf Comput Sci 41(1):186–195. doi:10.1021/ci000066d

    Google Scholar 

  53. Maggiora GM, Johnson MA, Lajiness MS, Miller AB, Hagadone TR (1988) Looking for buried treasures: the search for new drug leads in large chemical databases. Math Comput Model 11:626–629. doi:10.1016/0895-7177(88)90568-7

    Article  Google Scholar 

  54. Deb K (2000) An efficient constraint handling method for genetic algorithms. Comput Methods Appl Mech Eng 186(2–4):311–338. doi:10.1016/s0045-7825(99)00389-8

    Article  Google Scholar 

  55. Tsoulos IG (2008) Modifications of real code genetic algorithm for global optimization. Appl Math Comput 203(2):598–607. doi:10.1016/j.amc.2008.05.005

    Article  Google Scholar 

  56. Andre J, Siarry P, Dognon T (2001) An improvement of the standard genetic algorithm fighting premature convergence in continuous optimization. Adv Eng Softw 32(1):49–60. doi:10.1016/s0965-9978(00)00070-3

    Article  Google Scholar 

  57. Deep K, Singh KP, Kansal ML, Mohan C (2009) A real coded genetic algorithm for solving integer and mixed integer optimization problems. Appl Math Comput 212(2):505–518. doi:10.1016/j.amc.2009.02.044

    Article  Google Scholar 

  58. JChem, version 5.3.7. Chemaxon Ltd (2010)

  59. Palacios-Bejarano B, Luque-Ruiz I, Gomez-Nieto MA An Open Environment to Support the Development of Computational Chemistry Solutions im AIP Conference Proceedings. In: AIP Conference Proceedings, 2009. vol 1. pp 519–522

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gonzalo Cerruela García.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Palacios-Bejarano, B., Cerruela García, G., Luque Ruiz, I. et al. QSAR model based on weighted MCS trees approach for the representation of molecule data sets. J Comput Aided Mol Des 27, 185–201 (2013). https://doi.org/10.1007/s10822-013-9637-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10822-013-9637-7

Keywords

Navigation