Abstract
The COMBINE method was designed to study congeneric series of compounds including structural information of ligand–protein complexes. Although very successful, the method has not received the same level of attention than other alternatives to study Quantitative Structure Active Relationships (QSAR) mainly because lack of ways to measure the uncertainty of the predictions and the need for large datasets. Active learning, a semi-supervised learning approach that makes use of uncertainty to enhance models’ performance while reducing the size of the training sets, has been used in this work to address both problems. We propose two estimators of uncertainty: the pool of regressors and the distance to the training set. The performance of the methods has been evaluated by testing the resulting active learning workflows in 3 diverse datasets: HIV-1 protease inhibitors, Taxol-derivatives and BRD4 inhibitors. The proposed strategies were successful in 80% of the cases for the taxol-derivatives and BRD4 inhibitors, while outperformed random selection in the case of the HIV-1 protease inhibitors time-split. Our results suggest that AL-COMBINE might be an effective way of producing consistently superior QSAR models with a limited number of samples.



Similar content being viewed by others
Abbreviations
- AL:
-
Active learning
- PLS:
-
Partial least squares
- SVMR:
-
Support vector machine regression
- QSAR:
-
Quantitative structure–activity relationships
- COMBINE:
-
COMparative binding energy analysis
- cMMISMSA:
-
Classic molecular mechanism implicit solvent model surface access
- HIV:
-
Human immunodeficiency virus
- BRD4-BD1:
-
Bromodomain-containing protein 4 N-terminal bromodomain
References
Ortiz AR, Pisabarro MT, Gago F, Wade RC (1995) J Med Chem 38(14):2681
Wang T, Wade RC (2002) J Med Chem 45(22):4828
Cuevas C, Pastor M, Pérez C, Gago F (2001) Comb Chem High Throughput Screen 4(8):627
Wang T, Wade RC (2001) J Med Chem 44(6):961
Pérez C, Pastor M, Ortiz AR, Gago F (1998) J Med Chem 41(6):836
Peón A, Coderch C, Gago F, González-Bello C (2013) ChemMedChem 8(5):740
Teruya K, Hattori Y, Shimamoto Y, Kobayashi K, Sanjoh A, Nakagawa A, Yamashita E, Akaji K (2016) Pept Sci 106(4):391
Le X, Gu Q, Xu J (2015) RSC Adv 5(51):40536
Arakawa M, Hasegawa K, Funatsu K (2008) Chemometr Intell Lab Syst 92(2):145
Gil-Redondo R, Klett J, Gago F, Morreale A (2010) Proteins 78(1):162
Svetnik V, Liaw A, Tong C, Culberson JC, Sheridan RP, Feuston BP (2003) J Chem Inf Comput Sci 43(6):1947
Sheridan RP (2013) J Chem Inf Model 53(11):2837
Ma J, Sheridan RP, Liaw A, Dahl GE, Svetnik V (2015) J Chem Inf Model 55(2):263
Sheridan RP, Wang WM, Liaw A, Ma J, Gifford EM (2016) J Chem Inf Model 56(12):2353
Reker D, Schneider G (2015) Drug Discov Today 20(4):458
Douak F, Melgani F, Alajlan N, Pasolli E, Bazi Y, Benoudjit N (2012) J Chemom 26(7):374
Warmuth MK, Liao J, Rätsch G, Mathieson M, Putta S, Lemmen C (2003) J Chem Inf Comput Sci 43(2):667
Wang S-R, Yang C-G, Sánchez-Murcia PA, Snyder JP, Yan N, Sáez-Calvo G, Diaz JF, Gago F, Fang W-S (2015) Org Lett 17(24):6098
Ma Y-T, Yang Y, Cai P, Sun D-Y, Sánchez-Murcia PA, Zhang X-Y, Jia W-Q, Lei L, Guo M, Gago F (2018) J Nat Prod 81(3):524
Matesanz R, Barasoain I, Yang C-G, Wang L, Li X, De Ines C, Coderch C, Gago F, Barbero JJ, Andreu JM (2008) Chem Biol 15(6):573
Holloway MK, Wai JM, Halgren TA, Fitzgerald PM, Vacca JP, Dorsey BD, Levin RB, Thompson WJ, Chen LJ (1995) J Med Chem 38(2):305
Engelhardt H, Martin L, Smethurst C (2015) Pyridinones. 2015 Sep. 3
Klett J, Núñez-Salgado A, Dos Santos HG, Cortés-Cabrera Al, Perona A, Gil-Redondo Rn, Abia D, Gago F, Morreale A (2012) J Chem Theory Comput 8(9):3395
Hassan SA, Guarnieri F, Mehler EL (2000) J Phys Chem B 104(27):6490
Hassan SA, Guarnieri F, Mehler EL (2000) J Phys Chem B 104(27):6478
Alvarez Y, Esteban-Torres M, Cortés-Cabrera Á, Gago F, Acebrón I, Benavente R, Mardo K, de las Rivas B, Muñoz R, Mancheño JM (2014) PLoS ONE 9(3):e92257
Sánchez-Murcia PA, Cortés-Cabrera Á, Gago F (2017) J Comput-Aided Mol Des:1
Ortiz AR, Pastor M, Palomer A, Cruciani G, Gago F, Wade RC (1997) J Med Chem 40(7):1136
da Silva AWS, Vranken WF (2012) BMC Res Notes 5(1):367
Duke R, Giese T, Gohlke H, Goetz A, Homeyer N, Izadi S, Janowski P, Kaus J, Kovalenko A, Lee T (2016) AmberTools 16. University of California, San Francisco
Wang J, Wolf RM, Caldwell JW, Kollman PA, Case DA (2004) J Comput Chem 25(9):1157
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V (2011) J Mach Learn Res 12(Oct):2825
Breunig MM, Kriegel H-P, Ng RT, Sander J (2000) ACM Sigmod Rec 29(2):93
Coderch C, Klett J, Morreale A, Díaz JF, Gago F (2012) ChemMedChem 7(5):836
Canales A, Nieto L, Rodríguez-Salarichs J, Sánchez-Murcia PA, Coderch C, Cortés-Cabrera A, Paterson I, Carlomagno T, Gago F, Andreu JM (2014) ACS Chem Biol 9(4):1033
Fusani L, Wall I, Palmer D, Cortes A (2018) Bioinformatics 34(11):1947
Acknowledgements
In memoriam Dr. Angel Ramirez Ortiz (1966–2008). We thank Prof. Dr. Federico Gago for providing the historical HIV-PR and taxanes data sets.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Fusani, L., Cabrera, A.C. Active learning strategies with COMBINE analysis: new tricks for an old dog. J Comput Aided Mol Des 33, 287–294 (2019). https://doi.org/10.1007/s10822-018-0181-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10822-018-0181-3