Abstract
Computational toxicology is emerging as an encouraging alternative to experimental testing. The Molecular Libraries Screening Center Network (MLSCN) as part of the NIH Molecular Libraries Roadmap has recently started generating large and diverse screening datasets, which are publicly available in PubChem. In this report, we investigate various aspects of developing computational models to predict cell toxicity based on cell proliferation screening data generated in the MLSCN. By capturing feature-based information in those datasets, such predictive models would be useful in evaluating cell-based screening results in general (for example from reporter assays) and could be used as an aid to identify and eliminate potentially undesired compounds. Specifically we present the results of random forest ensemble models developed using different cell proliferation datasets and highlight protocols to take into account their extremely imbalanced nature. Depending on the nature of the datasets and the descriptors employed we were able to achieve percentage correct classification rates between 70% and 85% on the prediction set, though the accuracy rate dropped significantly when the models were applied to in vivo data. In this context we also compare the MLSCN cell proliferation results with animal acute toxicity data to investigate to what extent animal toxicity can be correlated and potentially predicted by proliferation results. Finally, we present a visualization technique that allows one to compare a new dataset to the training set of the models to decide whether the new dataset may be reliably predicted.












Similar content being viewed by others
References
Nidhi GM, Davies JW, Jenkins JL (2006) Prediction of biological targets for compounds using multiple-category Bayesian models trained on chemogenomics databases. J Chem Inf Model 46(3):1124–1133
Poroikov V, Filimonov D, Lagunin A, Gloriozova T, Zakharov A (2007) PASS: identification of probable targets and mechanisms of toxicity. SAR QSAR Environ Res 18:101–110
Paakkari I (2002) Cardiotoxicity of new antihistamines and cisapride. Toxicol Lett 127(1–3):279–284
Vandenberg JI, Walker BD, Campbell TJ (2001) Herg K+ channels: friend and foe. Trends Pharmacol Sci 22(5):240–246
Maxwell DM, Brecht KM, Koplovitz I, Sweeney RE (2006) Acetylcholinesterase inhibition: does it explain the toxicity of organophosphorus compounds? Arch Toxicol 80(11):756–760
Taylor P, Kovarik Z, Reiner E, Radic Z (2007) Acetylcholinesterase: converting a vulnerable target to a template for antidotes and detection of inhibitor exposure. Toxicology 233(1–3):70–78
Clark RD, Wolohan PRN, Hodgkin EE, Kelly JH, Sussman NL (2004) Modelling in vitro hepatotoxicity using molecular interaction fields and SIMCA J Mol Graph Model 22(6):487–497
Hodges G, Roberts DW, Marshall SJ, Dearden JC (2006) Defining the toxic mode of action of esther sulphonates using the joint toxicity of mixtures. Chemosphere 64(1):17–25
Ankley GT, Villeneuve DL (2006) The fathead minnow in aquatic toxicology: past, present and future. Aquat Toxicol 78(1):91–102
Lagunin AA, Zakharov AV, Filimonov DA, Poroikov VV (2007) A new approach to QSAR modelling of acute toxicity. Sar QSAR Environ Res 18(3–4):285–298
Pasha FA, Srivastava HK, Srivastava A, Singh PP (2007) QSTR study of small organic molecules against Tetrahymena pyriformis. QSAR Comb Sci 26(1):69–84
Yan XF, Xiao HM (2007) QSAR study of nitrobenzenes’ toxicity to tetrahymena pyriformis using semi-empirical quantum chemical methods. Chin J Struct Chem 26(1):7–14
Park SY, Lee SM, Ye SK, Yoon SH, Chung MH, Choi J (2006) Benzo[a]pyrene-induced DNA damage and p53 modulation in human hepatoma HepG2 cells for the identification of potential biomarkers for PAH monitoring and risk assessment. Toxicol Lett 167(1):27–33
Roos PH, Tschirbs S, Pfeifer F, Welge P, Hack A, Wilhelm M, Bolt HM (2004) Risk potentials for humans of original and remediated PAH-contaminated soils: application of biomarkers of effect. Toxicology 205(3):181–194
Niu J, Yu G (2004) Molecular structural characteristics governing biocatalytic chlorination of PAHs by chloroperoxidase from Caldariomyces fumago. SAR QSAR Environ Res 15(3):159–167
Perugini M, Visciano P, Giammarino A, Manera M, Di Nardo W, Amorena M (2007) Polycyclic aromatic hydrocarbons in marine organisms from the Adriatic Sea, Italy. Chemosphere 66(10):1904–1910
Bohonowych JE, Denison MS (2007) Persistent binding of ligands to the aryl hydrocarbon receptor. Toxicol Sci 98(1):99–109
Chroust K, Pavlova M, Prokop Z, Mendel J, Bozkova K, Kubat Z, Zajickova V, Damborsky J (2007) Quantitative structure-activity relationships for toxicity and genotoxicity of halogenated aliphatic compounds: wing spot test of Drosophila melanogaster. Chemosphere 67(1):152–159
Muellner MG, Wagner ED, McCalla K, Richardson SD, Woo YT, Plewa MJ (2007) Haloacetonitriles vs. regulated haloacetic acids: are nitrogen-containing DBPs more toxic? Environ Sci Technol 41(2):645–651
Lu GH, Wang C, Li YM (2006) QSARS for acute toxicity of halogenated benzenes to bacteria in natural waters. Biomed Environ Sci 19(6):457–460
Liu HX, Papa E, Gramatica P (2006) QSAR prediction of estrogen activity for a large set of diverse chemicals under the guidance of OECD principles. Chem Res Toxicol 19(11):1540–1548
Afantitis A, Melagraki G, Sarimveis H, Koutentis PA, Markopoulos J, Igglessi-Markopoulou O (2006) A novel QSAR model for predicting induction of apoptosis by 4-aryl-4H-chromenes. Bioorg Med Chem 14(19):6686–6694
Mosier PD, Jurs PC (2002) QSAR/QSPR studies using probabilistic neural networks and generalized regression neural networks. J Chem Inf Comput Sci 42(6):1460–1470
Kaiser KLE, Niculescu SP, Schultz TW (2002) Probabilistic neural network modeling of the toxicity of chemicals to Tetrahymena pyriformis with molecular fragment descriptors. SAR QAR Environ Res 13(1):57–67
Roncaglioni A, Novic M, Vracko M, Benfenati E (2004) Classification of potential endocrine disrupters on the basis of molecular structure using a nonlinear modeling method. J Chem Inf Comput Sci 44(2):300–309
Mazzatorta P, Vracko M, Jezierska A, Benfenati E (2003) Modeling toxicity by using supervised Kohonen neural networks. J Chem Inf Comput Sci 43(2):485–492
Crettaz P, Benigni R (2005) Prediction of the rodent carcinogenicity of 60 pesticides by the DEREKfW expert system. J Chem Inf Model 45(6):1864–1873
Veith GD (2004) On the nature, evolution and future of quantitative structure-activity relationships (QSAR) in toxicology. SAR QSAR Environ Res 15(5–6):323–330
von Korff M, Sander T (2006) Toxicity-indicating structural patterns. J Chem Inf Model 46(2):536–544
Xia M, Huang R, Witt KL, Southall N, Fostel J, Cho MH, Jadhav A, Smith CS, Inglese J, Portier CJ, Tice RR, Austin CP (2007) Compound cytotoxicity profiling using quantitative high-throughput screening. Environ Health Perspect, in press, 10.1289/ehp.10727
MDL (2006) MDL Toxicity Database, MDL, San Ramon
Renner S, Fechner U, Schneider G (2006) Pharmacophores and pharmacophore searches. In: Langer T, Hoffmann RD (eds) Wiley-VCH, Wienheim, Germany 32:49–79
Breiman L (2001) Random forests. Machine Learning 45:5–32
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Chapman & Hall/CRC, Boca Raton, FL
R Development Core Team (2005) A language and environment for statistical computing, R Foundation for Statistical Computing, Vienna, Austria
Cho SJ, Hermsmeier MA (2002) Genetic algorithm guided selection: variable selection and subset selection. J Chem Inf Comput Sci 42:927–936
Forrest S (1993) Genetic algorithms: principles of natural selection applied to computation. Science 261:872–878
Leardi R (2001) Genetic algorithms in chemometrics and chemistry. J Chemo 15:559–569
Derksen S, Keselman HJ (1992) Backward, forward and stepwise automated subset selection algorithms: frequency of obtaining authentic and noise variables. Br J Math Statis Psychol 45:265–282
Kirkpatrick S, Gelatt JCD, Vecchi MP (1983) Optimization by simulated annealing. Science 220:671–680
Sutter JM, Dixon SL, Jurs PC (1995) Automated descriptor selection for quantitative structure-activity relationships using generalized simulated annealing. J Chem Inf Comput Sci 35:77–84
Hanley JA, Mcneil BJ (1982) The meaning and use of the area under a Receiver Operating Characteristic (ROC) Curve. Radiology 143:29–36
Accelrys Scitegic Pipeline Pilot, San Diego, 2007
Cerri A, Serra F, Ferrari P, Folpini E, Padoani G, Melloni P (1997) Synthesis, cardiotonic activity, and structure-activity relationships of 17 beta-guanylhydrazone derivatives of 5 beta-androstane-3 beta, 14 beta-diol acting on the Na+,K(+)-ATPase receptor. J Med Chem 40(21):3484–3488
Grove SJ, Kaur J, Muir AW, Pow E, Tarver GJ, Zhang MQ (2002) Oxyaniliniums as acetylcholinesterase inhibitors for the reversal of neuromuscular block. Bioorg Med Chem Lett 12(2):193–196
Leader H, Wolfe AD, Chiang PK, Gordon RK (2002) Pyridophens: binary pyridostigmine-aprophen prodrugs with differential inhibition of acetylcholinesterase, butyrylcholinesterase, and muscarinic receptors. J Med Chem 45(4):902–910
Sheridan RP, Feuston BP, Maiorov VN, Kearsley SK (2004) Similarity to molecules in the training set is a good discriminator for prediction accuracy in QSAR. J Chem Inf Comput Sci 44(6):1912–1928
Guha R, Dutta D, Jurs PC, Chen T (2006) Local lazy regression: making use of the neighborhood to improve QSAR predictions. J Chem Inf Model 46(4):1836–1847
Netzeva TI, Worth A, Aldenberg T, Benigni R, Cronin MTD, Gramatica P, Jaworska JS, Kahn S, Klopman G, Marchant CA, Myatt G, Nikolova-Jeliazkova N, Patlewicz GY, Perkins R, Roberts D, Schultz T, Stanton DW, van de Sandt JJM, Tong W, Veith G, Yang C (2005) Current status of methods for defining the applicability domain of (Quantitative) structure–activity relationships. The Report and Recommendations of ECVAM Workshop 52. Altern Lab Anim 33(2):155–173
Acknowledgements
RG would like to acknowledge funding from NIH Grant No. P20 HG003894-01. SCS acknowledges the support by the National Institutes of Health Molecular Library Screening Center Network (Grant No U54 MH074404-01, Prof. Hugh Rosen, Principle Investigator).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Guha, R., Schürer, S.C. Utilizing high throughput screening data for predictive toxicology models: protocols and application to MLSCN assays. J Comput Aided Mol Des 22, 367–384 (2008). https://doi.org/10.1007/s10822-008-9192-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10822-008-9192-9