Abstract
In a typical high-throughput screening (HTS) campaign, less than 1 % of the small-molecule library is characterized by confirmatory experiments. As much as 99 % of the library’s molecules are set aside—and not included in downstream analysis—although some of these molecules would prove active were they sent for confirmatory testing. These missing experimental measurements prevent active molecules from being identified by screeners. In this study, we propose managing missing measurements using imputation—a powerful technique from the machine learning community—to fill in accurate guesses where measurements are missing. We then use these imputed measurements to construct an imputed visualization of HTS results, based on the scaffold tree visualization from the literature. This imputed visualization identifies almost all groups of active molecules from a HTS, even those that would otherwise be missed. We validate our methodology by simulating HTS experiments using the data from eight quantitative HTS campaigns, and the implications for drug discovery are discussed. In particular, this method can rapidly and economically identify novel active molecules, each of which could have novel function in either binding or selectivity in addition to representing new intellectual property.
Similar content being viewed by others
References
Macarron R, Banks MN, Bojanic D, Burns DJ, Cirovic DA, Garyantes T, Green DVS, Hertzberg RP, Janzen WP, Paslay JW, Schopfer U, Sittampalam GS (2011) Nat Rev Drug Discov 10(3):188. http://dx.doi.org/10.1038/nrd3368
Glick M, Klon A, Acklin P, Davies J (2004) J Biomol Screen 9(1):32. PMID: 15006146
Glick M, Jenkins J, Nettles J, Hitchings H, Davies J (2006) J Chem Inf Model 46(1):193. PMID: 16426055
Posner BA, Xi H, Mills JEJ (2009) J Chem Inf Model 49(10):2202–2210
Swamidass SJ, Bittker JA, Bodycombe NE, Ryder SP, Clemons PA (2010) J Biomol Screen 15(6):680
Swamidass SJ, Calhoun BT, Bittker JA, Bodycombe NE, Clemons PA (2011) Bioinformatics 27(16):2271–2278
Inglese J, Auld D, Jadhav A, Johnson R, Simeonov A, Yasgar A, Zheng W, Austin C (2006) Proc Natl Acad Sci 103(31):11473, PMID: 16864780
Varin T, Gubler H, Parker C, Zhang J, Raman P, Ertl P, Schuffenhauer A (2010) J Chem Inf Model 277–279, PMID: 21073183
Yan S, Asatryan H, Li J, Zhou Y (2005) J Chem Inf Model 45(6):1784
Lakshminarayan K, Harp S, Goldman R, Samad T, et al. (1996) Proceedings of the second international conference on knowledge discovery and data mining , pp 140–145
Ranu S, Calhoun BT, Singh AK, Swamidass SJ (2011) Mol Inf 30(9):809. doi:10.1002/minf.201100058
Tanrikulu Y, Kondru R, Schneider G, So W, Bitter H (2010) Mol Inf 29(10):678
Schuffenhauer A, Ertl P, Roggo S, Wetzel S, Koch M, Waldmann H (2007) J Chem Inf Model 47(1):47
Wang Y, Xiao J, Suzek T, Zhang J, Wang J, Bryant S (2009) Nucleic acids research 37 (Web Server issue), W623. PMID: 19498078
Bolton E, Wang Y, Thiessen P, Bryant S (2008) Annu Rep Comput Chem 4:217. PMID: 19498078
McCulley J, Myung K (2011) Cell Cycle 10:3434
Lee KY, Yang K, Cohn MA, Sikdar N, D’Andrea AD, Myung K (2010) J Biol Chem 285:10362
Jones M, Hamana N, Nezu J, Shimane M (2000) Genomics 63(1):40
Quinn A, Allali-Hassani A, Vedadi M, Simeonov A (2010) . Mol BioSyst 6(5):782
Liu F, Chen X, Allali-Hassani A, Quinn A, Wigle TJ, Wasney GA, Dong A, Senisterra G, Chau I, Siarheyeva A et al. (2010) J Med Chem 53(15):5844–5857
Lee J, Thompson J, Botuyan M, Mer G (2007) Nat Struct Mol Biol 15(1):109
Sonkoly E, Wei T, Janson PC, Saaf A, Lundeberg L, Tengvall-Linder M, Norstedt G, Alenius H, Homey B, Scheynius A, Stahle M, Pivarcsi A (2007) PLoS ONE 2:e610
Chan JA, Krichevsky AM, Kosik KS (2005) Cancer Res 65:6029
Biertumpfel C, Zhao Y, Kondo Y, Ramon-Maiques S, Gregory M, Lee JY, Masutani C, Lehmann AR, Hanaoka F, Yang W (2010) Nature 465:1044
Albertella MR, Green CM, Lehmann AR, O’Connor MJ (2005) Cancer Res 65:9799
Marchand C, Lea W, Jadhav A, Dexheimer T, Austin C, Inglese J, Pommier Y, Simeonov A (2009) Mol Cancer Ther 8(1):240
Dexheimer T, Antony S, Marchand C, Pommier Y (2008) Anticancer Agents Med Chem 8(4):381
Arner ES (2009) Biochim Biophys Acta 1790:495
Witte AB, Anestal K, Jerremalm E, Ehrsson H, Arner ES (2005) Free Radic Biol Med 39:696
Baldi P, Brunak S (2001) Bioinformatics: the machine learning approach. The MIT Press, Cambridge
Swamidass S, Azencott C, Lin T, Gramajo H, Tsai S, Baldi P (2009) J Chem Inf Model 49(4):756
Acknowledgments
MRB collaborated with SJS to write the initial manuscript. MRB implemented the imputed tree based on an idea by SJS and ran most of the experiments. BTC prepared the imputed data downloaded from PubChem. Edward Holson provided helpful comments and edits to the manuscript. The Pathology and Immunology Department at the Washington University in St. Louis supports BTC, MRB, and SJS. Marvin was used to generate the chemical structures in Fig. 4; Marvin 5.3.5, 2010, ChemAxon (http://www.chemaxon.com).
Conflict of interest
The authors declare they have no conflict of interests to disclose.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Browning, M.R., Calhoun, B.T. & Swamidass, S.J. Managing missing measurements in small-molecule screens. J Comput Aided Mol Des 27, 469–478 (2013). https://doi.org/10.1007/s10822-013-9642-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10822-013-9642-x