Abstract
Compounds known to be potent against a specific protein target may potentially contain a signature profile of common substructures that is highly correlated to their potency. These substructure profiles may be useful in enriching compound libraries or for prioritizing compounds against a specific protein target. With this objective in mind, a set of compounds with known potency against six selected kinases (2 each from 3 kinase families) was used to generate binary molecular fingerprints. Each fingerprint key represents a substructure that is found within a compound and the frequency with which the fingerprint occurs was then tabulated. Thereafter, a frequent pattern mining technique was applied with the aim of uncovering substructures that are not only well represented among known potent inhibitors but are also unrepresented among known inactive compounds and vice versa. Substructure profiles that are representative of potent inhibitors against each of the 3 kinase families were thus extracted. Based on our validation results, these substructure profiles demonstrated significant enrichment for highly potent compounds against their respective kinase targets. The advantages of using our approach over conventional methods in analyzing such datasets and its application in the mining of substructures for enriching compound libraries are presented.





Similar content being viewed by others
References
Zhang C, Habets G, Bollag G (2011) Nat Biotechnol 29(11):981
Eglen R, Reisine T (2011) Pharmacol Ther 130(2):144
Eglen RM, Reisine T (2009) Assay Drug Dev Technol 7(1):22
Manning G, Whyte DB, Martinez R, Hunter T, Sudarsanam S (2002) Science 298(5600):1912
Bamborough P, Brown MJ, Christopher JA, Chung CW, Mellor GW (2011) J Med Chem 54(14):5131
Bhagwat SS (2009) Curr Opin Investig Drugs 10(12):1266
Brandvold KR, Soellner MB (2011) 242nd National meeting of the American-Chemical-Society (ACS), Denver, CO, Aug 28–Sep 01, 2011. Abstracts of papers of the American Chemical Society 242, 338-MEDI
Cherry M, Williams DH (2004) Curr Med Chem 11(6):663
Daub H, Godl K, Brehmer D, Klebl B, Muller G (2004) Assay Drug Dev Technol 2(2):215
Anastassiadis T, Deacon SW, Devarajan K, Ma HC, Peterson JR (2011) Nat Biotechnol 29(11):1039
Godl K, Daub H (2004) Cell Cycle 3(4):393
Karaman MW, Herrgard S, Treiber DK, Gallant P, Atteridge CE, Campbell BT, Chan KW, Ciceri P, Davis MI, Edeen PT, Faraoni R, Floyd M, Hunt JP, Lockhart DJ, Milanov ZV, Morrison MJ, Pallares G, Patel HK, Pritchard S, Wodicka LM, Zarrinkar PP (2008) Nat Biotechnol 26(1):127
Morphy R (2010) J Med Chem 53(4):1413
Subramanian G, Sud M (2010) Acs Medicinal Chem Lett 1(8):395
Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD international conference on management of data. ACM, Washington, DC, USA, p 207
Brin S, Motwani R, Silverstein C (1997) Beyond market baskets: generalizing association rules to correlations. In: Proceedings of the 1997 ACM SIGMOD international conference on management of data. ACM, Tucson, AZ, USA, p 265
Silverstein C, Brin S, Motwani R (1998) Data Min Knowl Disc 2(1):39
Kinase SARfari. https://www.ebi.ac.uk/chembl/sarfari/kinasesarfari. Accessed 2011
Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey A, Light Y, McGlinchey S, Michalovich D, Al-Lazikani B, Overington JP (2012) Nucleic Acids Res 40(D1):D1100–D1107. doi:10.1093/nar/gkr777
Overington JP (2009) 238th National meeting of the American Chemical Society, DC, August 16–20, 2009. Abstracts of papers of the American Chemical Society 238, 39-COMP
Wadler S (2001) Drug Resist Updat 4(6):347
Bradham C, McClay DR (2006) Cell Cycle 5(8):824
Raymond E, Faivre S, Armand JP (2000) Drugs 60(Suppl 1):15
Chen T, George JA, Taylor CC (2006) Anticancer Drugs 17(2):123
Heron-Milhavet L, Khouya N, Fernandez A, Lamb NJ (2011) Histol Histopathol 26(5):651
Kawakami T, Kawakami Y, Kitaura J (2002) J Biochem 132(5):677
Liew CY, Ma XH, Yap CW (2010) J Comput Aided Mol Des 24(2):131
Han LY, Ma XH, Lin HH, Jia J, Zhu F, Xue Y, Li ZR, Cao ZW, Ji ZL, Chen YZ (2008) J Mol Graph Model 26(8):1276
Yap CW (2011) J Comput Chem 32(7):1466
Durant JL, Leland BA, Henry DR, Nourse JG (2002) J Chem Inf Comput Sci 42(6):1273
Li QL, Chen TJ, Wang YL, Bryant SH (2010) Drug Discovery Today 15(23–24):1052
Bryant S (2006) 231st National meeting of the American Chemical Society, Atlanta, GA March 26–30, 2006. Abstracts of papers of the American Chemical Society 231, 80-COMP
PubChem Fingerprints. ftp://ftp.ncbi.nih.gov/pubchem/data_spec/pubchem_fingerprints.txt. Accessed 2011
Klekota J, Roth FP (2008) Bioinformatics 24(21):2518
Japkowicz N, Shah M (2011) Performance measures I. Evaluating learning algorithms: a classification perspective. Cambridge University Press, Cambridge
Rogers DJ, Tanimoto TT (1960) Science 132(3434):1115
Acknowledgments
The PhD scholarship awarded to WKY from the Novartis Institute for Tropical Diseases is gratefully acknowledged.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Yeo, W.K., Go, M.L. & Nilar, S. Extraction and validation of substructure profiles for enriching compound libraries. J Comput Aided Mol Des 26, 1127–1141 (2012). https://doi.org/10.1007/s10822-012-9604-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10822-012-9604-8