Abstract
The cluster-based compound selection is used in the lead identification process of drug discovery and design. Many clustering methods have been used for chemical databases, but there is no clustering method that can obtain the best results under all circumstances. However, little attention has been focused on the use of combination methods for chemical structure clustering, which is known as consensus clustering. Recently, consensus clustering has been used in many areas including bioinformatics, machine learning and information theory. This process can improve the robustness, stability, consistency and novelty of clustering. For chemical databases, different consensus clustering methods have been used including the co-association matrix-based, graph-based, hypergraph-based and voting-based methods. In this paper, a weighted cumulative voting-based aggregation algorithm (W-CVAA) was developed. The MDL Drug Data Report (MDDR) benchmark chemical dataset was used in the experiments and represented by the AlogP and ECPF_4 descriptors. The results from the clustering methods were evaluated by the ability of the clustering to separate biologically active molecules in each cluster from inactive ones using different criteria, and the effectiveness of the consensus clustering was compared to that of Ward’s method, which is the current standard clustering method in chemoinformatics. This study indicated that weighted voting-based consensus clustering can overcome the limitations of the existing voting-based methods and improve the effectiveness of combining multiple clusterings of chemical structures.






Similar content being viewed by others
References
Brown FK (1998) Chemoinformatics what is it and how does it impact drug discovery. Annu Rep Med Chem 33:375–384
Brown FK (2005) Chemoinformatics-a ten year update. Curr Opin Drug Discov Devel 8(3):298
Johnson MA, Maggiora GM (1990) Concepts and application of molecular similarity. Wiley, New York
Brown RD, Martin YC (1996) Use of structure-activity data to compare structure-based clustering methods and descriptors for use in compound selection. J Chem Inf Comput Sci 36:572–584
Everitt BS, Landau S, Leese M (2001) Cluster analysis, 4th edn. Edward Arnold, London
Down GM, Barnard JM (2003) Clustering methods and their uses in computational Chemistry. Rev Comput Chem 18:1–40
Holliday JD, Rodgers SL, Willett P, Chen MY, Mahfouf M, Lawson K, Mullier G (2004) Clustering files of chemical structures using the fuzzy k-means clustering method. J Chem Inf Comput Sci 44(3):894–902
Downs GM, Willett P, Fisanick W (1994) Similarity searching and clustering of chemical-structure databases using molecular property data. J Chem Inf Comput Sci 34:1094–1102
Willett P (1987) Similarity and clustering in chemical information systems. Research Studies Press, Letchworth
Varin T, Bureau R, Mueller C, Willett P (2009) Clustering files of 549 chemical structures using the Székely − Rizzo generalization of Ward’s 550 method. J Mol Graph Model 28(2):187–195
Brown RD, Martin YC (1997) The information content of 2D and 3D structural descriptors relevant to ligand–receptor binding. J Chem Inf Comput Sci 37(1):1–9
Willett P (2000) Textual and chemical information processing: different domains but similar algorithms. Inf Res 5(2). http://informationr.net/ir/5-2/paper69.html
Ward JH (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58:236–244
Vega-Pons S, Ruiz-Schulcloper J (2011) A survey of clustering ensemble algorithms. Int J Pattern Recogn 25(3):337–372
Chu C-W, Holliday J, Willett P (2012) Combining multiple classifications of chemical structures using consensus clustering. Bioorg Med Chem 20(18):5366–5371
Saeed F, Salim N, Abdo A, Hentabli H (2013) Graph-based consensus clustering for combining multiple clusterings of chemical structures. Mol Inf 32(2):165–178
Saeed F, Salim N, Abdo A (2012) Voting-based consensus clustering for combining multiple clusterings of chemical structures. J Cheminform 4:37
Saeed F, Salim N, Abdo A (2013) Information theory and voting based consensus clustering for combining multiple clusterings of chemical structures. Mol Inform 32(7):591–598
Saeed F, Salim N, Abdo A (2013) Consensus methods for combining multiple clusterings of chemical structures. J Chem Inf Model 53(5):1026–1034
Ayad HG, Kamel MS (2008) Cumulative voting consensus method for partitions with a variable number of clusters. IEEE Trans Pattern Anal Mach Intell 30(1):160–173
Ayad HG, Kamel MS (2010) On voting-based consensus of cluster ensembles. Pattern Recogn 43:1943–1953
Cover TM, Thomas JA (1991) Elements of information theory. Wiley, New York
Sci Tegic Accelrys Inc., The MDL Drug Data Report (MDDR) database (2014). http ://www.accelrys.com/. Accessed 1 Jan 2014
Pilot P (2008) Accelrys Software Inc., San Diego
Ghose AK, Crippen GM (1986) Atomic physicochemical parameters for three-dimensional structure-directed quantitative structure–activity relationships 1. Partition coefficients as a measure of hydrophobicity. J Comput Chem 7:565–577
Ghose AK, Viswanadhan VN, Wendoloski JJ (1998) Prediction of hydrophobic (lipophilic) properties of small organic molecules using fragmental methods: an analysis of ALOGP and CLOGP methods. J Phys Chem A 102:3762–3772
Rogers D, Hahn M (2010) Extended-connectivity fingerprints. J Chem Inf Model 50:742–754
Varin T, Saettel N, Villain J, Lesnard A, Dauphin F, Bureau R, Rault SJ (2008) 3D Pharmacophore, hierarchical methods, and 5-HT4 receptor binding data. Enzyme Inhib Med Chem 23:593–603
Van Rijsbergen CJ (1979) Information retrieval. London, Butterworth
Siegel S, Castellan NJ (1988) Nonparametric statistics for the behavioral sciences. McGraw-Hill, New York
Acknowledgments
Faisal Saeed is a Researcher of Universiti Teknologi Malaysia under the Post-Doctoral Fellowship Scheme for the project “Consensus Clustering Methods for Chemical Structure Databases” and this work is supported by Research Management Centre (RMC) at Universiti Teknologi Malaysia under Research University Grant Category (VOT Q.J130000.2528.07H89).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Saeed, F., Ahmed, A., Shamsir, M.S. et al. Weighted voting-based consensus clustering for chemical structure databases. J Comput Aided Mol Des 28, 675–684 (2014). https://doi.org/10.1007/s10822-014-9750-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10822-014-9750-2