Skip to main content
Log in

Weighted voting-based consensus clustering for chemical structure databases

  • Published:
Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

Abstract

The cluster-based compound selection is used in the lead identification process of drug discovery and design. Many clustering methods have been used for chemical databases, but there is no clustering method that can obtain the best results under all circumstances. However, little attention has been focused on the use of combination methods for chemical structure clustering, which is known as consensus clustering. Recently, consensus clustering has been used in many areas including bioinformatics, machine learning and information theory. This process can improve the robustness, stability, consistency and novelty of clustering. For chemical databases, different consensus clustering methods have been used including the co-association matrix-based, graph-based, hypergraph-based and voting-based methods. In this paper, a weighted cumulative voting-based aggregation algorithm (W-CVAA) was developed. The MDL Drug Data Report (MDDR) benchmark chemical dataset was used in the experiments and represented by the AlogP and ECPF_4 descriptors. The results from the clustering methods were evaluated by the ability of the clustering to separate biologically active molecules in each cluster from inactive ones using different criteria, and the effectiveness of the consensus clustering was compared to that of Ward’s method, which is the current standard clustering method in chemoinformatics. This study indicated that weighted voting-based consensus clustering can overcome the limitations of the existing voting-based methods and improve the effectiveness of combining multiple clusterings of chemical structures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Brown FK (1998) Chemoinformatics what is it and how does it impact drug discovery. Annu Rep Med Chem 33:375–384

    Article  CAS  Google Scholar 

  2. Brown FK (2005) Chemoinformatics-a ten year update. Curr Opin Drug Discov Devel 8(3):298

    CAS  Google Scholar 

  3. Johnson MA, Maggiora GM (1990) Concepts and application of molecular similarity. Wiley, New York

    Google Scholar 

  4. Brown RD, Martin YC (1996) Use of structure-activity data to compare structure-based clustering methods and descriptors for use in compound selection. J Chem Inf Comput Sci 36:572–584

    Article  CAS  Google Scholar 

  5. Everitt BS, Landau S, Leese M (2001) Cluster analysis, 4th edn. Edward Arnold, London

    Google Scholar 

  6. Down GM, Barnard JM (2003) Clustering methods and their uses in computational Chemistry. Rev Comput Chem 18:1–40

    Google Scholar 

  7. Holliday JD, Rodgers SL, Willett P, Chen MY, Mahfouf M, Lawson K, Mullier G (2004) Clustering files of chemical structures using the fuzzy k-means clustering method. J Chem Inf Comput Sci 44(3):894–902

    Article  CAS  Google Scholar 

  8. Downs GM, Willett P, Fisanick W (1994) Similarity searching and clustering of chemical-structure databases using molecular property data. J Chem Inf Comput Sci 34:1094–1102

    Article  CAS  Google Scholar 

  9. Willett P (1987) Similarity and clustering in chemical information systems. Research Studies Press, Letchworth

    Google Scholar 

  10. Varin T, Bureau R, Mueller C, Willett P (2009) Clustering files of 549 chemical structures using the Székely − Rizzo generalization of Ward’s 550 method. J Mol Graph Model 28(2):187–195

    Article  CAS  Google Scholar 

  11. Brown RD, Martin YC (1997) The information content of 2D and 3D structural descriptors relevant to ligand–receptor binding. J Chem Inf Comput Sci 37(1):1–9

    Article  CAS  Google Scholar 

  12. Willett P (2000) Textual and chemical information processing: different domains but similar algorithms. Inf Res 5(2). http://informationr.net/ir/5-2/paper69.html

  13. Ward JH (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58:236–244

    Article  Google Scholar 

  14. Vega-Pons S, Ruiz-Schulcloper J (2011) A survey of clustering ensemble algorithms. Int J Pattern Recogn 25(3):337–372

    Article  Google Scholar 

  15. Chu C-W, Holliday J, Willett P (2012) Combining multiple classifications of chemical structures using consensus clustering. Bioorg Med Chem 20(18):5366–5371

    Article  CAS  Google Scholar 

  16. Saeed F, Salim N, Abdo A, Hentabli H (2013) Graph-based consensus clustering for combining multiple clusterings of chemical structures. Mol Inf 32(2):165–178

    Article  CAS  Google Scholar 

  17. Saeed F, Salim N, Abdo A (2012) Voting-based consensus clustering for combining multiple clusterings of chemical structures. J Cheminform 4:37

    Article  Google Scholar 

  18. Saeed F, Salim N, Abdo A (2013) Information theory and voting based consensus clustering for combining multiple clusterings of chemical structures. Mol Inform 32(7):591–598

    Article  CAS  Google Scholar 

  19. Saeed F, Salim N, Abdo A (2013) Consensus methods for combining multiple clusterings of chemical structures. J Chem Inf Model 53(5):1026–1034

    Article  CAS  Google Scholar 

  20. Ayad HG, Kamel MS (2008) Cumulative voting consensus method for partitions with a variable number of clusters. IEEE Trans Pattern Anal Mach Intell 30(1):160–173

    Article  Google Scholar 

  21. Ayad HG, Kamel MS (2010) On voting-based consensus of cluster ensembles. Pattern Recogn 43:1943–1953

    Article  Google Scholar 

  22. Cover TM, Thomas JA (1991) Elements of information theory. Wiley, New York

    Book  Google Scholar 

  23. Sci Tegic Accelrys Inc., The MDL Drug Data Report (MDDR) database (2014). http ://www.accelrys.com/. Accessed 1 Jan 2014

  24. Pilot P (2008) Accelrys Software Inc., San Diego

  25. Ghose AK, Crippen GM (1986) Atomic physicochemical parameters for three-dimensional structure-directed quantitative structure–activity relationships 1. Partition coefficients as a measure of hydrophobicity. J Comput Chem 7:565–577

    Article  CAS  Google Scholar 

  26. Ghose AK, Viswanadhan VN, Wendoloski JJ (1998) Prediction of hydrophobic (lipophilic) properties of small organic molecules using fragmental methods: an analysis of ALOGP and CLOGP methods. J Phys Chem A 102:3762–3772

    Article  CAS  Google Scholar 

  27. Rogers D, Hahn M (2010) Extended-connectivity fingerprints. J Chem Inf Model 50:742–754

    Article  CAS  Google Scholar 

  28. Varin T, Saettel N, Villain J, Lesnard A, Dauphin F, Bureau R, Rault SJ (2008) 3D Pharmacophore, hierarchical methods, and 5-HT4 receptor binding data. Enzyme Inhib Med Chem 23:593–603

    Article  CAS  Google Scholar 

  29. Van Rijsbergen CJ (1979) Information retrieval. London, Butterworth

    Google Scholar 

  30. Siegel S, Castellan NJ (1988) Nonparametric statistics for the behavioral sciences. McGraw-Hill, New York

    Google Scholar 

Download references

Acknowledgments

Faisal Saeed is a Researcher of Universiti Teknologi Malaysia under the Post-Doctoral Fellowship Scheme for the project “Consensus Clustering Methods for Chemical Structure Databases” and this work is supported by Research Management Centre (RMC) at Universiti Teknologi Malaysia under Research University Grant Category (VOT Q.J130000.2528.07H89).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Faisal Saeed.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Saeed, F., Ahmed, A., Shamsir, M.S. et al. Weighted voting-based consensus clustering for chemical structure databases. J Comput Aided Mol Des 28, 675–684 (2014). https://doi.org/10.1007/s10822-014-9750-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10822-014-9750-2

Keywords