Skip to main content

Combining Multiple Clusterings of Chemical Structures Using Cumulative Voting-Based Aggregation Algorithm

  • Conference paper
Intelligent Information and Database Systems (ACIIDS 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7803))

Included in the following conference series:

Abstract

The use of consensus clustering methods in chemoinformatics is motivated because of the success of consensus scoring (data fusion) in virtual screening and also because of the ability of consensus clustering to improve the robustness, novelty, consistency and stability of individual clusterings in other areas. In this paper, Cumulative Voting-based Aggregation Algorithm (CVAA) was examined for combining multiple clusterings of chemical structures. The effectiveness of clusterings was evaluated based on the extent to which they clustered compounds, which belong to the same activity class, together. Then, the results were compared to other consensus clustering and Ward’s methods. The MDL Drug Data Report (MDDR) database was used for experiments and the results were obtained by combining multiple clusterings that were applied using different distance measures. The experiments show that the voting-based consensus method can efficiently improve the effectiveness of chemical structures clusterings.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Adamson, G.W., Bush, J.A.: A method for the automatic classification of chemical structures. Information Storage and Retrieval 9, 561–568 (1973)

    Article  Google Scholar 

  2. Downs, G.M., Barnard, J.M.: Clustering of Chemical Structures on the Basis of Two-Dimensional Similarity Measures. Journal of Chemical Information and Computer Science 32, 644–649 (1992)

    Google Scholar 

  3. Willett, P.: Similarity and Clustering in Chemical Information Systems. Research Studies Press, Letchworth (1987)

    Google Scholar 

  4. Downs, G.M., Willett, P., Fisanick, W.: Similarity searching and clustering of chemical-structure databases using molecular property data. J. Chem. Inf. Comput. Sci. 34, 1094–1102 (1994)

    Article  Google Scholar 

  5. Brown, R.D., Martin, Y.C.: The information content of 2D and 3D structural descriptors relevant to ligand–receptor binding. J. Chem. Inf. Comput. Sci. 37, 1–9 (1997)

    Article  Google Scholar 

  6. Downs, G.M., Barnard, J.M.: Clustering of chemical structures on the basis of two-dimensional similarity measures. J. Chem. Inf. Comput. Sci. 32, 644–649 (1992)

    Article  Google Scholar 

  7. Holliday, J.D., Rodgers, S.L., Willet, P.: Clustering Files of chemical Structures Using the Fuzzy k-means Clustering Method. Journal of Chemical Information and Computer Science 44, 894–902 (2004)

    Google Scholar 

  8. Varin, T., Bureau, R., Mueller, C., Willett, P.: Clustering files of chemical structures using the Székely–Rizzo generalization of Ward’s method. Journal of Molecular Graphics and Modeling 28(2), 187–195 (2009)

    Article  Google Scholar 

  9. Brown, R.D., Martin, Y.C.: Use of structure-activity data to compare structure-based clustering methods and descriptors for use in compound selection. J. Chem. Inf. Compute. Sci. 36, 572–584 (1996)

    Article  Google Scholar 

  10. Salim, N.: Analysis and Comparison of Molecular Similarity Measures. University of Sheffield. PhD Thesis (2003)

    Google Scholar 

  11. Vega-Pons, S., Ruiz-Schulcloper, J.: A survey of clustering ensemble algorithms. International Journal of Pattern Recognition and Artificial Intelligence 25(3), 337–372 (2011)

    Article  MathSciNet  Google Scholar 

  12. Fischer, B., Buhmann, J.M.: Bagging for path-based clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence 25(11), 1411–1415 (2003)

    Article  Google Scholar 

  13. Dudoit, S., Fridlyand, J.: Bagging to improve the accuracy of a clustering procedure. Bioinformatics 19(9), 1090–1099 (2003)

    Article  Google Scholar 

  14. Evgenia, D., Andreas, W., Kurt, H.: A combination scheme for fuzzy clustering. International Journal of Pattern Recognition and Artificial Intelligence 16(7), 901–912 (2002)

    Article  Google Scholar 

  15. Gordon, A.D., Vichi, M.: Fuzzy partition models for fitting a set of partitions. Psychometrika 66(2), 229–248 (2001)

    Article  MathSciNet  Google Scholar 

  16. Topchy, A., Law, M., Jain, A.K., Fred, A.: Analysis of consensus partition in clustering ensemble. In: Proceedings of the IEEE Intl. Conf. on Data Mining 2004, Brighton, UK, pp. 225–232 (2004)

    Google Scholar 

  17. Ayad, H.G., Kamel, M.S.: Cumulative voting consensus method for partitions with a variable number of clusters. IEEE Transactions on Pattern Analysis and Machine Intelligence 30(1), 160–173 (2008)

    Article  Google Scholar 

  18. Ayad, H.G., Kamel, M.S.: On voting-based consensus of cluster ensembles. Patt. Recogn. 43, 1943–1953 (2010)

    Article  MATH  Google Scholar 

  19. Chu, C.-W., Holliday, J., Willett, P.: Combining multiple classifications of chemical structures using consensus clustering. Bioorgan. Med. Chem. 20(18), 5366–5371 (2012)

    Article  Google Scholar 

  20. Saeed, F., Salim, N., Abdo, A., Hentabli, H.: Combining Multiple Individual Clusterings of Chemical Structures Using Cluster-Based Similarity Partitioning Algorithm. In: Hassanien, A.E., Salem, A.-B.M., Ramadan, R., Kim, T.-h. (eds.) AMLTA 2012. CCIS, vol. 322, pp. 276–284. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  21. Strehl, A., Ghosh, J.: Cluster Ensembles - A Knowledge Reuse Framework for Combining Multiple Partitions. J. Machine Learning Research 3, 583–617 (2002)

    MathSciNet  Google Scholar 

  22. Sci Tegic Accelrys Inc., the MDL Drug Data Report (MDDR) database is available from at http://www.accelrys.com/ (accessed November 1, 2012)

  23. Abdo, A., Chen, B., Mueller, C., Salim, N., Willett, P.: Ligand-Based Virtual Screening Using Bayesian Networks. J. Chem. Inf. Model. 50, 1012–1020 (2010)

    Article  Google Scholar 

  24. Abdo, A., Salim, N.: New Fragment Weighting Scheme for the Bayesian Inference Network in Ligand-Based Virtual Screening. J. Chem. Inf. Model. 51, 25–32 (2011)

    Article  Google Scholar 

  25. Abdo, A., Saeed, F., Hentabli, H., Ali, A., Salim, N., Ahmed, A.: Ligand expansion in ligand-based virtual screening using relevance feedback. Journal of Computer-Aided Molecular Design 26, 279–287 (2012)

    Article  Google Scholar 

  26. Pipeline Pilot, Accelrys Software Inc., San Diego (2008)

    Google Scholar 

  27. Varin, T., Saettel, N., Villain, J., Lesnard, A., Dauphin, F., Bureau, R., Rault, S.J.: 3D Pharmacophore, hierarchical methods, and 5-HT4 receptor binding data. Enzyme Inhib.Med. Chem. 23, 593–603 (2008)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Saeed, F., Salim, N., Abdo, A., Hentabli, H. (2013). Combining Multiple Clusterings of Chemical Structures Using Cumulative Voting-Based Aggregation Algorithm. In: Selamat, A., Nguyen, N.T., Haron, H. (eds) Intelligent Information and Database Systems. ACIIDS 2013. Lecture Notes in Computer Science(), vol 7803. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36543-0_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-36543-0_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-36542-3

  • Online ISBN: 978-3-642-36543-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics