Abstract
The use of consensus clustering methods in chemoinformatics is motivated because of the success of consensus scoring (data fusion) in virtual screening and also because of the ability of consensus clustering to improve the robustness, novelty, consistency and stability of individual clusterings in other areas. In this paper, Cumulative Voting-based Aggregation Algorithm (CVAA) was examined for combining multiple clusterings of chemical structures. The effectiveness of clusterings was evaluated based on the extent to which they clustered compounds, which belong to the same activity class, together. Then, the results were compared to other consensus clustering and Ward’s methods. The MDL Drug Data Report (MDDR) database was used for experiments and the results were obtained by combining multiple clusterings that were applied using different distance measures. The experiments show that the voting-based consensus method can efficiently improve the effectiveness of chemical structures clusterings.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Adamson, G.W., Bush, J.A.: A method for the automatic classification of chemical structures. Information Storage and Retrieval 9, 561–568 (1973)
Downs, G.M., Barnard, J.M.: Clustering of Chemical Structures on the Basis of Two-Dimensional Similarity Measures. Journal of Chemical Information and Computer Science 32, 644–649 (1992)
Willett, P.: Similarity and Clustering in Chemical Information Systems. Research Studies Press, Letchworth (1987)
Downs, G.M., Willett, P., Fisanick, W.: Similarity searching and clustering of chemical-structure databases using molecular property data. J. Chem. Inf. Comput. Sci. 34, 1094–1102 (1994)
Brown, R.D., Martin, Y.C.: The information content of 2D and 3D structural descriptors relevant to ligand–receptor binding. J. Chem. Inf. Comput. Sci. 37, 1–9 (1997)
Downs, G.M., Barnard, J.M.: Clustering of chemical structures on the basis of two-dimensional similarity measures. J. Chem. Inf. Comput. Sci. 32, 644–649 (1992)
Holliday, J.D., Rodgers, S.L., Willet, P.: Clustering Files of chemical Structures Using the Fuzzy k-means Clustering Method. Journal of Chemical Information and Computer Science 44, 894–902 (2004)
Varin, T., Bureau, R., Mueller, C., Willett, P.: Clustering files of chemical structures using the Székely–Rizzo generalization of Ward’s method. Journal of Molecular Graphics and Modeling 28(2), 187–195 (2009)
Brown, R.D., Martin, Y.C.: Use of structure-activity data to compare structure-based clustering methods and descriptors for use in compound selection. J. Chem. Inf. Compute. Sci. 36, 572–584 (1996)
Salim, N.: Analysis and Comparison of Molecular Similarity Measures. University of Sheffield. PhD Thesis (2003)
Vega-Pons, S., Ruiz-Schulcloper, J.: A survey of clustering ensemble algorithms. International Journal of Pattern Recognition and Artificial Intelligence 25(3), 337–372 (2011)
Fischer, B., Buhmann, J.M.: Bagging for path-based clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence 25(11), 1411–1415 (2003)
Dudoit, S., Fridlyand, J.: Bagging to improve the accuracy of a clustering procedure. Bioinformatics 19(9), 1090–1099 (2003)
Evgenia, D., Andreas, W., Kurt, H.: A combination scheme for fuzzy clustering. International Journal of Pattern Recognition and Artificial Intelligence 16(7), 901–912 (2002)
Gordon, A.D., Vichi, M.: Fuzzy partition models for fitting a set of partitions. Psychometrika 66(2), 229–248 (2001)
Topchy, A., Law, M., Jain, A.K., Fred, A.: Analysis of consensus partition in clustering ensemble. In: Proceedings of the IEEE Intl. Conf. on Data Mining 2004, Brighton, UK, pp. 225–232 (2004)
Ayad, H.G., Kamel, M.S.: Cumulative voting consensus method for partitions with a variable number of clusters. IEEE Transactions on Pattern Analysis and Machine Intelligence 30(1), 160–173 (2008)
Ayad, H.G., Kamel, M.S.: On voting-based consensus of cluster ensembles. Patt. Recogn. 43, 1943–1953 (2010)
Chu, C.-W., Holliday, J., Willett, P.: Combining multiple classifications of chemical structures using consensus clustering. Bioorgan. Med. Chem. 20(18), 5366–5371 (2012)
Saeed, F., Salim, N., Abdo, A., Hentabli, H.: Combining Multiple Individual Clusterings of Chemical Structures Using Cluster-Based Similarity Partitioning Algorithm. In: Hassanien, A.E., Salem, A.-B.M., Ramadan, R., Kim, T.-h. (eds.) AMLTA 2012. CCIS, vol. 322, pp. 276–284. Springer, Heidelberg (2012)
Strehl, A., Ghosh, J.: Cluster Ensembles - A Knowledge Reuse Framework for Combining Multiple Partitions. J. Machine Learning Research 3, 583–617 (2002)
Sci Tegic Accelrys Inc., the MDL Drug Data Report (MDDR) database is available from at http://www.accelrys.com/ (accessed November 1, 2012)
Abdo, A., Chen, B., Mueller, C., Salim, N., Willett, P.: Ligand-Based Virtual Screening Using Bayesian Networks. J. Chem. Inf. Model. 50, 1012–1020 (2010)
Abdo, A., Salim, N.: New Fragment Weighting Scheme for the Bayesian Inference Network in Ligand-Based Virtual Screening. J. Chem. Inf. Model. 51, 25–32 (2011)
Abdo, A., Saeed, F., Hentabli, H., Ali, A., Salim, N., Ahmed, A.: Ligand expansion in ligand-based virtual screening using relevance feedback. Journal of Computer-Aided Molecular Design 26, 279–287 (2012)
Pipeline Pilot, Accelrys Software Inc., San Diego (2008)
Varin, T., Saettel, N., Villain, J., Lesnard, A., Dauphin, F., Bureau, R., Rault, S.J.: 3D Pharmacophore, hierarchical methods, and 5-HT4 receptor binding data. Enzyme Inhib.Med. Chem. 23, 593–603 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Saeed, F., Salim, N., Abdo, A., Hentabli, H. (2013). Combining Multiple Clusterings of Chemical Structures Using Cumulative Voting-Based Aggregation Algorithm. In: Selamat, A., Nguyen, N.T., Haron, H. (eds) Intelligent Information and Database Systems. ACIIDS 2013. Lecture Notes in Computer Science(), vol 7803. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36543-0_19
Download citation
DOI: https://doi.org/10.1007/978-3-642-36543-0_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36542-3
Online ISBN: 978-3-642-36543-0
eBook Packages: Computer ScienceComputer Science (R0)