Abstract
Many consensus clustering methods have been applied for combining multiple clusterings of chemical structures such as co-association matrix-based, graph-based, hypergraph-based and voting-based methods. However, the voting-based consensus methods showed the best performance among these methods. In this paper, a Weighted Cumulative Voting-based Aggregation Algorithm (W-CVAA) was developed for enhancing the effectiveness of combining multiple clusterings of chemical structures. The effectiveness of clusterings was evaluated based on the ability of clustering to separate active from inactive molecules in each cluster and the results were compared to Ward’s method, which is the standard clustering method for chemoinformatics applications. The chemical dataset MDL Drug Data Report (MDDR) was used. Experimental results suggest that the weighted cumulative voting-based consensus method can improve the effectiveness of combining multiple clustering of chemical structures.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Downs, G.M., Barnard, J.M.: Clustering of Chemical Structures on the Basis of Two-Dimensional Similarity Measures. Journal of Chemical Information and Computer Science 32, 644–649 (1992)
Willett, P.: Similarity and Clustering in Chemical Information Systems. Research Studies Press, Letchworth (1987)
Downs, G.M., Willett, P., Fisanick, W.: Similarity searching and clustering of chemical-structure databases using molecular property data. J. Chem. Inf. Comput. Sci. 34, 1094–1102 (1994)
Brown, R.D., Martin, Y.C.: The information content of 2D and 3D structural descriptors relevant to ligand–receptor binding. J. Chem. Inf. Comput. Sci. 37, 1–9 (1997)
Downs, G.M., Barnard, J.M.: Clustering methods and their uses in computational Chemistry. In: Lipkowitz, K.B., Boyd, D.B. (eds.) Reviews in Computational Chemistry, vol. 18. John Wiley (2002)
Holliday, J.D., Rodgers, S.L., Willet, P.: Clustering Files of chemical Structures Using the Fuzzy k-means Clustering Method. Journal of Chemical Information and Computer Science 44, 894–902 (2004)
Varin, T., Bureau, R., Mueller, C., Willett, P.: Clustering files of chemical structures using the Székely–Rizzo generalization of Ward’s method. Journal of Molecular Graphics and Modeling 28(2), 187–195 (2009)
Brown, R.D., Martin, Y.C.: Use of structure-activity data to compare structure-based clustering methods and descriptors for use in compound selection. J. Chem. Inf. Compute. Sci. 36, 572–584 (1996)
Salim, N.: Analysis and Comparison of Molecular Similarity Measures. University of Sheffield. PhD Thesis (2003)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data Clustering: a review. ACM Computing Surveys 31 (1999)
Vega-Pons, S., Ruiz-Schulcloper, J.: A survey of clustering ensemble algorithms. International Journal of Pattern Recognition and Artificial Intelligence 25(3), 337–372 (2011)
Fischer, B., Buhmann, J.M.: Bagging for path-based clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence 25(11), 1411–1415 (2003)
Dudoit, S., Fridlyand, J.: Bagging to improve the accuracy of a clustering procedure. Bioinformatics 19(9), 1090–1099 (2003)
Evgenia, D., Andreas, W., Kurt, H.: A combination scheme for fuzzy clustering. International Journal of Pattern Recognition and Artificial Intelligence 16(7), 901–912 (2002)
Gordon, A.D., Vichi, M.: Fuzzy partition models for fitting a set of partitions. Psychometrika 66(2), 229–248 (2001)
Topchy, A., Law, M., Jain, A.K., Fred, A.: Analysis of consensus partition in clustering ensemble. In: Proceedings of IEEE Intl. Conf. on Data Mining 2004, Brighton, UK, pp. 225–232 (2004)
Ayad, H.G., Kamel, M.S.: Cumulative voting consensus method for partitions with a variable number of clusters. IEEE Transactions on Pattern Analysis and Machine Intelligence 30(1), 160–173 (2008)
Ayad, H.G., Kamel, M.S.: On voting-based consensus of cluster ensembles. Patt. Recogn. 43, 1943–1953 (2010)
Chu, C.-W., Holliday, J., Willett, P.: Combining multiple classifications of chemical structures using consensus clustering. Bioorgan. Med. Chem. 20(18), 5366–5371 (2012)
Saeed, F., Salim, N., Abdo, A., Hentabli, H.: Graph-Based Consensus Clustering for Combining Multiple Clusterings of Chemical Structures. Journal of Molecular Informatics 32(2), 165–178 (2013)
Strehl, A., Ghosh, J.: Cluster Ensembles - A Knowledge Reuse Framework for Combining Multiple Partitions. J. Machine Learning Research 3, 583–617 (2002)
Saeed, F., Salim, N., Abdo, A.: Voting-based consensus clustering for combining multiple clusterings of chemical structures. J. Cheminf, 4, Article 37 (2012), http://www.jcheminf.com/content/4/1/37 (accessed March 20, 2013)
Saeed, F., Salim, N., Abdo, A.: Consensus methods for combining multiple clusterings of chemical structures. Journal of Chemical Information and Modeling 53(5), 1026–1034 (2013)
Sci Tegic Accelrys Inc., the MDL Drug Data Report (MDDR) database is available from at http://www.accelrys.com/ (accessed June 1, 2013)
Pipeline Pilot, Accelrys Software Inc., San Diego (2008)
Ghose, A.K., Crippen, G.M.: Atomic physicochemical parameters for three-dimensional structure-directed quantitative structure−activity relationships 1. Partition coefficients as a measure of hydrophobicity. J. Comput. Chem. 7, 565–577 (1986)
Ghose, A.K., Viswanadhan, V.N., Wendoloski, J.J.: Prediction of hydrophobic (lipophilic) properties of small organic molecules using fragmental methods: An analysis of ALOGP and CLOGP methods. J. Phys. Chem. A. 102, 3762–3772 (1998)
Rogers, D., Hahn, M.: Extended-connectivity fingerprints. J. Chem. Inf. Model. 50, 742–754 (2010)
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York (1991)
Van Rijsbergen, C.J.: Information Retrieval. Butterworth, London (1979)
Varin, T., Saettel, N., Villain, J., Lesnard, A., Dauphin, F., Bureau, R., Rault, S.J.: 3D Pharmacophore, hierarchical methods, and 5-HT4 receptor binding data. Enzyme Inhib. Med. Chem. 23, 593–603 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Saeed, F., Salim, N. (2013). Weighted Cumulative Voting-Based Aggregation Algorithm for Combining Multiple Clusterings of Chemical Structures. In: Banchs, R.E., Silvestri, F., Liu, TY., Zhang, M., Gao, S., Lang, J. (eds) Information Retrieval Technology. AIRS 2013. Lecture Notes in Computer Science, vol 8281. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45068-6_16
Download citation
DOI: https://doi.org/10.1007/978-3-642-45068-6_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-45067-9
Online ISBN: 978-3-642-45068-6
eBook Packages: Computer ScienceComputer Science (R0)