Abstract
Consensus clustering methods have been used in many areas to improve the quality of individual clusterings. In this paper, graph-based consensus clustering, Cluster-based Similarity Partitioning Algorithm (CSPA), was used to improve the quality of chemical structures clustering by enhancing the ability to separate active from inactive molecules in each cluster and improve the robustness and stability of individual clusterings. The clustering was evaluated using Quality Partition Index (QPI) measure and the results were compared with the Ward’s clustering method. The chemical dataset MDL Drug Data Report (MDDR) database was used for experiments. The results obtained by combining multiple K-means clusterings showed that graph-based consensus clustering, CSPA, can improve the quality of individual chemical structure clusterings.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Adamson, G.W., Bush, J.A.: A method for the automatic classification of chemical structures. Information Storage and Retrieval 9, 561–568 (1973)
Downs, G.M., Barnard, J.M.: Clustering of Chemical Structures on the Basis of Two-Dimensional Similarity Measures. Journal of Chemical Information and Computer Science 32, 644–649 (1992)
Brown, R.D., Martin, Y.C.: Use of structure-activity data to compare structure-based clustering methods and descriptors for use in compound selection. J. Chem. Inf. Comput. Sci. 36, 572–584 (1996)
Vega-Pons, S., Ruiz-Schulcloper, J.: A survey of clustering ensemble algorithms. International Journal of Pattern Recognition and Artificial Intelligence 25(3), 337–372 (2011)
Fred, A.L.N., Jain, A.K.: Combining multiple clustering using evidence accumulation. IEEE Trans. Patt. Anal. Mach. Intell. 27, 835–850 (2005)
Topchy, A., Jain, A.K., Punch, W.: A mixture model of clustering ensembles. In: SIAM Int. Conf. Data Mining, pp. 379–390 (2004)
Chu, C.-W., Holliday, J., Willett, P.: Combining multiple classifications of chemical structures using consensus clustering. Bioorganic & Medicinal Chemistry (March 10, 2012)
Salim, N., Holliday, J.D., Willett, P.: Combination of Fingerprint-Based Similarity Coefficients Using Data Fusion. J. Chem. Inf. Comput. Sci. 43, 435–442 (2003)
Willet, P.: Enhancing the Effectiveness of Ligand-Based Virtual Screening Using Data Fusion. QSAR Comb. Sci. 25, 1143–1152 (2006)
Chen, B., Mueller, C., Willett, P.: Combination Rules for Group Fusion in Similarity-Based Virtual Screening. Mol. Inf. 29, 533–541 (2010)
Moffat, K., Gillet, V.J., Whittle, M., Bravi, G., Leach, A.R.: A Comparison of Field-Based Similarity Searching Methods: CatShape, FBSS, and ROCS. J. Chem. Inf. Model. 48, 719–729 (2008)
Abdo, A., Chen, B., Mueller, C., Salim, N., Willett, P.: Ligand-Based Virtual Screening Using Bayesian Networks. J. Chem. Inf. Model. 50, 1012–1020 (2010)
Abdo, A., Salim, N.: New Fragment Weighting Scheme for the Bayesian Inference Network in Ligand-Based Virtual Screening. J. Chem. Inf. Model. 51, 25–32 (2011)
Abdo, A., Saeed, F., Hentabli, H., Ali, A., Salim, N., Ahmed, A.: Ligand expansion in ligand-based virtual screening using relevance feedback. Journal of Computer-Aided Molecular Design 26, 279–287 (2012)
Sci Tegic Accelrys Inc. (September 1, 2012), http://www.http//accelrys.com/
Strehl, A., Ghosh, J.: Cluster Ensembles - A Knowledge Reuse Framework for Combining Multiple Partitions. J. Machine Learning Research 3, 583–617 (2002)
Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Scient. Comput. 20, 359–392 (1998)
Varin, T., Saettel, N., Villain, J., Lesnard, A., Dauphin, F., Bureau, R., Rault, S.J.: Enzyme Inhib. Med. Chem. 23, 593 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Saeed, F., Salim, N., Abdo, A., Hentabli, H. (2012). Combining Multiple K-Means Clusterings of Chemical Structures Using Cluster-Based Similarity Partitioning Algorithm. In: Hassanien, A.E., Salem, AB.M., Ramadan, R., Kim, Th. (eds) Advanced Machine Learning Technologies and Applications. AMLTA 2012. Communications in Computer and Information Science, vol 322. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35326-0_31
Download citation
DOI: https://doi.org/10.1007/978-3-642-35326-0_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35325-3
Online ISBN: 978-3-642-35326-0
eBook Packages: Computer ScienceComputer Science (R0)