Abstract
Most of the clustering methods used in the clustering of chemical structures such as Ward’s, Group Average, K- means and Jarvis-Patrick, are known as hard or crisp as they partition a dataset into strictly disjoint subsets; and thus are not suitable for the clustering of chemical structures exhibiting more than one activity. Although, fuzzy clustering algorithms such as fuzzy c-means provides an inherent mechanism for the clustering of overlapping structures (objects) but this potential of the fuzzy methods which comes from its fuzzy membership functions have not been utilized effectively. In this work a fuzzy hierarchical algorithm is developed which provides a mechanism not only to benefit from the fuzzy clustering process but also to get advantage of the multiple membership function of the fuzzy clustering. The algorithm divides each and every cluster, if its size is larger than a pre-determined threshold, into two sub clusters based on the membership values of each structure. A structure is assigned to one or both the clusters if its membership value is very high or very similar respectively. The performance of the algorithm is evaluated on two bench mark datasets and a large dataset of compound structures derived from MDL’s MDDR database. The results of the algorithm show significant improvement in comparison to a similar implementation of the hard c-means algorithm.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Hecht, P.: High-throughput screening: beating the odds with informatics-driven chemistry. Current Drug Discovery, 21–24 (2002)
Warr, W.A.: High-Throughput Chemistry: Handbook of Chemoinformatics, vol. 4. Wiley-VCH, Weinheim (2003)
Hall, D.G., Manku, S., Wang, F.: Solution- and Solid-Phase Strategies for the Design, Synthesis, and Screening of Libraries Based on Natural Product Templates: A Comprehensive Survey. Journal of combinatorial Chemistry 3, 125–150 (2001)
Parker, C.N., Shamu, C.E., Kraybill, B., Austin, C.P., Bajorath, J.: Measure, mine, model, and manipulate: the future for HTS and chemoinformatics? Drug Discovery Today 11(19-20), 863–865 (2006)
Tryon, R.C.: Cluster Analysis. Edwards Brothers, Ann Arbor (1939)
Willett, P.: Similarity And Clustering In Chemical Information Systems. Research Studies Press, Letchworth (1987)
Downs, G.M., Barnard, J.M.: Clustering of Chemical Structures on the Basis of Two-Dimensional Similarity Measures. Journal of chemical information and computer science 32(6) (1992)
Downs, G.M., Willett, P., Fisanick, W.: Similarity searching and clustering of chemical structure databases using molecular property data. Journal of Chemical Information and Computer Science 34, 1094–1102 (1994)
Brown, R.D., Martin, Y.C.: Use of structure- Activity data to compare structure based clustering methods and descriptors for use in compound selection. Journal of chemical Information and computer science 36, 572–584 (1996)
Holliday, J.D., Rodgers, S.L., Willet, P.: Clustering Files of chemical Structures Using the Fuzzy k-means Clustering Method. Journal of chemical Information and computer science 44, 894–902 (2004)
Adamson, G.W., Bush, J.A.: A comparison of some similarity and dissimilarity measures in the classification of chemical structures. Journal of chemical Information and computer science 15, 55–58 (1975)
Shah, J.Z., Salim, N.: FCM and G-K clustering of chemical dataset using topological indices. In: Proc. of the First International Symposium on Bio-Inspired Computing, Johor Bahru, Malaysia (2005)
Bocker, A., Derksen, S., Schmidt, E., Teckentrup, A., Schneider, G.: A Hierarchical Clustering Approach for Large Compound Libraries. Journal of chemical Information and modeling 45(4), 807–815 (2005)
Bocker, A., Schneider, G., Teckentrup, A.: NIPALSTREE: A New Hierarchical Clustering Approach for Large Compound Libraries and Its Application to Virtual Screening. Journal of chemical Information and computer science (2006)
MDL’s Drug Data Report. Elsevier MDL. http://www.mdli.com/products/knowledge/drug_data_report/index.jsp
Fisher, R.A.: The use of multiple measurements in axonomic problems. Annual Eugenics 7, 179–188 (1936)
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science Magazine 285, 531–537 (1999)
Chemical Abstract Service, website: http://www.cas.org/
Dragon, melano chemoinformatics, http://www.talete.mi.it
Jolife, I.: Principal component analysis. Springer, New York (1986)
MVSP 3.13, Kovach computing services: http://www.kovcomp.com/
Bezdek, J.C., Hathaway, R.J.: Numerical convergence and interpretation of the fuzzy c-shells clustering algorithm. IEEE Transaction on Neural Networks 3, 787–793 (1992)
Dave, R.N.: Fuzzy shell-clustering and applications to circle detection in digital images. International Journal of General Systems 16, 343–355 (1990)
Hopner, F., Klawonn, F., Kruse, R., Runkler, T.: Fuzzy Cluster Analysis. John Wiley & Sons, Chichester (1999)
Krishnapurum, R., Nasraoui, O., Frigui, H.: The Fuzzy C-shells algorithm: A new approach. IEEE Transaction on Neural Networks 3(5), 663–671 (1992)
Man, Y.H., Gath, I.: Detection and separation of ring-shaped clusters using fuzzy clustering. IEEE Transaction on pattern analysis and machine intelligence 16(8), 855–861 (1994)
Dunn, J.C., Fuzzy, A.: Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters. Journal of Cybernetics 3, 32–57 (1973)
Bezdek, J.C., Ehrlich, R., Full, W.: FCM: Fuzzy c-means algorithm, Computers and Geoscience (1984)
Choe, H., Jordan, J.B.: On the optimal choice of parameters in a fuzzy c-means algorithm. In: Proc. of the IEEE Conference on Fuzzy Systemspp, pp. 349–354. IEEE Computer Society Press, Los Alamitos (1992)
Gath, I., Geva, A.B.: Unsupervised optimal fuzzy clustering. IEEE Transaction on pattern analysis and machine intelligence 11(7), 773–781 (1989)
Geva, A.B.: Hierarchical unsupervised fuzzy clustering. IEEE Transaction on Fuzzy Systems 7(6), 723–733 (1999)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Shah, J.Z., Salim, N.b. (2007). A Soft Hierarchical Algorithm for the Clustering of Multiple Bioactive Chemical Compounds. In: Hochreiter, S., Wagner, R. (eds) Bioinformatics Research and Development. BIRD 2007. Lecture Notes in Computer Science(), vol 4414. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71233-6_12
Download citation
DOI: https://doi.org/10.1007/978-3-540-71233-6_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71232-9
Online ISBN: 978-3-540-71233-6
eBook Packages: Computer ScienceComputer Science (R0)