A Soft Hierarchical Algorithm for the Clustering of Multiple Bioactive Chemical Compounds

Shah, Jehan Zeb; Salim, Naomie bt

doi:10.1007/978-3-540-71233-6_12

Jehan Zeb Shah¹ &
Naomie bt Salim¹

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4414))

Included in the following conference series:

International Conference on Bioinformatics Research and Development

1202 Accesses

Abstract

Most of the clustering methods used in the clustering of chemical structures such as Ward’s, Group Average, K- means and Jarvis-Patrick, are known as hard or crisp as they partition a dataset into strictly disjoint subsets; and thus are not suitable for the clustering of chemical structures exhibiting more than one activity. Although, fuzzy clustering algorithms such as fuzzy c-means provides an inherent mechanism for the clustering of overlapping structures (objects) but this potential of the fuzzy methods which comes from its fuzzy membership functions have not been utilized effectively. In this work a fuzzy hierarchical algorithm is developed which provides a mechanism not only to benefit from the fuzzy clustering process but also to get advantage of the multiple membership function of the fuzzy clustering. The algorithm divides each and every cluster, if its size is larger than a pre-determined threshold, into two sub clusters based on the membership values of each structure. A structure is assigned to one or both the clusters if its membership value is very high or very similar respectively. The performance of the algorithm is evaluated on two bench mark datasets and a large dataset of compound structures derived from MDL’s MDDR database. The results of the algorithm show significant improvement in comparison to a similar implementation of the hard c-means algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A Comparative Analysis of Algorithms and Metrics to Perform Clustering

A Comparative Analysis Between Crisp and Fuzzy Data Clustering Approaches for Traditional and Bioinspired Algorithms

Automatic clustering algorithms: a systematic review and bibliometric analysis of relevant literature

Article 10 October 2020

References

Hecht, P.: High-throughput screening: beating the odds with informatics-driven chemistry. Current Drug Discovery, 21–24 (2002)
Google Scholar
Warr, W.A.: High-Throughput Chemistry: Handbook of Chemoinformatics, vol. 4. Wiley-VCH, Weinheim (2003)
Google Scholar
Hall, D.G., Manku, S., Wang, F.: Solution- and Solid-Phase Strategies for the Design, Synthesis, and Screening of Libraries Based on Natural Product Templates: A Comprehensive Survey. Journal of combinatorial Chemistry 3, 125–150 (2001)
Article Google Scholar
Parker, C.N., Shamu, C.E., Kraybill, B., Austin, C.P., Bajorath, J.: Measure, mine, model, and manipulate: the future for HTS and chemoinformatics? Drug Discovery Today 11(19-20), 863–865 (2006)
Article Google Scholar
Tryon, R.C.: Cluster Analysis. Edwards Brothers, Ann Arbor (1939)
Google Scholar
Willett, P.: Similarity And Clustering In Chemical Information Systems. Research Studies Press, Letchworth (1987)
Google Scholar
Downs, G.M., Barnard, J.M.: Clustering of Chemical Structures on the Basis of Two-Dimensional Similarity Measures. Journal of chemical information and computer science 32(6) (1992)
Google Scholar
Downs, G.M., Willett, P., Fisanick, W.: Similarity searching and clustering of chemical structure databases using molecular property data. Journal of Chemical Information and Computer Science 34, 1094–1102 (1994)
Article Google Scholar
Brown, R.D., Martin, Y.C.: Use of structure- Activity data to compare structure based clustering methods and descriptors for use in compound selection. Journal of chemical Information and computer science 36, 572–584 (1996)
Article Google Scholar
Holliday, J.D., Rodgers, S.L., Willet, P.: Clustering Files of chemical Structures Using the Fuzzy k-means Clustering Method. Journal of chemical Information and computer science 44, 894–902 (2004)
Article Google Scholar
Adamson, G.W., Bush, J.A.: A comparison of some similarity and dissimilarity measures in the classification of chemical structures. Journal of chemical Information and computer science 15, 55–58 (1975)
Google Scholar
Shah, J.Z., Salim, N.: FCM and G-K clustering of chemical dataset using topological indices. In: Proc. of the First International Symposium on Bio-Inspired Computing, Johor Bahru, Malaysia (2005)
Google Scholar
Bocker, A., Derksen, S., Schmidt, E., Teckentrup, A., Schneider, G.: A Hierarchical Clustering Approach for Large Compound Libraries. Journal of chemical Information and modeling 45(4), 807–815 (2005)
Article Google Scholar
Bocker, A., Schneider, G., Teckentrup, A.: NIPALSTREE: A New Hierarchical Clustering Approach for Large Compound Libraries and Its Application to Virtual Screening. Journal of chemical Information and computer science (2006)
Google Scholar
MDL’s Drug Data Report. Elsevier MDL. http://www.mdli.com/products/knowledge/drug_data_report/index.jsp
Fisher, R.A.: The use of multiple measurements in axonomic problems. Annual Eugenics 7, 179–188 (1936)
Google Scholar
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science Magazine 285, 531–537 (1999)
Google Scholar
Chemical Abstract Service, website: http://www.cas.org/
Dragon, melano chemoinformatics, http://www.talete.mi.it
Jolife, I.: Principal component analysis. Springer, New York (1986)
Google Scholar
MVSP 3.13, Kovach computing services: http://www.kovcomp.com/
Bezdek, J.C., Hathaway, R.J.: Numerical convergence and interpretation of the fuzzy c-shells clustering algorithm. IEEE Transaction on Neural Networks 3, 787–793 (1992)
Article Google Scholar
Dave, R.N.: Fuzzy shell-clustering and applications to circle detection in digital images. International Journal of General Systems 16, 343–355 (1990)
Article MathSciNet Google Scholar
Hopner, F., Klawonn, F., Kruse, R., Runkler, T.: Fuzzy Cluster Analysis. John Wiley & Sons, Chichester (1999)
Google Scholar
Krishnapurum, R., Nasraoui, O., Frigui, H.: The Fuzzy C-shells algorithm: A new approach. IEEE Transaction on Neural Networks 3(5), 663–671 (1992)
Article Google Scholar
Man, Y.H., Gath, I.: Detection and separation of ring-shaped clusters using fuzzy clustering. IEEE Transaction on pattern analysis and machine intelligence 16(8), 855–861 (1994)
Article Google Scholar
Dunn, J.C., Fuzzy, A.: Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters. Journal of Cybernetics 3, 32–57 (1973)
Article MATH Google Scholar
Bezdek, J.C., Ehrlich, R., Full, W.: FCM: Fuzzy c-means algorithm, Computers and Geoscience (1984)
Google Scholar
Choe, H., Jordan, J.B.: On the optimal choice of parameters in a fuzzy c-means algorithm. In: Proc. of the IEEE Conference on Fuzzy Systemspp, pp. 349–354. IEEE Computer Society Press, Los Alamitos (1992)
Chapter Google Scholar
Gath, I., Geva, A.B.: Unsupervised optimal fuzzy clustering. IEEE Transaction on pattern analysis and machine intelligence 11(7), 773–781 (1989)
Article Google Scholar
Geva, A.B.: Hierarchical unsupervised fuzzy clustering. IEEE Transaction on Fuzzy Systems 7(6), 723–733 (1999)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Computer Science & Information Systems, Universiti Teknologi Malaysia, 81310 Skudai, Johor Darul Ta’zim, Malaysia
Jehan Zeb Shah & Naomie bt Salim

Authors

Jehan Zeb Shah
View author publications
You can also search for this author in PubMed Google Scholar
Naomie bt Salim
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Sepp Hochreiter Roland Wagner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shah, J.Z., Salim, N.b. (2007). A Soft Hierarchical Algorithm for the Clustering of Multiple Bioactive Chemical Compounds. In: Hochreiter, S., Wagner, R. (eds) Bioinformatics Research and Development. BIRD 2007. Lecture Notes in Computer Science(), vol 4414. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71233-6_12

Download citation

DOI: https://doi.org/10.1007/978-3-540-71233-6_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71232-9
Online ISBN: 978-3-540-71233-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics