Skip to main content

A Soft Hierarchical Algorithm for the Clustering of Multiple Bioactive Chemical Compounds

  • Conference paper
Bioinformatics Research and Development (BIRD 2007)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4414))

Included in the following conference series:

  • 1202 Accesses

Abstract

Most of the clustering methods used in the clustering of chemical structures such as Ward’s, Group Average, K- means and Jarvis-Patrick, are known as hard or crisp as they partition a dataset into strictly disjoint subsets; and thus are not suitable for the clustering of chemical structures exhibiting more than one activity. Although, fuzzy clustering algorithms such as fuzzy c-means provides an inherent mechanism for the clustering of overlapping structures (objects) but this potential of the fuzzy methods which comes from its fuzzy membership functions have not been utilized effectively. In this work a fuzzy hierarchical algorithm is developed which provides a mechanism not only to benefit from the fuzzy clustering process but also to get advantage of the multiple membership function of the fuzzy clustering. The algorithm divides each and every cluster, if its size is larger than a pre-determined threshold, into two sub clusters based on the membership values of each structure. A structure is assigned to one or both the clusters if its membership value is very high or very similar respectively. The performance of the algorithm is evaluated on two bench mark datasets and a large dataset of compound structures derived from MDL’s MDDR database. The results of the algorithm show significant improvement in comparison to a similar implementation of the hard c-means algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Hecht, P.: High-throughput screening: beating the odds with informatics-driven chemistry. Current Drug Discovery, 21–24 (2002)

    Google Scholar 

  2. Warr, W.A.: High-Throughput Chemistry: Handbook of Chemoinformatics, vol. 4. Wiley-VCH, Weinheim (2003)

    Google Scholar 

  3. Hall, D.G., Manku, S., Wang, F.: Solution- and Solid-Phase Strategies for the Design, Synthesis, and Screening of Libraries Based on Natural Product Templates: A Comprehensive Survey. Journal of combinatorial Chemistry 3, 125–150 (2001)

    Article  Google Scholar 

  4. Parker, C.N., Shamu, C.E., Kraybill, B., Austin, C.P., Bajorath, J.: Measure, mine, model, and manipulate: the future for HTS and chemoinformatics? Drug Discovery Today 11(19-20), 863–865 (2006)

    Article  Google Scholar 

  5. Tryon, R.C.: Cluster Analysis. Edwards Brothers, Ann Arbor (1939)

    Google Scholar 

  6. Willett, P.: Similarity And Clustering In Chemical Information Systems. Research Studies Press, Letchworth (1987)

    Google Scholar 

  7. Downs, G.M., Barnard, J.M.: Clustering of Chemical Structures on the Basis of Two-Dimensional Similarity Measures. Journal of chemical information and computer science 32(6) (1992)

    Google Scholar 

  8. Downs, G.M., Willett, P., Fisanick, W.: Similarity searching and clustering of chemical structure databases using molecular property data. Journal of Chemical Information and Computer Science 34, 1094–1102 (1994)

    Article  Google Scholar 

  9. Brown, R.D., Martin, Y.C.: Use of structure- Activity data to compare structure based clustering methods and descriptors for use in compound selection. Journal of chemical Information and computer science 36, 572–584 (1996)

    Article  Google Scholar 

  10. Holliday, J.D., Rodgers, S.L., Willet, P.: Clustering Files of chemical Structures Using the Fuzzy k-means Clustering Method. Journal of chemical Information and computer science 44, 894–902 (2004)

    Article  Google Scholar 

  11. Adamson, G.W., Bush, J.A.: A comparison of some similarity and dissimilarity measures in the classification of chemical structures. Journal of chemical Information and computer science 15, 55–58 (1975)

    Google Scholar 

  12. Shah, J.Z., Salim, N.: FCM and G-K clustering of chemical dataset using topological indices. In: Proc. of the First International Symposium on Bio-Inspired Computing, Johor Bahru, Malaysia (2005)

    Google Scholar 

  13. Bocker, A., Derksen, S., Schmidt, E., Teckentrup, A., Schneider, G.: A Hierarchical Clustering Approach for Large Compound Libraries. Journal of chemical Information and modeling 45(4), 807–815 (2005)

    Article  Google Scholar 

  14. Bocker, A., Schneider, G., Teckentrup, A.: NIPALSTREE: A New Hierarchical Clustering Approach for Large Compound Libraries and Its Application to Virtual Screening. Journal of chemical Information and computer science (2006)

    Google Scholar 

  15. MDL’s Drug Data Report. Elsevier MDL. http://www.mdli.com/products/knowledge/drug_data_report/index.jsp

  16. Fisher, R.A.: The use of multiple measurements in axonomic problems. Annual Eugenics 7, 179–188 (1936)

    Google Scholar 

  17. Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science Magazine 285, 531–537 (1999)

    Google Scholar 

  18. Chemical Abstract Service, website: http://www.cas.org/

  19. Dragon, melano chemoinformatics, http://www.talete.mi.it

  20. Jolife, I.: Principal component analysis. Springer, New York (1986)

    Google Scholar 

  21. MVSP 3.13, Kovach computing services: http://www.kovcomp.com/

  22. Bezdek, J.C., Hathaway, R.J.: Numerical convergence and interpretation of the fuzzy c-shells clustering algorithm. IEEE Transaction on Neural Networks 3, 787–793 (1992)

    Article  Google Scholar 

  23. Dave, R.N.: Fuzzy shell-clustering and applications to circle detection in digital images. International Journal of General Systems 16, 343–355 (1990)

    Article  MathSciNet  Google Scholar 

  24. Hopner, F., Klawonn, F., Kruse, R., Runkler, T.: Fuzzy Cluster Analysis. John Wiley & Sons, Chichester (1999)

    Google Scholar 

  25. Krishnapurum, R., Nasraoui, O., Frigui, H.: The Fuzzy C-shells algorithm: A new approach. IEEE Transaction on Neural Networks 3(5), 663–671 (1992)

    Article  Google Scholar 

  26. Man, Y.H., Gath, I.: Detection and separation of ring-shaped clusters using fuzzy clustering. IEEE Transaction on pattern analysis and machine intelligence 16(8), 855–861 (1994)

    Article  Google Scholar 

  27. Dunn, J.C., Fuzzy, A.: Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters. Journal of Cybernetics 3, 32–57 (1973)

    Article  MATH  Google Scholar 

  28. Bezdek, J.C., Ehrlich, R., Full, W.: FCM: Fuzzy c-means algorithm, Computers and Geoscience (1984)

    Google Scholar 

  29. Choe, H., Jordan, J.B.: On the optimal choice of parameters in a fuzzy c-means algorithm. In: Proc. of the IEEE Conference on Fuzzy Systemspp, pp. 349–354. IEEE Computer Society Press, Los Alamitos (1992)

    Chapter  Google Scholar 

  30. Gath, I., Geva, A.B.: Unsupervised optimal fuzzy clustering. IEEE Transaction on pattern analysis and machine intelligence 11(7), 773–781 (1989)

    Article  Google Scholar 

  31. Geva, A.B.: Hierarchical unsupervised fuzzy clustering. IEEE Transaction on Fuzzy Systems 7(6), 723–733 (1999)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Sepp Hochreiter Roland Wagner

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Shah, J.Z., Salim, N.b. (2007). A Soft Hierarchical Algorithm for the Clustering of Multiple Bioactive Chemical Compounds. In: Hochreiter, S., Wagner, R. (eds) Bioinformatics Research and Development. BIRD 2007. Lecture Notes in Computer Science(), vol 4414. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71233-6_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-71233-6_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-71232-9

  • Online ISBN: 978-3-540-71233-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics