Skip to main content

Identification of Lead Compounds in Pharmaceutical Data Using Data Mining Techniques

  • Conference paper
  • First Online:
Advances in Informatics (PCI 2001)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2563))

Included in the following conference series:

  • 456 Accesses

Abstract

As the use of High-Throughput Screening (HTS) systems becomes more routine in the drug discovery process, there is an increasing need for fast and reliable analysis of the massive amounts of resulting biological data. At the forefront of the methods used for analyzing HTS data is cluster analysis. It is used in this context to find natural groups in the data, thereby revealing families of compounds that exhibit increased activity towards a specific biological target. Scientists in this area have traditionally used a number of clustering algorithms, distance (similarity) measures, and compound representation methods. We first discuss the nature of chemical and biological data and how it adversely impacts the current analysis methodology. We emphasize the inability of widely used methods to discover the chemical families in a pharmaceutical dataset and point out specific problems occurring when one attempts to apply these common clustering and other statistical methods on chemical data. We then introduce a new, data-mining algorithm that employs a newly proposed clustering method and expert knowledge to accommodate user requests and produce chemically sensible results. This new, chemically aware algorithm employs molecular structure to find true chemical structural families of compounds in pharmaceutical data, while at the same time accommodates the multi-domain nature of chemical compounds.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. MacCuish J.D., Nicolaou C.A. and MacCuish N.J.: “Ties in Proximity and Clustering Compounds”, J. Chem. Inf. Comput. Sci., Vol.41, No.1, pp.134–146, 2001.

    Article  Google Scholar 

  2. Nicolaou C.A.: “Automated Lead Discovery and Development in HTS Da-tasets”, JALA, Vol.6, No.2, pp.60–63, 2001.

    Google Scholar 

  3. Nicolaou C.A., MacCuish J.D. and Tamura S.Y.: “A New Multi-domain Clustering Algorithm for Lead Discovery that Exploits Ties in Proximities”, Proceedings 13th European Symposium on Quantitative Structure-Activity Relationships, September, 2000.

    Google Scholar 

  4. Xie X.L. and Beni G.: “A Validity Measure for Fuzzy Clustering”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.13, No.8, pp.841–847, 2001.

    Article  Google Scholar 

  5. Engels M.F., Thielemans T., Verbinnen D., Tollenacre J. and Verbeeck R.: “CerBeruS: a System Supporting the Sequential Screening Process”, J. Chem. Inf. Comput. Sci., Vol.40, No.2. pp.241–245. 2000.

    Article  Google Scholar 

  6. Willett P., Winterman V. and Bawden D.: “Implementation of Non-hierarchic Cluster Analysis Methods in Chemical Information Systems: Selection of Compounds for Biological Testing and Clustering of Substructure Search Output”, J. Chem. Inf. Comput. Sci., Vol.26, pp.109–118, 1986.

    Article  Google Scholar 

  7. Brown R.D. and Martin Y.C.: “Use of Structure-activity Data to Compare Structure-based Clustering Methods and Descriptors for Use in Compound Selection”, J. Chem. Inf. Comput. Sci., Vol.36, pp.572–584, 1996.

    Article  Google Scholar 

  8. Wild D.J. and Blankley C.J.: “Comparison of 2d Fingerprint Types and Hierarchy Level Selection Methods for Structural Grouping Using Wards Clustering”, J. Chem. Inf. Comput. Sci., Vol.40, pp.155–162, 2000.

    Article  Google Scholar 

  9. Godden J., Xue L. and Bajorath J.: “Combinatorial Preferences Affect Molecular Similarity/diversity Calculations Using Binary Fingerprints and Tanimoto Coefficients”, J. Chem. Inf. Comput. Sci., Vol.40, pp.163–166, 2000.

    Article  Google Scholar 

  10. Flower D.R.: “On the Properties of Bit String-based Measures of Chemical Similarity”, J. Chem. Inf. Comput. Sci., Vol.38, pp.379–386, 1998.

    Article  Google Scholar 

  11. Bertrand P.: “Structural Properties of Pyramidal Clustering”, DIMACS Series in Discrete Mathematics and Theoretical Computer Science, Vol.19, pp.35–53, 1995.

    MathSciNet  Google Scholar 

  12. Barnard J.M. and Downs G.M.: “Clustering of Chemical Structures on the Basis of Two-dimensional Similarity Measures”, J. Chem. Inf. Comput. Sci., Vol.32, No.6, pp.644–649, 1992.

    Article  Google Scholar 

  13. MacCuish J.D. and Nicolaou C.A.: “Method and System for Artificial Intelligence Directed Lead Discovery Through Multi-Domain Agglomerative Clustering. Application for a United States Patent”, MBHB Case No. 99,832. Assignee: Bioreason Inc.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Nicolaou, C.A. (2003). Identification of Lead Compounds in Pharmaceutical Data Using Data Mining Techniques. In: Manolopoulos, Y., Evripidou, S., Kakas, A.C. (eds) Advances in Informatics. PCI 2001. Lecture Notes in Computer Science, vol 2563. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-38076-0_9

Download citation

  • DOI: https://doi.org/10.1007/3-540-38076-0_9

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-07544-8

  • Online ISBN: 978-3-540-38076-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics