Identification of Lead Compounds in Pharmaceutical Data Using Data Mining Techniques

Nicolaou, Christodoulos A.

doi:10.1007/3-540-38076-0_9

Christodoulos A. Nicolaou⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2563))

Included in the following conference series:

Panhellenic Conference on Informatics

456 Accesses

Abstract

As the use of High-Throughput Screening (HTS) systems becomes more routine in the drug discovery process, there is an increasing need for fast and reliable analysis of the massive amounts of resulting biological data. At the forefront of the methods used for analyzing HTS data is cluster analysis. It is used in this context to find natural groups in the data, thereby revealing families of compounds that exhibit increased activity towards a specific biological target. Scientists in this area have traditionally used a number of clustering algorithms, distance (similarity) measures, and compound representation methods. We first discuss the nature of chemical and biological data and how it adversely impacts the current analysis methodology. We emphasize the inability of widely used methods to discover the chemical families in a pharmaceutical dataset and point out specific problems occurring when one attempts to apply these common clustering and other statistical methods on chemical data. We then introduce a new, data-mining algorithm that employs a newly proposed clustering method and expert knowledge to accommodate user requests and produce chemically sensible results. This new, chemically aware algorithm employs molecular structure to find true chemical structural families of compounds in pharmaceutical data, while at the same time accommodates the multi-domain nature of chemical compounds.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

MacCuish J.D., Nicolaou C.A. and MacCuish N.J.: “Ties in Proximity and Clustering Compounds”, J. Chem. Inf. Comput. Sci., Vol.41, No.1, pp.134–146, 2001.
Article Google Scholar
Nicolaou C.A.: “Automated Lead Discovery and Development in HTS Da-tasets”, JALA, Vol.6, No.2, pp.60–63, 2001.
Google Scholar
Nicolaou C.A., MacCuish J.D. and Tamura S.Y.: “A New Multi-domain Clustering Algorithm for Lead Discovery that Exploits Ties in Proximities”, Proceedings 13th European Symposium on Quantitative Structure-Activity Relationships, September, 2000.
Google Scholar
Xie X.L. and Beni G.: “A Validity Measure for Fuzzy Clustering”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.13, No.8, pp.841–847, 2001.
Article Google Scholar
Engels M.F., Thielemans T., Verbinnen D., Tollenacre J. and Verbeeck R.: “CerBeruS: a System Supporting the Sequential Screening Process”, J. Chem. Inf. Comput. Sci., Vol.40, No.2. pp.241–245. 2000.
Article Google Scholar
Willett P., Winterman V. and Bawden D.: “Implementation of Non-hierarchic Cluster Analysis Methods in Chemical Information Systems: Selection of Compounds for Biological Testing and Clustering of Substructure Search Output”, J. Chem. Inf. Comput. Sci., Vol.26, pp.109–118, 1986.
Article Google Scholar
Brown R.D. and Martin Y.C.: “Use of Structure-activity Data to Compare Structure-based Clustering Methods and Descriptors for Use in Compound Selection”, J. Chem. Inf. Comput. Sci., Vol.36, pp.572–584, 1996.
Article Google Scholar
Wild D.J. and Blankley C.J.: “Comparison of 2d Fingerprint Types and Hierarchy Level Selection Methods for Structural Grouping Using Wards Clustering”, J. Chem. Inf. Comput. Sci., Vol.40, pp.155–162, 2000.
Article Google Scholar
Godden J., Xue L. and Bajorath J.: “Combinatorial Preferences Affect Molecular Similarity/diversity Calculations Using Binary Fingerprints and Tanimoto Coefficients”, J. Chem. Inf. Comput. Sci., Vol.40, pp.163–166, 2000.
Article Google Scholar
Flower D.R.: “On the Properties of Bit String-based Measures of Chemical Similarity”, J. Chem. Inf. Comput. Sci., Vol.38, pp.379–386, 1998.
Article Google Scholar
Bertrand P.: “Structural Properties of Pyramidal Clustering”, DIMACS Series in Discrete Mathematics and Theoretical Computer Science, Vol.19, pp.35–53, 1995.
MathSciNet Google Scholar
Barnard J.M. and Downs G.M.: “Clustering of Chemical Structures on the Basis of Two-dimensional Similarity Measures”, J. Chem. Inf. Comput. Sci., Vol.32, No.6, pp.644–649, 1992.
Article Google Scholar
MacCuish J.D. and Nicolaou C.A.: “Method and System for Artificial Intelligence Directed Lead Discovery Through Multi-Domain Agglomerative Clustering. Application for a United States Patent”, MBHB Case No. 99,832. Assignee: Bioreason Inc.
Google Scholar

Download references

Author information

Authors and Affiliations

Bioreason, Inc., 150 Washington Ave., Suite 220, 87501, Santa Fe, NM, USA
Christodoulos A. Nicolaou

Authors

Christodoulos A. Nicolaou
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Informatics, Aristotle University, 54006, Thessaloniki, Greece
Yannis Manolopoulos
Dept. of Computer Science, University of Cyprus, P.O. Box 20537, 1678, Nicosia, Cyprus
Skevos Evripidou & Antonis C. Kakas &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nicolaou, C.A. (2003). Identification of Lead Compounds in Pharmaceutical Data Using Data Mining Techniques. In: Manolopoulos, Y., Evripidou, S., Kakas, A.C. (eds) Advances in Informatics. PCI 2001. Lecture Notes in Computer Science, vol 2563. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-38076-0_9

Download citation

DOI: https://doi.org/10.1007/3-540-38076-0_9
Published: 25 June 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-07544-8
Online ISBN: 978-3-540-38076-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics