Abstract
In the domain of chemistry the information gathering process is highly focused on chemical entities. But due to synonyms and different entity representations the indexing of chemical documents is a challenging process. Considering the field of drug design, the task is even more complex. Domain experts from this field are usually not interested in any chemical entity itself, but in representatives of some chemical class showing a specific reaction behavior. For describing such a reaction behavior of chemical entities the most interesting parts are their functional groups. The restriction of each chemical class is somehow also related to the entities’ reaction behavior, but further based on the chemist’s implicit knowledge. In this paper we present an approach dealing with this implicit knowledge by clustering chemical entities based on their functional groups. However, since such clusters are generally too unspecific, containing chemical entities from different chemical classes, we further divide them into sub-clusters using fingerprint based similarity measures. We analyze several uncorrelated fingerprint/similarity measure combinations and show that the most similar entities with respect to a query entity can be found in the respective sub-cluster. Furthermore, we use our approach for document retrieval introducing a new similarity measure based on Wikipedia categories. Our evaluation shows that the sub-clustering leads to suitable results enabling sophisticated document retrieval in chemical digital libraries.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Tönnies, S., Köhncke, B., Koepler, O., Balke, W.-T.: Exposing the Hidden Web for Chemical Digital Libraries. In: Proc. of the Joint Conf. on Digital Libraries, JCDL (2010)
Haider, N.: Functionality Pattern Matching as an Efficient Complementary Structure/Reaction Search Tool: An Open-Source Approach. Molecules 15(8) (2010)
Feldman, H.J., et al.: CO: A Chemical Ontology for Identification of Functional Groups and Semantic Comparison of Small Molecules. FEBS Letters 579(21) (2005)
Corbett, P., Murray-Rust, P.: High-Throughput Identification of Chemistry in Life Science Texts. In: Berthold, M., Glen, R.C., Fischer, I. (eds.) CompLife 2006. LNCS (LNBI), vol. 4216, pp. 107–118. Springer, Heidelberg (2006)
Townsend, J.A., et al.: Chemical Documents: Machine Understanding and Automated Information Extraction. Journal of Organic & Biomolecular Chemistry 2 (2004)
Morgan, H.L.: The Generation of a Unique Machine Description for Chemical Structures-A Technique Developed at Chemical Abstracts Service. Journal of Chemical Documentation 5(2) (1965)
Gluck, D.J.: A Chemical Structure Storage and Search System Developed at Du Pont. Journal of Chemical Documentation 5(1) (1965)
Smith, E., Baker, P., Wiswesser, W.: The Wiswesser Line-Formula Chemical Notation (WLN). Chemical Information Management (Cherry Hill, N.J.) 102(2) (1975)
Weininger, D.: SMILES, A Chemical Language and Information System. 1. Introduction to Methodology and Encoding Rules. Journal of Chemical Information and Modeling 28(1) (1988)
Stein, S.E., Heller, S.R., Tchekhovskoi, D.: An Open Standard For Chemical Structure Representation: The IUPAC Chemical Identifier. In: Proc. of the International Chemical Information Conference (2003)
Berkhin, P.: A Survey of Clustering Data Mining Techniques. Journal of Grouping Multidimensional Data (2006)
Adamson, G.W., Bawden, D.: Comparison of Hierarchical Cluster Analysis Techniques for Automatic Classification of Chemical Structures. Journal of Chemical Information and Modeling 21(4) (1981)
Wilkens, S.J., Janes, J., Su, A.I.: HierS: Hierarchical Scaffold Clustering Using Topological Chemical Graphs. Journal of Medicinal Chemistry 48(9) (2005)
Downs, G.M., Barnard, J.M.: Clustering Methods and their Uses in Computational Chemistry. Reviews in Computational Chemistry 18 (2002)
Hubálek, Z.: Coefficients of Association and Similarity, Based on Binary (Presence-Absence) Data: An Evaluation. Journal of Biological Reviews 57(4) (1982)
Willett, P., Barnard, J.M., Downs, G.M.: Chemical Similarity Searching. Journal of Chemical Information and Modeling 38(6) (1998)
Holliday, J., Hu, C., Willett, P.: Grouping of Coefficients for the Calculation of Inter-molecular Similarity and Dissimilarity Using 2D Fragment Bit-Strings. Journal of Combinatorial Chemistry; High Throughput Screening 5(2) (2002)
Willett, P.: Similarity-based Approaches to Virtual Screening. Journal of Biochemical Society Transactions 31 (2003)
Tönnies, S., Köhncke, B., Balke, W.-T.: Taking Chemistry to the Task – Personalized Queries for Chemical Digital Libraries. In: Proc. of the Joint Conf. on Digital Libraries, JCDL (2011)
Köhncke, B., Balke, W.-T.: Using Wikipedia Categories for Compact Representations of Chemical Documents. In: Proc. of the Int. Conf. of Information and Knowledge Management, CIKM (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Köhncke, B., Tönnies, S., Balke, WT. (2012). Catching the Drift – Indexing Implicit Knowledge in Chemical Digital Libraries. In: Zaphiris, P., Buchanan, G., Rasmussen, E., Loizides, F. (eds) Theory and Practice of Digital Libraries. TPDL 2012. Lecture Notes in Computer Science, vol 7489. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33290-6_41
Download citation
DOI: https://doi.org/10.1007/978-3-642-33290-6_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33289-0
Online ISBN: 978-3-642-33290-6
eBook Packages: Computer ScienceComputer Science (R0)