skip to main content
10.1145/1816123.1816159acmconferencesArticle/Chapter ViewAbstractPublication PagesjcdlConference Proceedingsconference-collections
research-article

Exposing the hidden web for chemical digital libraries

Published: 21 June 2010 Publication History

Abstract

In recent years, the vast amount of digitally available content has lead to the creation of many topic-centered digital libraries. Also in the domain of chemistry more and more digital collections are available, but the complex query formulation still hampers their intuitive adoption. This is because information seeking in chemical documents is focused on chemical entities, for which current standard search relies on complex structures which are hard to extract from documents. Moreover, although simple keyword searches would often be sufficient, current collections simply cannot be indexed by Web search providers due to the ambiguity of chemical substance names. In this paper we present a framework for automatically generating metadata-enriched index pages for all documents in a given chemical collection. All information is then linked to the respective documents and thus provides an easy to crawl metadata repository promising to open up digital chemical libraries. Our experiments, indexing an open access journal, show that not only the documents can be found using a simple Google search via the automatically created index pages, but also that the quality of the search is much more efficient than fulltext indexing in terms of both precision/recall and performance. Finally, we compare our indexing against a classical structure search and figured out that keyword-based search can indeed solve at least some of the daily tasks in chemical workflows. To use our framework thus promises to expose a large part of the currently still hidden chemical Web, making the techniques employed interesting for chemical information providers like digital libraries and open access journals.

References

[1]
The Wiswesser Line-Formula Chemical Notation (WLN). Chemical Information Management, Cherry Hill, N. J., 1976.
[2]
Ash, S., Cline, M., Homer, R., Hurst, T., and Smith, G. SYBYL Line Notation (SLN): A Versatile Language for Chemical Structure Representation. Journal of Chemical Information and Modeling 37, 1 (1997), 71--79.
[3]
Barnard, J., Jochum, C., and Welford, S. ROSDAL: A universal structure/substructure representation for PC--host communication. Chemical Structure Information Systems: Interfaces, Communication and Standards, ACS Symposium Series No. 400, American Chemical Society (1989), 76--81.
[4]
Barnard, J. Structure Representation and Searching. Ellis Horwood, Chichester, UK, 1991.
[5]
Corbett, P. and Murray-Rust, P. High-throughput identification of chemistry in life science texts. Computational Life Sciences II, Springer Berlin Heidelberg (2006), 107--118.
[6]
Filippov, I.V. and Nicklaus, M.C. Optical Structure Recognition Software To Recover Chemical Information: OSRA, An Open Source Solution. Journal of chemical information and modeling 49, 3 (2009), 740--3.
[7]
Gluck, D.J. A Chemical Structure Storage and Search System Developed at Du Pont. Journal of Chemical Documentation 5, 1 (1965), 43--51.
[8]
Hoffmann, R. and Laszlo, P. Representation in Chemistry. Angewandte Chemie International Edition in English 30, 1 (1991), 1--16.
[9]
Klekota, J., Roth, F.P., and Schreiber, S.L. Query Chem: a Google-powered web search combining text and chemical structures. Bioinformatics (Oxford, England) 22, 13 (2006), 1670--3.
[10]
Liakata, M., Q, C., and Soldatova, L.N. Semantic annotation of papers: interface & enrichment tool (SAPIENT). Human Language Technology Conference, (2009).
[11]
Lynch, M. and Holliday, J. The Sheffield Generic Structures Project-a Retrospective Review. Journal of Chemical Information and Modeling 36, 5 (1996), 930--936.
[12]
McDaniel, J.R. and Balmuth, J.R. Kekule: OCR-optical chemical (structure) recognition. Journal of Chemical Information and Modeling 32, 4 (1992), 373--378.
[13]
Morgan, H.L. The Generation of a Unique Machine Description for Chemical Structures-A Technique Developed at Chemical Abstracts Service. Journal of Chemical Documentation 5, 2 (1965), 107--113.
[14]
Stein, S.E., Heller, S.R., and Tchekhovskoi, D. An Open Standard For Chemical Structure Representation: The IUPAC Chemical Identifier. Proceedings Of The 2003 International Chemical Information Conference, Infonortics (2003), 131--143.
[15]
Sun, B., Mitra, P., and Giles, C.L. Mining, indexing, and searching for textual chemical molecule information on the web. WWW, ACM (2008), 735--744.
[16]
Sun, B., Tan, Q., Mitra, P., and Giles, C.L. Extraction and search of chemical formulae in text documents on the web. WWW, ACM (2007), 251--260.
[17]
Teufel, S., Carletta, J., and Moens, M. An annotation scheme for discourse-level argumentation in research articles. European Chapter Meeting of the ACL, (1999).
[18]
Townsend, J.A., Adams, S.E., Waudby, C.A., de Souza, V.K., Goodman, J.M., and Murray-Rust, P. Chemical documents: machine understanding and automated information extraction. Organic & Biomolecular Chemistry 2, 22 (2004), 3294--3300.
[19]
Valko, A. and Johnson, P. CLiDE Pro: A chemical OCR tool. Proceedings of the 8th International Conference on Chemical Structures (ICCS), (2008).
[20]
Valko, A.T. and Johnson, a.P. CLiDE Pro: the latest generation of CLiDE, a tool for optical chemical structure recognition. Journal of chemical information and modeling 49, 4 (2009)
[21]
Weininger, D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. Journal of Chemical Information and Modeling 28, 1 (1988)
[22]
Zimmermann, M., Bui Thi, L., and Hofmann, M. Combating Illiteracy in Chemistry: Towards Computer-Based Chemical Structure Reconstruction. ERCIM News, 60 (2005), 40--41.

Cited By

View all
  • (2015)Demystifying the Semantics of Relevant Objects in Scholarly CollectionsProceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries10.1145/2756406.2756923(157-164)Online publication date: 21-Jun-2015
  • (2013)Context-Sensitive Ranking Using Cross-Domain Knowledge for Chemical Digital LibrariesResearch and Advanced Technology for Digital Libraries10.1007/978-3-642-40501-3_29(285-296)Online publication date: 2013
  • (2012)Catching the drift --- indexing implicit knowledge in chemical digital librariesProceedings of the Second international conference on Theory and Practice of Digital Libraries10.1007/978-3-642-33290-6_41(383-395)Online publication date: 23-Sep-2012
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
JCDL '10: Proceedings of the 10th annual joint conference on Digital libraries
June 2010
424 pages
ISBN:9781450300858
DOI:10.1145/1816123
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 June 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. chemical digital collections
  2. digital libraries
  3. hidden web
  4. information extraction
  5. information retrieval
  6. web search

Qualifiers

  • Research-article

Conference

JCDL10
Sponsor:
JCDL10: Joint Conference on Digital Libraries
June 21 - 25, 2010
Queensland, Gold Coast, Australia

Acceptance Rates

Overall Acceptance Rate 415 of 1,482 submissions, 28%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 25 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2015)Demystifying the Semantics of Relevant Objects in Scholarly CollectionsProceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries10.1145/2756406.2756923(157-164)Online publication date: 21-Jun-2015
  • (2013)Context-Sensitive Ranking Using Cross-Domain Knowledge for Chemical Digital LibrariesResearch and Advanced Technology for Digital Libraries10.1007/978-3-642-40501-3_29(285-296)Online publication date: 2013
  • (2012)Catching the drift --- indexing implicit knowledge in chemical digital librariesProceedings of the Second international conference on Theory and Practice of Digital Libraries10.1007/978-3-642-33290-6_41(383-395)Online publication date: 23-Sep-2012
  • (2011)Taking chemistry to the taskProceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries10.1145/1998076.1998137(325-334)Online publication date: 13-Jun-2011
  • (2010)Using Wikipedia categories for compact representations of chemical documentsProceedings of the 19th ACM international conference on Information and knowledge management10.1145/1871437.1871735(1809-1812)Online publication date: 26-Oct-2010

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media