Abstract
OSCAR3 is an open extensible system for the automated annotation of chemistry in scientific articles, which can process thousands of articles per hour. This XML annotation supports applications such as interactive browsing and chemically-aware searching, and has been designed for integration with larger text-analysis systems. We report its application to the high-throughput analysis of the small-molecule chemistry content of texts in life sciences, such as PubMed abstracts.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
de Matos, P., Ennis, M., Guedj, M., Degtyarenko, K., Apweiler, R.: ChEBI – Chemical Entities of Biological Interest. Nucleic Acids Res., Database Summary Paper 646
http://www.cl.cam.ac.uk/users/av308/Project_Index/index.html
Vasserman, A.: Identifying Chemical Names in Biomedical Text: An Investigation of the Substring Co-occurrence Based Approaches. In: Proceedings of the Student Research Workshop at HLT-NAACL (2004)
Wilbur, J.W., Hazard, G.F., Divita, G., Mork, J.G., Aronson, A.R., Browne, A.C.: Analysis of Biomedical Text for Chemical Names: A Comparison of Three Methods. In: Proc. AMIA Symp. 1999, pp. 176–180 (1999)
Chowdhury, G.G., Lynch, M.F.: Semantic Interpretation of the Texts of Chemical Patent Abstracts. 1. Lexical Analysis and Categorization. Journal of Chemical Informatics and Computer Science 32, 463–467 (1992)
Chowdhury, G.G., Lynch, M.F.: Semantic Interpretation of the Texts of Chemical Patent Abstracts. 2. Processing and Results. Journal of Chemical Informatics and Computer Science 32, 468–473 (1992)
Al, C.S., Blower Jr., P.E., Ledwith, R.H.: Extraction of Chemical Reaction Information from Primary Journal Text. Journal of Chemical Informatics and Computer Science 30, 163–169 (1990)
Zamora, E.M., Blower Jr., P.E.: Extraction of Chemical Reaction Information from Primary Journal Text Using Computational Linguistics Techniques. 1. Lexical and Syntactic Phases. Journal of Chemical Informatics and Computer Science 24, 176–181 (1984)
Zamora, E.M., Blower Jr., P.E.: Extraction of Chemical Reaction Information from Primary Journal Text Using Computational Linguistics Techniques. 2. Semantic Phase. Journal of Chemical Informatics and Computer Science 24, 181–188 (1984)
Postma, G.J., van der Linden, B., Smits, J.R., Kateman, G.: TICA: A System for the Extraction of Data from Analytical Chemical Text. Chemometrics and Intellegent Laboratory Systems 9, 65–74 (1990)
Cooper, J.W., Boyer, S., Nevidomsky, A., Coden, A.R.: Automatic discovery and annotation of organic chemical names in patents. In: 229th ACS National Meeting (2005)
Copestake, A., Corbett, P.T., Murray-Rust, P., Rupp, C.J., Siddharthan, A., Teufel, S., Waldron, B.: An Architecture for Language Technology for Processing Scientific Texts. UK e-Science All Hands Meeting (submitted, 2006)
Ludwig, M.-G., Vanek, M., Guerini, D., Gasser, J.A., Jones, C.E., Junker, U., Hofstetter, H., Wolf, R.M., Seuwen, K.: Proton-sensing G-protein-coupled receptors. Nature 425, 93–98 (2003)
Murray-Rust, P., Mitchell, J.B.O., Rzepa, H.S.: Communication and re-use of chemical information in bioscience. BMC Bioinformatics 6, 180 (2005)
Murray-Rust, P., Mitchell, J.B.O., Rzepa, H.S.: Chemistry in Bioinformatics. BMC Bioinformatics 6, 141 (2005)
Townsend, J., Copestake, A., Murray-Rust, P., Teufel, S., Waudby, C.: Language Technology for Processing Chemistry Publications. In: Proceedings of the fourth UK e-Science All Hands Meeting (2005)
Chen, S.F., Goodman, J.: An empirical study of smoothing techniques for language modeling. Computer Speech and Language 13, 359–394 (1999)
Townsend, J.A., Adams, S.E., Waudby, C.A., de Souza, V.K., Goodman, J.M., Murray-Rust, P.: Chemical documents: machine understanding and automated information extraction. Organic & Biomolecular Chemistry 2, 3294 (2004)
A Guide to IUPAC Nomenclature of Organic Chemistry, Recommendations 1993 (including Revisions, Published and hitherto Unpublished, to the 1979 Edition of Nomenclature of Organic Chemistry), IUPAC (1993)
Van der Stouw, G.G., Naznitsky, I., Rush, J.E.: Procedures for Converting Systematic Names of Organic Compounds into Atom-Bond Connection Tables. Journal of Chemical Documentation 7, 165–169 (1967)
Van der Stouw, G.G., Elliott, P.M., Isenbert, A.C.: Automated Conversion of Chemical Substance Names into Atom-Bond Connection Tables. Journal of Chemical Documentation 14, 185–193 (1974)
Cooke-Fox, D.I., Kirby, G.H., Rayner, J.D.: Computer Translation of IUPAC Systematic Organic Chemical Nomenclature. 1. Introduction and Background to a Grammar-Based Approach. J. Chem. Inf. Comp. Sci. 29, 101 (1989)
Brecher, J.: Name=Struct: A Practical Approach to the Sorry State of Real-Life Chemical Nomenclature. J. Chem. Inf. Comp. Sci. 39, 943 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Corbett, P., Murray-Rust, P. (2006). High-Throughput Identification of Chemistry in Life Science Texts. In: R. Berthold, M., Glen, R.C., Fischer, I. (eds) Computational Life Sciences II. CompLife 2006. Lecture Notes in Computer Science(), vol 4216. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11875741_11
Download citation
DOI: https://doi.org/10.1007/11875741_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45767-1
Online ISBN: 978-3-540-45768-8
eBook Packages: Computer ScienceComputer Science (R0)