skip to main content
10.1145/1815330.1815389acmotherconferencesArticle/Chapter ViewAbstractPublication PagesdasConference Proceedingsconference-collections
research-article

Ground-truthed dataset of chemical structure images in Japanese published patent applications

Published: 09 June 2010 Publication History

Abstract

This paper presents a ground-truthed dataset of chemical structure images in Japanese published patent applications. The ground-truthed dataset of 5576 chemical images was made in the following manner. First, the graphical structure of chemical images was recognized by a chemical structure recognition software program developed by our group. Then, with a newly developed correction GUI, the graphical structure dataset was corrected manually. Second, the graphical structure dataset was converted into another dataset representing the chemical structure. The chemical structure dataset was also corrected manually by another newly developed correction GUI. The result of statistical analysis of chemical structure images contained in the Japanese published patent applications is also presented in this paper. Although we concentrated on Japanese published patent applications, this dataset includes a wide range of chemical structure images.

References

[1]
Japan Institute of Invention and Innovation. http://www.jiii.or.jp/english/e.htm.
[2]
Ruby. http://www.ruby-lang.org/en/.
[3]
C. J. Hilditch. Linear skeltons from square cupboards. Machine Inteligence 6, B. Meltzer and D. Michie eds., pages 403--420, 1969.
[4]
I. V. Filippov and M. C. Nicklaus. Extracting chemical structure information: Optical structure recognition application. In Pre-Proceedings of the 8th IAPR International Workshop on Graphics Recognition (GREC 2009), pages 133--142, 2009.
[5]
J. Park, G. R. Rosania, K. A. Shedden, M. Nguyen, N. Lyu, and K. Saitou. Automated extraction of chemical structure information from digital raster images. Chemistry Central Journal, 3(4), 2009.
[6]
P. V. C. Hough. Method and means for recognizing complex patterns, 1962. U.S. Patent No. 3069654.
[7]
N. Sadawi. Recognising chemical formulas from molecule depictions. In Pre-Proceedings of the 8th IAPR International Workshop on Graphics Recognition (GREC 2009), pages 167--175, 2009.
[8]
A. T. Valko and A. P. Johnson. CLiDE Pro: The Latest Generation of CLiDE, a Tool for Optical Chemical Structure Recognition. J. Chem. Inf. Model., 49(4):780--787, 2009.

Cited By

View all
  • (2022)CEDeProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602236(27114-27126)Online publication date: 28-Nov-2022
  • (2017)DocCreator: A New Software for Creating Synthetic Ground-Truthed Document ImagesJournal of Imaging10.3390/jimaging30400623:4(62)Online publication date: 11-Dec-2017
  • (2015)CVC-FP and SGTInternational Journal on Document Analysis and Recognition10.1007/s10032-014-0236-518:1(15-30)Online publication date: 1-Mar-2015
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
DAS '10: Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
June 2010
490 pages
ISBN:9781605587738
DOI:10.1145/1815330
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 June 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Japanese published patent applications
  2. ground-truthed dataset
  3. optical chemical structure recognition

Qualifiers

  • Research-article

Conference

DAS '10

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2022)CEDeProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602236(27114-27126)Online publication date: 28-Nov-2022
  • (2017)DocCreator: A New Software for Creating Synthetic Ground-Truthed Document ImagesJournal of Imaging10.3390/jimaging30400623:4(62)Online publication date: 11-Dec-2017
  • (2015)CVC-FP and SGTInternational Journal on Document Analysis and Recognition10.1007/s10032-014-0236-518:1(15-30)Online publication date: 1-Mar-2015
  • (2014)Markov Logic Networks for Optical Chemical Structure RecognitionJournal of Chemical Information and Modeling10.1021/ci500219754:8(2380-2390)Online publication date: 6-Aug-2014
  • (2012)Displaying chemical structural formulae in ePub formatProceedings of the 2012 ACM symposium on Document engineering10.1145/2361354.2361382(125-128)Online publication date: 4-Sep-2012

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media