Skip to main content

Generating Ground Truthed Dataset of Chart Images: Automatic or Semi-automatic?

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 5046))

Abstract

Ground truthing tools mainly fall into two categories: automatic and semi-automatic. In this paper, we first discuss the pros and cons of the two approaches. We then report our own work on designing and implementing systems for generating a chart image dataset and multi-level ground truth data. Both semi-automatic and automatic approaches were adopted, resulting in two independent systems. The dataset as well as the ground truth data are publicly available so that other researchers can access them for evaluating and comparing performances of different systems.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Nagy, G.: Twenty years of Document Image Analysis in PAMI. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(1), 38–62 (2000)

    Article  Google Scholar 

  2. Yang, L., Huang, W.H., Tan, C.L.: Semi-automatic ground truth generation for chart image recognition. In: Bunke, H., Spitz, A.L. (eds.) DAS 2006. LNCS, vol. 3872, pp. 324–335. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  3. Haralick, R.M., et al.: UW English document image database I: A database of document images for OCR research. UW CD-ROM

    Google Scholar 

  4. Haralick, R. M. et al: UW-II English/Japanese Document Image Database: A Database of Document Images for OCR Research, http://www.science.uva.nl/research/dlia/datasets/uwash2.html

  5. Phillips, I.: Users’ reference manual. CD-ROM, UW-III Document Image Database-III (1995)

    Google Scholar 

  6. Wang, Y., Haralick, R.M., Phillips, I.T.: Automatic Table Ground Truth Generation and a Background-Analysis-Based Table Structure Extraction Method. In: 6th Int. Conf. on Document Analysis and Recognition, ICDAR 2001, Seattle, pp. 528–532 (2001)

    Google Scholar 

  7. Zi, G., Doermann, D.: Document Image Ground Truth Generation from Electronic Text. In: 17th Int. Conf. on Pattern Recognition, ICPR 2004, vol. 2, pp. 663–666 (2004)

    Google Scholar 

  8. Yacoub, S., Saxena, V., Sami, S.: PerfectDoc: A Ground Truthing Environment for Complex Documents. In: 8th Int. Conf. on Document Analysis and Recognition, vol. 1, pp. 452–456 (2005)

    Google Scholar 

  9. Suzuki, M., Suzuki, S., Nomura, A.: A Ground-Truthed Mathematical Character and Symbol Image Database. In: 8th Int. Conf. on Document Analysis and Recognition, vol. 2, pp. 675–679 (2005)

    Google Scholar 

  10. Baird, H.S.: Document Image Defect Models. In: Proceedings of IAPR Workshop on Syntactic and Structural Pattern Recognition, Murray Hill, NJ; Reprinted in: Baird, H.S., Bunke, H., Yamamoto, K.: Structured Document Image Analysis, pp. 546–556. Springer, New York (1990)

    Google Scholar 

  11. Zhai, J., Liu, W.Y., Dori, D., Li, Q.: A Line Drawings Degradation Model for Performance Characterization. In: 7th International Conference on Document Analysis and Recognition, Edinburgh, Scotland (2003)

    Google Scholar 

  12. Gonzalez, R.C., Wintz, P.: Digital Image Processing, 2nd edn. Addison-Wesley Publishing Company, Reading (1987)

    Google Scholar 

  13. William, H.P., Saul, A.T., William, T.V., Brian, P.F.: Numerical recipes in C++: The Art of Scientific Computing. Cambridge University Press, New York (2002)

    Google Scholar 

  14. Ross, S.M.: A Course in Simulation. Macmillan Publishing Company, New York (1990)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Wenyin Liu Josep Lladós Jean-Marc Ogier

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Huang, W., Tan, C.L., Zhao, J. (2008). Generating Ground Truthed Dataset of Chart Images: Automatic or Semi-automatic?. In: Liu, W., Lladós, J., Ogier, JM. (eds) Graphics Recognition. Recent Advances and New Opportunities. GREC 2007. Lecture Notes in Computer Science, vol 5046. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88188-9_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-88188-9_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-88184-1

  • Online ISBN: 978-3-540-88188-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics