Skip to main content

Chart-Type Classification Using Convolutional Neural Network for Scholarly Figures

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12047))

Abstract

Text-to-speech conversion by smart speakers is expected to assist visually handicapped people who are near total blindness to read documents. This research supposes a situation where such a text-to-speech conversion is applied to scholarly documents. Usually, a page in scholarly documents consists of multiple regions, i.e. ordinary text, mathematical expressions, tables, and figures. In this paper, we propose a method which classifys chart-type of scholarly figures using a convolutional neural network. The method classifies an input figure image into line charts or others. We evaluated the accuracy of the method using scholarly figures dataset collected from actual academic papers. The classification accuracy of the proposed method achieved 97%. We also compared the performance of the proposed method with that of hand-crafted features and support vector machine. The results suggest that the proposed CNN classification outperforms the conventional approach.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Nagy, G., Seth, S., Viswanathan, M.: A prototype document image analysis system for technical journals. Computer 25(7), 10–22 (1992)

    Article  Google Scholar 

  2. Namboodiri, A.M., Jain, A.K.: Document structure and layout analysis. In: Chaudhuri, B.B. (ed.) Digital Document Processing. Advances in Pattern Recognition, pp. 29–48. Springer, London (2007). https://doi.org/10.1007/978-1-84628-726-8_2

    Chapter  Google Scholar 

  3. Vinyals, O., et al.: Show and tell: a neural image caption generator. In: CVPR, pp. 3156–3164, June 2015

    Google Scholar 

  4. You, Q., Jin, H., Wang, Z., Fang, C., Luo, J.: Image captioning with semantic attention. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4651–4659 (2016)

    Google Scholar 

  5. Kieninger, T.G.: Table structure recognition based on robust block segmentation. In: Document Recognition V, vol. 3305. International Society for Optics and Photonics (1998)

    Google Scholar 

  6. Jie, Z., et al.: Tree-structured reinforcement learning for sequential object localization. arXiv preprint arXiv: 1703.02710v1 (2017)

  7. Kendall A., Gal, Y.: What uncertaintie do we need in bayesian deep learning for computer vision? arXiv preprint arXiv: 1703.04977v2 (2017)

  8. Meier, R., et al.: Perturb-and-MPM: quantifying segmentation uncertainty in dense multi-label CRFs. arXiv preprint arXiv:1703.00312 (2017)

  9. Siam, M., Singh, A., Perez, C., Jagersand, M.: 4-DoF tracking for robot fine manipulation tasks. arXiv preprint arXiv:1703.01698v2 (2017)

  10. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (2012)

    Google Scholar 

  11. Kulkarni, G., et al.: BabyTalk: understanding and generating simple image descriptions. IEEE Trans. Pattern Anal. Mach. Intell. 35(12), 2891–2903 (2013)

    Article  Google Scholar 

  12. Elliott, D., Keller, F.: Image description using visual dependency representations. In: EMNLP, pp. 1292–1302 (2013)

    Google Scholar 

  13. Kiros, R., Salakhutdinov, R., Zemel, R.S.: Unifying visual-semantic embeddings with multimodal neural language models. arXiv preprint arXiv:1411.2539 (2014)

  14. Huang, W., Tan, C.L.: A system for understanding imaged infographics and its applications. In: Proceedings of the 2007 ACM symposium on Document engineering. ACM (2007)

    Google Scholar 

  15. Tang, B., et al.: DeepChart: combining deep convolutional networks and deep belief networks in chart classification. Signal Process. 124, 156–161 (2016)

    Article  Google Scholar 

  16. Savva, M., et al.: Revision: automated classification, analysis and redesign of chart images. In: Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology. ACM (2011)

    Google Scholar 

  17. Shi, M., Fujisawa, Y., Wakabayashi, T., Kimura, F.: Handwritten numeral recognition using gradient and curvature of gray scale image. Pattern Recogn. 35(10), 2051–2059 (2002)

    Article  Google Scholar 

  18. arXiv https://arxiv.org/. Accessed 16 Jan 2019

  19. Elouard, C., et al.: Extracting work from quantum measurement in Maxwell demon engines. arXiv preprint arXiv:1702.01970v1 (2017)

  20. Kemker, R., Salvaggio, C., Kanan, C.: Algorithms for semantic segmentation of multispectral remote sensing imagery using deep learning. arXiv preprint arXiv: 1703.06452v3 (2017)

  21. Yang, S.: Propensity score weighting for causal inference with clustered data. arXiv preprint arXiv: 1703.06086v4 (2017)

  22. Chen, L., Tang, W., John, N.W., Wan, T.R., Zhang, J.J.: Augmented reality for depth cues in monocular minimally invasive surgery. arXiv preprint arXiv: 1703.01243v1 (2017)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tetsushi Wakabayashi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ishihara, T., Morita, K., Shirai, N.C., Wakabayashi, T., Ohyama, W. (2020). Chart-Type Classification Using Convolutional Neural Network for Scholarly Figures. In: Palaiahnakote, S., Sanniti di Baja, G., Wang, L., Yan, W. (eds) Pattern Recognition. ACPR 2019. Lecture Notes in Computer Science(), vol 12047. Springer, Cham. https://doi.org/10.1007/978-3-030-41299-9_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-41299-9_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-41298-2

  • Online ISBN: 978-3-030-41299-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics