Chart-Type Classification Using Convolutional Neural Network for Scholarly Figures

Ishihara, Takeo; Morita, Kento; Shirai, Nobu C.; Wakabayashi, Tetsushi; Ohyama, Wataru

doi:10.1007/978-3-030-41299-9_20

Chart-Type Classification Using Convolutional Neural Network for Scholarly Figures

Takeo Ishihara¹²,
Kento Morita¹²,
Nobu C. Shirai¹²,
Tetsushi Wakabayashi¹² &
…
Wataru Ohyama¹³

Conference paper
First Online: 23 February 2020

1238 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12047))

Abstract

Text-to-speech conversion by smart speakers is expected to assist visually handicapped people who are near total blindness to read documents. This research supposes a situation where such a text-to-speech conversion is applied to scholarly documents. Usually, a page in scholarly documents consists of multiple regions, i.e. ordinary text, mathematical expressions, tables, and figures. In this paper, we propose a method which classifys chart-type of scholarly figures using a convolutional neural network. The method classifies an input figure image into line charts or others. We evaluated the accuracy of the method using scholarly figures dataset collected from actual academic papers. The classification accuracy of the proposed method achieved 97%. We also compared the performance of the proposed method with that of hand-crafted features and support vector machine. The results suggest that the proposed CNN classification outperforms the conventional approach.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Nagy, G., Seth, S., Viswanathan, M.: A prototype document image analysis system for technical journals. Computer 25(7), 10–22 (1992)
Article Google Scholar
Namboodiri, A.M., Jain, A.K.: Document structure and layout analysis. In: Chaudhuri, B.B. (ed.) Digital Document Processing. Advances in Pattern Recognition, pp. 29–48. Springer, London (2007). https://doi.org/10.1007/978-1-84628-726-8_2
Chapter Google Scholar
Vinyals, O., et al.: Show and tell: a neural image caption generator. In: CVPR, pp. 3156–3164, June 2015
Google Scholar
You, Q., Jin, H., Wang, Z., Fang, C., Luo, J.: Image captioning with semantic attention. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4651–4659 (2016)
Google Scholar
Kieninger, T.G.: Table structure recognition based on robust block segmentation. In: Document Recognition V, vol. 3305. International Society for Optics and Photonics (1998)
Google Scholar
Jie, Z., et al.: Tree-structured reinforcement learning for sequential object localization. arXiv preprint arXiv: 1703.02710v1 (2017)
Kendall A., Gal, Y.: What uncertaintie do we need in bayesian deep learning for computer vision? arXiv preprint arXiv: 1703.04977v2 (2017)
Meier, R., et al.: Perturb-and-MPM: quantifying segmentation uncertainty in dense multi-label CRFs. arXiv preprint arXiv:1703.00312 (2017)
Siam, M., Singh, A., Perez, C., Jagersand, M.: 4-DoF tracking for robot fine manipulation tasks. arXiv preprint arXiv:1703.01698v2 (2017)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (2012)
Google Scholar
Kulkarni, G., et al.: BabyTalk: understanding and generating simple image descriptions. IEEE Trans. Pattern Anal. Mach. Intell. 35(12), 2891–2903 (2013)
Article Google Scholar
Elliott, D., Keller, F.: Image description using visual dependency representations. In: EMNLP, pp. 1292–1302 (2013)
Google Scholar
Kiros, R., Salakhutdinov, R., Zemel, R.S.: Unifying visual-semantic embeddings with multimodal neural language models. arXiv preprint arXiv:1411.2539 (2014)
Huang, W., Tan, C.L.: A system for understanding imaged infographics and its applications. In: Proceedings of the 2007 ACM symposium on Document engineering. ACM (2007)
Google Scholar
Tang, B., et al.: DeepChart: combining deep convolutional networks and deep belief networks in chart classification. Signal Process. 124, 156–161 (2016)
Article Google Scholar
Savva, M., et al.: Revision: automated classification, analysis and redesign of chart images. In: Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology. ACM (2011)
Google Scholar
Shi, M., Fujisawa, Y., Wakabayashi, T., Kimura, F.: Handwritten numeral recognition using gradient and curvature of gray scale image. Pattern Recogn. 35(10), 2051–2059 (2002)
Article Google Scholar
arXiv https://arxiv.org/. Accessed 16 Jan 2019
Elouard, C., et al.: Extracting work from quantum measurement in Maxwell demon engines. arXiv preprint arXiv:1702.01970v1 (2017)
Kemker, R., Salvaggio, C., Kanan, C.: Algorithms for semantic segmentation of multispectral remote sensing imagery using deep learning. arXiv preprint arXiv: 1703.06452v3 (2017)
Yang, S.: Propensity score weighting for causal inference with clustered data. arXiv preprint arXiv: 1703.06086v4 (2017)
Chen, L., Tang, W., John, N.W., Wan, T.R., Zhang, J.J.: Augmented reality for depth cues in monocular minimally invasive surgery. arXiv preprint arXiv: 1703.01243v1 (2017)

Download references

Author information

Authors and Affiliations

Mie University, 1577 Kurima-machiya, Tsu, 514, Japan
Takeo Ishihara, Kento Morita, Nobu C. Shirai & Tetsushi Wakabayashi
Saitama Institute of Technology, 1690 Fusaiji, Fukaya, Japan
Wataru Ohyama

Authors

Takeo Ishihara
View author publications
You can also search for this author in PubMed Google Scholar
Kento Morita
View author publications
You can also search for this author in PubMed Google Scholar
Nobu C. Shirai
View author publications
You can also search for this author in PubMed Google Scholar
Tetsushi Wakabayashi
View author publications
You can also search for this author in PubMed Google Scholar
Wataru Ohyama
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tetsushi Wakabayashi .

Editor information

Editors and Affiliations

University of Malaya, Kuala Lumpur, Malaysia
Shivakumara Palaiahnakote
Consiglio Nazionale delle Ricerche, ICAR, Naples, Italy
Gabriella Sanniti di Baja
Chinese Academy of Sciences, Beijing, China
Liang Wang
Auckland University of Technology, Auckland, New Zealand
Wei Qi Yan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ishihara, T., Morita, K., Shirai, N.C., Wakabayashi, T., Ohyama, W. (2020). Chart-Type Classification Using Convolutional Neural Network for Scholarly Figures. In: Palaiahnakote, S., Sanniti di Baja, G., Wang, L., Yan, W. (eds) Pattern Recognition. ACPR 2019. Lecture Notes in Computer Science(), vol 12047. Springer, Cham. https://doi.org/10.1007/978-3-030-41299-9_20

Download citation

DOI: https://doi.org/10.1007/978-3-030-41299-9_20
Published: 23 February 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-41298-2
Online ISBN: 978-3-030-41299-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics