Abstract
Text-to-speech conversion by smart speakers is expected to assist visually handicapped people who are near total blindness to read documents. This research supposes a situation where such a text-to-speech conversion is applied to scholarly documents. Usually, a page in scholarly documents consists of multiple regions, i.e. ordinary text, mathematical expressions, tables, and figures. In this paper, we propose a method which classifys chart-type of scholarly figures using a convolutional neural network. The method classifies an input figure image into line charts or others. We evaluated the accuracy of the method using scholarly figures dataset collected from actual academic papers. The classification accuracy of the proposed method achieved 97%. We also compared the performance of the proposed method with that of hand-crafted features and support vector machine. The results suggest that the proposed CNN classification outperforms the conventional approach.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Nagy, G., Seth, S., Viswanathan, M.: A prototype document image analysis system for technical journals. Computer 25(7), 10–22 (1992)
Namboodiri, A.M., Jain, A.K.: Document structure and layout analysis. In: Chaudhuri, B.B. (ed.) Digital Document Processing. Advances in Pattern Recognition, pp. 29–48. Springer, London (2007). https://doi.org/10.1007/978-1-84628-726-8_2
Vinyals, O., et al.: Show and tell: a neural image caption generator. In: CVPR, pp. 3156–3164, June 2015
You, Q., Jin, H., Wang, Z., Fang, C., Luo, J.: Image captioning with semantic attention. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4651–4659 (2016)
Kieninger, T.G.: Table structure recognition based on robust block segmentation. In: Document Recognition V, vol. 3305. International Society for Optics and Photonics (1998)
Jie, Z., et al.: Tree-structured reinforcement learning for sequential object localization. arXiv preprint arXiv: 1703.02710v1 (2017)
Kendall A., Gal, Y.: What uncertaintie do we need in bayesian deep learning for computer vision? arXiv preprint arXiv: 1703.04977v2 (2017)
Meier, R., et al.: Perturb-and-MPM: quantifying segmentation uncertainty in dense multi-label CRFs. arXiv preprint arXiv:1703.00312 (2017)
Siam, M., Singh, A., Perez, C., Jagersand, M.: 4-DoF tracking for robot fine manipulation tasks. arXiv preprint arXiv:1703.01698v2 (2017)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (2012)
Kulkarni, G., et al.: BabyTalk: understanding and generating simple image descriptions. IEEE Trans. Pattern Anal. Mach. Intell. 35(12), 2891–2903 (2013)
Elliott, D., Keller, F.: Image description using visual dependency representations. In: EMNLP, pp. 1292–1302 (2013)
Kiros, R., Salakhutdinov, R., Zemel, R.S.: Unifying visual-semantic embeddings with multimodal neural language models. arXiv preprint arXiv:1411.2539 (2014)
Huang, W., Tan, C.L.: A system for understanding imaged infographics and its applications. In: Proceedings of the 2007 ACM symposium on Document engineering. ACM (2007)
Tang, B., et al.: DeepChart: combining deep convolutional networks and deep belief networks in chart classification. Signal Process. 124, 156–161 (2016)
Savva, M., et al.: Revision: automated classification, analysis and redesign of chart images. In: Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology. ACM (2011)
Shi, M., Fujisawa, Y., Wakabayashi, T., Kimura, F.: Handwritten numeral recognition using gradient and curvature of gray scale image. Pattern Recogn. 35(10), 2051–2059 (2002)
arXiv https://arxiv.org/. Accessed 16 Jan 2019
Elouard, C., et al.: Extracting work from quantum measurement in Maxwell demon engines. arXiv preprint arXiv:1702.01970v1 (2017)
Kemker, R., Salvaggio, C., Kanan, C.: Algorithms for semantic segmentation of multispectral remote sensing imagery using deep learning. arXiv preprint arXiv: 1703.06452v3 (2017)
Yang, S.: Propensity score weighting for causal inference with clustered data. arXiv preprint arXiv: 1703.06086v4 (2017)
Chen, L., Tang, W., John, N.W., Wan, T.R., Zhang, J.J.: Augmented reality for depth cues in monocular minimally invasive surgery. arXiv preprint arXiv: 1703.01243v1 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Ishihara, T., Morita, K., Shirai, N.C., Wakabayashi, T., Ohyama, W. (2020). Chart-Type Classification Using Convolutional Neural Network for Scholarly Figures. In: Palaiahnakote, S., Sanniti di Baja, G., Wang, L., Yan, W. (eds) Pattern Recognition. ACPR 2019. Lecture Notes in Computer Science(), vol 12047. Springer, Cham. https://doi.org/10.1007/978-3-030-41299-9_20
Download citation
DOI: https://doi.org/10.1007/978-3-030-41299-9_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-41298-2
Online ISBN: 978-3-030-41299-9
eBook Packages: Computer ScienceComputer Science (R0)