Abstract
Structural image content recognition (SICR) aims to transcribe a two-dimensional structural image (e.g., mathematical expression, chemical formula, or music score) into a token sequence. Existing methods are mainly encoder-decoder based and overlook the importance of feature selection and spatial relation extraction in the feature map. In this paper, we propose DEAL (shorted for Dynamic fEAture seLection) for SICR, which contains a dynamic feature selector and a spatial relation extractor as two cornerstone modules. Specifically, we propose a novel loss function and random exploration strategy to dynamically select useful image cells for target sequence generation. Further, we consider the positional and surrounding information of cells in the feature map to extract spatial relations. We conduct extensive experiments to evaluate the performance of DEAL. Experimental results show that DEAL outperforms other state-of-the-arts significantly.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
We interchangeably use character and symbol in this paper.
References
Alfaro-Contreras, M., Calvo-Zaragoza, J., Iñesta, J.M.: Approaching end-to-end optical music recognition for homophonic scores. In: IbPRIA (2019)
Anderson, P., He, X., Buehler, C., Teney, D., Johnson, M., Gould, S., Zhang, L.: Bottom-up and top-down attention for image captioning and visual question answering. In: CVPR (2018)
Blostein, D., Grbavec, A.: Recognition of mathematical notation. In: Handbook of Character Recognition and Document Image Analysis (1997)
Chan, K.F., Yeung, D.Y.: Mathematical expression recognition: a survey. IJDAR 3, 3–15 (2000)
Chan, K.F., Yeung, D.Y.: Error detection, error correction and performance evaluation in on-line mathematical expression recognition. Pattern Recognit. 34(8), 1671–1684 (2001)
Deng, Y., Kanervisto, A., Ling, J., Rush, A.M.: Image-to-markup generation with coarse-to-fine attention. In: ICML (2017)
Fu, Y., Liu, T., Gao, M., Zhou, A.: EDSL: an encoder-decoder architecture with symbol-level features for printed mathematical expression recognition. arXiv (2020)
Garain, U., Chaudhuri, B.B., Chaudhuri, A.R.: Identification of embedded mathematical expressions in scanned documents. In: ICPR (2004)
LaViola, J.J., Zeleznik, R.C.: A practical approach for writer-dependent symbol recognition using a writer-independent symbol recognizer. TPAMI 29(11), 1917–1926 (2007)
Liu, Q., et al.: Finding similar exercises in online education systems. In: SIGKDD (2018)
Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. arXiv (2015)
Masato, M., Satoru, T., Hillary, K.C., Tadahiro, O., Momoyo, I., Minoru, F.: Automatic mood score detection method for music retrieval. IPSJ SIG Notes (2011)
Okamoto, M., Imai, H., Takagi, K.: Performance evaluation of a robust method for mathematical expression recognition. In: ICDAR (2001)
Qin, Y., Du, J., Zhang, Y., Lu, H.: Look back and predict forward in image captioning. In: CVPR (2019)
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv (2019)
Raja, A., Rayner, M., Sexton, A., Sorge, V.: Towards a parser for mathematical formula recognition. In: MKM (2006)
Shafait, F., Keysers, D., Breuel, T.: Performance evaluation and benchmarking of six-page segmentation algorithms. TPAMI 30(6), 941–954 (2008)
Shi, B., Xiang, B., Cong, Y.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. TPAMI 39(11), 2298–2304 (2016)
Suzuki, M., Tamari, F., Fukuda, R., Uchida, S., Kanahori, T.: INFTY: an integrated OCR system for mathematical documents. In: ACM Symposium on Document Engineering. ACM (2003)
Tang, P., Hui, S.C., Fu, C.W.: A progressive structural analysis approach for handwritten chemical formula recognition. In: ICDAR (2013)
Twaakyondo, H.M., Okamoto, M.: Structure analysis and recognition of mathematical expressions. In: ICDAR (1995)
Vo, Q.N., Nguyen, T., Kim, S.H., Yang, H.J., Lee, G.S.: Distorted music score recognition without staffline removal. In: ICPR (2014)
Wang, L., Zhang, D., Gao, L., Song, J., Guo, L., Shen, H.T.: Mathdqn: solving arithmetic word problems via deep reinforcement learning. In: AAAI (2018)
Xie, Z., Huang, Y., Zhu, Y., Jin, L., Liu, Y., Xie, L.: Aggregation cross-entropy for sequence recognition. In: CVPR (2020)
Xu, K., et al.: Show, attend and tell: Neural image caption generation with visual attention. In: ICML (2015)
Yin, Y., et al.: Transcribing content from structural images with spotlight mechanism. In: SIGKDD (2018)
Zanibbi, R., Blostein, D.: Recognition and retrieval of mathematical expressions. IJDAR 15, 331–357 (2012)
Zhang, J., Du, J., Dai, L.: Multi-scale attention with dense encoder for handwritten mathematical expression recognition. In: ICPR (2018)
Zhang, J., et al.: Watch, attend and parse: an end-to-end neural network based approach to handwritten mathematical expression recognition. Pattern Recognit. 71, 196–206 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Fu, Y., Zheng, S., Cai, W., Gao, M., Jin, C., Zhou, A. (2023). Dynamic Feature Selection for Structural Image Content Recognition. In: Dang-Nguyen, DT., et al. MultiMedia Modeling. MMM 2023. Lecture Notes in Computer Science, vol 13834. Springer, Cham. https://doi.org/10.1007/978-3-031-27818-1_28
Download citation
DOI: https://doi.org/10.1007/978-3-031-27818-1_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-27817-4
Online ISBN: 978-3-031-27818-1
eBook Packages: Computer ScienceComputer Science (R0)