Abstract
Mathematical expression recognition is a research field that aims to develop algorithms and systems capable of interpreting mathematical content. The recognition of MEs requires handling two-dimensional symbol relationships such as sub/superscripts, matrices and nested fractions, among others. The prevalent technology for addressing these challenges are based on encoder-decoder architectures with attention models. In this paper we propose the Py4MER system, based on Convolutional Recurrent Neural Network (CRNN) models, for the recognition and transcription of MEs into LaTeX mark-up sequences. This model is proposed as an alternative to encoder-decoder approaches, as CRNN models trained through Connectionist Temporal Classification (CTC) implicitly model the dependencies between symbols and do not suffer from under/over parsing of the input image, generating more consistent mark-up.
The proposed model is evaluated on the Im2Latex-100k data set based on both textual and image-level metrics, showing a remarkable improvement from other CTC-based approaches. Recognition results are analyzed for different ME lengths and ME structures. Furthermore, a study based on the edit distance is performed, showing a considerable improvement in precision when up to 5 edit operations are considered. Finally, we show that CTC-based CRNN models can adapt to non left-to-right ordering of ME elements, warranting more research for this approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ameryan, M., Schomaker, L.: A limited-size ensemble of homogeneous CNN/LSTMs for high-performance word classification. Neural Comput. Appl. 33(14), 8615–8634 (2021). https://doi.org/10.1007/s00521-020-05612-0
Anderson, R.H.: Syntax-directed recognition of hand-printed two-dimensional mathematics, p. 436–459 (1967)
Awal, A., Mouchère, H., Viard-Gaudin, C.: A global learning approach for an online handwritten mathematical expression recognition system. PRL 35, 68–77 (2014)
Bender, S., Haurilet, M., Roitberg, A., Stiefelhagen, R.: Learning fine-grained image representations for mathematical expression recognition. In: ICDARW, vol. 1, pp. 56–61 (2019)
Blostein, D., Grbavec, A.: Recognition of mathematical notation. In: Handbook of Character Recognition and Document Image Analysis, p. 557–582 (1997)
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: EMNLP, pp. 1724–1734 (2014)
Chorowski, J., Bahdanau, D., Cho, K., Bengio, Y.: End-to-end continuous speech recognition using attention-based recurrent NN: first results (2014)
Deng, Y., Kanervisto, A., Ling, J., Rush, A.M.: Image-to-markup generation with coarse-to-fine attention. In: ICML, pp. 980–989 (2017)
Endong, Z., Licheng, L.: Design of online handwritten mathematical expression recognition system based on gated recurrent unit recurrent neural network. In: PRAI, pp. 446–451 (2021)
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006)
Hou, Y., Kong, Q., Li, S.: Audio tagging with connectionist temporal classification model using sequentially labelled data. In: Liang, Q., Liu, X., Na, Z., Wang, W., Mu, J., Zhang, B. (eds.) CSPS 2018. LNEE, vol. 516, pp. 955–964. Springer, Singapore (2020). https://doi.org/10.1007/978-981-13-6504-1_114
Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Deep structured output learning for unconstrained text recognition. ArXiv abs/1412.5903 (2014)
Kang, L., Toledo, J.I., Riba, P., Villegas, M., Fornés, A., Rusiñol, M.: Convolve, attend and spell: an attention-based sequence-to-sequence model for handwritten word recognition. In: Brox, T., Bruhn, A., Fritz, M. (eds.) GCPR 2018. LNCS, vol. 11269, pp. 459–472. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-12939-2_32
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization (2017). https://arxiv.org/abs/1711.05101
Nguyen, C.T., Nguyen, H.T., Morizumi, K., Nakagawa, M.: Temporal classification constraint for improving handwritten mathematical expression recognition. In: ICDAR Workshops, pp. 113–125 (2021)
Noya, E., Benedí, J., Sánchez, J., Anitei, D.: Discriminative learning of two-dimensional probabilistic context-free grammars for mathematical expression recognition and retrieval. In: Pinho, A.J., Georgieva, P., Teixeira, L.F., Sánchez, J.A. (eds.) IbPRIA 2022. LNCS, vol. 13256, pp. 333–347. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-04881-4_27
Noya, E., Sánchez, J., Benedí, J.: Generation of hypergraphs from the N-best parsing of 2D-probabilistic context-free grammars for mathematical expression recognition. In: ICPR, pp. 5696–5703 (2021)
Pang, N., Yang, C., Zhu, X., Li, J., Yin, X.C.: Global Context-Based Network with Transformer for Image2latex. In: ICPR, pp. 4650–4656 (2021)
Papineni, K., Roukos, S., Ward, T., Zhu, W.: BLEU: a method for automatic evaluation of machine translation. In: ACL, pp. 311–318 (2002)
Peng, S., Gao, L., Yuan, K., Tang, Z.: Image to LaTeX with graph neural network for mathematical formula recognition. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) ICDAR 2021. LNCS, vol. 12822, pp. 648–663. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86331-9_42
Puigcerver, J.: Are multidimensional recurrent layers really necessary for handwritten text recognition? In: ICDAR, vol. 01, pp. 67–72 (2017)
Suzuki, M., Tamari, F., Fukuda, R., Uchida, S., Kanahori, T.: Infty - an integrated OCR system for mathematical documents. In: ACM, pp. 95–104 (2003)
Wang, J., Sun, Y., Wang, S.: Image to Latex with Densenet encoder and joint attention. Procedia Comput. Sci. 147, 374–380 (2019)
Wang, Z., Liu, J.C.: Translating math formula images to LaTeX sequences using deep neural networks with sequence-level training. IJDAR 24, 63–75 (2021)
Xu, K., et al.: Show, attend and tell: neural image caption generation with visual attention. In: ICML, pp. 2048–2057 (2015)
Yan, Z., Zhang, X., Gao, L., Yuan, K., Tang, Z.: ConvMath: a convolutional sequence network for mathematical expression recognition. In: ICPR, pp. 4566–4572 (2021)
Yuan, Y., et al.: Syntax-aware network for handwritten mathematical expression recognition. In: CVPR, pp. 4543–4552 (2022)
Zanibbi, R., Blostein, D., Cordy, J.: Recognizing mathematical expressions using tree transformation. PAMI 24(11), 1–13 (2002)
Zhang, J., Du, J., Dai, L.: Multi-scale attention with dense encoder for handwritten mathematical expression recognition. In: ICPR, pp. 2245–2250 (2018)
Zhang, J., et al.: Watch, attend and parse: an end-to-end neural network based approach to handwritten mathematical expression recognition. Pattern Recogn. 71, 196–206 (2017)
Zhang, T., Mouchère, H., Viard-Gaudin, C.: Using BLSTM for interpretation of 2-D languages: case of handwritten mathematical expressions. In: CORIA, pp. 217–232 (2016)
Zhang, W., Bai, Z., Zhu, Y.: An improved approach based on CNN-RNNs for mathematical expression recognition. In: ICMSSP, pp. 57–61 (2019)
Zhelezniakov, D., Zaytsev, V., Radyvonenko, O.: Online handwritten mathematical expression recognition and applications: a survey. IEEE Access 9, 38352–38373 (2021)
Zhu, Z., Dai, W., Hu, Y., Wang, J., Li, J.: Speech emotion recognition model based on CRNN-CTC. In: Abawajy, J.H., Choo, K.-K.R., Xu, Z., Atiquzzaman, M. (eds.) ATCI 2020. AISC, vol. 1244, pp. 771–778. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-53980-1_113
Álvaro, F., Sánchez, J., Benedí, J.: An integrated grammar-based approach for mathematical expression recognition. Pattern Recogn. 51, 135–147 (2016)
Álvaro, F., Sánchez, J.A., Benedí, J.M.: Recognition of on-line handwritten mathematical expressions using 2D stochastic context-free grammars and hidden Markov models. PRL 35, 58–67 (2014)
Acknowledgment
This work has been partially supported by MCIN/AEI/10.13039/501100011033 under the grant PID2020-116813RB-I00 (SimancasSearch); the Generalitat Valenciana under the FPI grant CIACIF/2021/313; and by the support of valgrAI - Valencian Graduate School and Research Network of Artificial Intelligence and the Generalitat Valenciana, and co-funded by the European Union.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Anitei, D., Sánchez, J.A., Benedí, J.M. (2023). Py4MER: A CTC-Based Mathematical Expression Recognition System. In: Pertusa, A., Gallego, A.J., Sánchez, J.A., Domingues, I. (eds) Pattern Recognition and Image Analysis. IbPRIA 2023. Lecture Notes in Computer Science, vol 14062. Springer, Cham. https://doi.org/10.1007/978-3-031-36616-1_34
Download citation
DOI: https://doi.org/10.1007/978-3-031-36616-1_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-36615-4
Online ISBN: 978-3-031-36616-1
eBook Packages: Computer ScienceComputer Science (R0)