Skip to main content

Py4MER: A CTC-Based Mathematical Expression Recognition System

  • Conference paper
  • First Online:
Pattern Recognition and Image Analysis (IbPRIA 2023)

Abstract

Mathematical expression recognition is a research field that aims to develop algorithms and systems capable of interpreting mathematical content. The recognition of MEs requires handling two-dimensional symbol relationships such as sub/superscripts, matrices and nested fractions, among others. The prevalent technology for addressing these challenges are based on encoder-decoder architectures with attention models. In this paper we propose the Py4MER system, based on Convolutional Recurrent Neural Network (CRNN) models, for the recognition and transcription of MEs into LaTeX mark-up sequences. This model is proposed as an alternative to encoder-decoder approaches, as CRNN models trained through Connectionist Temporal Classification (CTC) implicitly model the dependencies between symbols and do not suffer from under/over parsing of the input image, generating more consistent mark-up.

The proposed model is evaluated on the Im2Latex-100k data set based on both textual and image-level metrics, showing a remarkable improvement from other CTC-based approaches. Recognition results are analyzed for different ME lengths and ME structures. Furthermore, a study based on the edit distance is performed, showing a considerable improvement in precision when up to 5 edit operations are considered. Finally, we show that CTC-based CRNN models can adapt to non left-to-right ordering of ME elements, warranting more research for this approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/jpuigcerver/PyLaia.

References

  1. Ameryan, M., Schomaker, L.: A limited-size ensemble of homogeneous CNN/LSTMs for high-performance word classification. Neural Comput. Appl. 33(14), 8615–8634 (2021). https://doi.org/10.1007/s00521-020-05612-0

    Article  Google Scholar 

  2. Anderson, R.H.: Syntax-directed recognition of hand-printed two-dimensional mathematics, p. 436–459 (1967)

    Google Scholar 

  3. Awal, A., Mouchère, H., Viard-Gaudin, C.: A global learning approach for an online handwritten mathematical expression recognition system. PRL 35, 68–77 (2014)

    Article  Google Scholar 

  4. Bender, S., Haurilet, M., Roitberg, A., Stiefelhagen, R.: Learning fine-grained image representations for mathematical expression recognition. In: ICDARW, vol. 1, pp. 56–61 (2019)

    Google Scholar 

  5. Blostein, D., Grbavec, A.: Recognition of mathematical notation. In: Handbook of Character Recognition and Document Image Analysis, p. 557–582 (1997)

    Google Scholar 

  6. Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: EMNLP, pp. 1724–1734 (2014)

    Google Scholar 

  7. Chorowski, J., Bahdanau, D., Cho, K., Bengio, Y.: End-to-end continuous speech recognition using attention-based recurrent NN: first results (2014)

    Google Scholar 

  8. Deng, Y., Kanervisto, A., Ling, J., Rush, A.M.: Image-to-markup generation with coarse-to-fine attention. In: ICML, pp. 980–989 (2017)

    Google Scholar 

  9. Endong, Z., Licheng, L.: Design of online handwritten mathematical expression recognition system based on gated recurrent unit recurrent neural network. In: PRAI, pp. 446–451 (2021)

    Google Scholar 

  10. Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006)

    Google Scholar 

  11. Hou, Y., Kong, Q., Li, S.: Audio tagging with connectionist temporal classification model using sequentially labelled data. In: Liang, Q., Liu, X., Na, Z., Wang, W., Mu, J., Zhang, B. (eds.) CSPS 2018. LNEE, vol. 516, pp. 955–964. Springer, Singapore (2020). https://doi.org/10.1007/978-981-13-6504-1_114

    Chapter  Google Scholar 

  12. Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Deep structured output learning for unconstrained text recognition. ArXiv abs/1412.5903 (2014)

    Google Scholar 

  13. Kang, L., Toledo, J.I., Riba, P., Villegas, M., Fornés, A., Rusiñol, M.: Convolve, attend and spell: an attention-based sequence-to-sequence model for handwritten word recognition. In: Brox, T., Bruhn, A., Fritz, M. (eds.) GCPR 2018. LNCS, vol. 11269, pp. 459–472. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-12939-2_32

    Chapter  Google Scholar 

  14. Loshchilov, I., Hutter, F.: Decoupled weight decay regularization (2017). https://arxiv.org/abs/1711.05101

  15. Nguyen, C.T., Nguyen, H.T., Morizumi, K., Nakagawa, M.: Temporal classification constraint for improving handwritten mathematical expression recognition. In: ICDAR Workshops, pp. 113–125 (2021)

    Google Scholar 

  16. Noya, E., Benedí, J., Sánchez, J., Anitei, D.: Discriminative learning of two-dimensional probabilistic context-free grammars for mathematical expression recognition and retrieval. In: Pinho, A.J., Georgieva, P., Teixeira, L.F., Sánchez, J.A. (eds.) IbPRIA 2022. LNCS, vol. 13256, pp. 333–347. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-04881-4_27

    Chapter  Google Scholar 

  17. Noya, E., Sánchez, J., Benedí, J.: Generation of hypergraphs from the N-best parsing of 2D-probabilistic context-free grammars for mathematical expression recognition. In: ICPR, pp. 5696–5703 (2021)

    Google Scholar 

  18. Pang, N., Yang, C., Zhu, X., Li, J., Yin, X.C.: Global Context-Based Network with Transformer for Image2latex. In: ICPR, pp. 4650–4656 (2021)

    Google Scholar 

  19. Papineni, K., Roukos, S., Ward, T., Zhu, W.: BLEU: a method for automatic evaluation of machine translation. In: ACL, pp. 311–318 (2002)

    Google Scholar 

  20. Peng, S., Gao, L., Yuan, K., Tang, Z.: Image to LaTeX with graph neural network for mathematical formula recognition. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) ICDAR 2021. LNCS, vol. 12822, pp. 648–663. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86331-9_42

    Chapter  Google Scholar 

  21. Puigcerver, J.: Are multidimensional recurrent layers really necessary for handwritten text recognition? In: ICDAR, vol. 01, pp. 67–72 (2017)

    Google Scholar 

  22. Suzuki, M., Tamari, F., Fukuda, R., Uchida, S., Kanahori, T.: Infty - an integrated OCR system for mathematical documents. In: ACM, pp. 95–104 (2003)

    Google Scholar 

  23. Wang, J., Sun, Y., Wang, S.: Image to Latex with Densenet encoder and joint attention. Procedia Comput. Sci. 147, 374–380 (2019)

    Article  Google Scholar 

  24. Wang, Z., Liu, J.C.: Translating math formula images to LaTeX sequences using deep neural networks with sequence-level training. IJDAR 24, 63–75 (2021)

    Article  Google Scholar 

  25. Xu, K., et al.: Show, attend and tell: neural image caption generation with visual attention. In: ICML, pp. 2048–2057 (2015)

    Google Scholar 

  26. Yan, Z., Zhang, X., Gao, L., Yuan, K., Tang, Z.: ConvMath: a convolutional sequence network for mathematical expression recognition. In: ICPR, pp. 4566–4572 (2021)

    Google Scholar 

  27. Yuan, Y., et al.: Syntax-aware network for handwritten mathematical expression recognition. In: CVPR, pp. 4543–4552 (2022)

    Google Scholar 

  28. Zanibbi, R., Blostein, D., Cordy, J.: Recognizing mathematical expressions using tree transformation. PAMI 24(11), 1–13 (2002)

    Article  Google Scholar 

  29. Zhang, J., Du, J., Dai, L.: Multi-scale attention with dense encoder for handwritten mathematical expression recognition. In: ICPR, pp. 2245–2250 (2018)

    Google Scholar 

  30. Zhang, J., et al.: Watch, attend and parse: an end-to-end neural network based approach to handwritten mathematical expression recognition. Pattern Recogn. 71, 196–206 (2017)

    Article  Google Scholar 

  31. Zhang, T., Mouchère, H., Viard-Gaudin, C.: Using BLSTM for interpretation of 2-D languages: case of handwritten mathematical expressions. In: CORIA, pp. 217–232 (2016)

    Google Scholar 

  32. Zhang, W., Bai, Z., Zhu, Y.: An improved approach based on CNN-RNNs for mathematical expression recognition. In: ICMSSP, pp. 57–61 (2019)

    Google Scholar 

  33. Zhelezniakov, D., Zaytsev, V., Radyvonenko, O.: Online handwritten mathematical expression recognition and applications: a survey. IEEE Access 9, 38352–38373 (2021)

    Article  Google Scholar 

  34. Zhu, Z., Dai, W., Hu, Y., Wang, J., Li, J.: Speech emotion recognition model based on CRNN-CTC. In: Abawajy, J.H., Choo, K.-K.R., Xu, Z., Atiquzzaman, M. (eds.) ATCI 2020. AISC, vol. 1244, pp. 771–778. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-53980-1_113

    Chapter  Google Scholar 

  35. Álvaro, F., Sánchez, J., Benedí, J.: An integrated grammar-based approach for mathematical expression recognition. Pattern Recogn. 51, 135–147 (2016)

    Article  MATH  Google Scholar 

  36. Álvaro, F., Sánchez, J.A., Benedí, J.M.: Recognition of on-line handwritten mathematical expressions using 2D stochastic context-free grammars and hidden Markov models. PRL 35, 58–67 (2014)

    Article  Google Scholar 

Download references

Acknowledgment

This work has been partially supported by MCIN/AEI/10.13039/501100011033 under the grant PID2020-116813RB-I00 (SimancasSearch); the Generalitat Valenciana under the FPI grant CIACIF/2021/313; and by the support of valgrAI - Valencian Graduate School and Research Network of Artificial Intelligence and the Generalitat Valenciana, and co-funded by the European Union.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dan Anitei .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Anitei, D., Sánchez, J.A., Benedí, J.M. (2023). Py4MER: A CTC-Based Mathematical Expression Recognition System. In: Pertusa, A., Gallego, A.J., Sánchez, J.A., Domingues, I. (eds) Pattern Recognition and Image Analysis. IbPRIA 2023. Lecture Notes in Computer Science, vol 14062. Springer, Cham. https://doi.org/10.1007/978-3-031-36616-1_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-36616-1_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-36615-4

  • Online ISBN: 978-3-031-36616-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics