An Encoder-Decoder Method with Position-Aware for Printed Mathematical Expression Recognition

Long, Jun; Hong, Quan; Yang, Liu

doi:10.1007/978-3-031-41676-7_10

Jun Long¹²,
Quan Hong¹¹ &
Liu Yang¹¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14187))

Included in the following conference series:

International Conference on Document Analysis and Recognition

1164 Accesses

Abstract

Printed mathematical expression recognition is to transform printed mathematical formula image into LaTeX sequence. Recently, many methods based on deep learning have been proposed to solve this task. However, the positional relationship between mathematical symbols is often ignored or represented insufficient, leading to the loss of structural features of mathematical formulas. To overcome this challenge, we propose a position-aware encoder-decoder model for printed mathematical expression recognition. We design a two-dimensional position encoding algorithm based on sin/cos function to capture positional relationship between mathematical symbols. Meanwhile, we adopt a more advanced image feature extraction network. In decoder component, we use Bi-GRU as the translator, and add attention mechanism to make decoder focus on the important local information. We conduct experiments on the public dataset IM2LaTeX-100K, and the results show that our proposed approach is more excellent than the majority of advanced methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Anderson, R.H.: Syntax-directed recognition of hand-printed two-dimensional mathematics. In: Symposium on Interactive Systems for Experimental Applied Mathematics: Proceedings of the Association for Computing Machinery Inc., Symposium, pp. 436–459 (1967)
Google Scholar
Bian, X., Qin, B., Xin, X., Li, J., Su, X., Wang, Y.: Handwritten mathematical expression recognition via attention aggregation based bi-directional mutual learning. arXiv preprint arXiv:2112.03603 (2021)
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)
Deng, Y., Kanervisto, A., Rush, A.M.: What you get is what you see: a visual markup decompiler (2016)
Google Scholar
Deng, Y., Kanervisto, A., Ling, J., Rush, A.M.: Image-to-markup generation with coarse-to-fine attention (2017)
Google Scholar
Graves, A.: Sequence transduction with recurrent neural networks. arXiv preprint arXiv:1211.3711 (2012)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. IEEE (2016)
Google Scholar
Huang, G., Liu, Z., Laurens, V., Weinberger, K.Q.: Densely connected convolutional networks. IEEE Computer Society (2016)
Google Scholar
Huang, Z., et al.: Question difficulty prediction for reading problems in standard tests. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)
Google Scholar
Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3128–3137 (2015)
Google Scholar
Li, B., Yuan, Y., Liang, D., Liu, X., Ji, Z., Bai, J., Liu, W., Bai, X.: When counting meets hmer: Counting-aware network for handwritten mathematical expression recognition. In: European Conference on Computer Vision, pp. 197–214. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19815-1_12
Li, Z., Jin, L., Lai, S., Zhu, Y.: Improving attention-based handwritten mathematical expression recognition with scale augmentation and drop attention. In: 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 175–180. IEEE (2020)
Google Scholar
Liu, Q., Huang, Z., Huang, Z., Liu, C., Chen, E., Su, Y., Hu, G.: Finding similar exercises in online education systems. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1821–1830 (2018)
Google Scholar
Liu, Q., et al.: Fuzzy cognitive diagnosis for modelling examinee performance. ACM Trans. Intell. Syst. Technol. (TIST) 9(4), 1–26 (2018)
Article Google Scholar
Lu, J., Yang, J., Batra, D., Parikh, D.: Neural baby talk. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7219–7228 (2018)
Google Scholar
Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015)
Pang, N., Yang, C., Zhu, X., Li, J., Yin, X.C.: Global context-based network with transformer for image2latex. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 4650–4656. IEEE (2021)
Google Scholar
Peng, S., Gao, L., Yuan, K., Tang, Z.: Image to latex with graph neural network for mathematical formula recognition. In: International Conference on Document Analysis and Recognition, pp. 648–663. Springer (2021)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Suzuki, M., Tamari, F., Kanahori, T.: Infty – an integrated OCR system for (2003)
Google Scholar
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)
Google Scholar
Vaswani, A., et al.: Attention is all you need. Advances in neural information processing systems 30 (2017)
Google Scholar
Wang, J., Sun, Y., Wang, S.: Image to latex with densenet encoder and joint attention. Procedia Comput. Sci. 147, 374–380 (2019)
Article Google Scholar
Wu, J.W., Yin, F., Zhang, Y.M., Zhang, X.Y., Liu, C.L.: Handwritten mathematical expression recognition via paired adversarial learning. Int. J. Comput. Vis., 1–16 (2020)
Google Scholar
Wu, J.W., Yin, F., Zhang, Y.M., Zhang, X.Y., Liu, C.L.: Graph-to-graph: towards accurate and interpretable online handwritten mathematical expression recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 2925–2933 (2021)
Google Scholar
Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500 (2017)
Google Scholar
Yan, Z., Zhang, X., Gao, L., Yuan, K., Tang, Z.: Convmath: a convolutional sequence network for mathematical expression recognition. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 4566–4572. IEEE (2021)
Google Scholar
Zhang, J., Du, J., Dai, L.: Track, attend, and parse (tap): an end-to-end framework for online handwritten mathematical expression recognition. IEEE Trans. Multimedia 21(1), 221–233 (2018)
Article Google Scholar
Zhang, W., Bai, Z., Zhu, Y.: An improved approach based on CNN-RNNs for mathematical expression recognition. In: Proceedings of the 2019 4th International Conference on Multimedia Systems and Signal Processing, pp. 57–61 (2019)
Google Scholar

Download references

Acknowledgements

This work is being supported by the National Natural Science Foundation of China under the Grant No. U2003208 and No. 62172451, and supported by Open Research Projects of Zhejiang Lab under the Grant No. 2022KG0AB01.

Author information

Authors and Affiliations

The School of Computer Science and Engineering, Central South University, Changsha, 410075, Hunan, China
Quan Hong & Liu Yang
Big Data Institute, Central South University, Changsha, 410083, Hunan, China
Jun Long

Authors

Jun Long
View author publications
You can also search for this author in PubMed Google Scholar
Quan Hong
View author publications
You can also search for this author in PubMed Google Scholar
Liu Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Quan Hong or Liu Yang .

Editor information

Editors and Affiliations

TU Dortmund University, Dortmund, Germany
Gernot A. Fink
Adobe, College Park, MN, USA
Rajiv Jain
Osaka Metropolitan University, Osaka, Japan
Koichi Kise
Rochester Institute of Technology, Rochester, NY, USA
Richard Zanibbi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Long, J., Hong, Q., Yang, L. (2023). An Encoder-Decoder Method with Position-Aware for Printed Mathematical Expression Recognition. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14187. Springer, Cham. https://doi.org/10.1007/978-3-031-41676-7_10

Download citation

DOI: https://doi.org/10.1007/978-3-031-41676-7_10
Published: 19 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-41675-0
Online ISBN: 978-3-031-41676-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

An Encoder-Decoder Method with Position-Aware for Printed Mathematical Expression Recognition