End-to-End Detection and Recognition of Arithmetic Expressions

Wan, Jiangpeng; Zhao, Mengbiao; Yin, Fei; Zhang, Xu-Yao; Huang, LinLin

doi:10.1007/978-3-030-88004-0_41

Jiangpeng Wan¹⁶,
Mengbiao Zhao^17,18,
Fei Yin^17,18,
Xu-Yao Zhang^17,18 &
…
LinLin Huang¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 13019))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

2421 Accesses

Abstract

The detection and recognition of handwritten arithmetic expressions (AEs) play an important role in document retrieval [21] and analysis. They are very difficult because of the structural complexity and the variability of appearance. In this paper, we propose a novel framework to detect and recognize AEs in an End-to-End manner. Firstly, an AE detector based on EfficientNet-B1 [17] is designed to locate all AE instances efficiently. Upon AE location, the RoI Rotate module [11] is adopted to transform visual features for AE proposals. The transformed features are then fed into an attention mechanism based recognizer for AE recognition. The whole network for detection and recognition is trained End-to-End on document images annotated AE locations and transcripts. Since the datasets in this field are rare, we also construct a dataset named HAED, which contains 1069 images (855 for training, and 214 for testing). Extensive experiments on two datasets (HAED and TFD-ICDAR 2019) show that the proposed method has achieved competitive performance on both datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Deng, Y., Kanervisto, A., Ling, J., Rush, A.M.: Image-to-markup generation with coarse-to-fine attention. In: International Conference on Machine Learning, pp. 980–989. PMLR (2017)
Google Scholar
Drake, D.M., Baird, H.S.: Distinguishing mathematics notation from English text using computational geometry. In: Eighth International Conference on Document Analysis and Recognition (ICDAR 2005), pp. 1270–1274. IEEE (2005)
Google Scholar
He, W., Zhang, X.Y., Yin, F., Liu, C.L.: Deep direct regression for multi-oriented scene text detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 745–753 (2017)
Google Scholar
Hu, Y., Zheng, Y., Liu, H., Jiang, D., Ren, B.: Accurate structured-text spotting for arithmetical exercise correction. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 1, pp. 686–693 (2020)
Google Scholar
Kacem, A., Belaïd, A., Ahmed, M.B.: Automatic extraction of printed mathematical formulas using fuzzy logic and propagation of context. Int. J. Doc. Anal. Recogn. 4(2), 97–108 (2001)
Article Google Scholar
Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Valveny, E.: ICDAR 2015 competition on robust reading. In: International Conference on Document Analysis and Recognition (2015)
Google Scholar
Le, A.D., Nakagawa, M.: Training an end-to-end system for handwritten mathematical expression recognition by generated patterns. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1056–1061. IEEE (2017)
Google Scholar
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Google Scholar
Lin, X., Gao, L., Tang, Z., Baker, J., Sorge, V.: Mathematical formula identification and performance evaluation in pdf documents. Int. J. Doc. Anal. Recogn. (IJDAR) 17(3), 239–255 (2014)
Article Google Scholar
Liu, W., et al.: SSD: single shot MultiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: FOTS: fast oriented text spotting with a unified network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5676–5685 (2018)
Google Scholar
Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015)
Mahdavi, M., Zanibbi, R., Mouchere, H., Viard-Gaudin, C., Garain, U.: ICDAR 2019 CROHME+ TFD: competition on recognition of handwritten mathematical expressions and typeset formula detection. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1533–1538. IEEE (2019)
Google Scholar
Mali, P., Kukkadapu, P., Mahdavi, M., Zanibbi, R.: ScanSSD: scanning single shot detector for mathematical formulas in pdf document images. arXiv preprint arXiv:2003.08005 (2020)
Ohyama, W., Suzuki, M., Uchida, S.: Detecting mathematical expressions in scientific document images using a U-Net trained on a diverse dataset. IEEE Access 7, 144030–144042 (2019)
Article Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. arXiv preprint arXiv:1506.01497 (2015)
Tan, M., Le, Q.: EfficientNet: rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, pp. 6105–6114. PMLR (2019)
Google Scholar
Tan, M., Pang, R., Le, Q.V.: EfficientDet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10781–10790 (2020)
Google Scholar
Wang, W., et al.: Shape robust text detection with progressive scale expansion network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9336–9345 (2019)
Google Scholar
Zhang, J., et al.: Watch, attend and parse: an end-to-end neural network based approach to handwritten mathematical expression recognition. Pattern Recogn. 71, 196–206 (2017)
Article Google Scholar
Zhang, L., He, Z., Yang, Y., Wang, L., Gao, X.B.: Tasks integrated networks: joint detection and retrieval for image search. IEEE Trans. Pattern Anal. Mach. Intell. 1 (2020). https://doi.org/10.1109/TPAMI.2020.3009758
Zhou, X., et al.: EAST: an efficient and accurate scene text detector. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5551–5560 (2017)
Google Scholar

Download references

Acknowledgements

This work has been supported by the National Key Research and Development Program under Grant No. 2020AAA0109702.

Author information

Authors and Affiliations

Beijing Jiaotong University, Beijing, 100044, China
Jiangpeng Wan & LinLin Huang
National Laboratory of Pattern Recognition, Institute of Automation of Chinese Academy of Sciences, Beijing, 100190, China
Mengbiao Zhao, Fei Yin & Xu-Yao Zhang
School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, 100049, China
Mengbiao Zhao, Fei Yin & Xu-Yao Zhang

Authors

Jiangpeng Wan
View author publications
You can also search for this author in PubMed Google Scholar
Mengbiao Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Fei Yin
View author publications
You can also search for this author in PubMed Google Scholar
Xu-Yao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
LinLin Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiangpeng Wan .

Editor information

Editors and Affiliations

University of Science and Technology Beijing, Beijing, China
Huimin Ma
Chinese Academy of Sciences, Beijing, China
Liang Wang
Tsinghua University, Beijing, China
Changshui Zhang
Zhejiang University, Hangzhou, China
Fei Wu
Chinese Academy of Sciences, Beijing, China
Tieniu Tan
Hunan University, Changsha, China
Yaonan Wang
Sun Yat-Sen University, Guangzhou, Guangdong, China
Jianhuang Lai
Beijing Jiaotong University, Beijing, China
Yao Zhao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wan, J., Zhao, M., Yin, F., Zhang, XY., Huang, L. (2021). End-to-End Detection and Recognition of Arithmetic Expressions. In: Ma, H., et al. Pattern Recognition and Computer Vision. PRCV 2021. Lecture Notes in Computer Science(), vol 13019. Springer, Cham. https://doi.org/10.1007/978-3-030-88004-0_41

Download citation

DOI: https://doi.org/10.1007/978-3-030-88004-0_41
Published: 22 October 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88003-3
Online ISBN: 978-3-030-88004-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics