Abstract
The detection and recognition of handwritten arithmetic expressions (AEs) play an important role in document retrieval [21] and analysis. They are very difficult because of the structural complexity and the variability of appearance. In this paper, we propose a novel framework to detect and recognize AEs in an End-to-End manner. Firstly, an AE detector based on EfficientNet-B1 [17] is designed to locate all AE instances efficiently. Upon AE location, the RoI Rotate module [11] is adopted to transform visual features for AE proposals. The transformed features are then fed into an attention mechanism based recognizer for AE recognition. The whole network for detection and recognition is trained End-to-End on document images annotated AE locations and transcripts. Since the datasets in this field are rare, we also construct a dataset named HAED, which contains 1069 images (855 for training, and 214 for testing). Extensive experiments on two datasets (HAED and TFD-ICDAR 2019) show that the proposed method has achieved competitive performance on both datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Deng, Y., Kanervisto, A., Ling, J., Rush, A.M.: Image-to-markup generation with coarse-to-fine attention. In: International Conference on Machine Learning, pp. 980–989. PMLR (2017)
Drake, D.M., Baird, H.S.: Distinguishing mathematics notation from English text using computational geometry. In: Eighth International Conference on Document Analysis and Recognition (ICDAR 2005), pp. 1270–1274. IEEE (2005)
He, W., Zhang, X.Y., Yin, F., Liu, C.L.: Deep direct regression for multi-oriented scene text detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 745–753 (2017)
Hu, Y., Zheng, Y., Liu, H., Jiang, D., Ren, B.: Accurate structured-text spotting for arithmetical exercise correction. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 1, pp. 686–693 (2020)
Kacem, A., Belaïd, A., Ahmed, M.B.: Automatic extraction of printed mathematical formulas using fuzzy logic and propagation of context. Int. J. Doc. Anal. Recogn. 4(2), 97–108 (2001)
Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Valveny, E.: ICDAR 2015 competition on robust reading. In: International Conference on Document Analysis and Recognition (2015)
Le, A.D., Nakagawa, M.: Training an end-to-end system for handwritten mathematical expression recognition by generated patterns. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1056–1061. IEEE (2017)
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Lin, X., Gao, L., Tang, Z., Baker, J., Sorge, V.: Mathematical formula identification and performance evaluation in pdf documents. Int. J. Doc. Anal. Recogn. (IJDAR) 17(3), 239–255 (2014)
Liu, W., et al.: SSD: single shot MultiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: FOTS: fast oriented text spotting with a unified network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5676–5685 (2018)
Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015)
Mahdavi, M., Zanibbi, R., Mouchere, H., Viard-Gaudin, C., Garain, U.: ICDAR 2019 CROHME+ TFD: competition on recognition of handwritten mathematical expressions and typeset formula detection. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1533–1538. IEEE (2019)
Mali, P., Kukkadapu, P., Mahdavi, M., Zanibbi, R.: ScanSSD: scanning single shot detector for mathematical formulas in pdf document images. arXiv preprint arXiv:2003.08005 (2020)
Ohyama, W., Suzuki, M., Uchida, S.: Detecting mathematical expressions in scientific document images using a U-Net trained on a diverse dataset. IEEE Access 7, 144030–144042 (2019)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. arXiv preprint arXiv:1506.01497 (2015)
Tan, M., Le, Q.: EfficientNet: rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, pp. 6105–6114. PMLR (2019)
Tan, M., Pang, R., Le, Q.V.: EfficientDet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10781–10790 (2020)
Wang, W., et al.: Shape robust text detection with progressive scale expansion network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9336–9345 (2019)
Zhang, J., et al.: Watch, attend and parse: an end-to-end neural network based approach to handwritten mathematical expression recognition. Pattern Recogn. 71, 196–206 (2017)
Zhang, L., He, Z., Yang, Y., Wang, L., Gao, X.B.: Tasks integrated networks: joint detection and retrieval for image search. IEEE Trans. Pattern Anal. Mach. Intell. 1 (2020). https://doi.org/10.1109/TPAMI.2020.3009758
Zhou, X., et al.: EAST: an efficient and accurate scene text detector. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5551–5560 (2017)
Acknowledgements
This work has been supported by the National Key Research and Development Program under Grant No. 2020AAA0109702.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Wan, J., Zhao, M., Yin, F., Zhang, XY., Huang, L. (2021). End-to-End Detection and Recognition of Arithmetic Expressions. In: Ma, H., et al. Pattern Recognition and Computer Vision. PRCV 2021. Lecture Notes in Computer Science(), vol 13019. Springer, Cham. https://doi.org/10.1007/978-3-030-88004-0_41
Download citation
DOI: https://doi.org/10.1007/978-3-030-88004-0_41
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88003-3
Online ISBN: 978-3-030-88004-0
eBook Packages: Computer ScienceComputer Science (R0)