Abstract
As the number of primary students rapidly rises, the highly repetitive task of correcting arithmetical exercises consumes much time for teachers and hinders them from concentrating more on the growth of students. To reduce the workload of teachers, arithmetical exercise correction (AEC) is proposed to automatically detect, recognize and correct various arithmetical exercises in the workbook. However, two crucial issues need to be addressed since the research in this field is still immature, i.e., accurate detection of the arithmetic exercise with various structures and the effective recognition of long-size exercise. In this paper, we propose a three-stage method dubbed as FATE, to correct arithmetical exercises in an end-to-end manner. Specifically, we apply the anchor-free model with a feature pyramid network and constraint of center-ness to avoid the redundant bounding boxes. On the other hand, we employ a transformer-based framework with contrastive learning to extract global symbol information and generate corresponding sequences. Finally, we design a series of rule-based templates to correct the generated sequence based on the unique features of each type of arithmetical exercises, respectively. Extensive experiments demonstrate that our method yields the detection average precision of 96.8%, the recognition accuracy of 92.3% and the \(\mathrm {F_{1}}\) score of 91.2% in spotting experiment on the public dataset, which outperforms the state-of-the-art method.
Similar content being viewed by others
Data Availability
The AEC-5k dataset analyzed during the current study is available in the TencentYoutuResearch repository, with the link: https://github.com/TencentYoutuResearch/OCR-AEC5k.
References
Hu Y, Zheng Y, Liu H, Jiang D, Liu Y, Ren B (2020) Accurate structured-text spotting for arithmetical exercise correction. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 686–693
Li B, Yuan Y, Liang D, Liu X, Ji Z, Bai J, Liu W, Bai X (2022) When counting meets hmer: counting-aware network for handwritten mathematical expression recognition. In: Proceedings of the European conference on computer vision, pp 197–214
Zhao W, Gao L, Yan Z, Peng S, Du L, Zhang Z (2021) Handwritten mathematical expression recognition with bidirectionally trained transformer. In: Proceedings of the international conference on document analysis and recognition, pp 570–584
Bian X, Qin B, Xin X, Li J, Su X, Wang Y (2022) Handwritten mathematical expression recognition via attention aggregation based bi-directional mutual learning. In: Proceedings of the AAAI conference on artificial intelligence, pp 113–121
Lin T, Dollár P, Girshick RB, He K, Hariharan B, Belongie SJ (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 936–944 (2017)
Tian Z, Shen C, Chen H, He T (2019) Fcos: fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9627–9636
Ohyama W, Suzuki M, Uchida S (2019) Detecting mathematical expressions in scientific document images using a u-net trained on a diverse dataset. IEEE Access 7:144030–144042
Mali P, Kukkadapu P, Mahdavi M, Zanibbi R (2020) Scanssd: scanning single shot detector for mathematical formulas in pdf document images. arXiv:2003.08005
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: single shot multibox detector. In: Proceedings of the European conference on computer vision
Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q (2019) Centernet: keypoint triplets for object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6569–6578
Zhang Z, Zhang C, Shen W, Yao C, Liu W, Bai X (2016) Multi-oriented text detection with fully convolutional networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4159–4167
Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J (2017) EAST: an efficient and accurate scene text detector. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2642–2651
He W, Zhang X, Yin F, Liu C (2017) Deep direct regression for multi-oriented scene text detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 745–753
Liao M, Shi B, Bai X, Wang X, Liu W (2017) Textboxes: a fast text detector with a single deep neural network. In: Proceedings of the thirty-first AAAI conference on artificial intelligence, pp 4161–4167
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Proceedings of the advances in neural information processing systems, vol 28
Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv:1804.02767
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: Proceedings of the European conference on computer vision, vol 12346, pp 213–229
Chen Q, Wang Y, Yang T, Zhang X, Cheng J, Sun J (2021) You only look one-level feature. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13039–13048
Zhang J, Du J, Zhang S, Liu D, Hu Y, Hu J, Wei S, Dai L (2017) Watch, attend and parse: an end-to-end neural network based approach to handwritten mathematical expression recognition. Pattern Recognit 71:196–206
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 770–778
Voigtlaender P, Doetsch P, Ney H (2016) Handwriting recognition with large multidimensional long short-term memory recurrent neural networks. In: Proceedings of the international conference on frontiers in handwriting recognition, pp 228–233
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Proc Adv Neural Inf Process Syst 30:6000–6010
Gers FA, Schmidhuber J (2000) Recurrent nets that time and count. In: Proceedings of the IEEE-INNS-ENNS international joint conference on neural networks, pp 189–194
Liu X, Liang D, Yan S, Chen D, Qiao Y, Yan J (2018) FOTS: fast oriented text spotting with a unified network. In: In Proceedings of IEEE conference on computer vision and pattern recognition, pp 5676–5685
Jing L, Tian Y (2020) Self-supervised visual feature learning with deep neural networks: a survey. IEEE Trans Pattern Anal Mach Intell 43(11):4037–4058
He K, Fan H, Wu Y, Xie S, Girshick RB (2020) Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9726–9735
Chen T, Kornblith S, Norouzi M, Hinton GE (2020) A simple framework for contrastive learning of visual representations. In: Proceedings of the 37th international conference on machine learning, vol 119, pp 1597–1607
Chen X, He K (2021) Exploring simple siamese representation learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 15750–15758
Lin T, Goyal P, Girshick RB, He K, Dollár P (2020) Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell 42(2):318–327
Parvaneh S, Rubin J, Rahman A, Conroy B, Babaeizadeh S (2017) Densely connected convolutional networks and signal quality analysis to detect atrial fibrillation using short single-lead ECG recordings. In: Computing in cardiology, CinC 2007, Rennes, September 24–27, 2017
Grill J-B, Strub F, Altché F, Tallec C, Richemond P, Buchatskaya E, Doersch C, Avila Pires B, Guo Z, Gheshlaghi Azar M et al (2020) Bootstrap your own latent—a new approach to self-supervised learning. Proc Adv Neural Inf Process Syst 33:21271–21284
Ruder S (2016) An overview of gradient descent optimization algorithms. arXiv:1609.04747
Zeiler MD (2012) Adadelta: an adaptive learning rate method. arXiv:1212.5701
Acknowledgements
This work was supported by the National Natural Science Foundation of China (Grant No. 62076062) and the Social Development Science and Technology Project of Jiangsu Province (No. BE2022811). Furthermore, the work was also supported by the Collaborative Innovation Center of Wireless Communications Technology and the Big Data Computing Center of Southeast University.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhu, Q., Luo, Z., Zhu, S. et al. FATE: a three-stage method for arithmetical exercise correction. Neural Comput & Applic 35, 23491–23506 (2023). https://doi.org/10.1007/s00521-023-08890-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-023-08890-6