Skip to main content
Log in

FATE: a three-stage method for arithmetical exercise correction

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

As the number of primary students rapidly rises, the highly repetitive task of correcting arithmetical exercises consumes much time for teachers and hinders them from concentrating more on the growth of students. To reduce the workload of teachers, arithmetical exercise correction (AEC) is proposed to automatically detect, recognize and correct various arithmetical exercises in the workbook. However, two crucial issues need to be addressed since the research in this field is still immature, i.e., accurate detection of the arithmetic exercise with various structures and the effective recognition of long-size exercise. In this paper, we propose a three-stage method dubbed as FATE, to correct arithmetical exercises in an end-to-end manner. Specifically, we apply the anchor-free model with a feature pyramid network and constraint of center-ness to avoid the redundant bounding boxes. On the other hand, we employ a transformer-based framework with contrastive learning to extract global symbol information and generate corresponding sequences. Finally, we design a series of rule-based templates to correct the generated sequence based on the unique features of each type of arithmetical exercises, respectively. Extensive experiments demonstrate that our method yields the detection average precision of 96.8%, the recognition accuracy of 92.3% and the \(\mathrm {F_{1}}\) score of 91.2% in spotting experiment on the public dataset, which outperforms the state-of-the-art method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data Availability

The AEC-5k dataset analyzed during the current study is available in the TencentYoutuResearch repository, with the link: https://github.com/TencentYoutuResearch/OCR-AEC5k.

References

  1. Hu Y, Zheng Y, Liu H, Jiang D, Liu Y, Ren B (2020) Accurate structured-text spotting for arithmetical exercise correction. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 686–693

  2. Li B, Yuan Y, Liang D, Liu X, Ji Z, Bai J, Liu W, Bai X (2022) When counting meets hmer: counting-aware network for handwritten mathematical expression recognition. In: Proceedings of the European conference on computer vision, pp 197–214

  3. Zhao W, Gao L, Yan Z, Peng S, Du L, Zhang Z (2021) Handwritten mathematical expression recognition with bidirectionally trained transformer. In: Proceedings of the international conference on document analysis and recognition, pp 570–584

  4. Bian X, Qin B, Xin X, Li J, Su X, Wang Y (2022) Handwritten mathematical expression recognition via attention aggregation based bi-directional mutual learning. In: Proceedings of the AAAI conference on artificial intelligence, pp 113–121

  5. Lin T, Dollár P, Girshick RB, He K, Hariharan B, Belongie SJ (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 936–944 (2017)

  6. Tian Z, Shen C, Chen H, He T (2019) Fcos: fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9627–9636

  7. Ohyama W, Suzuki M, Uchida S (2019) Detecting mathematical expressions in scientific document images using a u-net trained on a diverse dataset. IEEE Access 7:144030–144042

    Article  Google Scholar 

  8. Mali P, Kukkadapu P, Mahdavi M, Zanibbi R (2020) Scanssd: scanning single shot detector for mathematical formulas in pdf document images. arXiv:2003.08005

  9. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: single shot multibox detector. In: Proceedings of the European conference on computer vision

  10. Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q (2019) Centernet: keypoint triplets for object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6569–6578

  11. Zhang Z, Zhang C, Shen W, Yao C, Liu W, Bai X (2016) Multi-oriented text detection with fully convolutional networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4159–4167

  12. Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J (2017) EAST: an efficient and accurate scene text detector. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2642–2651

  13. He W, Zhang X, Yin F, Liu C (2017) Deep direct regression for multi-oriented scene text detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 745–753

  14. Liao M, Shi B, Bai X, Wang X, Liu W (2017) Textboxes: a fast text detector with a single deep neural network. In: Proceedings of the thirty-first AAAI conference on artificial intelligence, pp 4161–4167

  15. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Proceedings of the advances in neural information processing systems, vol 28

  16. Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv:1804.02767

  17. Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: Proceedings of the European conference on computer vision, vol 12346, pp 213–229

  18. Chen Q, Wang Y, Yang T, Zhang X, Cheng J, Sun J (2021) You only look one-level feature. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13039–13048

  19. Zhang J, Du J, Zhang S, Liu D, Hu Y, Hu J, Wei S, Dai L (2017) Watch, attend and parse: an end-to-end neural network based approach to handwritten mathematical expression recognition. Pattern Recognit 71:196–206

    Article  Google Scholar 

  20. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 770–778

  21. Voigtlaender P, Doetsch P, Ney H (2016) Handwriting recognition with large multidimensional long short-term memory recurrent neural networks. In: Proceedings of the international conference on frontiers in handwriting recognition, pp 228–233

  22. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  23. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Proc Adv Neural Inf Process Syst 30:6000–6010

    Google Scholar 

  24. Gers FA, Schmidhuber J (2000) Recurrent nets that time and count. In: Proceedings of the IEEE-INNS-ENNS international joint conference on neural networks, pp 189–194

  25. Liu X, Liang D, Yan S, Chen D, Qiao Y, Yan J (2018) FOTS: fast oriented text spotting with a unified network. In: In Proceedings of IEEE conference on computer vision and pattern recognition, pp 5676–5685

  26. Jing L, Tian Y (2020) Self-supervised visual feature learning with deep neural networks: a survey. IEEE Trans Pattern Anal Mach Intell 43(11):4037–4058

    Article  Google Scholar 

  27. He K, Fan H, Wu Y, Xie S, Girshick RB (2020) Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9726–9735

  28. Chen T, Kornblith S, Norouzi M, Hinton GE (2020) A simple framework for contrastive learning of visual representations. In: Proceedings of the 37th international conference on machine learning, vol 119, pp 1597–1607

  29. Chen X, He K (2021) Exploring simple siamese representation learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 15750–15758

  30. Lin T, Goyal P, Girshick RB, He K, Dollár P (2020) Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell 42(2):318–327

    Article  Google Scholar 

  31. Parvaneh S, Rubin J, Rahman A, Conroy B, Babaeizadeh S (2017) Densely connected convolutional networks and signal quality analysis to detect atrial fibrillation using short single-lead ECG recordings. In: Computing in cardiology, CinC 2007, Rennes, September 24–27, 2017

  32. Grill J-B, Strub F, Altché F, Tallec C, Richemond P, Buchatskaya E, Doersch C, Avila Pires B, Guo Z, Gheshlaghi Azar M et al (2020) Bootstrap your own latent—a new approach to self-supervised learning. Proc Adv Neural Inf Process Syst 33:21271–21284

    Google Scholar 

  33. Ruder S (2016) An overview of gradient descent optimization algorithms. arXiv:1609.04747

  34. Zeiler MD (2012) Adadelta: an adaptive learning rate method. arXiv:1212.5701

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant No. 62076062) and the Social Development Science and Technology Project of Jiangsu Province (No. BE2022811). Furthermore, the work was also supported by the Collaborative Innovation Center of Wireless Communications Technology and the Big Data Computing Center of Southeast University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hui Xue.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, Q., Luo, Z., Zhu, S. et al. FATE: a three-stage method for arithmetical exercise correction. Neural Comput & Applic 35, 23491–23506 (2023). https://doi.org/10.1007/s00521-023-08890-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-08890-6

Keywords

Navigation