Skip to main content

Advertisement

Log in

Multimodal Dependence Attention and Large-Scale Data Based Offline Handwritten Formula Recognition

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Offline handwritten formula recognition is a challenging task due to the variety of handwritten symbols and two-dimensional formula structures. Recently, the deep neural network recognizers based on the encoder-decoder framework have achieved great improvements on this task. However, the unsatisfactory recognition performance for formulas with long LATEX strings is one shortcoming of the existing work. Moreover, lacking sufficient training data also limits the capability of these recognizers. In this paper, we design a multimodal dependence attention (MDA) module to help the model learn visual and semantic dependencies among symbols in the same formula to improve the recognition performance of the formulas with long LATEX strings. To alleviate overfitting and further improve the recognition performance, we also propose a new dataset, Handwritten Formula Image Dataset (HFID), which contains 25 620 handwritten formula images collected from real life. We conduct extensive experiments to demonstrate the effectiveness of our proposed MDA module and HFID dataset and achieve state-of-the-art performances, 63.79% and 65.24% expression accuracy on CROHME 2014 and CROHME 2016, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Zhang J S, Du J, Zhang S L, Liu D, Hu Y L, Hu J S, Wei S, Dai L R. Watch, attend and parse: An end-to-end neural network based approach to handwritten mathematical expression recognition. Pattern Recognition, 2017, 71: 196–206. DOI: https://doi.org/10.1016/j.patcog.2017.06.017.

    Article  Google Scholar 

  2. Wu J W, Yin F, Zhang Y M, Zhang X Y, Liu C L. Image-to-markup generation via paired adversarial learning. In Proc. the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Sept. 2018, pp.18–34. DOI: https://doi.org/10.1007/978-3-030-10925-7_2.

  3. Wu J W, Yin F, Zhang Y M, Zhang X Y, Liu C L. Handwritten mathematical expression recognition via paired adversarial learning. Int. J. Comput. Vision, 2020, 128(10): 2386–2401. DOI: https://doi.org/10.1007/s11263-020-01291-5.

    Article  MathSciNet  Google Scholar 

  4. Anderson R H. Syntax-directed recognition of hand-printed two-dimensional mathematics. In Proc. the Association for Computing Machinery Inc. Symposium, Aug. 1967, pp.436–459. DOI: https://doi.org/10.1145/2402536.2402585.

  5. Hu L, Zanibbi R. Segmenting handwritten math symbols using AdaBoost and multi-scale shape context features. In Proc. the 12th International Conference on Document Analysis and Recognition, Aug. 2013, pp.1180–1184. DOI: https://doi.org/10.1109/ICDAR.2013.239.

  6. Álvaro F, Sánchez J A, Benedí J M. Offline features for classifying handwritten math symbols with recurrent neural networks. In Proc. the 22nd International Conference on Pattern Recognition, Aug. 2014, pp.2944–2949. DOI: https://doi.org/10.1109/ICPR.2014.507.

  7. Awal A M, Mouchère H, Viard-Gaudin C. A global learning approach for an online handwritten mathematical expression recognition system. Pattern Recognit. Lett., 2014, 35: 68–77. DOI: https://doi.org/10.1016/j.patrec.2012.10.024.

    Article  Google Scholar 

  8. Álvaro F, Sánchez J A, Benedí J M. An integrated grammar-based approach for mathematical expression recognition. Pattern Recognit., 2016, 51: 135–147. DOI: https://doi.org/10.1016/j.patcog.2015.09.013.

    Article  Google Scholar 

  9. Deng Y T, Kanervisto A, Ling J, Rush A M. Image-to-markup generation with coarse-to-fine attention. In Proc. the 34th International Conference on Machine Learning, Aug. 2017, pp.980–989.

  10. Zhang J S, Du J, Dai L R. Multi-scale attention with dense encoder for handwritten mathematical expression recognition. In Proc. the 24th International Conference on Pattern Recognition, Aug. 2018, pp.2245–2250. DOI: https://doi.org/10.1109/ICPR.2018.8546031.

  11. Le A D, Indurkhya B, Nakagawa M. Pattern generation strategies for improving recognition of handwritten mathematical expressions. Pattern Recognit. Lett., 2019, 128: 255–262. DOI: https://doi.org/10.1016/j.patrec.2019.09.002.

    Article  Google Scholar 

  12. Li Z, Jin L W, Lai S X, Zhu Y C. Improving attention-based handwritten mathematical expression recognition with scale augmentation and drop attention. In Proc. the 17th International Conference on Frontiers in Handwriting Recognition, Sept. 2020, pp.175–180. DOI: https://doi.org/10.1109/ICFHR2020.2020.00041.

  13. Zhang J S, Du J, Yang Y X, Song Y Z, Wei S, Dai L R. A tree-structured decoder for image-to-markup generation. In Proc. the 37th International Conference on Machine Learning, Jul. 2020, Article No. 1027.

  14. Xu K, Ba J L, Kiros R, Cho K, Courville A, Salakhutdinov R, Zemel R S, Bengio Y. Show, attend and tell: Neural image caption generation with visual attention. In Proc. the 32nd International Conference on International Conference on Machine Learning, Jul. 2015, pp.2048–2057.

  15. Mouchère H, Zanibbi R, Garain U, Viard-Gaudin C. Advancing the state of the art for handwritten math recognition: The CROHME competitions, 2011–2014. Int. J. Document Anal. Recognit., 2016, 19(2): 173–189. DOI: https://doi.org/10.1007/s10032-016-0263-5.

    Article  Google Scholar 

  16. Mouchère H, Viard-Gaudin C, Zanibbi R, Garain U. ICFHR2016 CROHME: Competition on recognition of online handwritten mathematical expressions. In Proc. the 15th International Conference on Frontiers in Handwriting Recognition, Oct. 2016, pp.607–612. DOI: https://doi.org/10.1109/ICFHR.2016.0116.

  17. Mahdavi M, Zanibbi R, Mouchere H, Viard-Gaudin C, Garain U. ICDAR 2019 CROHME + TFD: Competition on recognition of handwritten mathematical expressions and typeset formula detection. In Proc. the 2019 International Conference on Document Analysis and Recognition, Sept. 2019, pp.1533–1538. DOI: https://doi.org/10.1109/ICDAR.2019.00247.

  18. Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput., 1997, 9(8): 1735–1780. DOI: https://doi.org/10.1162/neco.1997.9.8.1735.

    Article  Google Scholar 

  19. Chung J, Gulcehre C, Cho K H, Bengio Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv: 1412.3555, 2014. https://arxiv.org/abs/1412.3555, May 2024.

  20. Gehring J, Auli M, Grangier D, Yarats D, Dauphin Y N. Convolutional sequence to sequence learning. In Proc. the 34th International Conference on Machine Learning, Aug. 2017, pp.1243–1252.

  21. Tang G B, Müller M, Rios A, Sennrich R. Why self-attention? A targeted evaluation of neural machine translation architectures. In Proc. the 2018 Conference on Empirical Methods in Natural Language Processing, Oct. 31–Nov. 4, 2018, pp.4263–4272. DOI: https://doi.org/10.18653/v1/D18-1458.

  22. Zhang J S, Du J, Dai L R. Track, Attend, and Parse (TAP): An end-to-end framework for online handwritten mathematical expression recognition. IEEE Trans. Multimedia, 2019, 21(1): 221–233. DOI: https://doi.org/10.1109/TMM.2018.2844689.

    Article  Google Scholar 

  23. Liu C, Yin F, Wang D, Wang Q. CASIA online and offline Chinese handwriting databases. In Proc. the 2011 International Conference on Document Analysis and Recognition, Sept. 2011, pp.37–41. DOI: https://doi.org/10.1109/ICDAR.2011.17.

  24. Marti U V, Bunke H. The IAM-database: An English sentence database for offline handwriting recognition. Int. J. Document Anal. Recognit., 2002, 5(1): 39–46. DOI: https://doi.org/10.1007/s100320200071.

    Article  Google Scholar 

  25. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv: 1409.1556, 2014. https://arxiv.org/abs/1409.1556, May 2024.

  26. Gu J X, Wang G, Cai J F, Chen T. An empirical study of language CNN for image captioning. In Proc. the 2017 IEEE International Conference on Computer Vision, Oct. 2017, pp.1231–1240. DOI: https://doi.org/10.1109/ICCV.2017.138.

  27. Xiu Y H, Wang Q Q, Zhan H J, Lan M, Lu Y. A handwritten Chinese text recognizer applying multi-level multimodal fusion network. In Proc. the 2019 International Conference on Document Analysis and Recognition, Sept. 2019, pp.1464–1469. DOI: https://doi.org/10.1109/ICDAR.2019.00235.

  28. Huang G, Liu Z, Van Der Maaten L, Weinberger K Q. Densely connected convolutional networks. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Jul. 2017, pp.2261–2269. DOI: https://doi.org/10.1109/CVPR.2017.243.

  29. Weston J, Chopra S, Bordes A. Memory networks. arXiv: 1410.3916, 2014. https://arxiv.org/abs/1410.3916, May 2024.

  30. Ranzato M A, Chopra S, Auli M, Zaremba W. Sequence level training with recurrent neural networks. arXiv: 1511.06732, 2015. https://arxiv.org/abs/1511.06732, May 2024.

  31. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł, Polosukhin I. Attention is all you need. In Proc. the 31st International Conference on Neural Information Processing Systems, Dec. 2017, pp.6000–6010.

  32. Zanibbi R, Mouchere H, Viard-Gaudin C. Evaluating structural pattern recognition for handwritten math via primitive label graphs. In Proc. the SPIE 8658, Document Recognition and Retrieval XX, Feb. 2013, Article No. 865817. DOI: https://doi.org/10.1117/12.2008409.

  33. Abadi M, Agarwal A, Barham P et al. Tensor-flow: Large-scale machine learning on heterogeneous distributed systems. arXiv: 1603.04467, 2016. https://arxiv.org/abs/1603.04467, May 2024.

  34. Zeiler M D. ADADELTA: An adaptive learning rate method. arXiv: 1212.5701, 2012. https://arxiv.org/abs/1212.5701, May 2024.

  35. Krogh A, Hertz J A. A simple weight decay can improve generalization. In Proc. the 4th International Conference on Neural Information Processing Systems, Dec. 1991, pp.950–957.

  36. Cho K. Natural language understanding with distributed representation. arXiv: 1511.07916, 2015. https://arxiv.org/abs/1511.07916, May 2024.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xin-Ming Zhang  (张信明).

Ethics declarations

Conflict of Interest The authors declare that they have no conflict of interest.

Additional information

This work is supported by the National Key Research and Development Program of China under Grant No. 2020YFB1313602.

Han-Chao Liu is now a Ph.D. candidate in the School of Computer Science and Technology, University of Science and Technology of China, Hefei. He received his B.E. degree in computer science from Northwest Agriculture and Forestry University, Yangling, in 2015. His research interests include image analysis and pattern recognition.

Lan-Fang Dong received her B.E. degree in computer science from Lanzhou University, Lanzhou, in 1991, and her M.S. degree in computer application from University of Science and Technology of China, Hefei, in 1994. She is currently an associate professor with the School of Computer Science and Technology, University of Science and Technology of China, Hefei. Her research interests include computing and visualization, intelligent image analysis, and computer animation.

Xin-Ming Zhang received his B.E. and M.E. degrees in electrical engineering from China University of Mining and Technology, Xuzhou, in 1985 and 1988, respectively, and his Ph.D degree in computer science and technology from the University of Science and Technology of China, Hefei, in 2001. Since 2002, he has been with the faculty of the University of Science and Technology of China, Hefei, where he is currently a professor with the School of Computer Science and Technology.

Electronic Supplementary Material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, HC., Dong, LF. & Zhang, XM. Multimodal Dependence Attention and Large-Scale Data Based Offline Handwritten Formula Recognition. J. Comput. Sci. Technol. 39, 654–670 (2024). https://doi.org/10.1007/s11390-022-1987-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-022-1987-y

Keywords