Handwritten Mathematical Expression Recognition via Paired Adversarial Learning

Wu, Jin-Wen; Yin, Fei; Zhang, Yan-Ming; Zhang, Xu-Yao; Liu, Cheng-Lin

doi:10.1007/s11263-020-01291-5

Handwritten Mathematical Expression Recognition via Paired Adversarial Learning

Published: 21 January 2020

Volume 128, pages 2386–2401, (2020)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Jin-Wen Wu ORCID: orcid.org/0000-0003-1595-597X^1,2,
Fei Yin¹,
Yan-Ming Zhang¹,
Xu-Yao Zhang^1,2 &
…
Cheng-Lin Liu^1,2,3

3665 Accesses
54 Citations
1 Altmetric
Explore all metrics

Abstract

Recognition of handwritten mathematical expressions (MEs) is an important problem that has wide applications in practice. Handwritten ME recognition is challenging due to the variety of writing styles and ME formats. As a result, recognizers trained by optimizing the traditional supervision loss do not perform satisfactorily. To improve the robustness of the recognizer with respect to writing styles, in this work, we propose a novel paired adversarial learning method to learn semantic-invariant features. Specifically, our proposed model, named PAL-v2, consists of an attention-based recognizer and a discriminator. During training, handwritten MEs and their printed templates are fed into PAL-v2 simultaneously. The attention-based recognizer is trained to learn semantic-invariant features with the guide of the discriminator. Moreover, we adopt a convolutional decoder to alleviate the vanishing and exploding gradient problems of RNN-based decoder, and further, improve the coverage of decoding with a novel attention method. We conducted extensive experiments on the CROHME dataset to demonstrate the effectiveness of each part of the method and achieved state-of-the-art performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Offline handwritten mathematical recognition using adversarial learning and transformers

Article 09 September 2023

Ujjwal Thakur & Anuj Sharma

Handwritten Mathematical Expression Recognition via GCAttention-Based Encoder and Bidirectional Mutual Learning Transformer

Handwritten Mathematical Expression Recognition with Bidirectionally Trained Transformer

References

Alvaro, F., Sánchez, J., & Benedí, J. (2014). Recognition of on-line handwritten mathematical expressions using 2d stochastic context-free grammars and hidden Markov models. Pattern Recognition Letters, 35, 58–67.
Article Google Scholar
Alvaro, F., Sánchez, J., & Benedí, J. (2016). An integrated grammar-based approach for mathematical expression recognition. Pattern Recognition, 51, 135–147.
Article MATH Google Scholar
Anderson, R. H. (1967). Syntax-directed recognition of hand-printed two-dimensional mathematics. In Symposium on interactive systems for experimental applied mathematics: Proceedings of the Association for Computing Machinery Inc. Symposium (pp. 436–459). ACM.
Aneja, J., Deshpande, A., & Schwing, A. G. (2018). Convolutional image captioning. In 2018 IEEE conference on computer vision and pattern recognition, CVPR 2018, Salt Lake City, UT, USA, 18–22 June 2018, pp. 5561–5570.
Awal, A., Mouchère, H., & Viard-Gaudin, C. (2014). A global learning approach for an online handwritten mathematical expression recognition system. Pattern Recognition Letters, 35, 68–77.
Article Google Scholar
Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. CoRR arXiv:1409.0473.
Bai, S., Kolter, J. Z., & Koltun, V. (2018). An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. CoRR arXiv:1803.01271.
Bousmalis, K., Silberman, N., Dohan, D., Erhan, D., & Krishnan, D. (2017). Unsupervised pixel-level domain adaptation with generative adversarial networks. In 2017 IEEE conference on computer vision and pattern recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017, pp. 95–104.
Chan, K., & Yeung, D. (2000). Mathematical expression recognition: A survey. IJDAR, 3(1), 3–15.
Article MathSciNet Google Scholar
Chan, K., & Yeung, D. (2001). Error detection, error correction and performance evaluation in on-line mathematical expression recognition. Pattern Recognition, 34(8), 1671–1684.
MATH Google Scholar
Cho, K. (2015). Natural language understanding with distributed representation. CoRR arXiv:1511.07916.
Cho, K., Courville, A. C., & Bengio, Y. (2015). Describing multimedia content using attention-based encoder-decoder networks. IEEE Transactions on Multimedia, 17(11), 1875–1886.
Google Scholar
Chorowski, J., Bahdanau, D., Serdyuk, D., Cho, K., & Bengio, Y. (2015). Attention-based models for speech recognition. In Advances in neural information processing systems 28: Annual conference on neural information processing systems 2015, 7–12 December 2015, Montreal, Quebec, Canada, pp. 577–585.
Dauphin, Y. N., Fan, A., Auli, M., & Grangier, D. (2017). Language modeling with gated convolutional networks. In Proceedings of the 34th international conference on machine learning, ICML 2017, Sydney, NSW, Australia, 6–11 August 2017, pp. 933–941.
Deng, Y., Kanervisto, A., Ling, J., & Rush, A. M. (2017). Image-to-markup generation with coarse-to-fine attention. In Proceedings of the 34th international conference on machine learning, ICML 2017, Sydney, NSW, Australia, 6–11 August 2017, pp. 980–989.
Deng, Y., Kanervisto, A., & Rush, A. M. (2016). What you get is what you see: A visual markup decompiler. CoRR arXiv:1609.04938.
Gehring, J., Auli, M., Grangier, D., Yarats, D., & Dauphin, Y. N. (2017). Convolutional sequence to sequence learning. In Proceedings of the 34th international conference on machine learning, ICML 2017, Sydney, NSW, Australia, 6–11 August 2017, pp. 1243–1252.
Ghiasi, G., Lin, T., & Le, Q. V. (2018). Dropblock: A regularization method for convolutional networks. In Advances in neural information processing systems 31: Annual conference on neural information processing systems 2018, NeurIPS 2018, 3–8 December 2018, Montréal, Canada, pp. 10750–10760.
Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., et al. (2014). Generative adversarial nets. In Advances in neural information processing systems 27: Annual conference on neural information processing systems 2014, 8–13 December 2014, Montreal, Quebec, Canada, pp. 2672–2680.
Graves, A. (2011). Practical variational inference for neural networks. In Advances in neural information processing systems 24: 25th annual conference on neural information processing systems 2011. Proceedings of a meeting held 12–14 December 2011, Granada, Spain, pp. 2348–2356.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In 2016 IEEE conference on computer vision and pattern recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016, pp. 770–778.
Huang, G., Liu, Z., van der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In 2017 IEEE conference on computer vision and pattern recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017, pp. 2261–2269.
Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd international conference on machine learning, ICML 2015, Lille, France, 6–11 July 2015, pp. 448–456.
Jaderberg, M., Simonyan, K., Vedaldi, A., & Zisserman, A. (2016). Reading text in the wild with convolutional neural networks. International Journal of Computer Vision, 116(1), 1–20.
MathSciNet Google Scholar
Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata, K., Kravitz, J., et al. (2017). Visual genome: Connecting language and vision using crowdsourced dense image annotations. International Journal of Computer Vision, 123(1), 32–73.
Article MathSciNet Google Scholar
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems 25: 26th annual conference on neural information processing systems 2012. Proceedings of a meeting held 3–6 December 2012, Lake Tahoe, Nevada, USA, pp. 1106–1114.
Krogh, A., & Hertz, J. A. (1991) A simple weight decay can improve generalization. In Advances in neural information processing systems 4, [NIPS Conference, Denver, Colorado, USA, 2–5 December 1991], pp. 950–957.
Le, A. D., & Nakagawa, M. (2017) Training an end-to-end system for handwritten mathematical expression recognition by generated patterns. In 14th IAPR international conference on document analysis and recognition, ICDAR 2017, Kyoto, Japan, 9–15 November 2017, pp. 1056–1061.
Li, L., Tang, S., Deng, L., Zhang, Y., & Tian, Q. (2017). Image caption with global-local attention. In Proceedings of the thirty-first AAAI conference on artificial intelligence, 4–9 February 2017, San Francisco, California, USA, pp. 4133–4139.
Liu, Y., Wang, Z., Jin, H., & Wassell, I. J. (2018). Synthetically supervised feature learning for scene text recognition. In Computer vision—ECCV 2018—15th European Conference, Munich, Germany, 8–14 September 2018, Proceedings, Part V, pp. 449–465.
Maaten, L. V. D., & Hinton, G. (2008). Visualizing data using t-sne. Journal of Machine Learning Research, 9, 2579–2605.
MATH Google Scholar
MacLean, S., & Labahn, G. (2013). A new approach for recognizing handwritten mathematics using relational grammars and fuzzy sets. IJDAR, 16(2), 139–163.
Google Scholar
Mahdavi, M., Zanibbi, R., Mouchere, H., & Garain, U. (2019). ICDAR 2019 CROHME+ TFD: Competition on recognition of handwritten mathematical expressions and typeset formula detection. ICDAR: In Proc.
Mouchère, H., Viard-Gaudin, C., Zanibbi, R., & Garain, U. (2016a). ICFHR2016 CROHME: Competition on recognition of online handwritten mathematical expressions. In 15th international conference on frontiers in handwriting recognition, ICFHR 2016, Shenzhen, China, 23–26 October 2016, pp. 607–612.
Mouchère, H., Zanibbi, R., Garain, U., & Viard-Gaudin, C. (2016b). Advancing the state of the art for handwritten math recognition: The CROHME competitions, 2011–2014. IJDAR, 19(2), 173–189.
Google Scholar
Ordonez, V., Han, X., Kuznetsova, P., Kulkarni, G., Mitchell, M., Yamaguchi, K., et al. (2016). Large scale retrieval and generation of image descriptions. International Journal of Computer Vision, 119(1), 46–59.
MathSciNet Google Scholar
Radford, A., Metz, L., & Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. CoRR arXiv:1511.06434.
Salimans, T., & Kingma, D. P. (2016). Weight normalization: A simple reparameterization to accelerate training of deep neural networks. In Advances in neural information processing systems 29: Annual conference on neural information processing systems 2016, 5–10 December 2016, Barcelona, Spain, p. 901.
Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., & Bai, X. (2018). Aster: An attentional scene text recognizer with flexible rectification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1(99), 1.
Google Scholar
Srivastava, N., Hinton, G. E., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1), 1929–1958.
MathSciNet MATH Google Scholar
Su, J., Carreras, X., & Duh, K. (Eds.). (2016). Proceedings of the 2016 conference on empirical methods in natural language processing, EMNLP 2016, Austin, Texas, USA, 1–4 November 2016. The Association for Computational Linguistics.
Tu, Z., Lu, Z., Liu, Y., Liu, X., & Li, H. (2016). Modeling coverage for neural machine translation. In Proceedings of the 54th annual meeting of the association for computational linguistics, ACL 2016, 7–12 August 2016, Berlin, Germany, Volume 1: Long Papers.
Wu, Y., Yin, F., Zhang, X., Liu, L., & Liu, C. (2018a). SCAN: Sliding convolutional attention network for scene text recognition. CoRR arXiv:1806.00578.
Wu, J., Yin, F., Zhang, Y., Zhang, X., & Liu, C. (2018b). Image-to-markup generation via paired adversarial learning. In Machine learning and knowledge discovery in databases—European Conference, ECML PKDD 2018, Dublin, Ireland, 10–14 September 2018, Proceedings, Part I, pp. 18–34.
Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A. C., Salakhutdinov, R., et al.: Show, attend and tell: Neural image caption generation with visual attention. In Proceedings of the 32nd international conference on machine learning, ICML 2015, Lille, France, 6–11 July 2015, pp. 2048–2057.
Zanibbi, R., & Blostein, D. (2012). Recognition and retrieval of mathematical expressions. IJDAR, 15(4), 331–357.
Google Scholar
Zhang, J., Du, J., & Dai, L. (2017a) A gru-based encoder-decoder approach with attention for online handwritten mathematical expression recognition. In 14th IAPR international conference on document analysis and recognition, ICDAR 2017, Kyoto, Japan, 9–15 November 2017, pp. 902–907.
Zhang, J., Du, J., & Dai, L. (2018). Multi-scale attention with dense encoder for handwritten mathematical expression recognition. In 24th international conference on pattern recognition, ICPR 2018, Beijing, China, 20–24 August 2018, pp. 2245–2250.
Zhang, J., Du, J., & Dai, L. (2019). Track, attend, and parse (TAP): An end-to-end framework for online handwritten mathematical expression recognition. IEEE Transactions on Multimedia, 21(1), 221–233.
Article Google Scholar
Zhang, J., Du, J., Zhang, S., Liu, D., Hu, Y., Hu, J., et al. (2017b). Watch, attend and parse: An end-to-end neural network based approach to handwritten mathematical expression recognition. Pattern Recognition, 71, 196–206.
Google Scholar
Zhang, Y., Liang, S., Nie, S., Liu, W., & Peng, S. (2018). Robust offline handwritten character recognition through exploring writer-independent features under the guidance of printed data. Pattern Recognition Letters, 106, 20–26.
Google Scholar
Zhou, X., Wang, D., Tian, F., Liu, C., & Nakagawa, M. (2013). Handwritten Chinese/Japanese text recognition using semi-Markov conditional random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(10), 2413–2426.
Google Scholar

Download references

Acknowledgements

This work has been supported by the National Key Research and Development Program Grant 2018YFB1005000, the National Natural Science Foundation of China (NSFC) Grants 61721004, 61733007, 61773376, 61633021, 61836014, and the Beijing Science and Technology Program Grant Z181100008918010.

Author information

Authors and Affiliations

National Laboratory of Pattern Recognition, Institute of Automation of Chinese Academy of Sciences, Beijing, 100190, People’s Republic of China
Jin-Wen Wu, Fei Yin, Yan-Ming Zhang, Xu-Yao Zhang & Cheng-Lin Liu
School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, 100049, People’s Republic of China
Jin-Wen Wu, Xu-Yao Zhang & Cheng-Lin Liu
CAS Center for Excellence of Brain Science and Intelligence Technology, Beijing, 100190, People’s Republic of China
Cheng-Lin Liu

Authors

Jin-Wen Wu
View author publications
You can also search for this author in PubMed Google Scholar
Fei Yin
View author publications
You can also search for this author in PubMed Google Scholar
Yan-Ming Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xu-Yao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Cheng-Lin Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jin-Wen Wu.

Additional information

Communicated by Jun-Yan Zhu, Hongsheng Li, Eli Shechtman, Ming-Yu Liu, Jan Kautz, Antonio Torralba.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, JW., Yin, F., Zhang, YM. et al. Handwritten Mathematical Expression Recognition via Paired Adversarial Learning. Int J Comput Vis 128, 2386–2401 (2020). https://doi.org/10.1007/s11263-020-01291-5

Download citation

Received: 29 March 2019
Accepted: 02 January 2020
Published: 21 January 2020
Issue Date: November 2020
DOI: https://doi.org/10.1007/s11263-020-01291-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Handwritten Mathematical Expression Recognition via Paired Adversarial Learning

Abstract

Access this article

Similar content being viewed by others

Offline handwritten mathematical recognition using adversarial learning and transformers

Handwritten Mathematical Expression Recognition via GCAttention-Based Encoder and Bidirectional Mutual Learning Transformer

Handwritten Mathematical Expression Recognition with Bidirectionally Trained Transformer

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Handwritten Mathematical Expression Recognition via Paired Adversarial Learning

Abstract

Access this article

Similar content being viewed by others

Offline handwritten mathematical recognition using adversarial learning and transformers

Handwritten Mathematical Expression Recognition via GCAttention-Based Encoder and Bidirectional Mutual Learning Transformer

Handwritten Mathematical Expression Recognition with Bidirectionally Trained Transformer

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation