Skip to main content
Log in

Handwritten Mathematical Expression Recognition via Paired Adversarial Learning

International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Recognition of handwritten mathematical expressions (MEs) is an important problem that has wide applications in practice. Handwritten ME recognition is challenging due to the variety of writing styles and ME formats. As a result, recognizers trained by optimizing the traditional supervision loss do not perform satisfactorily. To improve the robustness of the recognizer with respect to writing styles, in this work, we propose a novel paired adversarial learning method to learn semantic-invariant features. Specifically, our proposed model, named PAL-v2, consists of an attention-based recognizer and a discriminator. During training, handwritten MEs and their printed templates are fed into PAL-v2 simultaneously. The attention-based recognizer is trained to learn semantic-invariant features with the guide of the discriminator. Moreover, we adopt a convolutional decoder to alleviate the vanishing and exploding gradient problems of RNN-based decoder, and further, improve the coverage of decoding with a novel attention method. We conducted extensive experiments on the CROHME dataset to demonstrate the effectiveness of each part of the method and achieved state-of-the-art performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

References

  • Alvaro, F., Sánchez, J., & Benedí, J. (2014). Recognition of on-line handwritten mathematical expressions using 2d stochastic context-free grammars and hidden Markov models. Pattern Recognition Letters, 35, 58–67.

    Article  Google Scholar 

  • Alvaro, F., Sánchez, J., & Benedí, J. (2016). An integrated grammar-based approach for mathematical expression recognition. Pattern Recognition, 51, 135–147.

    Article  MATH  Google Scholar 

  • Anderson, R. H. (1967). Syntax-directed recognition of hand-printed two-dimensional mathematics. In Symposium on interactive systems for experimental applied mathematics: Proceedings of the Association for Computing Machinery Inc. Symposium (pp. 436–459). ACM.

  • Aneja, J., Deshpande, A., & Schwing, A. G. (2018). Convolutional image captioning. In 2018 IEEE conference on computer vision and pattern recognition, CVPR 2018, Salt Lake City, UT, USA, 18–22 June 2018, pp. 5561–5570.

  • Awal, A., Mouchère, H., & Viard-Gaudin, C. (2014). A global learning approach for an online handwritten mathematical expression recognition system. Pattern Recognition Letters, 35, 68–77.

    Article  Google Scholar 

  • Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. CoRR arXiv:1409.0473.

  • Bai, S., Kolter, J. Z., & Koltun, V. (2018). An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. CoRR arXiv:1803.01271.

  • Bousmalis, K., Silberman, N., Dohan, D., Erhan, D., & Krishnan, D. (2017). Unsupervised pixel-level domain adaptation with generative adversarial networks. In 2017 IEEE conference on computer vision and pattern recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017, pp. 95–104.

  • Chan, K., & Yeung, D. (2000). Mathematical expression recognition: A survey. IJDAR, 3(1), 3–15.

    Article  MathSciNet  Google Scholar 

  • Chan, K., & Yeung, D. (2001). Error detection, error correction and performance evaluation in on-line mathematical expression recognition. Pattern Recognition, 34(8), 1671–1684.

    MATH  Google Scholar 

  • Cho, K. (2015). Natural language understanding with distributed representation. CoRR arXiv:1511.07916.

  • Cho, K., Courville, A. C., & Bengio, Y. (2015). Describing multimedia content using attention-based encoder-decoder networks. IEEE Transactions on Multimedia, 17(11), 1875–1886.

    Google Scholar 

  • Chorowski, J., Bahdanau, D., Serdyuk, D., Cho, K., & Bengio, Y. (2015). Attention-based models for speech recognition. In Advances in neural information processing systems 28: Annual conference on neural information processing systems 2015, 7–12 December 2015, Montreal, Quebec, Canada, pp. 577–585.

  • Dauphin, Y. N., Fan, A., Auli, M., & Grangier, D. (2017). Language modeling with gated convolutional networks. In Proceedings of the 34th international conference on machine learning, ICML 2017, Sydney, NSW, Australia, 6–11 August 2017, pp. 933–941.

  • Deng, Y., Kanervisto, A., Ling, J., & Rush, A. M. (2017). Image-to-markup generation with coarse-to-fine attention. In Proceedings of the 34th international conference on machine learning, ICML 2017, Sydney, NSW, Australia, 6–11 August 2017, pp. 980–989.

  • Deng, Y., Kanervisto, A., & Rush, A. M. (2016). What you get is what you see: A visual markup decompiler. CoRR arXiv:1609.04938.

  • Gehring, J., Auli, M., Grangier, D., Yarats, D., & Dauphin, Y. N. (2017). Convolutional sequence to sequence learning. In Proceedings of the 34th international conference on machine learning, ICML 2017, Sydney, NSW, Australia, 6–11 August 2017, pp. 1243–1252.

  • Ghiasi, G., Lin, T., & Le, Q. V. (2018). Dropblock: A regularization method for convolutional networks. In Advances in neural information processing systems 31: Annual conference on neural information processing systems 2018, NeurIPS 2018, 3–8 December 2018, Montréal, Canada, pp. 10750–10760.

  • Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., et al. (2014). Generative adversarial nets. In Advances in neural information processing systems 27: Annual conference on neural information processing systems 2014, 8–13 December 2014, Montreal, Quebec, Canada, pp. 2672–2680.

  • Graves, A. (2011). Practical variational inference for neural networks. In Advances in neural information processing systems 24: 25th annual conference on neural information processing systems 2011. Proceedings of a meeting held 12–14 December 2011, Granada, Spain, pp. 2348–2356.

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In 2016 IEEE conference on computer vision and pattern recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016, pp. 770–778.

  • Huang, G., Liu, Z., van der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In 2017 IEEE conference on computer vision and pattern recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017, pp. 2261–2269.

  • Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd international conference on machine learning, ICML 2015, Lille, France, 6–11 July 2015, pp. 448–456.

  • Jaderberg, M., Simonyan, K., Vedaldi, A., & Zisserman, A. (2016). Reading text in the wild with convolutional neural networks. International Journal of Computer Vision, 116(1), 1–20.

    MathSciNet  Google Scholar 

  • Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata, K., Kravitz, J., et al. (2017). Visual genome: Connecting language and vision using crowdsourced dense image annotations. International Journal of Computer Vision, 123(1), 32–73.

    Article  MathSciNet  Google Scholar 

  • Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems 25: 26th annual conference on neural information processing systems 2012. Proceedings of a meeting held 3–6 December 2012, Lake Tahoe, Nevada, USA, pp. 1106–1114.

  • Krogh, A., & Hertz, J. A. (1991) A simple weight decay can improve generalization. In Advances in neural information processing systems 4, [NIPS Conference, Denver, Colorado, USA, 2–5 December 1991], pp. 950–957.

  • Le, A. D., & Nakagawa, M. (2017) Training an end-to-end system for handwritten mathematical expression recognition by generated patterns. In 14th IAPR international conference on document analysis and recognition, ICDAR 2017, Kyoto, Japan, 9–15 November 2017, pp. 1056–1061.

  • Li, L., Tang, S., Deng, L., Zhang, Y., & Tian, Q. (2017). Image caption with global-local attention. In Proceedings of the thirty-first AAAI conference on artificial intelligence, 4–9 February 2017, San Francisco, California, USA, pp. 4133–4139.

  • Liu, Y., Wang, Z., Jin, H., & Wassell, I. J. (2018). Synthetically supervised feature learning for scene text recognition. In Computer vision—ECCV 2018—15th European Conference, Munich, Germany, 8–14 September 2018, Proceedings, Part V, pp. 449–465.

  • Maaten, L. V. D., & Hinton, G. (2008). Visualizing data using t-sne. Journal of Machine Learning Research, 9, 2579–2605.

    MATH  Google Scholar 

  • MacLean, S., & Labahn, G. (2013). A new approach for recognizing handwritten mathematics using relational grammars and fuzzy sets. IJDAR, 16(2), 139–163.

    Google Scholar 

  • Mahdavi, M., Zanibbi, R., Mouchere, H., & Garain, U. (2019). ICDAR 2019 CROHME+ TFD: Competition on recognition of handwritten mathematical expressions and typeset formula detection. ICDAR: In Proc.

  • Mouchère, H., Viard-Gaudin, C., Zanibbi, R., & Garain, U. (2016a). ICFHR2016 CROHME: Competition on recognition of online handwritten mathematical expressions. In 15th international conference on frontiers in handwriting recognition, ICFHR 2016, Shenzhen, China, 23–26 October 2016, pp. 607–612.

  • Mouchère, H., Zanibbi, R., Garain, U., & Viard-Gaudin, C. (2016b). Advancing the state of the art for handwritten math recognition: The CROHME competitions, 2011–2014. IJDAR, 19(2), 173–189.

    Google Scholar 

  • Ordonez, V., Han, X., Kuznetsova, P., Kulkarni, G., Mitchell, M., Yamaguchi, K., et al. (2016). Large scale retrieval and generation of image descriptions. International Journal of Computer Vision, 119(1), 46–59.

    MathSciNet  Google Scholar 

  • Radford, A., Metz, L., & Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. CoRR arXiv:1511.06434.

  • Salimans, T., & Kingma, D. P. (2016). Weight normalization: A simple reparameterization to accelerate training of deep neural networks. In Advances in neural information processing systems 29: Annual conference on neural information processing systems 2016, 5–10 December 2016, Barcelona, Spain, p. 901.

  • Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., & Bai, X. (2018). Aster: An attentional scene text recognizer with flexible rectification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1(99), 1.

    Google Scholar 

  • Srivastava, N., Hinton, G. E., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1), 1929–1958.

    MathSciNet  MATH  Google Scholar 

  • Su, J., Carreras, X., & Duh, K. (Eds.). (2016). Proceedings of the 2016 conference on empirical methods in natural language processing, EMNLP 2016, Austin, Texas, USA, 1–4 November 2016. The Association for Computational Linguistics.

  • Tu, Z., Lu, Z., Liu, Y., Liu, X., & Li, H. (2016). Modeling coverage for neural machine translation. In Proceedings of the 54th annual meeting of the association for computational linguistics, ACL 2016, 7–12 August 2016, Berlin, Germany, Volume 1: Long Papers.

  • Wu, Y., Yin, F., Zhang, X., Liu, L., & Liu, C. (2018a). SCAN: Sliding convolutional attention network for scene text recognition. CoRR arXiv:1806.00578.

  • Wu, J., Yin, F., Zhang, Y., Zhang, X., & Liu, C. (2018b). Image-to-markup generation via paired adversarial learning. In Machine learning and knowledge discovery in databases—European Conference, ECML PKDD 2018, Dublin, Ireland, 10–14 September 2018, Proceedings, Part I, pp. 18–34.

  • Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A. C., Salakhutdinov, R., et al.: Show, attend and tell: Neural image caption generation with visual attention. In Proceedings of the 32nd international conference on machine learning, ICML 2015, Lille, France, 6–11 July 2015, pp. 2048–2057.

  • Zanibbi, R., & Blostein, D. (2012). Recognition and retrieval of mathematical expressions. IJDAR, 15(4), 331–357.

    Google Scholar 

  • Zhang, J., Du, J., & Dai, L. (2017a) A gru-based encoder-decoder approach with attention for online handwritten mathematical expression recognition. In 14th IAPR international conference on document analysis and recognition, ICDAR 2017, Kyoto, Japan, 9–15 November 2017, pp. 902–907.

  • Zhang, J., Du, J., & Dai, L. (2018). Multi-scale attention with dense encoder for handwritten mathematical expression recognition. In 24th international conference on pattern recognition, ICPR 2018, Beijing, China, 20–24 August 2018, pp. 2245–2250.

  • Zhang, J., Du, J., & Dai, L. (2019). Track, attend, and parse (TAP): An end-to-end framework for online handwritten mathematical expression recognition. IEEE Transactions on Multimedia, 21(1), 221–233.

    Article  Google Scholar 

  • Zhang, J., Du, J., Zhang, S., Liu, D., Hu, Y., Hu, J., et al. (2017b). Watch, attend and parse: An end-to-end neural network based approach to handwritten mathematical expression recognition. Pattern Recognition, 71, 196–206.

    Google Scholar 

  • Zhang, Y., Liang, S., Nie, S., Liu, W., & Peng, S. (2018). Robust offline handwritten character recognition through exploring writer-independent features under the guidance of printed data. Pattern Recognition Letters, 106, 20–26.

    Google Scholar 

  • Zhou, X., Wang, D., Tian, F., Liu, C., & Nakagawa, M. (2013). Handwritten Chinese/Japanese text recognition using semi-Markov conditional random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(10), 2413–2426.

    Google Scholar 

Download references

Acknowledgements

This work has been supported by the National Key Research and Development Program Grant 2018YFB1005000, the National Natural Science Foundation of China (NSFC) Grants 61721004, 61733007, 61773376, 61633021, 61836014, and the Beijing Science and Technology Program Grant Z181100008918010.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jin-Wen Wu.

Additional information

Communicated by Jun-Yan Zhu, Hongsheng Li, Eli Shechtman, Ming-Yu Liu, Jan Kautz, Antonio Torralba.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, JW., Yin, F., Zhang, YM. et al. Handwritten Mathematical Expression Recognition via Paired Adversarial Learning. Int J Comput Vis 128, 2386–2401 (2020). https://doi.org/10.1007/s11263-020-01291-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-020-01291-5

Keywords

Navigation