skip to main content
10.1145/3581783.3612499acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Read Ten Lines at One Glance: Line-Aware Semi-Autoregressive Transformer for Multi-Line Handwritten Mathematical Expression Recognition

Published: 27 October 2023 Publication History

Abstract

Handwritten Mathematical Expression Recognition (HMER) plays a critical role in various applications, such as digitized education and scientific research. Although existing methods have achieved promising performance on publicly available datasets, they still struggle to recognize multi-line mathematical expressions (MEs), suffering from complex structures and slow inference speed. To address these issues, we propose a Line-Aware Semi-autoregressive Transformer (LAST) that treats multi-line mathematical expression sequences as two-dimensional dual-end structures. The proposed LAST utilizes a line-wise dual-end decoding strategy to decode multi-line mathematical expressions in parallel and perform dual-end decoding within each line. Specifically, we introduce a line-aware positional encoding module and a line-partitioned dual-end mask to endow LAST with line order awareness and directionality. Additionally, we adopt a shared-task optimization strategy to train LAST in both autoregressive and semi-autoregressive tasks. To evaluate the effectiveness of our approach in real-world scenarios, we have built a new Multi-line Mathematical Expression dataset (M2E), which, to the best of our knowledge, is the first of its kind and boasts with the largest character category, the largest samples of characters, and the longest average sequence length, compared to existing ME datasets. Experimental results on both the M2E dataset and publicly available datasets demonstrate the effectiveness of our proposed method. Notably, our semi-autoregressive decoding approach achieves significantly faster decoding speeds while still achieving state-of-the-art performance compared to the existing methods.

References

[1]
Francisco Álvaro, Joan-Andreu Sánchez, and José-Miguel Benedí. 2016. An integrated grammar-based approach for mathematical expression recognition. Pattern Recognition, Vol. 51 (2016), 135--147.
[2]
Robert H Anderson. 1967. Syntax-directed recognition of hand-printed two-dimensional mathematics. In Symposium on Interactive Systems for Experimental Applied Mathematics: Proceedings of the Association for Computing Machinery Inc. Symposium. 436--459.
[3]
Ahmad-Montaser Awal, Harold Mouchere, and Christian Viard-Gaudin. 2014. A global learning approach for an online handwritten mathematical expression recognition system. Pattern Recognition Letters, Vol. 35 (2014), 68--77.
[4]
Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. 2017. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 39, 12 (2017), 2481--2495.
[5]
Ayan Kumar Bhunia, Pinaki Nath Chowdhury, Aneeshan Sain, and Yi-Zhe Song. 2021. Towards the unseen: Iterative text recognition by distilling from errors. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 14950--14959.
[6]
Xiaohang Bian, Bo Qin, Xiaozhe Xin, Jianwu Li, Xuefeng Su, and Yanfeng Wang. 2022. Handwritten mathematical expression recognition via attention aggregation based bi-directional mutual learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 113--121.
[7]
Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-end object detection with transformers. In Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part I 16. Springer, 213--229.
[8]
Kam-Fai Chan and Dit-Yan Yeung. 2000. Mathematical expression recognition: a survey. International Journal on Document Analysis and Recognition, Vol. 3 (2000), 3--15.
[9]
Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, 1724--1734. https://doi.org/10.3115/v1/D14-1179
[10]
Denis Coquenet, Clément Chatelain, and Thierry Paquet. 2022. End-to-end handwritten paragraph text recognition using a vertical attention network. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 45, 1 (2022), 508--524.
[11]
Yuntian Deng, Anssi Kanervisto, Jeffrey Ling, and Alexander M Rush. 2017. Image-to-markup generation with coarse-to-fine attention. In International Conference on Machine Learning. PMLR, 980--989.
[12]
Shancheng Fang, Hongtao Xie, Yuxin Wang, Zhendong Mao, and Yongdong Zhang. 2021. Read like humans: Autonomous, bidirectional and iterative language modeling for scene text recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7098--7107.
[13]
Zhengcong Fei. 2021. Partially non-autoregressive image captioning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 1309--1316.
[14]
Jiatao Gu, James Bradbury, Caiming Xiong, Victor O. K. Li, and Richard Socher. 2018. Non-Autoregressive Neural Machine Translation. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net. https://openreview.net/forum?id=B1l8BtlCb
[15]
Jiatao Gu, Xu Tan, Di He, Tao Qin, Linli Xu, and Tie-Yan Liu. 2019. Non-autoregressive neural machine translation with enhanced decoder input. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 3723--3730.
[16]
Longteng Guo, Jing Liu, Xinxin Zhu, Xingjian He, Jie Jiang, and Hanqing Lu. 2020. Non-Autoregressive Image Captioning with Counterfactuals-Critical Multi-Agent Learning. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, Christian Bessiere (Ed.). International Joint Conferences on Artificial Intelligence Organization, 767--773. https://doi.org/10.24963/ijcai.2020/107 Main track.
[17]
Chris Hokamp and Qun Liu. 2017. Lexically constrained decoding for sequence generation using grid beam search. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Vancouver, Canada, 1535--1546. https://doi.org/10.18653/v1/P17-1141
[18]
Lei Hu and Richard Zanibbi. 2011. HMM-based recognition of online handwritten mathematical symbols using segmental k-means initialization and a modified pen-up/down feature. In 2011 International Conference on Document Analysis and Recognition. IEEE, 457--462.
[19]
Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. 2017. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4700--4708.
[20]
Kazuya Kawakami. 2008. Supervised sequence labelling with recurrent neural networks. Ph.D. Dissertation. Technical University of Munich.
[21]
Birendra Keshari and S Watt. 2007. Hybrid mathematical symbol recognition using support vector machines. In Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), Vol. 2. IEEE, 859--863.
[22]
Yoon Kim and Alexander M. Rush. 2016. Sequence-level knowledge distillation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Austin, Texas, 1317--1327. https://doi.org/10.18653/v1/D16-1139
[23]
Andreas Kosmala, Gerhard Rigoll, Stéphane Lavirotte, and Loic Pottier. 1999. On-line handwritten formula recognition using hidden Markov models and context dependent graph grammars. In Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR'99 (Cat. No. PR00318). IEEE, 107--110.
[24]
Stéphane Lavirotte and Loic Pottier. 1998. Mathematical formula recognition using graph grammar. In Document Recognition V, Vol. 3305. SPIE, 44--52.
[25]
Anh Duc Le. 2020. Recognizing handwritten mathematical expressions via paired dual loss attention network and printed mathematical expressions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 566--567.
[26]
Anh Duc Le, Bipin Indurkhya, and Masaki Nakagawa. 2019. Pattern generation strategies for improving recognition of handwritten mathematical expressions. Pattern Recognition Letters, Vol. 128 (2019), 255--262.
[27]
Jason Lee, Elman Mansimov, and Kyunghyun Cho. 2018. Deterministic non-autoregressive neural sequence modeling by iterative refinement. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 1173--1182. https://doi.org/10.18653/v1/D18-1149
[28]
Bohan Li, Ye Yuan, Dingkang Liang, Xiao Liu, Zhilong Ji, Jinfeng Bai, Wenyu Liu, and Xiang Bai. 2022. When counting meets HMER: counting-aware network for handwritten mathematical expression recognition. In Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XXVIII. Springer, 197--214.
[29]
Minli Li, Peilin Zhao, Yifan Zhang, Shuaicheng Niu, Qingyao Wu, and Mingkui Tan. 2021. Structure-aware mathematical expression recognition with sequence-level modeling. In Proceedings of the 29th ACM International Conference on Multimedia (Virtual Event, China) (MM '21). Association for Computing Machinery, New York, NY, USA, 5038--5046. https://doi.org/10.1145/3474085.3475578
[30]
Zhe Li, Lianwen Jin, Songxuan Lai, and Yecheng Zhu. 2020. Improving attention-based handwritten mathematical expression recognition with scale augmentation and drop attention. In 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR). IEEE, 175--180.
[31]
Mahshad Mahdavi, Richard Zanibbi, Harold Mouchere, Christian Viard-Gaudin, and Utpal Garain. 2019. ICDAR 2019 CROHME TFD: competition on recognition of handwritten mathematical expressions and typeset formula detection. In 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, 1533--1538.
[32]
Harold Mouchere, Christian Viard-Gaudin, Richard Zanibbi, and Utpal Garain. 2014. ICFHR 2014 competition on recognition of on-line handwritten mathematical expressions (CROHME 2014). In 2014 14th International Conference on Frontiers in Handwriting Recognition. IEEE, 791--796.
[33]
Harold Mouchère, Christian Viard-Gaudin, Richard Zanibbi, and Utpal Garain. 2016. ICFHR2016 CROHME: Competition on recognition of online handwritten mathematical expressions. In 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR). IEEE, 607--612.
[34]
Cuong Tuan Nguyen, Hung Tuan Nguyen, Kei Morizumi, and Masaki Nakagawa. 2021. Temporal classification constraint for improving handwritten mathematical expression recognition. In Document Analysis and Recognition--ICDAR 2021 Workshops: Lausanne, Switzerland, September 5-10, 2021, Proceedings, Part II 16. Springer, 113--125.
[35]
Lihua Qian, Hao Zhou, Yu Bao, Mingxuan Wang, Lin Qiu, Weinan Zhang, Yong Yu, and Lei Li. 2021. Glancing Transformer for Non-Autoregressive Neural Machine Translation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Online, 1993--2003. https://doi.org/10.18653/v1/2021.acl-long.155
[36]
Zhi Qiao, Yu Zhou, Jin Wei, Wei Wang, Yuan Zhang, Ning Jiang, Hongbin Wang, and Weiping Wang. 2021. Pimnet: a parallel, iterative and mimicking network for scene text recognition. In Proceedings of the 29th ACM International Conference on Multimedia. 2046--2055.
[37]
Qiu Ran, Yankai Lin, Peng Li, and Jie Zhou. 2020. Learning to recover from multi-modality errors for non-autoregressive neural machine translation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 3059--3069. https://doi.org/10.18653/v1/2020.acl-main.277
[38]
Baoguang Shi, Xiang Bai, and Cong Yao. 2016. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 39, 11 (2016), 2298--2304.
[39]
Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. Advances in Neural Information Processing Systems, Vol. 27 (2014).
[40]
Thanh-Nghia Truong, Cuong Tuan Nguyen, Khanh Minh Phan, and Masaki Nakagawa. 2020. Improvement of end-to-end offline handwritten mathematical expression recognition by weakly supervised learning. In 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR). IEEE, 181--186.
[41]
Thanh-Nghia Truong, Huy Quang Ung, Hung Tuan Nguyen, Cuong Tuan Nguyen, and Masaki Nakagawa. 2021. Relation-based representation for handwritten mathematical expression recognition. In Document Analysis and Recognition-ICDAR 2021 Workshops: Lausanne, Switzerland, September 5-10, 2021, Proceedings, Part I 16. Springer, 7--19.
[42]
Zhaopeng Tu, Zhengdong Lu, Yang Liu, Xiaohua Liu, and Hang Li. 2016. Modeling coverage for neural machine translation. arXiv preprint arXiv:1601.04811 (2016).
[43]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. attention is all you need. In Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
[44]
Chunqi Wang, Ji Zhang, and Haiqing Chen. 2018. Semi-autoregressive neural machine translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 479--488. https://doi.org/10.18653/v1/D18-1044
[45]
Tianwei Wang, Yuanzhi Zhu, Lianwen Jin, Dezhi Peng, Zhe Li, Mengchao He, Yongpan Wang, and Canjie Luo. 2021. Implicit feature alignment: learn to convert text recognizer to text spotter. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5973--5982.
[46]
Changjie Wu, Jun Du, Yunqing Li, Jianshu Zhang, Chen Yang, Bo Ren, and Yiqing Hu. 2022. TDv2: A novel tree-structured decoder for offline mathematical expression recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 2694--2702.
[47]
Jin-Wen Wu, Fei Yin, Yan-Ming Zhang, Xu-Yao Zhang, and Cheng-Lin Liu. 2019. Image-to-markup generation via paired adversarial learning. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2018, Dublin, Ireland, September 10-14, 2018, Proceedings, Part I 18. Springer, 18--34.
[48]
Jin-Wen Wu, Fei Yin, Yan-Ming Zhang, Xu-Yao Zhang, and Cheng-Lin Liu. 2020. Handwritten mathematical expression recognition via paired adversarial learning. International Journal of Computer Vision, Vol. 128 (2020), 2386--2401.
[49]
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In International Conference on Machine Learning. PMLR, 2048--2057.
[50]
Xu Yan, Zhengcong Fei, Zekang Li, Shuhui Wang, Qingming Huang, and Qi Tian. 2021. Semi-autoregressive image captioning. In Proceedings of the 29th ACM International Conference on Multimedia. 2708--2716.
[51]
Chen Yang, Jun Du, Jianshu Zhang, Changjie Wu, Mingjun Chen, and JiaJia Wu. 2022. Tree-based data augmentation and mutual learning for offline handwritten mathematical expression recognition. Pattern Recognition, Vol. 132 (2022), 108910.
[52]
Ye Yuan, Xiao Liu, Wondimu Dikubab, Hui Liu, Zhilong Ji, Zhongqin Wu, and Xiang Bai. 2022. Syntax-aware network for handwritten mathematical expression recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4553--4562.
[53]
Matthew D Zeiler. 2012. Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701 (2012).
[54]
Jianshu Zhang, Jun Du, and Lirong Dai. 2018. Multi-scale attention with dense encoder for handwritten mathematical expression recognition. In 2018 24th International Conference on Pattern Recognition (ICPR). IEEE, 2245--2250.
[55]
Jianshu Zhang, Jun Du, Yongxin Yang, Yi-Zhe Song, Si Wei, and Lirong Dai. 2020. A tree-structured decoder for image-to-markup generation. In International Conference on Machine Learning. PMLR, 11076--11085.
[56]
Jianshu Zhang, Jun Du, Shiliang Zhang, Dan Liu, Yulong Hu, Jinshui Hu, Si Wei, and Lirong Dai. 2017. Watch, attend and parse: An end-to-end neural network based approach to handwritten mathematical expression recognition. Pattern Recognition, Vol. 71 (2017), 196--206.
[57]
Rui Zhang, Yongsheng Zhou, Qianyi Jiang, Qi Song, Nan Li, Kai Zhou, Lei Wang, Dong Wang, Minghui Liao, Mingkun Yang, et al. 2019. ICDAR 2019 robust reading challenge on reading chinese text on signboard. In 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, 1577--1581.
[58]
Wenqi Zhao and Liangcai Gao. 2022. CoMER: Modeling coverage for Transformer-based handwritten mathematical expression recognition. In Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XXVIII. Springer, 392--408.
[59]
Wenqi Zhao, Liangcai Gao, Zuoyu Yan, Shuai Peng, Lin Du, and Ziyin Zhang. 2021. Handwritten mathematical expression recognition with bidirectionally trained transformer. In Document Analysis and Recognition--ICDAR 2021: 16th International Conference, Lausanne, Switzerland, September 5-10, 2021, Proceedings, Part II 16. Springer, 570--584.
[60]
Shuhan Zhong, Sizhe Song, Guanyao Li, and S-H Gary Chan. 2022. A tree-based structure-aware Transformer decoder for image-to-markup generation. In Proceedings of the 30th ACM International Conference on Multimedia. 5751--5760.
[61]
Yuanen Zhou, Yong Zhang, Zhenzhen Hu, and Meng Wang. 2021. Semi-autoregressive transformer for image captioning. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3139--3143.

Cited By

View all
  • (2024)ICPR 2024 Competition on Multi-line Mathematical Expressions RecognitionPattern Recognition. Competitions10.1007/978-3-031-80139-6_2(17-29)Online publication date: 30-Nov-2024
  • (2024)PosFormer: Recognizing Complex Handwritten Mathematical Expression with Position Forest TransformerComputer Vision – ECCV 202410.1007/978-3-031-72670-5_8(130-147)Online publication date: 30-Sep-2024

Index Terms

  1. Read Ten Lines at One Glance: Line-Aware Semi-Autoregressive Transformer for Multi-Line Handwritten Mathematical Expression Recognition

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MM '23: Proceedings of the 31st ACM International Conference on Multimedia
      October 2023
      9913 pages
      ISBN:9798400701085
      DOI:10.1145/3581783
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 27 October 2023

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. line-aware positional encoding
      2. multi-line mathematical expression recognition
      3. semi-autoregressive transformer

      Qualifiers

      • Research-article

      Funding Sources

      • NSFC, Zhuhai Industry Core and Key Technology Research Project
      • Science and Technology Foundation of Guangzhou Huangpu Development District

      Conference

      MM '23
      Sponsor:
      MM '23: The 31st ACM International Conference on Multimedia
      October 29 - November 3, 2023
      Ottawa ON, Canada

      Acceptance Rates

      Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)117
      • Downloads (Last 6 weeks)7
      Reflects downloads up to 28 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)ICPR 2024 Competition on Multi-line Mathematical Expressions RecognitionPattern Recognition. Competitions10.1007/978-3-031-80139-6_2(17-29)Online publication date: 30-Nov-2024
      • (2024)PosFormer: Recognizing Complex Handwritten Mathematical Expression with Position Forest TransformerComputer Vision – ECCV 202410.1007/978-3-031-72670-5_8(130-147)Online publication date: 30-Sep-2024

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media