ABSTRACT
Recently most handwritten mathematical expression recognition methods adopt the attention-based encoder-decoder framework, which generates LaTeX sequences from given images. However, the accuracy of the attention mechanism limits the performance of HMER models. Lacking global context information in the decoding process is also a challenge for HMER. Some methods adopt symbol-level counting to localize symbols for improving the model performance, while these methods cannot work well. In this paper, we propose a method named SLAN, shorted for a Symbol Location-Aware Network, to solve the HMER problem. Specifically, we propose an advanced relation-level counting method to detect symbols in the image. We solve the lacking global context problem with a new global context-aware decoder. For improving the accuracy of attention, we design a novel attention alignment loss function by the dynamic programming algorithm, which can learn attention alignment directly without pixel-level labels. We conducted extensive experiments on the CROHME dataset to demonstrate the effectiveness of each part of SLAN and achieved state-of-the-art performance.
- Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, and Lei Zhang. 2018. Bottom-up and top-down attention for image captioning and visual question answering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6077–6086.Google ScholarCross Ref
- Xiaohang Bian, Bo Qin, Xiaozhe Xin, Jianwu Li, Xuefeng Su, and Yanfeng Wang. 2022. Handwritten mathematical expression recognition via attention aggregation based bi-directional mutual learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 113–121.Google ScholarCross Ref
- Dorothea Blostein and Ann Grbavec. 1997. Recognition of mathematical notation. In Handbook of character recognition and document image analysis. World Scientific, 557–582.Google Scholar
- Kam-Fai Chan and Dit-Yan Yeung. 2001. Error detection, error correction and performance evaluation in on-line mathematical expression recognition. Pattern Recognition 34, 8 (2001), 1671–1684.Google ScholarCross Ref
- Xinpeng Chen, Lin Ma, Wenhao Jiang, Jian Yao, and Wei Liu. 2018. Regularizing rnns for caption generation by reconstructing the past with the present. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7995–8003.Google ScholarCross Ref
- Yuntian Deng, Anssi Kanervisto, Jeffrey Ling, and Alexander M Rush. 2017. Image-to-markup generation with coarse-to-fine attention. In Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 980–989.Google ScholarDigital Library
- Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. 2017. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4700–4708.Google ScholarCross Ref
- Joseph J LaViola and Robert C Zeleznik. 2007. A practical approach for writer-dependent symbol recognition using a writer-independent symbol recognizer. IEEE Transactions on pattern analysis and machine intelligence 29, 11 (2007), 1917–1926.Google ScholarDigital Library
- Anh Duc Le. 2020. Recognizing handwritten mathematical expressions via paired dual loss attention network and printed mathematical expressions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 566–567.Google ScholarCross Ref
- Bohan Li, Ye Yuan, Dingkang Liang, Xiao Liu, Zhilong Ji, Jinfeng Bai, Wenyu Liu, and Xiang Bai. 2022. When Counting Meets HMER: Counting-Aware Network for Handwritten Mathematical Expression Recognition. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXVIII. Springer, 197–214.Google ScholarDigital Library
- Qiqiang Lin, Chunyi Wang, Ning Bi, Ching Y Suen, and Jun Tan. 2022. An Encoder-Decoder Approach to Offline Handwritten Mathematical Expression Recognition with Residual Attention. In Pattern Recognition and Artificial Intelligence: Third International Conference, ICPRAI 2022, Paris, France, June 1–3, 2022, Proceedings, Part I. Springer, 335–345.Google ScholarDigital Library
- Qi Liu, Zai Huang, Zhenya Huang, Chuanren Liu, Enhong Chen, Yu Su, and Guoping Hu. 2018. Finding similar exercises in online education systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 1821–1830.Google ScholarDigital Library
- Minh-Thang Luong, Hieu Pham, and Christopher D Manning. 2015. Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015).Google Scholar
- Christopher Malon, Seiichi Uchida, and Masakazu Suzuki. 2008. Mathematical symbol recognition with support vector machines. Pattern Recognition Letters 29, 9 (2008), 1326–1332.Google ScholarDigital Library
- Cuong Tuan Nguyen, Hung Tuan Nguyen, Kei Morizumi, and Masaki Nakagawa. 2021. Temporal classification constraint for improving handwritten mathematical expression recognition. In Document Analysis and Recognition–ICDAR 2021 Workshops: Lausanne, Switzerland, September 5–10, 2021, Proceedings, Part II 16. Springer, 113–125.Google Scholar
- Masayuki Okamoto, Hiroki Imai, and Kazuhiko Takagi. 2001. Performance evaluation of a robust method for mathematical expression recognition. In Proceedings of Sixth International Conference on Document Analysis and Recognition. IEEE, 121–128.Google ScholarCross Ref
- Aniket Pal and Krishna Pratap Singh. 2022. R-GRU: Regularized gated recurrent unit for handwritten mathematical expression recognition. Multimedia Tools and Applications 81, 22 (2022), 31405–31419.Google ScholarDigital Library
- Amar Raja, Matthew Rayner, Alan Sexton, and Volker Sorge. 2006. Towards a parser for mathematical formula recognition. In International Conference on Mathematical Knowledge Management. Springer, 139–151.Google ScholarDigital Library
- Faisal Shafait, Daniel Keysers, and Thomas Breuel. 2008. Performance evaluation and benchmarking of six-page segmentation algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence 30, 6 (2008), 941–954.Google ScholarDigital Library
- Yu Su, Qingwen Liu, Qi Liu, Zhenya Huang, Yu Yin, Enhong Chen, Chris Ding, Si Wei, and Guoping Hu. 2018. Exercise-enhanced sequential modeling for student performance prediction. In Thirty-Second AAAI Conference on Artificial Intelligence.Google ScholarCross Ref
- Thanh-Nghia Truong, Huy Quang Ung, Hung Tuan Nguyen, Cuong Tuan Nguyen, and Masaki Nakagawa. 2021. Relation-based representation for handwritten mathematical expression recognition. In Document Analysis and Recognition–ICDAR 2021 Workshops: Lausanne, Switzerland, September 5–10, 2021, Proceedings, Part I 16. Springer, 7–19.Google Scholar
- Lei Wang, Dongxiang Zhang, Lianli Gao, Jingkuan Song, Long Guo, and Heng Tao Shen. 2018. Mathdqn: Solving arithmetic word problems via deep reinforcement learning. In Thirty-Second AAAI Conference on Artificial Intelligence.Google ScholarCross Ref
- Lei Wang, Dongxiang Zhang, Jipeng Zhang, Xing Xu, Lianli Gao, Bingtian Dai, and Heng Tao Shen. 2019. Template-Based Math Word Problem Solvers with Recursive Neural Networks. (2019).Google Scholar
- Zelun Wang and Jyh-Charn Liu. 2021. Translating math formula images to LaTeX sequences using deep neural networks with sequence-level training. International Journal on Document Analysis and Recognition (IJDAR) 24, 1 (2021), 63–75.Google ScholarDigital Library
- Changjie Wu, Jun Du, Yunqing Li, Jianshu Zhang, Chen Yang, Bo Ren, and Yiqing Hu. 2022. TDv2: A Novel Tree-Structured Decoder for Offline Mathematical Expression Recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 2694–2702.Google ScholarCross Ref
- Jin-Wen Wu, Fei Yin, Yan-Ming Zhang, Xu-Yao Zhang, and Cheng-Lin Liu. 2019. Image-to-markup generation via paired adversarial learning. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2018, Dublin, Ireland, September 10–14, 2018, Proceedings, Part I 18. Springer, 18–34.Google ScholarDigital Library
- Jin-Wen Wu, Fei Yin, Yan-Ming Zhang, Xu-Yao Zhang, and Cheng-Lin Liu. 2020. Handwritten mathematical expression recognition via paired adversarial learning. International Journal of Computer Vision 128 (2020), 2386–2401.Google ScholarDigital Library
- Zuoyu Yan, Xiaode Zhang, Liangcai Gao, Ke Yuan, and Zhi Tang. 2021. ConvMath: A Convolutional Sequence Network for Mathematical Expression Recognition. In 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, 4566–4572.Google Scholar
- Yu Yin, Zhenya Huang, Enhong Chen, Qi Liu, Fuzheng Zhang, Xing Xie, and Guoping Hu. 2018. Transcribing content from structural images with spotlight mechanism. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 2643–2652.Google ScholarDigital Library
- Richard Zanibbi and Dorothea Blostein. 2012. Recognition and retrieval of mathematical expressions. International Journal on Document Analysis and Recognition (IJDAR) 15, 4 (2012), 331–357.Google ScholarDigital Library
- Richard Zanibbi, Dorothea Blostein, and James R Cordy. 2001. Baseline structure analysis of handwritten mathematics notation. In Proceedings of Sixth International Conference on Document Analysis and Recognition. IEEE, 768–773.Google ScholarCross Ref
- Richard Zanibbi, Dorothea Blostein, and James R. Cordy. 2002. Recognizing mathematical expressions using tree transformation. IEEE Transactions on pattern analysis and machine intelligence 24, 11 (2002), 1455–1467.Google ScholarDigital Library
- Jianshu Zhang, Jun Du, and Lirong Dai. 2018. Multi-scale attention with dense encoder for handwritten mathematical expression recognition. In 2018 24th international conference on pattern recognition (ICPR). IEEE, 2245–2250.Google ScholarCross Ref
- Jianshu Zhang, Jun Du, Yongxin Yang, Yi-Zhe Song, Si Wei, and Lirong Dai. 2020. A tree-structured decoder for image-to-markup generation. In International Conference on Machine Learning. PMLR, 11076–11085.Google Scholar
- Jianshu Zhang, Jun Du, Shiliang Zhang, Dan Liu, Yulong Hu, Jinshui Hu, Si Wei, and Lirong Dai. 2017. Watch, attend and parse: An end-to-end neural network based approach to handwritten mathematical expression recognition. Pattern Recognition 71 (2017), 196–206.Google ScholarCross Ref
- Wenqi Zhao, Liangcai Gao, Zuoyu Yan, Shuai Peng, Lin Du, and Ziyin Zhang. 2021. Handwritten mathematical expression recognition with bidirectionally trained transformer. In Document Analysis and Recognition–ICDAR 2021: 16th International Conference, Lausanne, Switzerland, September 5–10, 2021, Proceedings, Part II 16. Springer, 570–584.Google ScholarDigital Library
- Shuhan Zhong, Sizhe Song, Guanyao Li, and S-H Gary Chan. 2022. A Tree-Based Structure-Aware Transformer Decoder for Image-To-Markup Generation. In Proceedings of the 30th ACM International Conference on Multimedia. 5751–5760.Google ScholarDigital Library
Index Terms
- Symbol Location-Aware Network for Improving Handwritten Mathematical Expression Recognition
Recommendations
Stroke constrained attention network for online handwritten mathematical expression recognition
Highlights- A novel stroke constrained attention network for online HMER and online HCCR is proposed.
AbstractIn this paper, we propose a novel stroke constrained attention network (SCAN) which treats stroke as the basic unit for encoder-decoder based online handwritten mathematical expression recognition (HMER). Unlike previous methods which ...
Semantic Graph Representation Learning for Handwritten Mathematical Expression Recognition
Document Analysis and Recognition - ICDAR 2023AbstractHandwritten mathematical expression recognition (HMER) has attracted extensive attention recently. However, current methods cannot explicitly study the interactions between different symbols, which may fail when faced similar symbols. To alleviate ...
Semantic-Aware Non-local Network for Handwritten Mathematical Expression Recognition
Pattern Recognition and Computer VisionAbstractHandwritten mathematical expression recognition (HMER) is a challenging task due to its complex two-dimensional structure of mathematical expressions and the high similarity between handwritten texts. Most existing encoder-decoder approaches for ...
Comments