research-article

A Symbolic Characters Aware Model for Solving Geometry Problems

Authors:

Xiaowei HuangAuthors Info & Claims

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Pages 7767 - 7775

https://doi.org/10.1145/3581783.3612570

Published: 27 October 2023 Publication History

Abstract

AI has made significant progress in solving math problems, but geometry problems remain challenging due to their reliance on both text and diagrams. In the text description, symbolic characters such as "ABC" often serve as a bridge to connect the corresponding diagram. However, by simply tokenizing symbolic characters into individual letters (e.g., 'A', 'B' and 'C'), existing works fail to study them explicitly and thus lose the semantic relationship with the diagram. In this paper, we develop a symbolic character-aware model to fully explore the role of these characters in both text and diagram understanding and optimize the model under a multi-modal reasoning framework. In the text encoder, we propose merging individual symbolic characters to form one semantic unit along with geometric information from the corresponding diagram. For the diagram encoder, we pre-train it under a multi-label classification framework with the symbolic characters as labels. In addition, we enhance the geometry diagram understanding ability via a self-supervised learning method under the masked image modeling auxiliary task. By integrating the proposed model into a general encoder-decoder pipeline for solving geometry problems, we demonstrate its superiority on two benchmark datasets, including GeoQA and Geometry3K, with extensive experiments. Specifically, on GeoQA, the question-solving accuracy is increased from 60.0% to 64.1%, achieving a new state-of-the-art accuracy; on Geometry3K, we reduce the question average solving steps from 6.9 down to 6.0 with marginally higher solving accuracy.

References

[1]

Manoj Acharya, Kushal Kafle, and Christopher Kanan. 2019. TallyQA: Answering complex counting questions. In Proceedings of the AAAI conference on artificial intelligence, Vol. 33. 8076--8084.

Digital Library

[2]

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In International Conference on Learning Representations.

[3]

Richa Bajaj and Vidushi Sharma. 2018. Smart Education with artificial intelligence based determination of learning styles. Procedia computer science 132 (2018), 834--842.

[4]

Jie Cao and Jing Xiao. 2022. An Augmented Benchmark Dataset for Geometric Question Answering through Dual Parallel Text Encoding. In Proceedings of the 29th International Conference on Computational Linguistics. 1511--1520.

[5]

Jiaqi Chen, Jianheng Tang, Jinghui Qin, Xiaodan Liang, Lingbo Liu, Eric Xing, and Liang Lin. 2021. GeoQA: A Geometric Question Answering Benchmark Towards Multimodal Numerical Reasoning. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. Association for Computational Linguistics, Online, 513--523. https://doi.org/10.18653/v1/2021.findings-acl.46

[6]

Shang-Ching Chou, Xiao-Shan Gao, and Jing-Zhong Zhang. 1996. Automated generation of readable proofs with geometric invariants. II. Theorem proving with full-angles. Journal of Automated Reasoning 17, 3 (1996), 349--370.

[7]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171--4186. https://doi.org/10.18653/v1/N19-1423

[8]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In International Conference on Learning Representations.

[9]

Herbert Gelernter, James R Hansen, and Donald W Loveland. 1960. Empirical explorations of the geometry theorem machine. In Papers presented at the May 3-5, 1960, western joint IRE-AIEE-ACM computer conference. 143--149.

Digital Library

[10]

Yash Goyal, Tejas Khot, Douglas Summers-Stay, Dhruv Batra, and Devi Parikh. 2017. Making the v in vqa matter: Elevating the role of image understanding in visual question answering. In Proceedings of the IEEE conference on computer vision and pattern recognition. 6904--6913.

[11]

Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. 2022. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16000--16009.

[12]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.

Digital Library

[13]

Danqing Huang, Shuming Shi, Chin-Yew Lin, Jian Yin, and Wei-Ying Ma. 2016. How well do computers solve math word problems? large-scale dataset construction and evaluation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 887--896.

[14]

Zhanming Jie, Jierui Li, and Wei Lu. 2022. Learning to Reason Deductively: Math Word Problem Solving as Complex Relation Extraction. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Dublin, Ireland, 5944--5955. https://doi.org/10.18653/v1/2022.acl-long.410

[15]

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, et al. 2023. Segment anything. arXiv preprint arXiv:2304.02643 (2023).

[16]

Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A Shamma, et al. 2017. Visual genome: Connecting language and vision using crowdsourced dense image annotations. International journal of computer vision 123, 1 (2017), 32--73.

[17]

Zhongli Li, Wenxuan Zhang, Chao Yan, Qingyu Zhou, Chao Li, Hongzhi Liu, and Yunbo Cao. 2022. Seeking Patterns, Not just Memorizing Procedures: Contrastive Learning for Solving Math Word Problems. In Findings of the Association for Computational Linguistics: ACL 2022. Association for Computational Linguistics, Dublin, Ireland, 2486--2496. https://doi.org/10.18653/v1/2022.findings-acl.195

[18]

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).

[19]

Pan Lu, Ran Gong, Shibiao Jiang, Liang Qiu, Siyuan Huang, Xiaodan Liang, and Song-Chun Zhu. 2021. Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning. In The Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021).

[20]

Mrinmaya Sachan, Avinava Dubey, Eduard H. Hovy, Tom M. Mitchell, Dan Roth, and Eric P. Xing. 2020. Discourse in Multimedia: A Case Study in Extracting Geometry Knowledge from Textbooks. Computational Linguistics 45, 4 (01 2020), 627--665. https://doi.org/10.1162/coli_a_00360 arXiv:https://direct.mit.edu/coli/article-pdf/45/4/627/1847535/coli_a_00360.pdf

Digital Library

[21]

Mrinmaya Sachan and Eric Xing. 2017. Learning to solve geometry problems from natural language demonstrations in textbooks. In Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (* SEM 2017). 251--261.

[22]

Adam Santoro, David Raposo, David G Barrett, Mateusz Malinowski, Razvan Pascanu, Peter Battaglia, and Timothy Lillicrap. 2017. A simple neural network module for relational reasoning. Advances in neural information processing systems 30 (2017).

[23]

Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. 2008. The graph neural network model. IEEE transactions on neural networks 20, 1 (2008), 61--80.

Digital Library

[24]

Minjoon Seo, Hannaneh Hajishirzi, Ali Farhadi, Oren Etzioni, and Clint Malcolm. 2015. Solving geometry problems: Combining text and diagram interpretation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 1466--1476.

[25]

Min Joon Seo, Hannaneh Hajishirzi, Ali Farhadi, and Oren Etzioni. 2014. Diagram understanding in geometry questions. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 28.

[26]

Jianhao Shen, Yichun Yin, Lin Li, Lifeng Shang, Xin Jiang, Ming Zhang, and Qun Liu. 2021. Generate & Rank: A Multi-task Framework for Math Word Problems. In Findings of the Association for Computational Linguistics: EMNLP 2021. Association for Computational Linguistics, Punta Cana, Dominican Republic, 2269--2279. https://doi.org/10.18653/v1/2021.findings-emnlp.195

[27]

Qinzhuo Wu, Qi Zhang, Zhongyu Wei, and Xuan-Jing Huang. 2021. Math word problem solving with explicit numerical values. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 5859--5869.

[28]

Zhipeng Xie and Shichao Sun. 2019. A Goal-Driven Tree-Structured Neural Model for Math Word Problems. In IJCAI. 5299--5305.

[29]

Zheng Ye, Shang-Ching Chou, and Xiao-Shan Gao. 2011. An Introduction to Java Geometry Expert. In Automated Deduction in Geometry, Thomas Sturm and Christoph Zengler (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 189--195.

[30]

Kexin Yi, Jiajun Wu, Chuang Gan, Antonio Torralba, Pushmeet Kohli, and Josh Tenenbaum. 2018. Neural-symbolic vqa: Disentangling reasoning from vision and language understanding. Advances in neural information processing systems 31 (2018).

[31]

Zhou Yu, Jun Yu, Yuhao Cui, Dacheng Tao, and Qi Tian. 2019. Deep modular co-attention networks for visual question answering. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 6281--6290.

[32]

Jipeng Zhang, Lei Wang, Roy Ka-Wei Lee, Yi Bin, Yan Wang, Jie Shao, and Ee-Peng Lim. 2020. Graph-to-tree learning for solving math word problems. Association for Computational Linguistics.

[33]

Ming-Liang Zhang, Fei Yin, Yi-Han Hao, and Cheng-Lin Liu. 2022. Plane Geometry Diagram Parsing. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22. 1636--1643. https://doi.org/10. 24963/ijcai.2022/228

[34]

Zihao Zhou, Maizhen Ning, Qiufeng Wang, Jie Yao, Wei Wang, Xiaowei Huang, and Kaizhu Huang. 2023. Learning by Analogy: Diverse Questions Generation in Math Word Problem. In Findings of the Association for Computational Linguistics: ACL 2023. Association for Computational Linguistics, Toronto, Canada, 11091--11104. https://aclanthology.org/2023.findings-acl.705

Cited By

Zhang XZhu NHe YZou JQin CLi YLeng T(2024)FGeo-SSS: A Search-Based Symbolic Solver for Human-like Automated Geometric ReasoningSymmetry10.3390/sym1604040416:4(404)Online publication date: 30-Mar-2024
https://doi.org/10.3390/sym16040404
Xiao TLiu JHuang ZWu JSha JWang SChen ELarson K(2024)Learning to solve geometry problems via simulating human dual-reasoning processProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/725(6559-6568)Online publication date: 3-Aug-2024
https://dl.acm.org/doi/10.24963/ijcai.2024/725
Duan XTan DFang LZhou YHe CChen ZWu LChen GGong ZLuo WGuan QCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Reason-and-Execute Prompting: Enhancing Multi-Modal Large Language Models for Solving Geometry QuestionsProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681484(6959-6968)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681484
Show More Cited By

Index Terms

A Symbolic Characters Aware Model for Solving Geometry Problems
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Information extraction

Recommendations

Stroke effect on legibility of Japanese characters

This study applied a computer program to analyze the descriptors of Japanese characters, including 56 Hiragana, 56 Katakana, and 98 Kanji characters. An experiment was designed to test the legibility of these characters by 40 Japanese students studying ...
Offline recognition of handwritten Bangla characters: an efficient two-stage approach

The present work deals with recognition of handwritten characters of Bangla, a major script of the Indian sub-continent. The main contributions presented here are (a) generation of a database of handwritten basic characters of Bangla and (b) development ...
Inference technology solving model evolution problems

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

October 2023

9913 pages

ISBN:9798400701085

DOI:10.1145/3581783

General Chairs:
Abdulmotaleb El Saddik
University of Ottawa, Canada & MBZUAI, UAE
,
Tao Mei
HiDream.ai, China
,
Rita Cucchiara
University of Modena and Reggio Emilia, Italy
,
Program Chairs:
Marco Bertini
University of Florence, Italy
,
Diana Patricia Tobon Vallejo
Unversidad de Medellin, Colombia
,
Pradeep K. Atrey
University at Albany, State University of New York, USA
,
M. Shamim Hossain
M. Shamim Hossain (King Saud University, KSA

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM '23

Sponsor:

SIGMM

MM '23: The 31st ACM International Conference on Multimedia

October 29 - November 3, 2023

Ottawa ON, Canada

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
187
Total Downloads

Downloads (Last 12 months)120
Downloads (Last 6 weeks)6

Reflects downloads up to 20 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang XZhu NHe YZou JQin CLi YLeng T(2024)FGeo-SSS: A Search-Based Symbolic Solver for Human-like Automated Geometric ReasoningSymmetry10.3390/sym1604040416:4(404)Online publication date: 30-Mar-2024
https://doi.org/10.3390/sym16040404
Xiao TLiu JHuang ZWu JSha JWang SChen ELarson K(2024)Learning to solve geometry problems via simulating human dual-reasoning processProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/725(6559-6568)Online publication date: 3-Aug-2024
https://dl.acm.org/doi/10.24963/ijcai.2024/725
Duan XTan DFang LZhou YHe CChen ZWu LChen GGong ZLuo WGuan QCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Reason-and-Execute Prompting: Enhancing Multi-Modal Large Language Models for Solving Geometry QuestionsProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681484(6959-6968)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681484
Boob AReddy SWalke DPillarisetti HShukla SRadke M(2024)Automatic extraction of structured information from elementary level geometry questions into logic formsMultimedia Tools and Applications10.1007/s11042-024-20463-wOnline publication date: 28-Nov-2024
https://doi.org/10.1007/s11042-024-20463-w
Lin ZXiao SChen ZLi JWang DZhang X(2024)SANS: Spatial-Aware Neural Solver for Plane Geometry ProblemPattern Recognition10.1007/978-3-031-78119-3_13(183-196)Online publication date: 5-Dec-2024
https://doi.org/10.1007/978-3-031-78119-3_13

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten