Abstract
In this paper, we apply deep reinforcement learning (DRL) for geometry reasoning and develop Dragon to facilitate online tutoring. Its success is contingent on a flexible data model to capture diverse concepts and heterogeneous relations, as well as an effective DRL agent to generate near-optimal and human-readable solutions. We use proximal policy optimization (PPO) as the backbone DRL architecture, customized with effective state representation and integrated with a bunch of optimization tricks including attention mechanism, action mask, data augmentation and curriculum learning. In our experimental study, we craft so far the largest scale dataset with geometry problems and a knowledge base with 46 theorems. We implement various heuristic algorithms and DRL models as baselines for performance comparison. The results show that our agent achieves near-optimal solution and is superior over multiple competitive baselines. To benefit the community, we opensource the dataset and implementation at https://github.com/AIEdu-xzy/geometry-solver.






Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Chou S-C, Gao X-S, Zhang J (1994) Machine proofs in geometry: automated production of readable proofs for geometry theorems. World Scientific, Singapore. https://doi.org/10.1142/9789812798152
Alvin C, Gulwani S, Majumdar R, Mukhopadhyay S (2017) Synthesis of problems for shaded area geometry reasoning, pp. 455–458. https://doi.org/10.1007/978-3-319-61425-0_39
Alvin C, Gulwani S, Majumdar R, Mukhopadhyay S (2014) Synthesis of geometry proof problems. Proc Natl Conf Artif Intell 1:245–252
Lan Y, Wang L, Zhang Q, Lan Y, Dai BT, Wang Y, Zhang D, Lim E (2022) Mwptoolkit: An open-source framework for deep learning-based math word problem solvers. In: Thirty-sixth aaai conference on artificial intelligence, aaai 2022, thirty-fourth conference on innovative applications of artificial intelligence, IAAI 2022, the twelveth symposium on educational advances in artificial intelligence, EAAI 2022 Virtual Event, February 22 - March 1, 2022, pp. 13188–13190
TUN W-D (1978) On the decision problem and the mechanization of theorem-proving in elementary geometry. Sci Sinica 21(2):159–172
Wen-Tsun W (1986) Basic principles of mechanical theorem proving in elementary geometries. J Autom Reason 2(3):221–252
Kapur D (1986) Using gröbner bases to reason about geometry problems. J Symb Comput 2(4):399–408
Seo MJ, Hajishirzi H, Farhadi A, Etzioni O (2014) Diagram understanding in geometry questions. In: AAAI, pp 2831–2838
Seo MJ, Hajishirzi H, Farhadi A, Etzioni O, Malcolm C (2015) Solving geometry problems: combining text and diagram interpretation. In: EMNLP, pp 1466–1476
Zhang D, Wang L, Zhang L, Dai BT, Shen HT (2020) The gap of semantic parsing: a survey on automatic math word problem solvers. IEEE Trans Pattern Anal Mach Intell 42(9):2287–2305
Rocktäschel T, Riedel S (2017) End-to-end differentiable proving. In: NIPS, pp 3788–3800
Kaliszyk C, Urban J, Michalewski H, Olsák M (2018) Reinforcement learning of theorem proving. In: NeurIPS, pp 8836–8847
Letz R, Mayr K, Goller C (1994) Cotrolled integration of the cut rule into connection tableaux calculi. J Autom Reason 13(3):297–337
Selsam D, Lamm M, Bünz B, Liang P, de Moura L, Dill DL (2019) Learning a SAT solver from single-bit supervision. In: ICLR
Lederman G, Rabe MN, Seshia S, Lee EA (2020) Learning heuristics for quantified boolean formulas through reinforcement learning. In: ICLR
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. CoRR abs/1707.06347
Laskin M, Lee K, Stooke A, Pinto L, Abbeel P, Srinivas A (2020) Reinforcement learning with augmented data. CoRR abs/2004.14990
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: NIPS, pp 1106–1114
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: ICLR
Cubuk ED, Zoph B, Mane D, Vasudevan V, Le QV (2019) Autoaugment: Learning augmentation strategies from data. In: CVPR, pp 113–123
Bengio Y, Louradour J, Collobert R, Weston J (2009) Curriculum learning. In: Proceedings of the 26th annual international conference on machine learning, ICML 2009, Montreal, Quebec, Canada, June 14-18, 2009, pp 41–48
Jiang L, Zhou Z, Leung T, Li L, Fei-Fei L (2018) Mentornet: Learning data-driven curriculum for very deep neural networks on corrupted labels. In: ICML, pp 2309–2318
Guo S, Huang W, Zhang H, Zhuang C, Dong D, Scott MR, Huang D (2018) Curriculumnet: Weakly supervised learning from large-scale web images. CoRR abs/1808.01097
Xu B, Zhang L, Mao Z, Wang Q, Xie H, Zhang Y (2020) Curriculum learning for natural language understanding. In: ACL, pp 6095–6104
Williams RJ (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn 8:229–256
Wu S, Li Y, Zhu H, Zhao J, Chen G (2022) Dynamic index construction with deep reinforcement learning. Data Sci Eng 7(2):87–101
Mnih V, Badia AP, Mirza M, Graves A, Lillicrap TP, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: ICML. JMLR workshop and conference proceedings, vol. 48, pp 1928–1937
Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double q-learning. In: AAAI, pp 2094–2100
Loshchilov I, Hutter F (2017) SGDR: stochastic gradient descent with warm restarts. In: ICLR
Acknowledgements
The work is supported by the National Key Research and Development Project of China (2022YFF0902000).
Funding
No funding was received for conducting this study.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no competing interests to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Xiao, Z., Zhang, D. A deep reinforcement learning agent for geometry online tutoring. Knowl Inf Syst 65, 1611–1625 (2023). https://doi.org/10.1007/s10115-022-01804-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-022-01804-3