Skip to main content

Advertisement

Log in

A deep reinforcement learning agent for geometry online tutoring

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

In this paper, we apply deep reinforcement learning (DRL) for geometry reasoning and develop Dragon to facilitate online tutoring. Its success is contingent on a flexible data model to capture diverse concepts and heterogeneous relations, as well as an effective DRL agent to generate near-optimal and human-readable solutions. We use proximal policy optimization (PPO) as the backbone DRL architecture, customized with effective state representation and integrated with a bunch of optimization tricks including attention mechanism, action mask, data augmentation and curriculum learning. In our experimental study, we craft so far the largest scale dataset with geometry problems and a knowledge base with 46 theorems. We implement various heuristic algorithms and DRL models as baselines for performance comparison. The results show that our agent achieves near-optimal solution and is superior over multiple competitive baselines. To benefit the community, we opensource the dataset and implementation at https://github.com/AIEdu-xzy/geometry-solver.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Chou S-C, Gao X-S, Zhang J (1994) Machine proofs in geometry: automated production of readable proofs for geometry theorems. World Scientific, Singapore. https://doi.org/10.1142/9789812798152

    Article  MATH  Google Scholar 

  2. Alvin C, Gulwani S, Majumdar R, Mukhopadhyay S (2017) Synthesis of problems for shaded area geometry reasoning, pp. 455–458. https://doi.org/10.1007/978-3-319-61425-0_39

  3. Alvin C, Gulwani S, Majumdar R, Mukhopadhyay S (2014) Synthesis of geometry proof problems. Proc Natl Conf Artif Intell 1:245–252

    Google Scholar 

  4. Lan Y, Wang L, Zhang Q, Lan Y, Dai BT, Wang Y, Zhang D, Lim E (2022) Mwptoolkit: An open-source framework for deep learning-based math word problem solvers. In: Thirty-sixth aaai conference on artificial intelligence, aaai 2022, thirty-fourth conference on innovative applications of artificial intelligence, IAAI 2022, the twelveth symposium on educational advances in artificial intelligence, EAAI 2022 Virtual Event, February 22 - March 1, 2022, pp. 13188–13190

  5. TUN W-D (1978) On the decision problem and the mechanization of theorem-proving in elementary geometry. Sci Sinica 21(2):159–172

    MathSciNet  Google Scholar 

  6. Wen-Tsun W (1986) Basic principles of mechanical theorem proving in elementary geometries. J Autom Reason 2(3):221–252

    Article  Google Scholar 

  7. Kapur D (1986) Using gröbner bases to reason about geometry problems. J Symb Comput 2(4):399–408

    Article  MATH  Google Scholar 

  8. Seo MJ, Hajishirzi H, Farhadi A, Etzioni O (2014) Diagram understanding in geometry questions. In: AAAI, pp 2831–2838

  9. Seo MJ, Hajishirzi H, Farhadi A, Etzioni O, Malcolm C (2015) Solving geometry problems: combining text and diagram interpretation. In: EMNLP, pp 1466–1476

  10. Zhang D, Wang L, Zhang L, Dai BT, Shen HT (2020) The gap of semantic parsing: a survey on automatic math word problem solvers. IEEE Trans Pattern Anal Mach Intell 42(9):2287–2305

    Article  Google Scholar 

  11. Rocktäschel T, Riedel S (2017) End-to-end differentiable proving. In: NIPS, pp 3788–3800

  12. Kaliszyk C, Urban J, Michalewski H, Olsák M (2018) Reinforcement learning of theorem proving. In: NeurIPS, pp 8836–8847

  13. Letz R, Mayr K, Goller C (1994) Cotrolled integration of the cut rule into connection tableaux calculi. J Autom Reason 13(3):297–337

    Article  MATH  Google Scholar 

  14. Selsam D, Lamm M, Bünz B, Liang P, de Moura L, Dill DL (2019) Learning a SAT solver from single-bit supervision. In: ICLR

  15. Lederman G, Rabe MN, Seshia S, Lee EA (2020) Learning heuristics for quantified boolean formulas through reinforcement learning. In: ICLR

  16. Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. CoRR abs/1707.06347

  17. Laskin M, Lee K, Stooke A, Pinto L, Abbeel P, Srinivas A (2020) Reinforcement learning with augmented data. CoRR abs/2004.14990

  18. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: NIPS, pp 1106–1114

  19. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: ICLR

  20. Cubuk ED, Zoph B, Mane D, Vasudevan V, Le QV (2019) Autoaugment: Learning augmentation strategies from data. In: CVPR, pp 113–123

  21. Bengio Y, Louradour J, Collobert R, Weston J (2009) Curriculum learning. In: Proceedings of the 26th annual international conference on machine learning, ICML 2009, Montreal, Quebec, Canada, June 14-18, 2009, pp 41–48

  22. Jiang L, Zhou Z, Leung T, Li L, Fei-Fei L (2018) Mentornet: Learning data-driven curriculum for very deep neural networks on corrupted labels. In: ICML, pp 2309–2318

  23. Guo S, Huang W, Zhang H, Zhuang C, Dong D, Scott MR, Huang D (2018) Curriculumnet: Weakly supervised learning from large-scale web images. CoRR abs/1808.01097

  24. Xu B, Zhang L, Mao Z, Wang Q, Xie H, Zhang Y (2020) Curriculum learning for natural language understanding. In: ACL, pp 6095–6104

  25. Williams RJ (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn 8:229–256

    Article  MATH  Google Scholar 

  26. Wu S, Li Y, Zhu H, Zhao J, Chen G (2022) Dynamic index construction with deep reinforcement learning. Data Sci Eng 7(2):87–101

    Article  Google Scholar 

  27. Mnih V, Badia AP, Mirza M, Graves A, Lillicrap TP, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: ICML. JMLR workshop and conference proceedings, vol. 48, pp 1928–1937

  28. Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double q-learning. In: AAAI, pp 2094–2100

  29. Loshchilov I, Hutter F (2017) SGDR: stochastic gradient descent with warm restarts. In: ICLR

Download references

Acknowledgements

The work is supported by the National Key Research and Development Project of China (2022YFF0902000).

Funding

No funding was received for conducting this study.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dongxiang Zhang.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

See Fig.  7.

Fig. 7
figure 7

Examples of solutions generated by Dragon, MCTS and MCTS-RL

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xiao, Z., Zhang, D. A deep reinforcement learning agent for geometry online tutoring. Knowl Inf Syst 65, 1611–1625 (2023). https://doi.org/10.1007/s10115-022-01804-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-022-01804-3

Keywords

Navigation