Skip to main content

Automating Formalization by Statistical and Semantic Parsing of Mathematics

  • Conference paper
Interactive Theorem Proving (ITP 2017)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10499))

Included in the following conference series:

Abstract

We discuss the progress in our project which aims to automate formalization by combining natural language processing with deep semantic understanding of mathematical expressions. We introduce the overall motivation and ideas behind this project, and then propose a context-based parsing approach that combines efficient statistical learning of deep parse trees with their semantic pruning by type checking and large-theory automated theorem proving. We show that our learning method allows efficient use of large amount of contextual information, which in turn significantly boosts the precision of the statistical parsing and also makes it more efficient. This leads to a large improvement of our first results in parsing theorems from the Flyspeck corpus.

C. Kaliszyk—Supported by the ERC Starting grant no. 714034 SMART.

J. Urban and J. Vyskočil—Supported by the ERC Consolidator grant no. 649043 AI4REASON. This work was supported by the European Regional Development Fund under the project AI&Reasoning (reg. no. CZ.02.1.01/0.0/0.0/15_003/0000466).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In general, a grammar could pick only some subtree depths instead of their contiguous intervals, but we do not use such grammars now.

  2. 2.

    About 1% of the longest Flyspeck formulas were removed from the evaluation to keep the parsing times manageable.

  3. 3.

    If the context-free version parsed only a few terms, but with the best rank, its average rank would be 1, but the method would still be much worse in terms of the overall number of correctly parsed terms.

References

  1. Bancerek, G., Rudnicki, P.: A compendium of continuous lattices in MIZAR. J. Autom. Reason. 29(3–4), 189–224 (2002)

    Article  Google Scholar 

  2. Blanchette, J.C., Kaliszyk, C., Paulson, L.C., Urban, J.: Hammering towards QED. J. Formaliz. Reason. 9(1), 101–148 (2016)

    MathSciNet  MATH  Google Scholar 

  3. Collins, M.: Three generative, lexicalised models for statistical parsing. In: Cohen, P.R., Wahlster, W. (eds.) Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics, pp. 16–23. Morgan Kaufmann Publishers/ACL (1997)

    Google Scholar 

  4. The Coq Proof Assistant. http://coq.inria.fr

  5. Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. JASIS 41(6), 391–407 (1990)

    Article  Google Scholar 

  6. Dudani, S.A.: The distance-weighted K-nearest-neighbor rule. IEEE Trans. Syst. Man Cybern. 6(4), 325–327 (1976)

    Article  Google Scholar 

  7. Garillot, F., Gonthier, G., Mahboubi, A., Rideau, L.: Packaging mathematical structures. In: Berghofer, S., Nipkow, T., Urban, C., Wenzel, M. (eds.) TPHOLs 2009. LNCS, vol. 5674, pp. 327–342. Springer, Heidelberg (2009). doi:10.1007/978-3-642-03359-9_23

    Chapter  Google Scholar 

  8. Gauthier, T., Kaliszyk, C.: Matching concepts across HOL libraries. In: Watt, S.M., Davenport, J.H., Sexton, A.P., Sojka, P., Urban, J. (eds.) CICM 2014. LNCS, vol. 8543, pp. 267–281. Springer, Cham (2014). doi:10.1007/978-3-319-08434-3_20

    Chapter  Google Scholar 

  9. Gonthier, G., et al.: A machine-checked proof of the odd order theorem. In: Blazy, S., Paulin-Mohring, C., Pichardie, D. (eds.) ITP 2013. LNCS, vol. 7998, pp. 163–179. Springer, Heidelberg (2013). doi:10.1007/978-3-642-39634-2_14

    Chapter  Google Scholar 

  10. Gonthier, G., Tassi, E.: A language of patterns for subterm selection. In: Beringer, L., Felty, A. (eds.) ITP 2012. LNCS, vol. 7406, pp. 361–376. Springer, Heidelberg (2012). doi:10.1007/978-3-642-32347-8_25

    Chapter  Google Scholar 

  11. Grabowski, A., Korniłowicz, A., Naumowicz, A.: Mizar in a nutshell. J. Formaliz. Reason. 3(2), 153–245 (2010)

    MathSciNet  MATH  Google Scholar 

  12. Greenbaum, S.: Input transformations and resolution implementation techniques for theorem-proving in first-order logic. Ph.D. thesis, University of Illinois at Urbana-Champaign (1986)

    Google Scholar 

  13. Haftmann, F., Wenzel, M.: Constructive type classes in isabelle. In: Altenkirch, T., McBride, C. (eds.) TYPES 2006. LNCS, vol. 4502, pp. 160–174. Springer, Heidelberg (2007). doi:10.1007/978-3-540-74464-1_11

    Chapter  Google Scholar 

  14. Hales, T.: Dense Sphere Packings a Blueprint for Formal Proofs, London Mathematical Society Lecture Note Series, vol. 400. Cambridge University Press, Cambridge (2012)

    Book  Google Scholar 

  15. Hales, T.C., Adams, M., Bauer, G., Dang, D.T., Harrison, J., Hoang, T.L., Kaliszyk, C., Magron, V., McLaughlin, S., Nguyen, T.T., Nguyen, T.Q., Nipkow, T., Obua, S., Pleso, J., Rute, J., Solovyev, A., Ta, A.H.T., Tran, T.N., Trieu, D.T., Urban, J., Vu, K.K., Zumkeller, R.: A formal proof of the Kepler conjecture. CoRR, abs/1501.02155, 2015

    Google Scholar 

  16. Harrison, J.: HOL Light: a tutorial introduction. In: Srivas, M., Camilleri, A. (eds.) FMCAD 1996. LNCS, vol. 1166, pp. 265–269. Springer, Heidelberg (1996). doi:10.1007/BFb0031814

    Chapter  Google Scholar 

  17. Harrison, J., Urban, J., Wiedijk, F.: History of interactive theorem proving. In: Siekmann, J.H. (ed.) Computational Logic. Handbook of the History of Logic, vol. 9. Elsevier, Amsterdam (2014)

    Chapter  Google Scholar 

  18. Kaliszyk, C., Urban, J.: Learning-assisted automated reasoning with Flyspeck. J. Autom. Reason. 53(2), 173–213 (2014)

    Article  MathSciNet  Google Scholar 

  19. Kaliszyk, C., Urban, J., Vyskocil, J.: System description: statistical parsing of informalized Mizar formulas. http://grid01.ciirc.cvut.cz/mptp/synasc17sd.pdf

  20. Kaliszyk, C., Urban, J., Vyskočil, J.: Efficient semantic features for automated reasoning over large theories. In: Yang, Q., Wooldridge, M. (eds.) IJCAI 2015, pp. 3084–3090. AAAI Press, Menlo Park (2015)

    Google Scholar 

  21. Kaliszyk, C., Urban, J., Vyskočil, J.: Learning to parse on aligned corpora (rough diamond). In: Urban, C., Zhang, X. (eds.) ITP 2015. LNCS, vol. 9236, pp. 227–233. Springer, Cham (2015). doi:10.1007/978-3-319-22102-1_15

    Chapter  Google Scholar 

  22. Kaliszyk, C., Urban, J., Vyskočil, J., Geuvers, H.: Developing corpus-based translation methods between informal and formal mathematics: project description. In: Watt, S.M., Davenport, J.H., Sexton, A.P., Sojka, P., Urban, J. (eds.) CICM 2014. LNCS, vol. 8543, pp. 435–439. Springer, Cham (2014). doi:10.1007/978-3-319-08434-3_34

    Chapter  MATH  Google Scholar 

  23. Klein, G., Andronick, J., Elphinstone, K., Heiser, G., Cock, D., Derrin, P., Elkaduwe, D., Engelhardt, K., Kolanski, R., Norrish, M., Sewell, T., Tuch, H., Winwood, S.: seL4: formal verification of an operating-system kernel. Commun. ACM 53(6), 107–115 (2010)

    Article  Google Scholar 

  24. Kovács, L., Voronkov, A.: First-order theorem proving and Vampire. In: Sharygina, N., Veith, H. (eds.) CAV 2013. LNCS, vol. 8044, pp. 1–35. Springer, Heidelberg (2013). doi:10.1007/978-3-642-39799-8_1

    Chapter  Google Scholar 

  25. Lange, M., Leiß, H.: To CNF or not to CNF? an efficient yet presentable version of the CYK algorithm. Inform. Didact. 8, 1–21 (2009). https://www.informaticadidactica.de/uploads/Artikel/LangeLeiss2009/LangeLeiss2009.pdf

    Google Scholar 

  26. Leroy, X.: Formal verification of a realistic compiler. Commun. ACM 52(7), 107–115 (2009)

    Article  Google Scholar 

  27. Robinson, J.A., Voronkov, A. (eds.): Handbook of Automated Reasoning (in 2 Volumes). Elsevier and MIT Press, Cambridge (2001)

    MATH  Google Scholar 

  28. Rudnicki, P., Schwarzweller, C., Trybulec, A.: Commutative algebra in the Mizar system. J. Symb. Comput. 32(1/2), 143–169 (2001)

    Article  MathSciNet  Google Scholar 

  29. Tankink, C., Kaliszyk, C., Urban, J., Geuvers, H.: Formal mathematics on display: a wiki for Flyspeck. In: Carette, J., Aspinall, D., Lange, C., Sojka, P., Windsteiger, W. (eds.) CICM 2013. LNCS, vol. 7961, pp. 152–167. Springer, Heidelberg (2013). doi:10.1007/978-3-642-39320-4_10

    Chapter  MATH  Google Scholar 

  30. Urban, J., Vyskočil, J.: Theorem proving in large formal mathematics as an emerging AI field. In: Bonacina, M.P., Stickel, M.E. (eds.) Automated Reasoning and Mathematics. LNCS, vol. 7788, pp. 240–257. Springer, Heidelberg (2013). doi:10.1007/978-3-642-36675-8_13

    Chapter  MATH  Google Scholar 

  31. Wenzel, M., Paulson, L.C., Nipkow, T.: The Isabelle framework. In: Mohamed, O.A., Muñoz, C., Tahar, S. (eds.) TPHOLs 2008. LNCS, vol. 5170, pp. 33–38. Springer, Heidelberg (2008). doi:10.1007/978-3-540-71067-7_7

    Chapter  Google Scholar 

  32. Younger, D.H.: Recognition and parsing of context-free languages in time \(n^{3}\). Inf. Control 10(2), 189–208 (1967)

    Article  Google Scholar 

  33. Zinn, C.: Understanding informal mathematical discourse. Ph.D. thesis, University of Erlangen-Nuremberg (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cezary Kaliszyk .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Kaliszyk, C., Urban, J., Vyskočil, J. (2017). Automating Formalization by Statistical and Semantic Parsing of Mathematics. In: Ayala-Rincón, M., Muñoz, C.A. (eds) Interactive Theorem Proving. ITP 2017. Lecture Notes in Computer Science(), vol 10499. Springer, Cham. https://doi.org/10.1007/978-3-319-66107-0_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-66107-0_2

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-66106-3

  • Online ISBN: 978-3-319-66107-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics