Automating Formalization by Statistical and Semantic Parsing of Mathematics

Kaliszyk, Cezary; Urban, Josef; Vyskočil, Jiří

doi:10.1007/978-3-319-66107-0_2

Cezary Kaliszyk¹⁵,
Josef Urban¹⁶ &
Jiří Vyskočil¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10499))

Included in the following conference series:

International Conference on Interactive Theorem Proving

1232 Accesses
9 Citations

Abstract

We discuss the progress in our project which aims to automate formalization by combining natural language processing with deep semantic understanding of mathematical expressions. We introduce the overall motivation and ideas behind this project, and then propose a context-based parsing approach that combines efficient statistical learning of deep parse trees with their semantic pruning by type checking and large-theory automated theorem proving. We show that our learning method allows efficient use of large amount of contextual information, which in turn significantly boosts the precision of the statistical parsing and also makes it more efficient. This leads to a large improvement of our first results in parsing theorems from the Flyspeck corpus.

C. Kaliszyk—Supported by the ERC Starting grant no. 714034 SMART.

J. Urban and J. Vyskočil—Supported by the ERC Consolidator grant no. 649043 AI4REASON. This work was supported by the European Regional Development Fund under the project AI&Reasoning (reg. no. CZ.02.1.01/0.0/0.0/15_003/0000466).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
In general, a grammar could pick only some subtree depths instead of their contiguous intervals, but we do not use such grammars now.
2.
About 1% of the longest Flyspeck formulas were removed from the evaluation to keep the parsing times manageable.
3.
If the context-free version parsed only a few terms, but with the best rank, its average rank would be 1, but the method would still be much worse in terms of the overall number of correctly parsed terms.

References

Bancerek, G., Rudnicki, P.: A compendium of continuous lattices in MIZAR. J. Autom. Reason. 29(3–4), 189–224 (2002)
Article Google Scholar
Blanchette, J.C., Kaliszyk, C., Paulson, L.C., Urban, J.: Hammering towards QED. J. Formaliz. Reason. 9(1), 101–148 (2016)
MathSciNet MATH Google Scholar
Collins, M.: Three generative, lexicalised models for statistical parsing. In: Cohen, P.R., Wahlster, W. (eds.) Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics, pp. 16–23. Morgan Kaufmann Publishers/ACL (1997)
Google Scholar
The Coq Proof Assistant. http://coq.inria.fr
Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. JASIS 41(6), 391–407 (1990)
Article Google Scholar
Dudani, S.A.: The distance-weighted K-nearest-neighbor rule. IEEE Trans. Syst. Man Cybern. 6(4), 325–327 (1976)
Article Google Scholar
Garillot, F., Gonthier, G., Mahboubi, A., Rideau, L.: Packaging mathematical structures. In: Berghofer, S., Nipkow, T., Urban, C., Wenzel, M. (eds.) TPHOLs 2009. LNCS, vol. 5674, pp. 327–342. Springer, Heidelberg (2009). doi:10.1007/978-3-642-03359-9_23
Chapter Google Scholar
Gauthier, T., Kaliszyk, C.: Matching concepts across HOL libraries. In: Watt, S.M., Davenport, J.H., Sexton, A.P., Sojka, P., Urban, J. (eds.) CICM 2014. LNCS, vol. 8543, pp. 267–281. Springer, Cham (2014). doi:10.1007/978-3-319-08434-3_20
Chapter Google Scholar
Gonthier, G., et al.: A machine-checked proof of the odd order theorem. In: Blazy, S., Paulin-Mohring, C., Pichardie, D. (eds.) ITP 2013. LNCS, vol. 7998, pp. 163–179. Springer, Heidelberg (2013). doi:10.1007/978-3-642-39634-2_14
Chapter Google Scholar
Gonthier, G., Tassi, E.: A language of patterns for subterm selection. In: Beringer, L., Felty, A. (eds.) ITP 2012. LNCS, vol. 7406, pp. 361–376. Springer, Heidelberg (2012). doi:10.1007/978-3-642-32347-8_25
Chapter Google Scholar
Grabowski, A., Korniłowicz, A., Naumowicz, A.: Mizar in a nutshell. J. Formaliz. Reason. 3(2), 153–245 (2010)
MathSciNet MATH Google Scholar
Greenbaum, S.: Input transformations and resolution implementation techniques for theorem-proving in first-order logic. Ph.D. thesis, University of Illinois at Urbana-Champaign (1986)
Google Scholar
Haftmann, F., Wenzel, M.: Constructive type classes in isabelle. In: Altenkirch, T., McBride, C. (eds.) TYPES 2006. LNCS, vol. 4502, pp. 160–174. Springer, Heidelberg (2007). doi:10.1007/978-3-540-74464-1_11
Chapter Google Scholar
Hales, T.: Dense Sphere Packings a Blueprint for Formal Proofs, London Mathematical Society Lecture Note Series, vol. 400. Cambridge University Press, Cambridge (2012)
Book Google Scholar
Hales, T.C., Adams, M., Bauer, G., Dang, D.T., Harrison, J., Hoang, T.L., Kaliszyk, C., Magron, V., McLaughlin, S., Nguyen, T.T., Nguyen, T.Q., Nipkow, T., Obua, S., Pleso, J., Rute, J., Solovyev, A., Ta, A.H.T., Tran, T.N., Trieu, D.T., Urban, J., Vu, K.K., Zumkeller, R.: A formal proof of the Kepler conjecture. CoRR, abs/1501.02155, 2015
Google Scholar
Harrison, J.: HOL Light: a tutorial introduction. In: Srivas, M., Camilleri, A. (eds.) FMCAD 1996. LNCS, vol. 1166, pp. 265–269. Springer, Heidelberg (1996). doi:10.1007/BFb0031814
Chapter Google Scholar
Harrison, J., Urban, J., Wiedijk, F.: History of interactive theorem proving. In: Siekmann, J.H. (ed.) Computational Logic. Handbook of the History of Logic, vol. 9. Elsevier, Amsterdam (2014)
Chapter Google Scholar
Kaliszyk, C., Urban, J.: Learning-assisted automated reasoning with Flyspeck. J. Autom. Reason. 53(2), 173–213 (2014)
Article MathSciNet Google Scholar
Kaliszyk, C., Urban, J., Vyskocil, J.: System description: statistical parsing of informalized Mizar formulas. http://grid01.ciirc.cvut.cz/mptp/synasc17sd.pdf
Kaliszyk, C., Urban, J., Vyskočil, J.: Efficient semantic features for automated reasoning over large theories. In: Yang, Q., Wooldridge, M. (eds.) IJCAI 2015, pp. 3084–3090. AAAI Press, Menlo Park (2015)
Google Scholar
Kaliszyk, C., Urban, J., Vyskočil, J.: Learning to parse on aligned corpora (rough diamond). In: Urban, C., Zhang, X. (eds.) ITP 2015. LNCS, vol. 9236, pp. 227–233. Springer, Cham (2015). doi:10.1007/978-3-319-22102-1_15
Chapter Google Scholar
Kaliszyk, C., Urban, J., Vyskočil, J., Geuvers, H.: Developing corpus-based translation methods between informal and formal mathematics: project description. In: Watt, S.M., Davenport, J.H., Sexton, A.P., Sojka, P., Urban, J. (eds.) CICM 2014. LNCS, vol. 8543, pp. 435–439. Springer, Cham (2014). doi:10.1007/978-3-319-08434-3_34
Chapter MATH Google Scholar
Klein, G., Andronick, J., Elphinstone, K., Heiser, G., Cock, D., Derrin, P., Elkaduwe, D., Engelhardt, K., Kolanski, R., Norrish, M., Sewell, T., Tuch, H., Winwood, S.: seL4: formal verification of an operating-system kernel. Commun. ACM 53(6), 107–115 (2010)
Article Google Scholar
Kovács, L., Voronkov, A.: First-order theorem proving and Vampire. In: Sharygina, N., Veith, H. (eds.) CAV 2013. LNCS, vol. 8044, pp. 1–35. Springer, Heidelberg (2013). doi:10.1007/978-3-642-39799-8_1
Chapter Google Scholar
Lange, M., Leiß, H.: To CNF or not to CNF? an efficient yet presentable version of the CYK algorithm. Inform. Didact. 8, 1–21 (2009). https://www.informaticadidactica.de/uploads/Artikel/LangeLeiss2009/LangeLeiss2009.pdf
Google Scholar
Leroy, X.: Formal verification of a realistic compiler. Commun. ACM 52(7), 107–115 (2009)
Article Google Scholar
Robinson, J.A., Voronkov, A. (eds.): Handbook of Automated Reasoning (in 2 Volumes). Elsevier and MIT Press, Cambridge (2001)
MATH Google Scholar
Rudnicki, P., Schwarzweller, C., Trybulec, A.: Commutative algebra in the Mizar system. J. Symb. Comput. 32(1/2), 143–169 (2001)
Article MathSciNet Google Scholar
Tankink, C., Kaliszyk, C., Urban, J., Geuvers, H.: Formal mathematics on display: a wiki for Flyspeck. In: Carette, J., Aspinall, D., Lange, C., Sojka, P., Windsteiger, W. (eds.) CICM 2013. LNCS, vol. 7961, pp. 152–167. Springer, Heidelberg (2013). doi:10.1007/978-3-642-39320-4_10
Chapter MATH Google Scholar
Urban, J., Vyskočil, J.: Theorem proving in large formal mathematics as an emerging AI field. In: Bonacina, M.P., Stickel, M.E. (eds.) Automated Reasoning and Mathematics. LNCS, vol. 7788, pp. 240–257. Springer, Heidelberg (2013). doi:10.1007/978-3-642-36675-8_13
Chapter MATH Google Scholar
Wenzel, M., Paulson, L.C., Nipkow, T.: The Isabelle framework. In: Mohamed, O.A., Muñoz, C., Tahar, S. (eds.) TPHOLs 2008. LNCS, vol. 5170, pp. 33–38. Springer, Heidelberg (2008). doi:10.1007/978-3-540-71067-7_7
Chapter Google Scholar
Younger, D.H.: Recognition and parsing of context-free languages in time \(n^{3}\). Inf. Control 10(2), 189–208 (1967)
Article Google Scholar
Zinn, C.: Understanding informal mathematical discourse. Ph.D. thesis, University of Erlangen-Nuremberg (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Innsbruck, Innsbruck, Austria
Cezary Kaliszyk
Czech Technical University in Prague, Prague, Czech Republic
Josef Urban & Jiří Vyskočil

Authors

Cezary Kaliszyk
View author publications
You can also search for this author in PubMed Google Scholar
Josef Urban
View author publications
You can also search for this author in PubMed Google Scholar
Jiří Vyskočil
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cezary Kaliszyk .

Editor information

Editors and Affiliations

University of Brasília, Brasília D.F., Brazil
Mauricio Ayala-Rincón
NASA, Hampton, VA, USA
César A. Muñoz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kaliszyk, C., Urban, J., Vyskočil, J. (2017). Automating Formalization by Statistical and Semantic Parsing of Mathematics. In: Ayala-Rincón, M., Muñoz, C.A. (eds) Interactive Theorem Proving. ITP 2017. Lecture Notes in Computer Science(), vol 10499. Springer, Cham. https://doi.org/10.1007/978-3-319-66107-0_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-66107-0_2
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66106-3
Online ISBN: 978-3-319-66107-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics