Abstract
The choice of data structures for the internal representation of terms in logical frameworks and higher-order theorem provers is a crucial low-level factor for their performance. We propose a representation of terms based on a polymorphically typed nameless spine data structure in conjunction with perfect term sharing and explicit substitutions.
In related systems the choice of a \(\beta \)-normalization method is usually statically fixed and cannot be adjusted to the input problem at runtime. The predominant strategies are hereby implementation specific adaptions of leftmost-outermost normalization. We introduce several different \(\beta \)-normalization strategies and empirically evaluate their performance by reduction step measurement on about 7000 heterogeneous problems from different (TPTP) domains.
Our study shows that there is no generally best \(\beta \)-normalization strategy and that for different problem domains, different best strategies can be identified. The evaluation results suggest a problem-dependent choice of a preferred \(\beta \)-normalization strategy for higher-order reasoning systems.
Keywords
- Explicit Substitutions
- Higher-order Theorem Provers
- Count Reduction
- Simple Type Theory
- Selected Problem Domain
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This work has been supported by the German National Research Foundation (DFG) under grant BE 2501/11-1 (Leo-III).
1 Introduction
Higher-order (HO) automated theorem proving (ATP) is, in many ways, more complex and involved than ATP in first-order or propositional logic. This additional complexity can be found on the proof search layer as well as on the layer of terms respectively formulas. However, one advantage is that the increased practical expressiveness of higher-order logic often enables more intuitive and concise problem representations and solutions. Many interactive and automated theorem provers for higher-order logic are based on Church’s simple type theory [7] – also called classical higher-order logic (HOL) – or extensions of it.
In automated reasoning systems, terms are the most general and common pieces of information that are accessed, manipulated and created by most routines of the reasoning system. It is therefore not surprising, that the internal representation of terms is a crucial detail which has direct consequences on the efficiency of the whole system.
We present a combination of term representation techniques for HO ATP systems that is based on locally nameless spine terms [4] and explicit treatment of substitutions [1]. These base choices are appropriately adjusted to meet the requirements of HO ATP systems. In particular, our representation natively admits an expressive typing system, efficient term operations and reasonable memory consumption through term sharing in a combination that is novel to HO reasoners.
The support for efficient term operations hereby not only covers those adopted from the first-order universe, but also the essential operation of \(\beta \)-normalization. To this end, we differ from prominent other reasoning systems in proposing several new (modified) \(\beta \)-normalization strategies that allow a problem-dependent handling of \(\beta \)-reduction. Thus, we do not hard-wire a single, preferred \(\beta \)-normalization strategy that we anticipate to perform best over all possible problem inputs. We think that this approach can in fact increase the overall performance of HO ATP systems in which \(\beta \)-(re-)normalization has to be repeatedly carried out during proof search.
This research is motivated by previous observations [18] that suggest that there is no single best normalization strategy. The here proposed strategies have been empirically evaluated using a representative set of benchmark problems for theorem proving. This evaluation confirms that there are problem classes at which the de-facto standard leftmost-outermost strategy is outperformed by our rather simple alternative strategies. The evaluation has been conducted within the LeoPARD [21] system platform for HOL reasoners.Footnote 1
2 HOL Term Representation
HOL is an elegant and expressive formal system that extends first-order logic with quantification over arbitrary sets and functions. We consider Alonzo Church’s simple type theory [7] which is a formulation of HOL that is built on top of the simply typed \(\lambda \)-calculus [5, 6].
The simply typed \(\lambda \)-calculus, denoted \(\lambda _\rightarrow \), augments the untyped \(\lambda \)-calculus with simple types, which are freely generated from a set of base types and the function type constructor \(\rightarrow \). In HOL, the set of base types is usually taken as a superset of \(\{ \iota , o\}\) with \(\iota \) and o for individuals and truth values, respectively.
The work presented here focuses on an extended variant of \(\lambda _\rightarrow \) that natively supports parametric polymorphism and incorporates a locally nameless representation using de-Bruijn indices for bound variables [3]. The notion of de-Bruijn indices is extended for nameless type variables to keep up the guarantee of syntactical uniqueness of \(\alpha \)-equivalent terms. Types (denoted by \(\tau \) or \(\nu \)) are thus given by
where T is a non-empty set of base type symbols and \(\underline{i}\) is a nameless type variable.
The term data structure presented next adopts, combines and extends techniques that are employed in state-of-the-art HO reasoning systems, such as Teyjus \(\lambda \)Prolog [12] (which is based on explicit substitutions of the Suspension Calculus [11]), the logical frameworks TWELF [13] and Beluga [14], and the interactive Abella prover [8]. In particular, the combination of techniques for term data structures presented here is, up to our knowledge, novel in the context of HO ATP and not employed in any modern system.
On the basis of nameless terms, spine notation [4] in conjunction with explicit substitutions [1] is employed. The first technique allows quick head access and a left-to-right traversal method that is more efficient than in classical curried representation. The latter method’s explicit treatment of substitutions enables the combination of substitution runs which in turn permits a more efficient \(\beta \)-normalization procedure.
More specifically, the internal representation of polymorphic HOL syntax is given by (types are partially omitted for simplicity):
where the terms s are either roots, redexes, term and type abstraction, or closures (respectively) with heads h (that are bound indices i, constants \(c_\tau \in \Sigma \) from the signature \(\Sigma \) or itself closures) and spines S. We support defined constants \(c_\tau \) and their expansion using directed equation axioms \((c_\tau := d_\tau )\). The spines collect arguments in a linear sequence, concatenated by the ; constructor. A substitution \(\sigma = (\sigma _{term},\sigma _{type})\) is internally represented by a pair of a term- and a type substitution, for which each individual substitution exclusively contains substitutes for the corresponding de-Bruijn indices. In the current version, closures cannot occur within types. This is because the number of type variables within current common ATP problems is typically very low (often zero), and, hence, merging of substitution runs in types is not crucial.
We extend the notion of \(\beta \)-normalization to substitutions \(\sigma = (\sigma _{term},\sigma _{type})\) by where denotes the substitution \(\rho \) for which it holds that , i.e. all components of the substitution are \(\beta \)-normalized individually.
The type abstraction mechanism (\(\varLambda .\;s\)) is due to Girard and Reynolds, who independently developed a polymorphically typed \(\lambda \)-calculus today widely known as System F [9, 16]. We use a Church-style \(\lambda \)-calculus in which each type is considered a part of the term’s name and thus intrinsic to it. This has several advantages over the extrinsic, or Curry-style, interpretation, but comes with some downsides, e.g., wrt. typing flexibility.
3 Normalization Strategies
We now introduce corresponding strategies, two of them novel (wrt. earlier experiments in [18]), and present them along with a brief discussion of possible benefits (and downsides). Subsequently, these strategies are empirically evaluated using an extensive benchmark set. The strategies are:
-
1.
DEFAULT (Leftmost-outermost): This normalization method corresponds to the standard normal-order strategy, that is, the leftmost-outermost redex is processed first at each step during \(\beta \)-normalization. We use DEFAULT as starting point for the presentation and explanation of further strategies below. The complete rules for DEFAULT can be found in Fig. 1. Here, denotes \(\beta \)-normalization relative to substitution \(\sigma \). The computation of the \(\beta \)-normal form of term s is initiated by , where id is the identity substitution \(id := \,\uparrow ^0\).
-
2.
HSUBST n ( \(n>0\) , Heuristic application of substitution in RxApp ): If the size of the term to be prepended onto the substitution is smaller than n, it is normalized strictly. Otherwise, the substitution is postponed using closures as before. The rule RxApp from Fig. 1 is thus replaced by the two rules
where |t| denotes the size of term t (i.e. the number of term nodes in internal representation).
-
3.
WHNF (Normalize substitution once WHNF is obtained): When arrived at weak head normal form \(c \cdot S\) of the current (sub-)term during \(\beta \)-normalization, the substitution \(\sigma \) is normalized and then used to further \(\beta \)-normalize the spine S. Thus, the rule RAtom (cf. Fig. 1) is replaced by
-
4.
STRCOMP (Strict composition of term-substitutions): The standard (meta-operation) of term-substitution composition with closures is given by
$$\begin{aligned} (s_\tau \cdot \sigma _{term}) \circ \rho _{term} \longrightarrow s_\tau [\rho _{term}] \cdot \left( \sigma _{term} \circ \rho _{term}\right) \end{aligned}$$(1)In STRCOMP it is instead calculated strictly:
(2)In contrast to (1), the application of substitution \(\rho _{term}\) in (2) is not postponed using closures but applied immediately by \(\beta \)-normalization.
-
5.
WEAK (Weakly normalize substitutions on demand): Before application of RTermSub or RTermClos, \(\beta \)-normalize the term before substituting, and update \(\sigma \) accordingly. This means that each time a term is supposed to be substituted, its \(\beta \)-normal form is substituted instead. Also, in order to avoid re-computations, the original term is replaced by its \(\beta \)-normal form in the substitution \(\sigma \), too. Thus, rule RTermSub from Fig. 1 is replaced by
and RTermClos is replaced analogously. Here, when term t is substituted for de-Bruijn index i, the substitution \(\sigma \) is updated to hold the normalized t at position i, i.e. iff \(j = i\) and \(\sigma ^\prime (j) = \sigma (j)\) otherwise.
4 Evaluation and Further Work
In order to estimate the expected effects of using different \(\beta \)-normalization strategies in practical scenarios of automated reasoning, a worst-case analysis seems inappropriate and is therefore omitted. In lieu thereof, a representative set of problems for (HO) theorem proving has been chosen for which the number of \(\beta \)-normalization reduction steps has been compared empirically between all strategies. Since the proposed strategies do not include costly heuristics (e.g. based on structural properties of terms), a decrease in reduction counts can directly be translated to a speed-up with respect to actual time consumption. The evaluation has been conducted with the LeoPARD system platform, in which the term data structures from Sect. 2 and the strategies from Sect. 3 have been implemented.
The Benchmarks. The benchmark problems were chosen from a relatively broad field of diversity: The first three benchmark domains are the sets denoted Church I, Church II and Church III that contain reducible arithmetic terms (of the form mult(i, i), power(i, 3), power(3, i) respectively) in polymorphic Church numerals encoding [17]. The domains S4E and S4F contain a total of 3480 HO problems, converted from propositional and first-order modal logic problems from the QMLTP library [15]. Both domains differ wrt. to the details of the employed semantic embedding of logic S4 in HOL [2].Footnote 2
The remaining benchmarks (a total of 3246 problems) are (typed) first-order and HO problems from the TPTP problem library [19, 20]. These benchmark domains are denoted according to their problem domain name as given by the TPTP library. Generally, first-order CNF problems, as well as TPTP domains that only contain them, were not considered for the evaluation, since the contained formulae are already given in clause normal form which results in likewise \(\beta \)-normalized internal clause representations in LeoPARD.
The benchmark problem selection embodies, in its sum, a representative set of nearly 7000 practical inputs for reasoning systems and a heterogeneous set of (syntactic and semantic) term characteristics is covered.
Results and Discussion. Figure 2a shows the number of benchmark problems (throughout all domains) that were \(\beta \)-normalized (uniquely) best using the given strategy. It can be seen that, in our benchmark set, the DEFAULT strategy has the higher number of problems normalized with minimal reduction count (compared to the other strategies). Nevertheless, HSUBST4 and WHNF are competitive alternatives, and there are even problems that are uniquely normalized best in the remaining strategies. It should be pointed out again that the competing strategies are relatively simple, since they do not use sophisticated term structure heuristics and yet already admit a fair effectiveness in certain domains.
In order to give a brief idea of the amount of potential reduction count savings, a quantitative comparison of 14 problems from KRS with highest reduction count differences between default leftmost-outermost and the alternative WHNF strategy is shown in Fig. 2b. These difference are, in the most striking cases, up to factor 4.5 which is considerable in magnitudes of \(10^6\) reduction steps and above.
More detailed results that underline our observations can be found in Table 1. Here, for selected problem domainsFootnote 3, and each relevant \(\beta \)-normalization strategy, the number of problems that performed best and worst are displayed (i.e. the number of problems that had the lowest respectively highest overall reduction count for this strategy). Additionally, the number of unique problems – denoted (u) – which normalized strictly faster in this strategy than in any other strategy within the domain is given. The sum of all reduction steps, denoted \(\Sigma r_i\), throughout the whole problem domain, as well as the maximal number of reduction steps (for a single problem) are given. The remaining three values, \(\widetilde{r_i}\), \(\overline{r_i}\) and \(\sigma \), denote the arithmetic mean, the median value and the standard derivation of the measurement results (respectively).
As an example, in benchmark domain Church I (cf. Table 1a) STRCOMP performs drastically better than in any other domain: Although DEFAULT and WHNF have the lowest minimum value, STRCOMP is by far the best strategy (in problem count and overall reduction sum) with 88 of 100 problems (uniquely) normalized best. In terms of reduction steps per problem, STRCOMP takes only roughly 70 % of the number of steps required by DEFAULT (in both average and mean). Similar results also apply for the remaining Church domains. Also, in the GRA domain (cf. Table 1c), the mean normalization step count \(\overline{r_i}\) is more than 100 steps lower in the WHNF strategy than when using DEFAULT. These results demonstrate that the alternative normalization strategies can in fact perform better (wrt. reduction count per problem) than default leftmost-outermost in certain problem domains.
Further Work. While the present evaluation grouped problems by a (given, practically motivated) semantic classification, further investigations need to identify syntactic criteria in order to group problems with similar properties (with respect to \(\beta \)-normalization performance) for a specific strategy.
Based on observation and some preliminary experiments, we are positive that methods based on syntactic criteria such as the following can be employed for choosing an appropriate normalization strategy at runtime:
-
Recognition of regular patterns in terms
-
The term’s size and depth
-
The number of abstractions not occurring at top-level
-
The number of bound indices
For future work, not only concrete (syntactical) heuristics but also machine learning techniques could be employed to study representative sets of problems.
5 Conclusion
A sophisticated internal representation mechanism for (second-order) polymorphically typed HO terms, including a locally nameless spine notation combined with explicit substitutions and perfect term sharing, has been presented.
Using the above representation, several new \(\beta \)-normalization strategies have been introduced. These strategies vary in their extent of laziness and strictness in certain normalization rules, e.g. during composition of substitutions. They have subsequently been implemented and evaluated within the LeoPARD framework. The conducted evaluation was based on a representative benchmark set.
For logical frameworks and meta languages, the representation of objects such as programs and proofs in \(\lambda \)Prolog has previously been studied [10]. However, a fine-grained evaluation of normalization strategies in context of HO ATP as reported here has not been carried out before. Extending previous studies in a rather orthogonal manner (wrt. application domain, granularity, and system of explicit substitutions), our benchmarks reveal that there is no single best \(\beta \)-normalization strategy for a relevant set of problem classes. In particular, our findings show that the performance of a strategy rather depends on some (syntactic) characteristics of the input problem. The reduction count difference between the default leftmost-outermost strategy and the leading strategy can, in fact, be as high as factor four.
Notes
- 1.
The LeoPARD framework is freely available under BSD license and can be downloaded at https://github.com/cbenzmueller/LeoPARD.
- 2.
The archive of semantically embedded S4-formulae from QMLTP can be found at http://page.mi.fu-berlin.de/cbenzmueller/papers/THF-S4-ALL.zip.
- 3.
The complete evaluation results can be found at http://inf.fu-berlin.de/~lex/files/betaresults.pdf.
References
Abadi, M., Cardelli, L., Curien, P.L., Levy, J.J.: Explicit substitutions. In: Proceedings of the 17th Symposium on Principles of Programming Languages, POPL 1990, pp. 31–46. ACM, New York, NY, USA (1990)
Benzmüller, C., Raths, T.: HOL based first-order modal logic provers. In: McMillan, K., Middeldorp, A., Voronkov, A. (eds.) LPAR-19 2013. LNCS, vol. 8312, pp. 127–136. Springer, Heidelberg (2013)
Bruijn, N.G.D.: Lambda calculus notation with nameless dummies, a tool for automatic formula manipulation, with application to the church-rosser theorem. INDAG. MATH 34, 381–392 (1972)
Cervesato, I., Pfenning, F.: A linear spine calculus. J. Logic Comput. 13(5), 639–688 (2003)
Church, A.: A set of postulates for the foundation of logic. Ann. Math. 33(2), 346–366 (1932)
Church, A.: A set of postulates for the foundation of logic. Second Paper. Ann. Math. 34(4), 839–864 (1933)
Church, A.: A formulation of the simple theory of types. J. Symb. Log. 5(2), 56–68 (1940)
Gacek, A.: The abella interactive theorem prover (system description). In: Armando, A., Baumgartner, P., Dowek, G. (eds.) IJCAR 2008. LNCS (LNAI), vol. 5195, pp. 154–161. Springer, Heidelberg (2008)
Girard, J.: Interprétation fonctionnelle et élimination des coupures de l’arithmétique d’ordre supérieur. Ph.D. thesis, Université Paris VII (1972)
Liang, C., Nadathur, G., Qi, X.: Choices in representation and reduction strategies for lambda terms in intensional contexts. J. Autom. Reasoning 33(2), 89–132 (2004)
Nadathur, G.: A fine-grained notation for lambda terms and its use in intensional operations. J. Funct. Logic Program. 1999(2), 1–62 (1999)
Nadathur, G., Mitchell, D.J.: System description: teyjus - a compiler and abstract machine based implementation of \(\lambda \)prolog. In: Ganzinger, H. (ed.) CADE 1999. LNCS (LNAI), vol. 1632, pp. 287–291. Springer, Heidelberg (1999)
Pfenning, F., Schürmann, C.: System description: Twelf - a meta-logical framework for deductive systems. In: Ganzinger, H. (ed.) CADE 1999. LNCS (LNAI), vol. 1632, pp. 202–206. Springer, Heidelberg (1999)
Pientka, B., Dunfield, J.: Beluga: a framework for programming and reasoning with deductive systems (system description). In: Giesl, J., Hähnle, R. (eds.) IJCAR 2010. LNCS, vol. 6173, pp. 15–21. Springer, Heidelberg (2010)
Raths, T., Otten, J.: The QMLTP problem library for first-order modal logics. In: Gramlich, B., Miller, D., Sattler, U. (eds.) IJCAR 2012. LNCS, vol. 7364, pp. 454–461. Springer, Heidelberg (2012)
Reynolds, J.C.: Towards a theory of type structure. In: Robinet, B. (ed.) Symposium on Programming. LNCS, vol. 19, pp. 408–423. Springer, Heidelberg (1974)
Reynolds, J.C.: An introduction to polymorphic lambda calculus. In: Logical Foundations of Functional Programming, pp. 77–86. Addison-Wesley (1994)
Steen, A.: Efficient Data Structures for Automated Theorem Proving in Expressive Higher-Order Logics. Master’s thesis, Freie Universität Berlin, Berlin (2014)
Sutcliffe, G.: The TPTP problem library and associated infrastructure: The FOF and CNF parts, v3.5.0. J. Autom. Reasoning 43(4), 337–362 (2009)
Sutcliffe, G., Benzmüller, C.: Automated reasoning in higher-order logic using the TPTP THF infrastructure. J. Formalized Reasoning 3(1), 1–27 (2010)
Wisniewski, M., Steen, A., Benzmüller, C.: LeoPARD — A generic platform for the implementation of higher-order reasoners. In: Kerber, M., Carette, J., Kaliszyk, C., Rabe, F., Sorge, V. (eds.) CICM 2015. LNCS, vol. 9150, pp. 325–330. Springer, Heidelberg (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Steen, A., Benzmüller, C. (2015). There Is No Best \(\beta \)-Normalization Strategy for Higher-Order Reasoners. In: Davis, M., Fehnker, A., McIver, A., Voronkov, A. (eds) Logic for Programming, Artificial Intelligence, and Reasoning. LPAR 2015. Lecture Notes in Computer Science(), vol 9450. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-48899-7_23
Download citation
DOI: https://doi.org/10.1007/978-3-662-48899-7_23
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-48898-0
Online ISBN: 978-3-662-48899-7
eBook Packages: Computer ScienceComputer Science (R0)