Many questions about meaning are straightforwardly empirical, like the question whether “Gift” means the same in German as it does in English.

Other questions about meaning are more abstract: for example the question whether in metaphors a word acquires a new meaning, or simply contributes its meaning in a different way to a phrase containing it. This is a question about the architecture of our preferred semantic theory, and perhaps not a question about facts at all. For some theories of language the notion of meaning, like the notion of cause in physics, vanishes altogether when we reach fundamentals.

One task of any semantic theory is to draw a line between semantics and grammar/syntax. Noam Chomsky hints at this when he says: ([6] p. 144):

… at every point in the stream of discourse the speaker must choose a particular single word, and it makes sense to ask to what extent his choice of a particular word was governed by the grammatical structure of the language, and to what extent it was governed by other factors.

Chomsky’s phrasing suggests an equation:

$$\mathrm{Language = Syntax + (Other factors)},$$
((15.1))

where the other factors presumably include meaning. If (15.1) was an equation in an abelian group, we could solve for (Other factors) by subtracting Syntax from Language. Unfortunately the mathematical structure underlying the equation not an abelian group, but something quite different that I describe in Section 15.2.4 below. (If mathematical readers detect some resemblance to Galois theory, that’s good.)

In broad outline, if we have a language with a given or presumed grammatical structure, and an account of how sentences of the language are used, then together these items induce a formal structure on the rest of the language. Somewhere inside this formal structure there lies a shadow of meanings. In Section 15.1 I describe the formal structure, and Section 15.2 is about how one extracts the meanings.

This paper is an extract from talks given during the last couple of years in meetings at the University of Düsseldorf, the Sorbonne, Stanford University and the Indian Institute of Technology, Bombay. I warmly thank my hosts and audiences in all those places.

1 Mathematical Theory

We consider a language L. For the present our grammatical assumptions about L will be thin. We assume that we can recognise expressions of L, and the grammatical constituents of these expressions. A constituent of an expression is again an expression in its own right. We know what it is to replace a constituent in an expression by another expression. To make these notions formal, we adopt the following definition.

Definition 15.1

  1. (a)

    By a constituent structure we mean an ordered pair of sets \((\mathbb{E},\mathbb{F})\), where the elements of \(\mathbb {E}\) are called the expressions and the elements of \(\mathbb {F}\) are called the frames, such that the four conditions below hold. (Here and below, e, f etc. are expressions. F, \(G(\xi)\) etc. are frames.)

    1. 1.

      \(\mathbb {F}\) is a set of nonempty partial functions on \(\mathbb {E}\). (“Nonempty” means their domains are not empty.)

    2. 2.

      (Nonempty Composition) If \(F(\xi_1,\ldots,\xi_n)\) and \(G(\eta_1,\ldots,\eta_m)\) are frames, \(1 \leqslant i \leqslant n\) and there is an expression

      $$F(e_1,\ldots,e_{i-1},G(f_1,\ldots,f_m),e_{i+1},\ldots,e_n),$$

      then

      $$F(\xi_1,\ldots,\xi_{i-1},G(\eta_1,\ldots,\eta_m),\xi_{i+1},\ldots,\xi_n)$$

      is a frame.

      Note that if \(H(\xi)\) is \(F(G(\xi))\) then the existence of an expression \(H(f)\) implies the existence of an expression \(G(f)\).

    3. 3.

      (Nonempty Substitution) If \(F(e_1,\ldots,e_n)\) is an expression, \(n > 1\) and \(1 \leqslant i \leqslant n\), then

      $$F(\xi_1,\ldots,\xi_{i-1},e_i,\xi_{i+1},\ldots,\xi_n)$$

      is a frame.

    4. 4.

      (Identity) There is a frame \(1(\xi)\) such that for each expression e, \(1(e) = e\).

  2. (b)

    We say that an expression e is a constituent of an expression f if f is \(G(e)\) for some frame G; e is a proper constituent of f if e is a constituent of f and \(e \neq f\).

  3. (c)

    We refer to \(F(e_1,f,e_3)\) as the result of replacing the occurrence of e 2 in second place in \(F(e_1,e_2,e_3)\) by f. (This notion depends on F, of course.)

  4. (d)

    We say that a set Y of expressions is cofinal if every expression of L is a constituent of an expression in Y.

Our only syntactic assumption on L (for the moment) is that L has a constituent structure. By the expressions of L we mean the expressions of its constituent structure.

Definition 15.2

By a partial semantics for L we mean a function \(\mu: X \to Y\) where X is a set of expressions of L. (There are no further requirements on Y or μ.) A total semantics for L is a partial semantics \(\mu: X \to Y\) where X is the set \(\mathbb {E}\) of all expressions.

The constituent structure and the partial semantics μ together induce two equivalence relations on the set of expressions as follows.

Definition 15.3

  1. (a)

    Let e, f be expressions of L and μ a partial semantics for L with domain X. We write \(e \equiv_{\mu} f\) if for every 1-ary frame \(G(\xi)\),

    1. (i)

      \(G(e)\) is in X if and only \(G(f)\) is in X;

    2. (ii)

      if \(G(e)\) is in X then \(\mu(G(e)) = \mu(G(f))\).

    We say e, f have the same μ -value, or for short the same fregean value, if \(e \equiv_{\mu} f\).

  2. (b)

    The relation ∼ μ is defined exactly as ≡ μ but with clause (ii) deleted.

We assume some choice of labels for the equivalence classes of ≡ μ , and we write \(|e|_{\mu}\) for the label of the equivalence class of the expression e. We call \(|e|_{\mu}\) the fregean value of e. The function \(|.|_{\mu}\) is a total semantics for L. I call it the fregean semantics (or among computer scientists, the fully abstract semantics).

Lemma 15.1 (Lifting Lemma)

Suppose \(F(e_1,\ldots,e_n)\) is a constituent of some expression in X, and for each i, \(e_i \equiv_{\mu} f_i\). Then:

  1. (a)

    \(F(f_1,\ldots,f_n)\) is an expression.

  2. (b)

    \(F(e_1,\ldots,e_n) \equiv_{\mu} F(f_1,\ldots,f_n)\).

Proof

By Nonempty Substitution we can make the replacements one expression at a time. So it suffices to prove the lemma when \(n = 1\).

Assume \(F(e)\) is an expression, \(H(F(e))\) is in X and \(e \equiv_{\mu} f\).

Claim (a): \(F(f)\) is an expression.

By Nonempty Composition \(H(F(\xi))\) is a frame \(G(\xi)\). Since \(e \equiv_{\mu} f\) and \(G(e)\) is in X, \(G(f)\) is in X. But \(G(f)\) is \(H(F(f))\), proving (a).

Claim (b): \(F(e) \equiv_{\mu} F(f)\).

Let \(G(\xi)\) be any 1-ary frame such that \(G(F(e))\) is an expression in X. By Nonempty Composition \(G(F(\xi))\) is a frame \(J(\xi)\). Since \(e \equiv_{\mu} f\) and \(J(e)\) is in X, \(J(f)\) is in X and \(\mu(J(e)) = \mu(J(f))\). So \(\mu(G(F(e)) = \mu(G(F(f))\) as required, proving (b).□

Note that the Lifting Lemma holds also for ∼ μ in place of ≡ μ ; in fact the proof of the lemma already proves this.

Lemma 15.2

Suppose the domain X of μ is cofinal (Definition 15.1(d)). Then there is, for each n-ary frame F, a n-ary map \(h_F: V^n \to V\), where V is the class of ≡μ-values, such that whenever \(F(e_1,\ldots,e_n)\) is an expression,

$$|F(e_1,\ldots,e_n)|_{\mu} = h_F(|e_1|_{\mu},\ldots,|e_n|_{\mu}).$$

Proof

By (b) of the Lifting Lemma, if \(e_i \equiv_{\mu} f_i\) for each i then

$$F(e_1,\ldots,e_n) \equiv_{\mu} F(f_1,\ldots,f_n)$$

provided these expressions exist. So F and the fregean values of the e i determine the fregean value of \(F(e_1,\ldots,e_n)\).□

The map h F of the lemma is essentially unique; its values are determined on all n-tuples that actually occur as values of the constituents \(e_1, \ldots, e_n\) of an expression \(F(e_1,\ldots,e_n)\). We call this map the Hayyan function of F.

The name is from the distinguished Andalusian linguist Abu Ḥayyān al-Gharnāṭī, who was born in Granada in 1256 and worked in Egypt until his death in 1345. His claims to fame include a textbook of Turkish for Arabs [7] and a monograph on Mongolian. See [11] for details of the following quotation from him.

I find it astonishing that people allow a sentence construction in a language, even when they have never heard a construction like it [before]. Are Arabic constructions different from the words in the dictionary? Just as one can’t use newly-invented single words, so one can’t use [newly-invented] constructions. Hence all these matters are subject to convention (waḍ c), and matters of convention require one to follow the practice of the speakers of the relevant language. The difference between syntax and lexicography is that syntax studies universal [rules], whereas lexicography studies items one at a time. These two sciences interlock in [describing] the conventions [on which language is based].

The word waḍ c is crucial for understanding this passage. It is the term used for the assumed imposition of words on their meanings in the origins of language. Abu Ḥayyān is arguing that syntactic constructions have meanings just as words do, but the meaning of a construction is “universal”—in modern terminology, it’s a function defined for all possible inputs to the construction.

Definition 15.4

  1. (a)

    Let ≃ be an equivalence relation on expressions. We say that ≃ is compositional if for every pair of expressions \(F(e_1,\ldots,e_n)\) and \(F(f_1,\ldots,f_n)\),

    $$e_1 \simeq f_1 \mathrm{ and } \ldots \mathrm{ and } e_n \simeq f_n \ \Rightarrow F(e_1,\ldots,e_n) \simeq F(f_1,\ldots,f_n).$$
  2. (b)

    Let φ be a function defined on expressions. We say that φ is compositional if for each expression \(F(e_1,\ldots,e_n)\), the value

    $$\varphi(F(e_1,\ldots,e_n))$$

    is determined by F and the values \(\varphi(e_i)\).

(Cf. Partee et al. [15] p. 318.) Parts (a) and (b) match; the fregean semantics \(|.|_{\mu}\) is compositional if and only if ≡ μ is compositional. In this terminology, Lemma 15.2 tells us that ≡ μ is in fact compositional. The same argument shows that ∼ μ is compositional too.

Lemma 15.3

Suppose \(e \equiv_{\mu} f\) and e is an expression in X. Then f is in X and \(\mu(e) = \mu(f)\).

Proof

This is immediate from the definition, by applying the identity frame \(1(\xi)\).□

It follows from Lemma 15.3 that there is a function p μ such that for every expression e in X, \(\mu(e) = p_{\mu}(|e|_{\mu})\). This function p μ is uniquely determined in a similar sense to the Hayyan functions. We will call it the read-out function of μ.

Lemma 15.4

The following are equivalent:

  1. (a)

    For all e, f in X, \(e \equiv_{\mu} f\) if and only if \(\mu(e) = \mu(f)\).

  2. (b)

    For all e, f in X and every frame \(F(\eta)\),

    $$\begin{array}{ll} & \mu(e) = \mu(f) \;and\;F(e) \in X \\ \Rightarrow & F(f) \in X\;and\; \mu(F(e)) = \mu(F(f)). \end{array}$$

Proof.

Again this is immediate from the definition.□

When the conditions of Lemma 15.4 hold, we can assume that the representatives \(|e|_{\mu}\) with e in X were chosen so that \(|e|_{\mu} = \mu(e)\). The read-out function p μ is then the identity function.

Our next result needs a further assumption on the constituent structure.

Definition 15.5

  1. (a)

    We say that the constituent structure is well-founded if there are no infinite sequences of expressions

    $$e_0, e_1, e_2, \ldots$$

    where for every n, \(e_{n+1}\) is a proper constituent of e n .

  2. (b)

    Assuming that the constituent structure is well-founded, we define the complexity \(c(e)\) of an expression e to be the least ordinal strictly greater than \(c(f)\) for each proper constituent f of e. A standard set-theoretic argument shows that c is a well-defined function from the set of expressions to the ordinal numbers. (If L is any natural language or a finitary formal language, then the complexity of each expression will be a natural number.) We say that an expression is an atom if its complexity is 0; otherwise the expression is complex.

Theorem 15.1 (Abstract Tarski theorem)

Let L be a language with a well-founded constituent structure, and μ a function whose domain is a cofinal set X of expressions of L. Then μ has a definition of the following form. A function ν is defined on all expressions of L by recursion on complexity. The basis clause is

  • \(\nu(e) = |e|_{\mu}\) for each atom e.

The recursion clause is

  • \(\nu(F(e_1,\ldots,e_n)) = h_F(\nu(e_1),\ldots,\nu(e_n))\) for each complex expression \(F(e_1,\ldots,e_n)\).

Then for each expression e in X,

$$\mu(e) = p_{\mu}(\nu(e)).$$

This concludes our main mathematical development. Two further directions are worth mention.

1.1 Equivalence of Frames

In analogy to the definition of ≡ μ on expressions, we can define ≡ μ on frames as follows. For simplicity I give the definition for binary frames, but it extends straightforwardly. Again we suppose we have a partial semantics \(\mu: X \to Y\).

Definition 15.6

Suppose \(F(\xi_1,\xi_2)\) and \(F'(\xi_1,\xi_2)\) are frames. Then \(F(\xi_1,\xi_2) \equiv_{\mu} F'(\xi_1,\xi_2)\) if and only if for all expressions e, f and frames \(G(\eta)\),

  1. (a)

    \(G(F(e,f))\) is in X if and only if \(G(F'(e,f))\) is in X, and

  2. (b)

    if \(G(F(e,f))\) is in X then \(\mu(G(F(e,f)) = \mu(G(F'(e,f))\).

Then ≡ μ on frames is an equivalence relation, and one can prove analogues of the results for ≡ μ on expressions.

1.2 The Case Where X Is Not Cofinal

Lemmas 15.1 and 15.4 extend μ (under the conditions in Lemma 15.4) to a compositional semantics ν on the set of all expressions that occur as constituents of expressions in X. A dual problem is to take a compositional semantics μ on a set X of expressions that is closed under constituents, and extend μ to a total compositional semantics. There is a trivial solution if we assume that \(\mu(e) = \mu(f)\) implies \(e \sim_{\mu} f\) (a condition called husserlian in [10]). When the husserlian condition fails, the problem is much harder. Dag Westerståhl [22] gave a positive answer under the assumption that L is a subset of a term algebra.

2 Commentary

2.1 The Freedom to Choose a Grammar

Broadly speaking, the results of Section 15.1 are independent of any choice of grammatical theory (at least among the better known theories), but the data delivered (\(\equiv_{\mu}, h_F\) etc.) is highly dependent on the choice of a constituent structure for a given language.

Lemma 13 of my [10] was a version of the Lifting Lemma under the stronger assumption that the language is a subset of a term algebra. (More precisely I assumed that it’s a homomorphic image of a subset of a term algebra, but in fact the work all takes place on the term algebra itself.) In Chapter 2 of his book [18] Mark Steedman argues for a more flexible notion of constituents. For example they might overlap. Not being a linguist, I can’t comment on the force of Steedman’s arguments. But they did convince me that the Lifting Lemma in [10] was not at the right level of generality. The version in this paper is the result, and it is certainly simpler than the earlier version (which is now a special case).

Are there other approaches to grammar that won’t accommodate the Lifting Lemma? Fortunately I didn’t need to hunt around on this, because at exactly the right time Edward Keenan and Edward Stabler published a notion of “bare grammar” [12] that is designed to pick out a common core from “otherwise rather different specific theories of grammar: Relational Grammar, Arc-Pair Grammar, Categorial Grammar, Lexical Functional Grammar, Head Driven Phrase Structure Grammar, Government Binding Theory/Minimalism, and others” ([12] p. 1). They build on the two notions of constituent and category.

Every bare grammar G yields a constituent structure in either of two ways. The first way is to take as the set of expressions the “language generated by G”, which is the closure of the lexicon of G under the syntactic rules of G, and as the set of frames the closure under substitution of the “generating or structure building functions” of G ([12] p. 14f). The resulting expressions are ordered pairs consisting of (at first approximation) a phrase and the category of the phrase; for example at word level we might find two expressions

$$({\mathrm{win,N}}),\;\;({\mathrm{win,VT}})$$
((15.2))

for the noun “win” and the transitive verb “win”. The second way of extracting a constituent structure from the bare grammar is to ignore the categories and take the expressions to be the first terms of these pairs.

Bare grammars are general enough to handle non-configurational languages, where the word order within clauses is not significant. One of the examples in [12] (p. 59ff) is “free order Korean”, a version of Korean that the authors have deliberately souped up to make it even less configurational than Korean normally is. I mention this because I was asked – for example about Sanskrit in Mumbai – whether the results of Section 15.1 make sense for non-configurational languages. For these languages one would expect a large number of equivalences of the form \(F(\xi_1,\xi_2) \equiv_{\mu} F(\xi_2,\xi_1)\).

It’s true that Section 15.1 applies to non-configurational languages. But it’s not the whole truth. Non-configurational languages tend to decorate each word stem with a cluster of prefixes, suffixes and infixes that carry the information given in other languages by the order. In fact most languages go some way down this road; thus Gothic handus is subject and handu is object. If you want to separate out the meanings of the stem hand and the suffixes, it makes sense to adopt a grammar where hand, -us and -u are all atoms. (Then probably you need a frame for combining stem and suffix into a single word.)

English has separated out the Old Germanic stem hand by dropping the suffixes. But you may want to think of the old -us and -u as implied constituents of a modern English sentence. One way of doing this is to have atoms NOM and ACC for nominative and accusative. In the resulting grammar “hand-NOM” and “hand-ACC” will be distinct expressions, even though “NOM” and “ACC” aren’t pronounced. In the same spirit you may want to regard the English word “me” as how we spell and pronounce what is really a compound word “I-ACC”. Or as (15.2) illustrates, you might want to count one inscription as two different words (for syntactic reasons in the case of (15.2), but reasons might come from anywhere).

And so on, really. A grammar for a language doesn’t arrive ready-made. To a great extent it’s a theoretical construct from the empirical data.

In that spirit, one way of looking at the results of Section 15.1 is as follows. We have a language L, and we propose a grammar for it. The grammar, together with a function μ describing the usage of sentences, induces fregean values, and we try to make sense of these values. If they are hopelessly obscure or perversely difficult to describe, then we look for a different grammar and/or a different way of describing the usage of sentences, and we try again. This is a familiar kind of heuristic.

2.2 Freedom in the Choice of \(|e|_{\mu}\)

The notion \(e \equiv_{\mu} f\) conveys that e and f make exactly the same contribution to the μ-values of expressions in X that have them as constituents. Is this the only possible formalisation of this informal notion?

It seems that it is. The key point to note is that the informal notion is an equivalence relation; so any formalisation of it must be an equivalence relation too. This immediately knocks out some alternatives that one might propose. Take for example the relation ≡ defined by:

For any expressions e and f, \(e \equiv^{\star} f\) if and only if for all frames \(F(\xi)\), if \(F(e)\) and \(F(f)\) are both in X then \(\mu(F(e)) = \mu(F(f))\).

If e and f are so unrelated that there is no frame \(F(\xi)\) for which both \(F(e)\) and \(F(f)\) are expressions, then \(e \equiv^{\star} f\). That should already make us uneasy. What truly knocks this definition on the head is that there is no reason at all to expect the defined relation ≡ to be transitive. I think if you try out some further possibilities, you will soon convince yourself that our definition of ≡ μ is the only one that makes sense in all cases.

When we know that two expressions have the same fregean value, this information on its own  doesn’t  tell us  what  that  fregean   value  actually is. This might seem a

devastating gap in the theory of Section 15.1, but in practice it isn’t. In concrete cases, when we find out how to tell whether two expressions have different fregean values, that information normally gives us all we need for assigning sensible labels to the equivalence classes. We will see some examples below.

2.3 The Centrality of Sentence Meanings

The meanings of sentences of a language are decisive for the meanings of all phrases of the language. There are many arguments to support this view; here are three.

(1) We express ourselves in sentences.

(2) In practice the normal method for distinguishing the meanings of two closely related words is to give a sentence using one of them, that would lose its meaning or become inappropriate if the other was used. As a typical example, Webster’s Dictionary of English Usage [8] distinguishes between “less” and “lesser” by calling on sentences from its corpus, including this from Virgil Thompson:

$$\ldots {\mathrm{lesser composers analyze music better than they used to}}.$$
((15.3))

I’m not sure whether the result of putting “less” in place of “lesser” in (15.3) means anything, but it certainly doesn’t mean the same as (15.3).

(3) Slightly more technical: the meaning of a word includes the semantic argument structure of the word. For example you don’t know the meaning of the word “mother” until you know that “my mother” normally means a person who bears a certain relation to me, not just someone who is both mine and a mother. (This is a semantic property of “mother”; to the best of my knowledge there are no purely syntactic differences between these two uses of the phrase “my mother”.) But words taken on their own don’t show their argument structure. To discover that structure you have to understand how the words fit into phrases. Granted not all phrases are sentences; but at least the sentences containing a word give enough evidence to place its semantic argument structure.

We can almost read off the following

NECESSARY CONDITION FOR DIFFERENCE OF MEANING. Let L be a language, X the set of sentences of L and for each sentence e let \(\mu(e)\) be the meaning of e. If two expressions have different meanings then they have different fregean values.

This condition is in danger of lapsing into triviality. Arguably if e and f are any two  distinct  expressions  of  a  natural   language,   then  there  will  be  at  least  one sentence e – even a sentence not using quotation – whose meaning changes if we put one expression in place of the other in e.

But there are degrees of synonymy. Phrases can have broadly similar meanings and differ in nuances; otherwise we’d have no use for the notion of a synonym. To investigate a weak kind of synonymy by means of fregean values, there are two possible moves to make. The first is to restrict to a limited part of the language, either by shedding vocabulary or by tightening the requirements of grammaticality. Dropping adverbs of degree removes a distinction between “angry” and “enraged” (you can be mildly angry but not mildly enraged, cf. Apresjan [1] p. 125). And so on. The second possible move is to weaken the notion of synonymy on sentences. For example if we take note only of when sentences are true or false, and ignore their social status, we cut away the difference in meaning between “intoxicated” and “pissed”. And so on.

In short, the apparatus of Section 15.1 gets its bite by being applied to limited parts of language – just as the physicist normally applies physical theory to bounded systems. Sections 15.2.6 and 15.2.7 will discuss some languages where Section 15.1 applies without any limitation. But these are austere formal languages of logic.

As a consequence, the Necessary Condition above will sometimes fail, but benignly. If there really is a difference between the meanings of e and f, it’s safe to assume that some difference between meanings of sentences will serve to show this, though we might have to strengthen the language or the function μ to show this. At worst we might have to accept interjections like “Ouch!” as sentences.

2.4 Splitting Syntax From Semantics

Assume a language L is given, together with a constituent structure and a partial semantics μ on sentences. We saw how this data yields two equivalence relations ∼ μ and ≡ μ on the class of expressions. Also ≡ μ is a refinement of ∼ μ : if \(e \not\sim_{\mu} f\) then \(e \not\equiv_{\mu} f\) too.

Now  granting  that  the  class  of  sentences  is  syntactically  distinguishable,  the relation ∼ μ is purely syntactic. Recall Chomsky’s Other Factors and the question of distinguishing them from syntax. Imagine that we can define a third equivalence relation ≈ μ , where \(e \thickapprox_{\mu} f\) holds if and only if e and f agree in their their meanings, i.e. those Other Factors that have a detectable effect on μ. Then we will have

$$e \equiv_{\mu} f \ \ \Leftrightarrow \ \ e \sim_{\mu} f \textup{ and } e \thickapprox_{\mu} f.$$
((15.4))

Equation (15.4) solves equation (15.1). There is always at least one solution of equation (15.4), namely where ≈ μ is ≡ μ . It should be obvious that in general there are bound to be many other solutions. This is where semantic theory takes off.

The common claim that meaning is compositional is a claim about the proper form of ≈ μ . By our general theory, ≡ μ and ∼ μ are both compositional; but it doesn’t follows that ≈ μ is. Here let me make a comment in passing. Some of the arguments in favour of the compositionality of meanings, in particular those which refer to processes by which an interpreter of sentences discovers their meanings, are fully met by the observation that ≡ μ is compositional; they provide no evidence that ≈ μ as well as μ must be compositional. But as we’ve seen, the compositionality of ≡ μ is a purely mathematical artefact. Unlike the compositionality of ≈ μ , it has no empirical content at all. I believe these facts go a long way towards accounting for the feeling that compositionality of meaning is both necessary and elusive. (There are other reasons of a methodological kind for wanting meanings to be compositional.)

My own belief is that the choice of ≈ μ within ≡ μ is never purely empirical; it always depends on a mixture of a priori theory and plain subjective taste. This is not the sort of thing one can hope to prove. But let me discuss four types of example.

Example 15.1

The chief difference between the English phrases “in spite of” and “notwithstanding” is that “notwithstanding” is usable as a postposition, whereas “in spite of” is always a preposition:

She looked stunning, her tight schedule notwithstanding.

⋆She looked stunning, her tight schedule in spite of.

Probably most people would say at once that this is a purely syntactic difference between “in spite of” and “notwithstanding”.

We can analyse this example as follows. Take \(F(\xi_1,\xi_2)\) to be the construction putting a preposition at ξ 1 in front of a noun at ξ 2, and \(F'(\xi_1,\xi_2)\) to be the construction putting a postposition at ξ 1 after a noun at ξ 2. Then

$$F(\mathrm{in spite of},\xi_2) \ \equiv_{\mu} \ F'(\mathrm{notwithstanding},\xi_2)$$
((15.5))

in the sense of Definition 15.6, and hence these two 1-ary frames have the same meaning. But F and F′ themselves have no semantic content; they are pure constructors. Hence the meanings of the 1-ary frames in (15.5) come from their first arguments. Furthermore these arguments never occur in any other context. So we should reckon that the two arguments have the same meaning.

Before we rest easy with this argument, take another example. It’s artificial but I hope it makes its point. Imagine a linguist coming on English for the first time, and recognising the morphemes “dis-” and “em-”. She notes that “disgruntled” and “embittered” mean near enough the same, and she infers that the difference between “gruntled” and “bittered” is the purely syntactic one that the first takes “dis-” and the second takes “em-”. This is essentially the same argument as above. But here it doesn’t work, because the prefix “dis-” carries a semantic content not in “em-”, namely that it negates what follows it.

As this second example emphasises, the argument for Example 15.1 rests on our already having decided that in English there is no semantic difference between prefixing and postfixing. Technically, it means we have already made some decisions about ≈ μ on frames as well as on expressions. (Not that this particular decision is controversial.)

Example 15.2

Another example, already referred to, is the difference between “I” and “me”. You might want to say that these words have the same meaning, in spite of the fact that there are almost no sentences in which we can change between “I” and “me” without altering or losing the sense.

There are several possible arguments to support this view. One crude argument is that in any context where they are uttered, the two terms have the same reference. Since it’s hard to maintain that reference is the whole of meaning, this argument is weak. A stronger argument would point to the practical advantages of analysing “I” and “me” as “I-NOM” and “I-ACC”, and then observe that the difference between NOM and ACC is purely syntactic. The use of semantic theory in this argument is clear. (Also I suspect not everybody would accept that NOM and ACC are semantically neutral. They have more than a hint of agent and patient.)

My guess is that in Examples 15.1 and 15.2 most people will find it quite easy to separate the syntax from the semantics. The purpose of the examples was to show the involvement of background semantic theory even in uncontroversial cases. In the next two examples it seems to me genuinely puzzling where or how one should draw the line. Perhaps the distinction in these cases is purely within one’s semantic theory, without any empirical content to it at all.

Example 15.3

Compare “murderer” and “person who has killed someone”:

A person who has killed someone most often already knew him or her.

A murderer most often already knew him or her.

The second sentence doesn’t carry the straightforward meaning of the first.

The sentences show that the phrase “person who has killed someone” includes a term for the object, which anaphoric pronouns can latch onto, whereas “murderer” has no such term. Is this a semantic difference or a purely syntactic one? How would one decide?

This example is complicated by the fact that there is another difference between the two expressions, as shown in

That man is the murderer of Caroline.

That man is the person who has killed someone of Caroline.

The second sentence is syntactically ugly, but the main point is that it can’t mean the same as the first. Together the sentences show that “murderer” has a semantic argument place that’s not available in “person who has killed someone”. I think this is a clear difference of meaning, but I could name leading linguists who seem to disagree.

Example 15.4

Compare “liked” and “enjoyed”. The general pattern seems to be that in frames where both words are acceptable, they carry the same sense. But “enjoyed” is limited to frames where it refers to an activity. For example “He enjoyed Bach” forces us to interpret as “He enjoyed listening to Bach”, or in some contexts “He enjoyed playing Bach”. “He enjoyed aluminium” is uninterpretable out of context. “He enjoyed Susan” strongly suggests an activity. (Pustejovsky [16] p. 135.)

With examples of this kind, the evidence for a semantic difference is that certain sentences don’t normally occur. But exactly the same evidence could be used to make a case that English syntax recognises certain distinctions (here, between activity verbs and other verbs). So which wins, syntax or semantics?

A formal description of Example 15.4 would say that there are expressions e and f such that \(e \not\sim_{\mu} f\) but \(F(e) \equiv_{\mu} F(f)\) in the many cases where \(F(e)\) and \(F(f)\) both exist. This seems to be a common phenomenon. An example with adjectives is “cold” and “cool”. Apresjan ([1] p. 52) notes that we tend not to use “cool” for felt sensations in particular parts of our bodies: “My fingers feel cool/cold”. In this case he opts for syntax, but he has an unusually articulate semantic theory in the background.

2.5 The Indeterminacy of Translation

Quine ([17] chapter ii) famously argues that the notion of a correct translation from one language to another is underdetermined. He allows that languages contain “observation sentences” which “wear their meanings on their sleeves”. Between sentences of this kind, translations are objectively right or wrong, up to the “normal inductive” uncertainties. But to extend the translation downwards to constituents of observation sentences is to make “analytical hypotheses”, and many different and incompatible analytical hypotheses might do the job.

Markus Werning [21], working as closely as possible from Quine’s own assumptions, argues that the Lifting Lemma (Lemma 15.1) puts serious constraints on the possible analytical hypotheses, and so very much reduces the indeterminacy of translation, at least for words and phrases that occur as constituents of observation sentences. Hannes Leitgeb [13] replies to Werning.

2.6 Tarski’s Definition of Truth

In 1933 Tarski [19] undertook to give a definition of the predicate “φ is a true sentence of the language L” where L is a fully interpreted formal language. One of his requirements was that the definition should use only notions from set theory and syntax, together with the notions expressible in L.

For the languages L that Tarski had in mind, the class of sentences is cofinal. Hence Theorem 15.1 applies, where X is the set of sentences and for each sentence φ, \(\mu(\varphi)\) is the truth value of φ. That theorem gives us the format of a truth definition. For Tarski’s purposes we need only check that syntax, set theory and “the notions expressible in L” suffice to define \(|.|_{\mu}\) on atoms, the Hayyan functions and the read-out function p μ .

The condition of Lemma 15.4 holds for Tarski’s languages L: If a sentence φ is a constituent of a sentence ψ, and we replace this constituent by a sentence φ with the same truth value as φ, the result is again a sentence, and it has the same truth value as ψ. So we can ignore the read-out function. With any reasonable notion of constituents for these languages, one can show that two formulas e, f have the same fregean value if and only if (1) e and f have the same free variables and (2) if α is an assignment to the free variables of e, then α satisfies e if and only if it satisfies f. Hence we can take \(|e|_\mu\) to be the ordered pair \((FV(e),|e|)\) where \(FV(e)\) is the set of variables free in e and \(|e|\) is the set of assignments to \(FV(e)\) which satisfy e. (Except when e is false under all assignments, \(|e|\) determines \(FV(e)\) anyway.) The set \(FV(e)\) is defined syntactically, and for atomic formulas \(R(x_1,\ldots,x_n)\) (the atoms of Tarski’s paper) it’s plausible that \(|R(x_1,\ldots,x_n)|\) is a “notion expressible in L”. It remains to define the Hayyan functions; Tarski’s set-theoretic definitions of them are familiar.

In short, Tarski’s truth definition in [19] is in all essentials the definition of truth given by the Abstract Tarski Theorem 15.1.

Actually Tarski doesn’t determine for each formula e the set \(|e|\) consisting of those assignments to the free variables of e that satisfy e; he determines the set consisting of those assignments to all variables that satisfy e. Some model theorists have preferred to rewrite his definition using \(|e|\), but it’s not what he wrote.

We can account for this discrepancy as follows. For Tarski’s languages, \(e \sim_{\mu} f\) if and only if \(FV(e) = FV(f)\). Write \(e \thickapprox_{\mu} f\) if and only if among the assignments to all variables, those that satisfy e are those that satisfy f. Then Equation (15.4) holds. As it happens, ≈ μ is compositional too.

Later Tarski and Vaught [20] gave a truth definition for uninterpreted model-theoretic first-order languages. In this setting one can ask for truth definitions at several different levels of generality:

  1. (a)

    Given a particular structure A and corresponding language L, define which sentences of L are true in A.

  2. (b)

    Given a particular signature σ of relation symbols and corresponding language L, define for each sentence of φ the class of σ-structures in which A is true.

  3. (c)

    The same as (b), but for all signatures simultaneously.

The Abstract Tarski Theorem applies as before. Case (a) is a special case of the 1933 definition. For case (b) one defines \(\mu(\varphi)\) to be the class of all σ-structures in which φ is true. It turns out that two formulas with the same signature have the same fregean value if and only if they have the same set of free variables and are logically equivalent. So the Abstract Tarski Theorem delivers the usual model-theoretic definition of satisfaction.

Case (c) can be an exercise, as can the question how to deal with constant and function symbols if they are included as atoms. The Abstract Tarski Theorem brings home the bacon again. In cases (b) and (c) there is a set-theoretical problem about the definition of μ and the definition of the resulting fregean value function \(|.|_{\mu}\). But if you know enough set theory to see the problem, you almost certainly know enough set theory to fix it.

2.7 Tarski-Style Semantics for Other Languages

People have sometimes asked whether this or that logical language “has a Tarski-style truth definition”. In cases where the language has infinitary relations, or its constituent structure is not well-founded, the Abstract Tarski Theorem doesn’t apply (at least in its present form). But in the vast majority of cases the Abstract Tarski Theorem does guarantee that there is a Tarski-style truth definition. That is, unless you have your own notion of what Tarski’s style is, in which case you should tell us.

One well-known case is the branching quantifier logic of Leon Henkin. Jon Barwise asserts on the first page of [2] that for this logic the meaning of a certain formula “cannot be defined inductively in terms of simpler formulas, by explaining away one quantifier at a time”. As it happens, the formula is tree-shaped. Our notion of constituent structure can cope with that, but there is no need, since Jaakko Hintikka translated Henkin’s formulas into linear form within his IF logic. The Abstract Tarski Theorem applies straightforwardly to IF logic, so we know that Barwise’s claim must be wrong. (We can write down the fregean values for Hintikka’s IF logic explicitly, as in [9].)

At the end of his paper Barwise proves his claim. What he actually proves (p. 75f with some small changes of notation) is that

The relation “φ is true in A” is not an inductive verifiability relation in the sense of Barwise-Moschovakis [3].

I don’t think anybody with any experience of Henkin quantifiers would have expected otherwise. In fact it’s possible to show [5] that fregean values for Henkin’s language can’t be given in terms of any notion of satisfaction by assignments of elements to variables.

Nevertheless the fregean semantics for IF and similar logics is a close analogue of Tarski’s; the Hayyan functions have an obvious resemblance to his. The semantics is natural enough that Rohit Parikh and Jouko Väänänen [14] can use it as a basis for a semantics for a “finite information logic”. The semantics uses sets of assignments rather than single assignments, but it’s similar enough to standard model-theoretic semantics to serve for defining fixed point operators in IF and similar logics, as Julian Bradfield [4] showed.

The existence of a tractable fregean semantics for IF logic also has a consequence for natural language semantics: A discrepancy between syntactic scope and semantic scope of quantifiers is no longer automatically a barrier to a Tarski-style formal semantics.