Nonmonotonic Logics: a Preferential Approach

https://doi.org/10.1016/S1874-5857(07)80010-0Get rights and content

Introduction

A logic is called non-monotonic, if it is so in the first argument. If | is the consequence relation, then T|ϕ need not imply T|ϕ for TT′.

Seen from classical logic, this is a surprising property, which is, however, imposed by the intended application. Non-monotonic logics are used for (among other things) reasoning with information of different quality. For instance, to take the most common example, the sentence “birds fly” of common sense reasoning does not mean that all birds fly, with “all” the classical quantifier, but that the majority of birds fly, the interesting ones fly, or something the like. It is a general information, which we are prepared to give up in the face of more specific or reliable (i.e. of better quality) information. Knowing that Tweety is a bird, and that it is a penguin, will make us believe that the property of the more special class, penguins, to be unable to fly, will override the general property. Thus bird (x)|fly(x), but bird(x) ∧penguin (x)|fly(x), and even bird(x) ∧penguin (x)|¬fly(x).

So, we can summarize: non-monotonic logics are an abstraction of principled reasoning with information of different quality (among other things).

Thus, they have their justification as a logic of artificial intelligence, which tries to imitate aspects of common sense reasoning.

There are several types of non-monotonic logics, the principal ones are perhaps:

  • defeasible inheritance

  • defaults

  • others, as

    • (1)

      autoepistemic logic

    • (2)

      circumscription

    • (3)

      logic programming and Prolog

    • (4)

      preferential reasoning

    • (5)

      theory revision

    • (6)

      theory update

The last two, theory revision and theory update, stand out, as their consequence relation takes two arguments on the left, e.g. for theory revision K and ϕ, and look at the consequences of K * ϕ, the result of revising K by ϕ. This property is, however, at least in the traditional AGM approach (due to Alchourron, Gärdenfors, Makinson, [1985]), only a superficial distinction, as K will be fixed. As a matter of fact, theory revision and update find their place naturally in the general nonmonotonic context.

Defeasible inheritance, however, is radically different from the other formalisms, as the reasoning follows predetermined paths, it does not have the flexibility of the other systems, it is deceptively simple, but is ridden with some deep problems, like extensions versus direct scepticism, and so on. They will not be treated here, the reader is referred to the book by Touretsky [1986], with some discussion also to be found in the author's [1997a].

Defaults take an intermediate position, they are based on classical logic, enriched with rules which work with consistency tests. They will also not be treated here, see Reiter [1980] for the original paper.

Concerning the remaining logics, the author of these lines has never done serious work, i.e. research, on autoepistemic logic, circumscirption, logic programming and Prolog, so he simply does not feel competent enough for a deeper presentation. Theory update is very close to counterfactual conditionals, which are relatively well known, and not very far from theory revision, though the quantifier is distributed differently at a decisive point: in the semantics of distance based theory revision, we look at all ϕ-models which are globally closest to the set of all K-models, whereas in theory update, we look at all ϕ-models, which are closest to some K-model (this makes it monotone in the first argument). We will not treat theory update here, either.

So we will focus on preferential reasoning and theory revision. This introductory remark will be further restricted, and we will discuss only preferential reasoning. The reason is simple: the basic problems and approaches are similar, so one can stand for the other.

There are two, even three approaches to preferential reasoning.

The first is by structural semantics: the consequences are not those formulas which hold in all (classical) models of a (classical) theory T, but those which hold in the preferred models of that theory — where preference is determined by a binary relation between classical models. As the preferred models are a subset of all models, we have a strengthening of classical logic. Note, that we have a superstructure over the set of classical models, just as a Kripke structure is on top of a set of classical models. This construction is the natural and basically very simple idea of preferential structures. They were first introduced in a different context, for deontic logic, by Hansson in [1971], and then rediscovered by Shoham [1987] and Siegel [1985] for non-monotonic logic, generalizing circumscription.

The second approach is by examination of natural rules for non-monotonic logics. Note that the important rule of monotony obviously fails, so reasoning becomes less regulated, so the need for other laws is strongly felt. Such rules are e.g. AND (if T|ϕ and T|ϕ then T|(ϕϕ)), OR (if ϕ|ψ and ϕ|ψ then also ϕϕ|ψ), etc. Such laws were examined first by Gabbay [1985] and Makinson [1994]. The connection between the two approaches was quite soon seen, and published in the seminal papers [Kraus et al., 1990] and [Lehmann and Magidor, 1992].

Finally, the third approach is intermediate, and considers the abstract algebraic properties of the choice functions defined by preference. The most important property is XY → μ(Y) ∩ X ⊆ μ(X), its validity is obvious: if yX is minimal or preferred in Y, y ∈ μ(Y), i.e. there is no y′ <y in Y, then there is no such y′ in X, so it must be minimal in X, too. (For historical reasons, preference increases downwards.) It is intermediate in the following sense: Such algebraic properties of the choice functions carry (almost) directly over to the logical properties of the generated consequence relation, and the hard work in representation is the construction of the relation from the properties of the choice function. Such choice function were considered in social choice, see the work by Aizerman, Arrow, Chernoff, Malishevski, Sen, [Aizerman, 1985; Aizerman and Malishevski, 1981; Arrow, 1959; Chernoff, 1954; Sen, 1970], and rediscovered in the context of (possibly infinite) representation problems by the present author, [Schlechta, 1992]. The connection was pointed out by Lehmann, [2001].

Where are the problems? Apart from general problems facing researchers in representation questions, i.e. to find the right construction techniques, there are specific issues to treat here: The first one is hidden in the “(almost)” of the preceding paragraph. In the general infinite case, it may well be that μ(M(T)), the set of minimal models of a theory does not correspond to any theory, i.e. it is not definable by a theory. In this case, the tight connection between semantics and logics is loosened, and the usual characterizations fail and cannot be recovered. The second one has to do with domain closure properties: if, e.g. the domain of definable model sets is not closed under finite unions (it is in classical logic, but not in all other logics, when we build preferential strutures on top of their models), this has far reaching consequences on possible characterizations. This is the subject of still ongoing research.

We begin with a short discussion of some concepts underlying nonmonotonic logics, and their development. The main emphasis of this text is, however, on formal results in Section 2, proof techniques, advanced problems and solutions in Section 3. In the latter part, we will in particular discuss definability preservation and domain closure properties, which are certainly also important for other fields in non-classical logic, and thus go beyond the framework of this Chapter.

Since the beginning of nonmonotonic logics, there was a development in two directions:

  • from fast and dirty common sense reasoning to reasoning about normality,

  • from rules to semantics.

Grosso modo, the second development was not followed by researchers who wanted to create rapidly a working system, whereas it was followed by that part of the community which was more foundations oriented, who wanted to understand what those new logics were about. And, there seems no better way than a formal semantics to understand a logic.

In the beginning, there was hope that somehow bundling information into normality would allow to simplify reasoning. Of course, this is partly true, we can subsume many cases under “normal cases” — and exceptions under “abnormal cases”, but this leaves two fundamental problems unsolved:

  • (1)

    is reasoning with normal cases more efficient?

  • (2)

    how do we know whether assuming normality is justified?

Solutions to the second problem via consistency — an often adopted idea — in some non-trivial logic is notoriously inefficient. As a consequence, researchers have turned to the perhaps more accessible question of what “normality” is, or, better, what its properties are.

The author has followed both re-directions, and this chapter will reflect it.

When we look at formal semantics of logics which are more or less tightly related to the question of reasoning about normality, we see that two basic concepts stand out: size and distance. These do not necessarily mean size and distance as we use them in every day life, or in usual mathematics, but they are sufficiently close to the usual concepts to merit their name. Size and distance can be used to define other notions too, like certainty, utility, etc., but this discussion would lead beyond this handbook chapter, and we refer the reader to [Schlechta, 2004] instead.

It is natural to interpret “normality” by some sort of “size”: “normality” might just mean “majority” (perhaps with different weight given to different cases), or something like “a big subset”. The standard abstraction of “big” is the notion of a filter (or, dually, an ideal is the abstraction of “small”). We include immediately a modification, the weak versions, to be discussed below. They seem to be minimal in the following sense: A reasonable abstract notion of size without the properties of weak filters seems difficult to imagine: The full set seems the best candidate for a “big” subset, “big” should cooperate with inclusion, and, finally, no set should be big and small at the same time.

Fix a base set X.

A (weak) filter on or over X is a set FP(X)- P(X) the power set of X -, s.t. (F 1) − (F 3) ((F 1), (F2), (F 3′) respectively) hold:

  • (F1)

    XF

  • (F2)

    ABX, AF imply BF

  • (F3)

    A, BF imply ABF

  • (F3′)

    A, BF imply AB ≠ ∅.

So a weak filter satisfies (F3′) instead of (F3).

An (weak) ideal on or over X is a set IP(X), s.t. (I1)-(I3) ((I1), (I2), (I3′) respectively) hold:

  • (I1)

    ∅ ∈ I

  • (I2)

    ABX, BI imply AI

  • (13)

    A, BI imply ABI

  • (I3′)

    A, BI imply ABX.

So a weak ideal satisfies (I3′) instead of (I3).

Elements of a filter on X are called big subsets of X, their complements are called small, and the rest have “medium size”. The set of the X-complements of the elements of a filter form an ideal, and vice versa.

Due to the finite intersection property, filters and ideals work well with logics: If ϕ holds normally, as it holds in a big subset, and so does ϕ′, ‘then ϕ ∧ ϕ′ will normally hold, too, as the intersection of two big subsets is big again. This is a nice property, but not justified in all situations, consider e.g. simple counting of a finite subset. (The question has a name, “lottery paradox”: normally no single participant wins, but someone wins in the end.) This motivates the weak versions, see Section 2.3 below for more details.

Normality defined by (weak or not) filters is a local concept: the filter defined on X and the one defined on X′ might be totally independent. Consider, however, the following two situations: Let Y′ be a big subset of X′, XX′, and Y′ ⊆ X. If “size” has any absolute meaning, then Y′ should be a big subset of X, too. On the other hand, let X and X′ be big subsets of Y, then there are good reasons (analogue to those justifying the intersection property of filters) to assume that XX′ is also a big subset of X′. These set properties are strongly connected to logical properties: For instance, if the latter property holds, we can deduce the logical property Cautious Monotony (see below for a formal definition): If ψ implies normally ϕ and ϕ′, because the sets X and X′ of ψ ∧ ϕ– models and ψ ∧ ϕ′-models are big subsets of the set Y of ψ-models, then ψ ∧ ϕ′ will imply normally ϕ too, as the set XX′ of ψ ∩ ϕ ∧ ϕ′-models will be a big subset of the set X′ of ψ ∧ ϕ′-models.

Seen more abstractly, such set properties allow the transfer of big subsets from one to another base set (and the conclusions drawn on this basis), and we call them “coherence properties”. They are very important, not only for working with a logic which respects them, but also for soundness and completeness questions, often they are at the core of such problems. The reader is invited to read the articles by Ben-David and Ben-Eliyahu [1994] and Friedman and Halpern [1995], which treat essentially the same questions in different languages (and perhaps their comparison by the author in [Schlechta, 1997b] and [Schlechta, 2004]).

Suppose we have a (by some criterion) ideal situation — be it realistic or not. “Normality” might then be defined via some distance: normal situations are the cases among those considered which have minimal distance from the ideal ones. “Distance” need not be a metric, it might be symmetric or not, it might respect identity (only x has distance 0 to x), it might respect the triangle inequality or not, it may even fail to be a total order: the distance from x to y might be incomparable to the distance from x′ to y′.

We define distance or pseudo-distance for our purposes as:

d : U × UZ is called a pseudo-distance on U iff (d1) holds:

  • (d1)

    Z is totally ordered by a relation <.

    If, in addition, Z has a < —smallest element 0, and (d2) holds, we say that d respects identity:

  • (d2)

    d(a, b) = 0 iff a = b.

    If, in addition, (d3) holds, then d is called symmetric:

  • (d3)

    d(a, b)= d(b, a).

    (For any a, bU.)

    Let ≤ stand for < or =.

Note that we can force the triangle inequality to hold trivially (if we can choose the values in the real numbers): It suffices to choose the values in the set {0}∪ [0.5, 1], i.e. in the interval from 0.5 to 1, or as 0. This remark is due to D. Lehmann. (Usually, we will only be interested in the comparison of distances, not in their absolute values, so we can thus make the triangle inequality hold trivially.)

A preference relation is, in its most general form, just an arbitrary binary relation ≺, expressing different degrees of normality or (for historical reasons, better:) abnormality. We will then not so much consider all elements of a (model) set, but only the “best” or ≺— minimal ones, and reason with these “best” elements. We thus define a logic by T|ϕ iff in the ≺— best models of τϕ holds. (It is reasonable to assume here for the moment that such best models always exist, if there are any T-models at all.) Preferential models are formally defined in Definitions 7 and 8 below for the “minimal” version, and Definition 75 for the “limit version” — see there for an explanation.

To see the conceptual connection between distance and preference, consider the following argument: a is preferred to b iff the distance from an ideal point ∞ to a is smaller than the distance from ∞ to b.

This might be the moment to make our “situations” more precise: In most cases, they will just be classical propositional models, (almost) as in Kripke semantics for modal and similar logics, or as in Stalnaker-Lewis semantics for counterfactual conditionals (which, by the way, work with distances, too).

A natural distance for such classical models is (at least in the finite case) the Hamming distance: the distance between m and m′ is the number of propositional variables (or atoms) in which they differ.

Finally, when we consider e.g. situations developing over several steps, e.g. for iterated update, we might be interested to form sums e.g. of distances between situations (now, of course, absolute values will matter). Here, well-known algorithms to solve systems of (in) equalities of sums are useful to investigate representation problems. The reader is referred to [Schlechta, 2004] for details.

Before we turn to historical remarks to conclude this introduction, we will introduce some definitions which are basic for the rest of this Chapter of the Handbook.

We will assume the Axiom of Choice throughout this chapter.

We use P to denote the power set operator, Π{Xi : iI} := {g : g : I → ∪ {Xi : iI}, ∀iI.g(i) ∈ Xi} is the general cartesian product, card(X) shall denote the cardinality of X, and V the set-theoretic universe we work in - the class of all sets. Given a set of pairs χ, and a set X, we denote by χ⌈X := {〈x, i ∈χ : xX}.

AB will denote that A is a subset of B or equal to B, and AB that A is a proper subset of B, likewise for AB and AB.

Given some fixed set U we work in, and XU, then C(X):= U — X.

≺* will denote the transitive closure of the relation ≺. If a relation <, ≺, or similar is given, ab will express that a and b are < — (or ≺—) incomparable — context will tell.

A child (or successor) of an element x in a tree t will be a direct child in t. A child of a child, etc. will be called an indirect child. Trees will be supposed to grow downwards, so the root is the top element.

A subsequence σi : iI ⊆ μ of a sequence σi : i ∈ μ is called cofinal, iff for all i ∈ μ there is i′ ∈ I ii′.

Unless said otherwise, we always work in propositional logic.

If L is a propositional language, v(L) will be the set of its variables, ML the set of its classical models, ϕ, etc. shall denote formulas, T, etc. theories in L, and M (T) or MTML the models of T, likewise for ϕ.

A theory will just be an arbitrary set of formulas, without any closure conditions.

For any classical model m, let Th(m) be the set of formulas valid in m, likewise Th(M):= {ϕ : m ⊨ ϕ for all mM}, if M is a set of classical models. ⊨ is the sign of classical validity. For two theories T and T′, let TT′ := {ϕ ∨ ψ : ϕ ∈ T, ψ ∈ T′}. ⊥ stands for falsity (the double use will be unambigous).

T¯L will denote the closure of T under classical logic, and ⊨ the classical consequence relation, thus T¯:={ϕ:Tϕ}. Given some other logic |,T¯¯ will denote the set of consequences of T under that logic, i.e. T¯¯:={ϕ:T|ϕ}.

Con(T) will say that T is classically consistent, likewise Con(ϕ), etc.

Note that the double bar notation does not really conflict with the single bar notation: closing twice under classical logic makes no sense from a pragmatic point of view, as the classical consequence operator is idempotent.

DLP (ML) shall be the set of definable subsets of ML, i.e. ADL iff there is some TL s.t. A = MT. If the context is clear, we omit the subscript L from DL.

For XP(ML), a function μ : XP(MT) will be called definability preserving, iff μ(Y) ∈ DL for all YDLX.

We recall the following basic facts about definable sets. The reader should be familiar with such properties.

  • (1)

    ∅, MLDL.

  • (2)

    DL contains all singletons, is closed under arbitrary intersections and finite unions.

  • (3)

    If v(L) is infinite, and m any model for L, then M := ML − {m} is not definable by any theory T. (Proof: Suppose it were, and let ϕ hold in M′, but not in m, so in m ¬ ϕ holds, but as ϕ is finite, there is a model m′ in M′ which coincides on all propositional variables of ϕ with m, so in m′ ¬ ϕ holds, too, a contradiction.)

  • (4)

    There is an easy cardinality argument which shows that in the infinite case, there are many more not definable than definable model sets: If k = card(v(L)), then k is also the size of the set of L-formulas, so there there are 2k L–theories, thus at most 2k definable model sets. Yet there are 2k different models, so 2(2k) model sets. This arguments complements above 3., one is constructive, the other shows the cardinality difference.

We conclude this introduction by some very short historical remarks, and by putting our approach into a more general perspective.

Preferential structures or models were first considered by Hansson as a semantics for deontic logic, see [Hansson, 1971], where the relation expresses moral quality of a situation. They were re-discovered independently by Shoham [1987] and Siegel [1985] (in the latter case, in the limit version, see below in Section 3.4) as an abstract semantics for reasoning about normal cases. Distance based semantics were, to the author's knowledge, first considered in the Stalnaker-Lewis semantics for counterfactuals (see [Lewis, 1973]), and introduced as a semantics for theory revision (see below) by Lehmann, Magidor, and the author, see [Lehmann et al., 2001]. Filter or weak filter based semantics for reasoning with and about normality were introduced by the author in a first order setting, and re-discovered independently by Ben-David and Ben-Eliyahu [1994] and Friedman and Halpern [1995], which treat essentially the same questions in different languages. The various properties of choice functions in a finite setting were considered by economists in the context of social choice, and re-discovered indepently by the author in the infinite setting for his representation proofs.

The contents of Section 2.4 is, of course, due to Alchourron, Gärdenfors, and Makinson, see [1985], part of the work of Kraus, Lehmann, and Magidor, [Kraus et al., 1990], and [Lehmann and Magidor, 1992] is described in Section 2.2. Plausibility logic in Section 3.3 is due to Lehmann, [1992a; 1992b]. The rest of the material - with some small exceptions, indicated locally — is due to the author, and presented in detail in his book, [Schlechta, 2004].

It might have become clear implicitly that we concentrate on the model side, more precisely, we will consider as consequences of a formula or theory those formulas which hold in some set of classical models — chosen in an adequate manner. As this approach works with some choice of model sets, as do Kripke semantics for modal logic, we call such logics generalized modal logics. This covers preferential logics, the logics of counterfactual conditionals, theory revision and theory update, and, of course, usual modal logic.

This general approach is quite liberal, but it already has some consequences, and thus excludes some ways of reasoning: the set of consequences will be closed under classical logic (and will contain the original information if the chosen set is a subset of the models of this original information — as will often be the case). On the other hand, e.g. obligations cannot be modelled this way, as classical weakening will hold in our approach, but from the obligation not to steal, we cannot conclude the obligation not to steal or to kill our grandmother.

Section snippets

Acknowledgements

I would like to thank David Makinson for valuable discussion and suggestions.

Daniel Lehmann kindly gave permission to re-use the material of the joint paper [Lehmann et al., 2001].

The following editors kindly gave permission to re-use the following material published before:

Almost all of the material presented here was published before by Elsevier in the author's [Schlechta, 2004].

The material on a nonsmooth model of cumulativity (in Section 2.2) was published by Oxford University Press in

First page preview

First page preview
Click to open first page preview

References (0)

Cited by (1)

View full text