Epistemic Utility Arguments for Probabilism

First published Fri Sep 23, 2011

Our beliefs come in degrees; we believe some more strongly than others. We call the strengths of our beliefs our credences. Suppose I know that a die is to be rolled, and I believe that it will land on six more strongly than I believe that it will land on an even number. In this case, we would say that there is something wrong with my credences, for if it lands on six, it has landed on an even number. I ought not to believe a proposition more strongly than I believe any of its logical consequences. This is a consequence of a popular doctrine in epistemology called Probabilism, which says that our credences at a given time ought to satisfy the axioms of the probability calculus (given in detail below). Since this says something about how our credences ought to be rather than how they in fact are, we call this an epistemic norm.

Suppose next that I satisfy Probabilism before the die is rolled, and that I divide my credences equally over the possible outcomes of the roll: that is, I assign a credence of 1/6 to each. The die is then rolled and I learn that it landed either on one or on two. If my credence that the die landed on one becomes 1/3, whilst my credence that it landed on two becomes 2/3, we would again say that there is something wrong with my credences. This time it is not Probabilism that I have violated, but Conditionalization, an epistemic norm that governs how we should update our credences when we learn new information: in the example, the information is about how the die landed. Roughly speaking, Conditionalization says that I should remove all my credence from the outcomes that I have learned did not happen and divide this amongst the remaining outcomes in proportion to the initial credence I had in each. We state Conditionalization precisely below.

In this entry, we explore a particular strategy that we might deploy when we wish to establish an epistemic norm such as Probabilism or Conditionalization. It is called epistemic utility theory, or sometimes cognitive decision theory. I will use the former. Epistemic utility theory is inspired by traditional utility theory, so let's begin with a quick summary of that.

Traditional utility theory (also known as decision theory) explores a particular strategy for establishing the norms that govern which actions it is rational for us to perform in a given situation. The framework for the theory includes states of the world, actions, and, for each agent, a utility function, which takes a state of the world and an action and returns a measure of the extent to which the agent values the outcome of performing that action at that world. We call this measure the utility of the outcome at the world. For example, there might be just two relevant states of the world: one in which it rains and one in which it does not. And there might be just two relevant actions from which to choose: take an umbrella when one leaves the house or don't. Then my utility function will measure how much I value the outcomes of each action at each state of the world: that is, it will give the value of being in the rain without an umbrella, being in the rain with an umbrella, being with an umbrella when there is no rain, and being without an umbrella when there is no rain. Sometimes, we also have, for each agent, a credence function, which takes a state of the world and returns a measure of the agent's credence that the world is in that state. In our example, the credence function would give my credence that it will rain and my credence that it will not. With this framework in hand, we can state certain very general norms of action in terms of it. For instance, we might say that an agent ought to perform a particular action if, for every possible state of the world, that action has the highest utility at that state of the world amongst all possible actions. This norm is called Dominance. Or we might say that an agent ought to perform an action that has maximal expected utility, where the expected utility of an action is obtained by weighting its utility at each state of the world by the credence assigned to that state of the world, and summing. This norm is called Maximize Expected Utility. Thus, for instance, it would be rational for me to take my umbrella if I believe that it will rain exactly as strongly as I believe that it won't and if I dislike being in the rain without an umbrella more than I like being without an umbrella when it isn't raining.

In epistemic utility theory, the states of the world remain the same, but the possible actions an agent might perform are replaced by the possible epistemic states she might adopt, and the utility function is replaced, for each agent, by an epistemic utility function, which takes a state of the world and a possible epistemic state and returns a measure of the purely epistemic value that the agent would attach to being in that epistemic state at that state of the world. So, in epistemic utility theory, we can appeal to epistemic utility to ask which of a range of possible epistemic states it is rational to adopt, just as in traditional utility theory we appeal to utility to ask which of a range of possible actions it is rational to perform.

Again, certain very general norms may be stated, such as the obvious analogues of Dominance and Maximize Expected Utility from above. Thus, before the die is rolled, we might ask whether I should adopt an epistemic state in which I believe that the die will land on six more strongly than I believe that it will land on an even number. And we might be able to show that I shouldn't because there is some other epistemic state I could adopt instead that will have greater epistemic utility however the world turns out. In this case, we appeal to the epistemic version of Dominance to show what is wrong with my credences. This is an example of how epistemic utility theory might come to justify Probabilism. As we will see, arguments just like this have indeed been given. And arguments that appeal to an epistemic version of Maximize Expected Utility have been given to justify Probabilism and Conditionalization. In this entry, we explore these arguments.

1. Modelling Epistemic States

In formal epistemology, epistemic states are modelled in many different ways. Given an epistemic agent and a time t, we might model her epistemic state at t using any of the following: the set of propositions she believes at t; the set of propositions she believes at t together with an entrenchment ordering, which specifies the order in which she is prepared to abandon her beliefs in the light of conflicting evidence; her credence function at t; the set of credence functions, each of which is a precisification of her otherwise vague credences at t; her upper and lower probability functions at t. Epistemic utility theory may be applied to any one of these ways of modelling epistemic states. Whichever we choose, we define an epistemic utility function to be a function that takes an epistemic state modelled in this way, together with a state of the world, to a real number that measures the epistemic utility of having that epistemic state at that world.

However, the vast majority of work carried out so far in epistemic utility theory has taken an agent's epistemic state at time t to be modelled by her credence function at t. And, in any case, the Bayesian norms that interest us here govern agents modelled in this way. Thus, we focus on this case.

So, henceforth, we model an agent's epistemic state at t by her credence function at t. We assume that the set of propositions about which an agent has an opinion forms an algebra F: that is, it contains a contradictory proposition (⊥) and a tautologous proposition (⊤), and it is closed under disjunction, conjunction, and negation. Thus, if the agent has only an opinion about whether or not it will rain, then the algebra contains the contradictory proposition, the tautologous proposition, and the propositions It will rain and It will not rain. We then assume that our agent's credence in a proposition in F can be measured by a real number. Then her credence function at t is a function b from F to the real numbers ℜ. If A is in F, then b(A) is our agent's credence in A at t. Throughout, we denote by BF the set of possible credence functions b : F → ℜ.

Much of what follows depends on certain assumptions about the algebra F. For instance, many of the arguments presented will assume that F is finite. That is, our credence functions model agents who have opinions only about a finite set of propositions. They will also often assume that the propositions that form the algebra are non-indexical or non-self-locating. By this, we mean that they say something only about the world; they do not also say something about where or when in the world the agent is located. Thus, the proposition It rains in Bristol at noon on 1st January 2011 is non-indexical, while the proposition It rains here now is indexical. Again, I will indicate when this assumption can be relaxed.

So, an epistemic utility function EU takes a credence function b, together with a model of the way the world might be, to a measure of the epistemic utility of having that credence function if the world were that way. In fact, we call such an epistemic utility function a global epistemic utility function, since it applies to the whole epistemic state. A local epistemic utility function applies only to a certain sort of proper part of an epistemic state. For instance, a local epistemic utility function might take a particular credence in a particular proposition, together with a model of the way the world might be, to a measure of the epistemic utility of having that credence in that proposition if the world were that way.

Although it is not an essential feature of epistemic utility theory, in this article, we model a way the world might be as a classically consistent assignment of truth values to the propositions in F, that is, as a classical valuation function. We let VF denote the set of such consistent truth assignments or valuations v : F → {0, 1}. Note that VFBF. That is, each truth assignment to F is a credence function on F; indeed, it is the credence function of a maximally opinionated agent. Note also that, in algebras that contain atomic propositions, there is a natural one-one correspondence between the atomic propositions of the algebra and the consistent assignments of truth values to the propositions of the algebra. After all, the atoms of F are maximally opinionated propositions. Thus, given an atomic proposition, there is exactly one valuation that makes it true; and given a valuation, there is exactly one atomic proposition that it makes true. We will often abuse notation and use the same notation for a truth assignment and its corresponding atomic proposition (when F has atoms). Since we typically assume that the propositions in F are non-indexical, the worlds represented by the valuation functions v in VF are ordinary possible worlds; they are not so-called centred worlds, which we might think of as pairs consisting of a world and a spatiotemporal position in it (Quine 1969).

2. The Form of Arguments in Epistemic Utility Theory

In epistemic utility theory, we attempt to justify an epistemic norm N using the following two ingredients:

  • Q. A norm of standard utility theory/decision theory, which is to be applied to epistemic utility functions.
  • E. A set of conditions that an epistemic utility function must satisfy.

Typically, the inference from Q and E to N goes via a mathematical theorem, which shows that, applied to any epistemic utility function that satisfies the conditions E, the norm Q entails the norm N.

Given that the extant arguments of epistemic utility theory share this common form, we might organize these arguments by the norms they attempt to justify, or by the norms of standard utility theory they employ, or by the set of rational constraints on epistemic utility functions they impose. We will take the latter course in this survey. In the next section, we state Probabilism and Conditionalization precisely: listing them here, we can refer back to them later. Then, in the three sections that follow, we consider different rational constraints on epistemic utility functions and we present the arguments for the norms that they have been used to give, as well as the objections that have been raised against these arguments.

3. The Putative Epistemic Norms

The most famous putative norms for credence functions are those that comprise orthodox Bayesianism: they are Probabilism, Countable Additivity, and Conditionalization. Probabilism and Countable Additivity are synchronic norms: that is, each states a condition of rationality for individual credence functions that represent a single agent's epistemic state at different times. Conditionalization, on the other hand, is a diachronic norm: that is, it states a condition of rationality for pairs of credence functions. These are the norms on which we will focus below, though we will consider others in passing. We state each here so that we may refer back to them later. Those familiar with these norms may wish to skip to the next section.

Probabilism is a coherence constraint on credence functions. It is often likened to the consistency constraint on sets of full beliefs.

Probabilism. An agent's credence function b at a given time ought to be a probability function on the algebra F:

  • b(⊥) = 0 and b(⊤) = 1;
  • 0 ≤ b(A) ≤ 1, for all A in F;
  • b(AB) = b(A) + b(B), for all mutually exclusive A and B in F.

Henceforth, we denote the set of such functions PF. Clearly, VFPFBF.

Note that any agent who satisfies Probabilism must be logically omniscient: that is, she must be certain of every tautology.

Countable additivity is a further property that a credence function may or may not have when the algebra F on which it is defined is a σ-algebra: that is, F is closed under infinite disjunction. The norm Countable Additivity says that an agent's credence function at any time ought to have this property:

Countable Additivity. An agent's credence function b at a given time ought to be a countably additive probability function on the σ-algebra F: b ought to be a probability function satisfying

  • b(∨i in I Ai) = Σi in I b(Ai) for all countable sets {Ai : i in I} of mutually exclusive propositions in F. (A set is countable if it can be put into one-one correspondence with the set of natural numbers {1, 2, … }. )

Thus, for instance, suppose that there is a lottery with a countable infinity of tickets t1, t2, … . And suppose that the algebra F contains the proposition Ticket tn will win for every n = 1, 2, … . Then it is a consequence of Countable Additivity that, if an agent's credence function is defined on F, then her credence in the proposition Some ticket will win ought to be equal to the infinite sum of her credences in each of the individual propositions Ticket tn will win. It follows from this that no agent who satisfies this norm can assign to each proposition Ticket tn will win the same credence.

Conditionalization is an updating rule. That is, it describes a way of updating one's credence function in the light of a piece of evidence that comes in the form of a proposition learned with certainty. The norm Conditionalization says that an agent ought to update by following that rule:

Conditionalization. Suppose that, between times t and t′, our agent learns proposition E with certainty, and nothing more. And suppose that b and b′ are the agent's credence functions at t and t′ respectively. Then, if b(E) > 0, it ought to be the case that b′(•) = b(• | E).

Intuitively, Conditionalization says that, when we learn E with certainty, we remove all credence that we originally assigned to the worlds in which E is false and distribute it over the worlds in which E is true in proportion to our original credence in them and in such a way that the resulting credence function obeys Probabilism. It follows that we multiply our credence in each of these worlds by a factor of 1/b(E).

4. The Arguments of Epistemic Utility Theory

In the next three sections, we describe the arguments that have been given for these three putative epistemic norms using the strategy of epistemic utility theory. As mentioned above, we group them according to the rational constraints they place on epistemic utility functions. In fact, in epistemic utility theory, it turns out to be much easier to deal with epistemic disutility functions, rather than epistemic utility functions. The two are interchangeable. If EU is a utility function, then −EU is a disutility function, and vice versa. In sections 5 and 6, we identify a specific epistemic goal and treat epistemic disutility functions as measures of the distance of an epistemic state from that goal in a given situation; we lay down conditions that it is claimed all such measures must satisfy. In section 7, we take an alternative route: we lay down putative general conditions on any epistemic disutility function, which it is claimed such a function must satisfy regardless of whether or not it is a measure of distance from a specified epistemic goal.

5. Calibration Arguments

In this section, we consider the conditions imposed on an epistemic disutility function when we treat it as a measure of the distance of an epistemic state from the goal of being actually or hypothetically calibrated (Shimony 1988), (van Fraassen 1983), (Lange 1999). We say that a credence function is actually calibrated at a particular possible world if the credence it assigns to a proposition matches the relative frequencies with which propositions of that kind are true at that world. Thus, credence 0.2 in proposition A is actually calibrated if one-fifth of propositions like A are actually true. And we say that it is hypothetically calibrated if the credence it assigns to a proposition matches the limiting relative frequency with which propositions of that kind would be true were there more propositions of that kind. Thus, credence 0.2 in proposition A is hypothetically calibrated if, as we move to worlds with more and more propositions like A, the proportion of such propositions that are true approaches 0.2 in the limit. According to the calibration arguments, matching the relative frequencies or limiting relative frequencies is an epistemic goal. And they attempt to justify Probabilism and Conditionalization by appealing to this goal and measures of distance from it.

5.1 Calibration measures

First, we must make precise what we mean by actual and hypothetical calibration; then we can say which functions will count as measuring distance from these putative goals. We treat actual calibration first. Since we are talking of relative frequencies, we will need to assign to each proposition in F its reference class: that is, the set of relevantly similar propositions. Thus, we require an equivalence relation ∼ on F, where AB iff A and B are relevantly similar. For instance, if our algebra of propositions contains Heads on first toss of coin, Heads on second toss of coin, and Six on first roll of die, we might plausibly say that the first two are relevantly similar, but neither first nor second is relevantly similar to the third. Proponents of calibration arguments do not claim to give an account of how the equivalence relation is determined. Nor do they claim that there is a single, objectively correct equivalence relation on a given algebra of propositions: this is the notorious problem of the reference class that haunts frequentist interpretations of objective probability. Rather they treat the equivalence relation as a component of the agent's epistemic state, along with her credence function. Indeed, for van Fraassen, it is determined entirely by the credence function together with the form of the propositions in F (van Fraassen 1983, p. 299). However, they do impose some rational constraints on ∼ in order to establish their conclusion. We will not discuss these conditions in any detail. Rather we denote them C(∼), and keep in mind that this is a placeholder for a full account of conditions on ∼. Detailed accounts of these conditions have been given by (van Fraassen 1983) and (Shimony 1988). We say that a credence function b, together with an equivalence relation ∼, is perfectly calibrated or not relative to a way the world might be, which we model by a consistent truth assignment v in VF. We are now ready to give our first definitions; but we preface these with an example.

Suppose a coin is to be flipped 1000 times. And suppose that A is the proposition Heads on toss 1. And suppose that the propositions that are relevantly similar to A in algebra F are: Heads on toss 1, …, Heads on toss 1000. Finally, suppose that v is a consistent truth assignment that represents a possible world. Then the relative frequency of A at v (written Freq(F, A, ∼, v)) is the proportion of the propositions relevantly similar to A that are true at v: that is, the frequency of heads amongst the 1000 coin tosses at that world. For instance, if every second toss lands heads, then Freq(F, A, ∼, v) = ½.

Now we give the definition in full generality. Suppose ∼ is an equivalence relation on F, and v is in VF. Then:

  • For each A in F, the relative frequency of truths amongst propositions like A is

    Freq(F, A, ∼, v) :=
    |{X in F : XA and v (X) = 1}|
    |{X in F : XA}|

    where |X| is the cardinality of the set X.

  • Relative to ∼, the credence r in proposition A is actually calibrated at v if r = Freq(F, A, ∼, v).

The idea is that, if ∼ satisfies constraints C(∼), then the function Freq(F, •, ∼, v) is always a probability function on F.

It is clear from this definition that the calibration arguments will work only for finite algebras F. For an infinite algebra, the definition just given will often make no sense, since the cardinalities of the two sets involved in the ratio will often be infinite.

Next, we treat hypothetical calibration. For this, we need the notion of the limiting relative frequency of truths amongst propositions of a certain sort. The idea is that, for each proposition A in F, there is not just a fact of the matter about what the frequency of truths amongst propositions like A actually is; there is also a fact of the matter about what the frequency of truths amongst propositions like A would be, if there were more propositions like A. For instance, there is not just a fact of the matter about how many actual tosses of a given coin will land heads; there is also a fact of the matter about the frequency of heads amongst hypothetical further tosses of the same coin. In general, suppose we have a consistent truth assignment v in VF (representing a possible world), an extension F′ of F (containing new propositions like A), and an extension ∼′ of ∼ to cover the new propositions in F′. Then there is a single unique number Freq(F′, A, ∼′, v) that gives what the relative frequency of truths amongst propositions like A would be were there all the propositions in F′ and were the relation of similarity amongst them given by ∼′, where this counterfactual is evaluated at the world represented by v. Again, let us illustrate this using our example of the coin toss from above.

Suppose again that A is the proposition Heads on toss 1 and that the propositions in F that are relevantly similar to A according to ∼ are Heads on toss 1, …, Heads on toss 1000. Now suppose that F1 extends F by introducing a new proposition about a further hypothetical toss of the coin (as well as perhaps other propositions). That is, it introduces Heads on toss 1001 (and closes out under negation, disjunction, and conjunction). And suppose that ∼1 extends ∼, so that the new proposition Heads on toss 1001 is considered relevantly similar to each Heads on toss 1, …, Heads on toss 1000. Then those who appeal to hypothetical limiting frequencies must claim that there is a unique number that gives what the frequency of heads would be, were the coin tossed 1001 times. They denote this number Freq(F1, A, ∼1, v). Now suppose that F2 extends F1 by adding the new proposition Heads on toss 1002 and ∼2 extends ∼1, so that the new proposition Heads on toss 1002 is considered relevantly similar to each Heads on toss 1, …, Heads on toss 1001. And so on. Then the limiting relative frequency of A at v (written LimFreq(F, A, ∼, v)) is the number towards which the following sequence tends:

Freq(F, A, ∼, v), Freq(F1, A, ∼1, v), Freq(F2, A, ∼2, v), …

In general, for each algebra F and equivalence relation ∼, there is an infinite sequence

(F, ∼) = (F0, ∼0), (F1, ∼1), (F2, ∼2), …

of algebras and equivalence relations such that each Fi+1 is an extension of Fi and each ∼i+1 is an extension of ∼i and, for all i, C(∼i). Using this, we can define the notion of limiting relative frequency and the associated notion of hypothetical calibration in full generality. Suppose ∼ is an equivalence relation on F and v in VF. And suppose (F, ∼) = (F0, ∼0), (F1, ∼1), (F2, ∼2), … is the sequence just mentioned.

  • For each A in F, the limiting relative frequency of truths amongst propositions like A is

    LimFreq(F, A, ∼, v) = limn → ∞Freq(Fn, A, ∼n, v)

    That is, the limiting relative frequency of A is the number approached arbitrarily closely by the hypothetical relative frequencies of truths as we extend the algebra F to include more and more propositions like A.
  • Relative to ∼, the credence r in proposition A is hypothetically calibrated at v if r = LimFreq(F, A, ∼, v)

According to some calibration arguments, actual calibration is an epistemic goal; according to others, hypothetical calibration is the goal. Whichever it is, a local epistemic disutility function ought to measure the distance of an epistemic state from this epistemic goal in a given situation. This gives rise to the following definition of a particular sort of local epistemic disutility function:

  • An actual calibration measure is a function of the form Cal(r, A, F, ∼, v) = H(|Freq(F, A, ∼, v) − r|) where H : ℜ → ℜ is a strictly increasing continuous function with H(0) = 0.
  • A hypothetical calibration measure is a function of the form LimCal(r, A, F, ∼, v) = H(|LimFreq(F, A, ∼, v) − r|) where again H : ℜ → ℜ is a strictly increasing continuous function with H(0) = 0.

Our next task is to identify the norms of standard decision theory/utility theory that are deployed in conjunction with this characterization to derive Probabilism and Conditionalization.

5.2 Calibration arguments for Probabilism

Let's take Probabilism first. Here's a putative norm of standard decision theory (van Fraassen 1983, p. 297):

Possibility of vindication. An agent ought to act in such way that she does not thereby preclude the possibility of attaining minimal disutility, when such a minimum exists.

That is: Suppose U is a disutility function, W is the set of possible worlds, and A the set of possible actions. Then an agent ought to choose an action a0 in A such that there is a possible world w0 in W such that

U(a0, w0) = min{U(a, w) : a in A and w in W}

when such a minimum exists.

It can be shown that, together with the characterization of measures of actual calibration given above, suitable constraints C(∼) on the equivalence relation ∼, and the assumption that actual calibration is the sole epistemic goal, this norm entails something stronger than Probabilism. It entails:

Rational-valued Probabilism. At any time t, an agent's credence function b ought to be a probability function on the algebra F that takes only values in ℚ (where ℚ is the set of rational numbers).

This is a consequence of the following theorem:

Theorem 1. Suppose Cal is a calibration measure and suppose C(∼). Then the following are equivalent:

  1. b is a probability function on F that takes only values in ℚ.
  2. There is a world at which b is actually calibrated. That is, there is a world v in VF such that, for all A in F, Cal(b(A), A, F, ∼, v) = 0.

Different versions of this theorem result from different constraints C(∼) on the equivalence relation ∼ (Shimony 1988), (van Fraassen 1983), but the result is not surprising. An agent will satisfy Possibility of vindication just in case her credences match the relative frequencies at some world. And those relative frequencies will satisfy the probability axioms if C(∼) and if we have specified that condition correctly. That they will be rational numbers follows from the definition of the relative frequency of a proposition at a world.

Most proponents of the calibration argument are reluctant to accept a norm that rules out all credences given by irrational numbers. To establish the weaker norm of Probabilism, there are two strategies they might adopt. The first is to appeal to the epistemic goal of hypothetical calibration instead of actual calibration. This, together with Possibility of vindication gives us Probabilism via the following theorem:

Theorem 2. Suppose LimCal is a hypothetical calibration measure, and suppose C(∼). Then the following are equivalent:

  1. b is a probability function on F.
  2. There is a world at which b is hypothetically calibrated. That is, there is a world v in VF such that, for all A in F, LimCal(b(A), A, F, ∼, v) = 0.

The reason is that, while relative frequencies are always rational numbers, the limit of an infinite sequence of rational numbers may be an irrational number. And in fact, for any irrational number, there is a sequence of rational numbers that approaches it in the limit (indeed, there are infinitely many such sequences).

An alternative route to Probabilism changes the decision-theoretic norm to which we appeal, rather than the sort of calibration from which we wish our epistemic disutility function to measure distance. The alternative norm is:

Possibility of arbitrary closeness to vindication. An agent ought to act in such way that there are possible worlds in which her disutility is arbitrarily close to being minimal.

That is: Suppose U is a disutility function, W is the set of possible worlds, and A the set of possible actions. Then an agent ought to choose an action a0 in A such that, for any ε > 0, there is a possible world wε in W such that

|U(a0, wε) − min{U(a, w) : a in A and w in W}| < ε

when these minima exist.

Together with the characterization of calibration measures given above, suitable constraints C(∼) on the equivalence relation ∼, and two extra assumptions, this norm does establish Probabilism (van Fraassen 1983, Shimony 1988). The extra assumptions are these: First, if our agent has a credence function b in BF, the possible worlds that we are considering include not only all (consistent) truth assignments to F, but also any (consistent) truth assignments to any (finite) algebra F′ that extends F. And, second, given any such F′, the equivalence relation ∼ can be extended in any possible way, providing the extension ∼′ of ∼ satisfies C(∼′).

Theorem 3. Suppose Cal is a calibration measure and C(∼). Then the following are equivalent:

  • b is a probability function on F.
  • For all ε > 0, there is a finite extension F′ of F and an extension ∼′ of ∼ that satisfies C(∼′), and a possible world v′ in VF′ such that, for all A in F, Cal(b(A), A, F′, ∼′, v) < ε

Thus, if our agent satisfies Probabilism, then however close she would like to be to actual calibration, there is some possible world at which she is that close. And conversely.

These are the calibration arguments for Probabilism. In the next section, we consider objections that may be raised against them.

5.3 Objections to calibration arguments for Probabilism

  • Calibration is not an epistemic goal.
    It may be objected that neither actual nor hypothetical calibration measures are truth-directed epistemic disutility functions, where this is taken to be a necessary condition on such a function (Joyce 1998), (Seidenfeld 1985). We say that a local disutility function is truth-directed if it assigns a higher disutility to one credence in a proposition than another credence in that proposition exactly when the first is further from the truth value than the second. Calibration measures do not necessarily do this. Let us return to our toy example: the propositions Heads on toss 1, …, Heads on toss 1000 are in F and they are all relevantly similar according to ∼. Now suppose that the first coin toss lands heads, but all the others land tails. Then credence 0.001 in Heads on toss 1 is further from the truth, but closer to calibration; indeed, it is actually calibrated since exactly one out of one-thousand relevantly similar propositions are true. However, this objection seems rather question-begging. Proponents of the calibration argument will simply reject the claim that an epistemic disutility function must be truth-directed.

  • Limiting relative frequencies are not well-defined
    To define the limiting relative frequency of A at a world v, we require that there is a unique sequence of extensions of the algebra that contain more and more propositions that are relevantly similar to A, and a corresponding sequence of relative frequencies of truths amongst the propositions like A in the corresponding algebra. But the assumption of such a unique sequence is extremely controversial and the problems to which it gives rise have haunted hypothetical frequentism about objective probability (Hájek 2009).

  • Neither Possibility of vindication nor Possibility of arbitrary closeness to vindication is a norm
    It might be that the only actions that give rise to the possibility of vindication or of arbitrary closeness to vindication also give rise to the possibility of maximal distance from vindication. And it might be that there are actions that do not give rise to the possibility of vindication or of arbitrary closeness to vindication, but do limit the distance from vindication that is risked by choosing that action. In such cases, it is not at all clear that it is rationally required of an agent that she ought to risk maximal distance from vindication in order to leave open the possibility of vindication or of arbitrary closeness to vindication.

  • The constraints on ∼ are ill-motivated
    This objection will vary with the constraints C(∼) that are imposed on ∼. One uncontroversial constraint is this: If AB, then b(A) = b(B). The further constraints imposed in (van Fraassen 1983) and (Shimony 1988) are more controversial (Joyce 1998). Moreover, they limit the application of the result, since they involve assumptions about the form of the propositions in F. Thus, the calibration arguments do not show in general, of any finite algebra F, that a credence function on F ought to be a probability function, since not every such algebra will contain propositions with the form required by the constraints C(∼).

5.4 Calibration arguments for Conditionalization

In this section, we identify the norm of decision theory that is deployed in conjunction with the above characterization of hypothetical calibration measures to derive Conditionalization. And we give the argument for Conditionalization.

The decision theoretic norm is familiar and uncontroversial:

Minimize disutility. If there is an action that has minimal disutility in all worlds that are not ruled out by the agent's epistemic state, then the agent ought to perform such an action.

That is: Suppose U is a disutility function, W is the set of epistemically possible worlds, and A the set of possible actions. And suppose there is an action a in A such that, for all a′ in A and all w in W, we have U(a, w) ≤ U(a′, w). Then the agent ought to perform an action with that property.

Suppose that, between t and t′, an agent learns proposition E with certainty and nothing more. And suppose that b and b′ are her credence functions at t and t′ respectively. Then, in choosing the epistemic state she will adopt at time t′, she can only appeal to her epistemic state at time t and the new evidence E. How can this prior epistemic state and new evidence guide her? According to Lange, there are two ways (Lange 1999).

First, if ∼ and ∼′ are the equivalence relations relative to which relative frequencies are calculated at t and at t′ respectively, then there is a way in which ∼ together with E must guide ∼′. Lange claims that two propositions A and B are relevantly similar at t (that is, AB) just in case the evidence the agent has for A at t is the same as the evidence that the agent has for B at t. Since learning E adds new evidence to the agent's stock, ∼′ is a more fine-grained relation. In fact, Lange imposes a number of further constraints on each ∼ and the relations between them. We denote these constraints D(∼).

Second, there is a way that her credence function bt, together with E and Minimize Disutility, can guide her. The idea is adapted from van Fraassen: According to Lange, to be in an epistemic state is to be committed to acting as if that epistemic state is hypothetically calibrated. That is, an agent with a credence function b should act as if b is hypothetically calibrated. Thus, in conjunction with Minimize Disutility, an agent who learns E and nothing more between t and t′ ought to choose a credence function bt at t′ with the following property: at all worlds v at which bt is hypothetically calibrated relative to ∼ and at which E is true, her epistemic state at t′ is hypothetically calibrated relative to ∼′.

Together these entail Conditionalization via the following theorem (Lange 1999):

Theorem 4. Suppose D(∼) and D(∼′). And suppose that, for all v in VF, if b is hypothetically calibrated at v relative to ∼ and v(E) = 1, then b′ is hypothetically calibrated at v relative to ∼′. Then b′( •) = b( • | E), providing b(E) > 0.

This argument inherits the objections to the calibration arguments for probabilism: see section 5.3.

6. Accuracy Arguments

In this section, we consider the rational constraints imposed on an epistemic disutility function when we treat it as a measure of the accuracy of an epistemic state at a possible world; that is, as a measure of the distance of an epistemic state from the epistemic goal of being true or being correct (Joyce 1998), (Leitgeb and Pettigrew 2010a), (Leitgeb and Pettigrew 2010b). We say that a credence function on algebra F is true or correct at a possible world if it assigns credence 1 to all propositions that are true at that world and credence 0 to all propositions that are false. Thus, representing possible worlds using valuation functions v in VF, we say that a credence function b in BF is true or correct at a possible world v in VF just in case v(A) = b(A) for all A in F. According to accuracy arguments, matching the truth values is an epistemic goal. And they attempt to justify Probabilism, Countable Additivity, and Conditionalization, along with some other putative epistemic norms, by appealing to this goal and measures of distance from it.

Henceforth, we drop the subscript on VF, BF, and PF. That is, we will keep our algebra F fixed throughout.

6.1 Gradational inaccuracy measures

In this section, we consider three different ways in which we might characterize those epistemic disutility functions that measure inaccuracy. In the first, some of the functions that satisfy the characterizing conditions are known, but it is an open question whether any others do. In the second and third attempt, the conditions imposed are strict enough to narrow the class of inaccuracy measures to a single familiar function known as the Brier score (or a closely related function), which is defined as

α²(b, v) = ΣA in F |b(A) − v(A)|²

6.1.1 Learning from the Brier score

The Brier score is a plausible measure of the inaccuracy of a credence function at a world. Indeed, it is the measure often used by meteorologists in order to measure the inaccuracy of their probabilistic weather forecasts (Brier 1950). But it is not clear that it is the only plausible measure. The first attempt to characterize the class of (global) epistemic disutility functions G : B × V → ℜ that measure the inaccuracy of a credence function at a world attempts to extract the properties of the Brier score α² that we would like any measure of inaccuracy to share with that function (Joyce 1998).

We consider such properties now (though the properties I list below differ from those listed in (Joyce 1998) in certain respects). In each case, I will state the property formally, and then give an informal gloss.

Strong Non-Triviality. If bv, then G(v, v) < G(b, v).

This says that the true or correct credence function v in V that is maximally certain of the truth of all truths and the falsity of all falsehoods at v is the only minimally inaccurate credence function at v.

Proposition-wise continuity. For all v in V, G(•, v) is proposition-wise continuous on B. That is, for all b in B and all ε > 0, there is δ > 0 such that, for all b′ in B, if |b(A) − b′(A)| < δ for all A in F, then |G(b, v) − G(b′, v)| < ε.

This says that the inaccuracy of a credence function should not be able to ‘jump’ without any corresponding ‘jump’ in the credences it assigns to propositions. Some have argued that there might be reason for allowing inaccuracy measures that violate this condition (Schervish et al 2009).

Unboundedness. For any A in F, G(b, v) → ∞ as b(A) → ∞ or b(A) → − ∞.

This says that inaccuracy has no upper bound and it increases without bound as the credence increases or decreases without bound.

Truth-directednes.s If |b(A) − v(A)| ≤ |b′(A) − v′(A)|, for all A in F, then G(b, v) ≤ G(b′, v′).

It might be argued that this is part of what it means for an epistemic disutility function to measure inaccuracy. If the credences of b are all at least as close to truth values at v as the credences of b′ are at v′, then b is at most as inaccurate as b′.

Strong Convexity. If bb′ and G(b, v) = G(b′, v), then Gb + ½b′, v) < G(b, v), G(b′, v).

Given two credence functions b, b′ in B and given 0 < λ < 1, the credence function λb + (1- λ)b′ is defined pointwise: that is, (λb + (1-λ)b′)(A) = λb(A) + (1-λ)b′(A). It is plausible to think of λb + (1-λ)b′ as a compromise between b and b′. The compromise is biased towards b if ½ < λ ≤ 1 and biased towards b′ if 0 ≤ λ < ½. It is unbiased if λ = ½. Thus, Strong Convexity says that, if two credence functions are equally inaccurate, the unbiased compromise between them is more accurate than both.

This is the first truly contentious condition. Various arguments have been proposed in its favour (Joyce 1998), (Joyce 2009). It should be noted that the so-called absolute value measure, defined as

α1(b, v) = ΣA in F |b(A) − v(A)|,

violates this condition, and yet seems initially to be a plausible measure of inaccuracy (Maher 2002). Thus, the proponent of Strong Convexity will have to say what is wrong with the absolute value measure.

Symmetry. If G(b, v) = G(b′, v), then for any 0 ≤ λ ≤ 1, we have Gb + (1-λ)b′, v) = G((1-λ)b + λb′, v).

If a putative inaccuracy measure were to violate Symmetry, then it would be biased towards one part of the space of possible credence functions over another. After all, there would be two equally inaccurate credence functions between which there is a compromise biased towards one by a certain amount that is more accurate that the compromise biased towards the other by the same amount (Joyce 1998). Interestingly, the absolute value measure also violates Symmetry. However, in this case, it seems that the condition might be more plausible than the claim that the absolute value score is a good measure of inaccuracy.

The final condition also concerns the inaccuracy of compromises (there is no analogue of this condition in (Joyce 1998)).

Dominating compromise. Suppose b, b′, c, c′ in B. Then, if

  1. G(b, v) ≤ G(c, v) and G(b′, v) ≤ G(c′, v)
  2. For all A in F, |b(A) − b′(A)| = |c(A) − c′(A)|

then we have

Gb + (1-λ)b′, v) ≤ Gc + (1-λ)c′, v)

This says that, if (i) b is at most as inaccurate as c and b′ is at most as inaccurate as c′ and (ii) b is ‘as far’ from b′ as c is from c′, then any compromise between b and b′ is at most as inaccurate as the corresponding compromise between c and c′. Initially, we might think that a stronger condition should be imposed, which results from removing condition (ii) from the antecedent. However, this is too strong. Indeed, it is inconsistent with Truth-Directed and Strongly Non-Trivial. Thus, we restrict ourselves to those cases in which b is ‘as far’ from b′ as c is from c′.

It is not immediately clear that these conditions are consistent with one another. We can see that they are by showing that the Brier score satisfies them, as do other epistemic disutility functions that are obtained from the Brier score by weighting the summands. However, it is not known whether any further functions satisfy the conditions.

It is clear that none of the conditions listed in this section depends for its statement or for its plausibility on the finitude of F nor on the propositions of F being non-indexical. Thus, they apply equally to inaccuracy measures on sets of credence functions on a countable algebra, if they apply at all; and similarly for an algebra that includes indexical propositions.

In section 6.2.1, we will see how these conditions might be put to use to give an argument for Probabilism. First, however, we consider two alternative sets of conditions that we may wish to impose on epistemic disutility functions conceived as measures of inaccuracy. These both serve to narrow the field to the Brier score alone.

6.1.2 Inaccuracy and urgency

In the previous section, we considered a particular global epistemic disutility function called the Brier score and we tried to extract the features we would like any inaccuracy measure to share with it. In this section, we consider both local and global epistemic disutility functions that measure the inaccuracy of individual credences and whole credence functions, respectively. And we consider two conditions on how these two sorts of functions are related. Suppose L : F × V × ℜ → ℜ is a local inaccuracy measure, and G : B × ℜ → ℜ is a global inaccuracy measure. Then the first condition demands that L and G take a particular form. And the second demands that they interact in a certain way: in particular, it demands that they both give rise to a measure of the urgency with which an agent ought to change her credence in the light of the inaccuracy she can expect it to have, and that these local and global measures of urgency agree in all situations (Leitgeb and Pettigrew 2010a).

Local and Global Comparability. There is a strictly increasing function f : ℜ → ℜ such that f(0) = 0 and

L(A, v, r) = f(|v(A) − r|) and G(b, v) = f(||bv||)

where ||bv|| is the Euclidean distance between the two vectors b = (b(v1), …., b(vn)) and v = (v(v1), …., v(vn)), where v1, …, vn are the atoms of F.

This says that local inaccuracy supervenes in a certain way on the difference between credence and truth value; and global inaccuracy supervenes in the same way on the Euclidean distance between credence function as applied to atoms of F and truth assignment to those atoms. (Note that this condition can only be imposed if F has atoms. In fact, as the second condition assumes that, furthermore, F is finite, we must have that F has atoms.)

Why might we wish to impose this condition on our local and global epistemic utility functions when we conceive of them as measures of inaccuracy? One possible answer is this: We might hold that, although we are not sure that the difference between credence and truth value is the correct measure of the inaccuracy of the credence, we are at least sure that it gives the correct ordering of credence-truth value pairs by their accuracy. That is, credence r in A is at least as accurate at v as r′ in A′ is at v′ if, and only if, |rv(A)| ≤ |r′ − v′(A′)|. If this is the case, we will want our local inaccuracy measure to be a strictly increasing function of such differences. And similarly, perhaps, for the global epistemic utility function. You may think that the Euclidean distance at least gets it right with respect to the ordering of credence function-world pairs by their accuracy. But this leaves a question unanswered: Why should the same strictly increasing function take us from differences to local epistemic utility functions and from Euclidean distances to global epistemic utility functions? Here, we might say that accuracy is a dimensionless quantity. That is, it is applicable in any space of credence functions, regardless of dimension. Thus, once the appropriate distance measure is imposed on the space of credence functions, we apply the same function to turn it into an inaccuracy measure (Leitgeb and Pettigrew 2010a).

To state the second condition considered in this section, we must explain how a local or global inaccuracy measure can give rise to a measure of the urgency with which an agent ought to alter her credence in a proposition due to the inaccuracy she expects it to have. First, we need to define what we mean by the expected local and global inaccuracy of an individual credence or a credence function, respectively. Suppose an agent uses local inaccuracy measure L and global inaccuracy measure G; suppose she has credence function b; and suppose her total evidence is that proposition E is true. Then we can define the local inaccuracy she expects credence r in proposition A to have as follows (where ΣE denotes the sum over valuation functions v in V that make E true, and where we abuse notation and let v denote both valuation and the corresponding atomic proposition):

LExpL, E(r, A | b) = ΣE b(v)L(A, v, r)

And the global inaccuracy she expects the credence function b′ to have is:

GExpG, E(b′ | b) = ΣE b(v)G(b′, v)

Thus, the local inaccuracy she expects b′ to have is given by the sum of the local inaccuracies at each world in which E is true weighted by her credence in that world. And similarly for the global inaccuracy she expects it to have. It is important to note that, if we were to allow indexical propositions into our algebra F, so that the valuation functions v in VF represent centred worlds rather than ordinary uncentred possible worlds, then it would no longer be clear that this is the correct definition of expected local and global inaccuracies (Kierland and Monton 2005). So it would no longer be clear that this argument would work. Thus, our assumption that the algebra contains only non-indexical propositions is crucial here.

Second, we need to use these expectation values to define the local and global measures of the urgency with which an agent ought to change her credence. The local directed urgency with which our agent ought to change credence x in proposition vi is given by:

LUrgL, E(x, vi | b) = ∂/∂x LExpL, E(x, vi | b)

where it exists. That is, it is the rate at which the expected local inaccuracy of her credence changes. And the global directed urgency with which an agent ought to change credence in proposition vi whilst having credences given by credence function b′ for other propositions is given by:

GUrgG, E(b′, x, vi | b) = ∂/∂x GExpG, E(b′(x / vi) | b)

where b′(x / vi) agrees with b′ on all propositions except vi where it takes x. That is, it is the rate at which the expected global inaccuracy of her credence in vi changes.

With these definitions in hand, we state our condition on local and global inaccuracy measures:

Agreement on urgency. For all vi, all b′, and all x in ℜ, we have that LUrgL, E(x, vi | b) and GUrgG, E(b′, x, vi | b) exist, they are continuous at x, and

LUrgL, E(x, vi | b) = GUrgG, E(b′, x, vi | b)

That is, any local and global inaccuracy measures give rise to measures of local and global urgency that are defined everywhere, are continuous, and agree everywhere.

The idea is that a pair of local and global inaccuracy measures should not give rise to a dilemma for an agent who uses them. But, if they disagree on the urgency with which an agent must change her credence in the light of her expected inaccuracy, she will face such a dilemma. Hence, Agreement on urgency.

From these two conditions, we can infer that local and global inaccuracy measures must be closely related to the Brier score. Indeed, given the local inaccuracy measure characterized by these two conditions, the Brier score is obtained by taking the local inaccuracy of the credence in each proposition and summing. And the global inaccuracy measure characterized by these two conditions is obtained by taking the local inaccuracy of the credence in each atomic proposition and summing (Leitgeb and Pettigrew 2010a).

Theorem 5. Local and Global Comparability and Agreement on Urgency entail

  1. L(A, v, r) = λ|v(A) − r
  2. G(b, v) = λΣ|v(vi) − b(vi)|², where again v1, …, vn are the atoms of F.
Supplement on Proof of Theorem 5

6.1.3 Inaccuracy and distance

We turn now to the final attempt to characterize the inaccuracy measures. Note, however, that its ambition is less than that of previous characterizations. This characterization applies only to functions that measure the inaccuracy of probabilistic credence functions p in P. It says nothing about more general inaccuracy measures that also measure the inaccuracy of non-probabilistic credence functions b in BP. The characterization is based on the following idea (Selten 1984). The inaccuracy of a credence function at a world is something like a measure of distance of that credence function from the truth at that world. There is a natural way in which to extend this distance measure from a measure of the distance of a credence function from a world to a measure of the distance of one credence function from another. Given a global inaccuracy measure G, let the distance of p′ from p be given by GExpG(p′ | p) − GExpG(p | p). That is, it is the expected inaccuracy of p′ by the lights of p corrected so that the distance of p from itself is zero. (Note that GExpG(p | v) − GExpG(v | v) = G(p, v) if G(v, v) = 0. That is, on plausible assumptions, this new measure of distance genuinely extends the old measure G.) However, if this is going to provide a distance measure, it must be symmetric. That is, the distance of p from p′ must always be the same as the distance of p′ from p. This is the first condition considered in this section:

Perspective Indifference. For all p, p′ in P,

GExpG(p′ | p) − GExpG(p | p) = GExpG(p | p′) − GExpG(p′ | p′)

It turns out that this, together with the two weak conditions on G listed below, characterizes the same global inaccuracy measures as were characterized by the conditions considered in the previous section, when they are restricted to measuring the inaccuracy of credence functions in P.

World Indifference. If v, v′, w, w′ in V then

  • If vv′ and ww′, then G(v, v′) = G(w, w′)
  • If v = v′ and w = w′, then G(v, v′) = G(w, w′)

Thus, all worlds are equally inaccurate relative to one another. And each world is equally inaccurate relative to itself.

Weakly Non-Trivial. If v, v′ in V then G(v, v) < G(v, v′).

Thus, a world is more accurate relative to itself than relative to another world.

We have the following theorem (Selten 1984):

Theorem 6. Perspective Indifference, World Indifference, and Weakly Non-Trivial entail

G(p, v) = λΣ |v(vi) − p(vi)|²

for all p in P, where, again, v1, …, vn are the atoms of F.

Supplement on Proof of Theorem 6

This theorem characterizes the legitimate global inaccuracy measures restricted to P. Thus, it cannot be used to justify Probabilism. However, it can be used to justify Conditionalization and other diachronic norms, as well as more restrictive synchronic norms.

What is notable from the last two sections is that quite different conditions motivated by quite different philosophical considerations characterize roughly the same global epistemic disutility functions as the correct measures of inaccuracy. That is, they both characterize the Brier score (or some slight variant). We seem to be triangulating towards that measure of inaccuracy.

6.2 Accuracy arguments for Probabilism

In the previous section, we saw three different sets of conditions that we might impose upon a global epistemic disutility function to ensure that it is a measure of inaccuracy. In this section, we put these characterizations to work justifying the synchronic norm Probabilism. We consider two arguments, which are distinguished by the norms of standard decision theory that they employ.

6.2.1 The Accuracy Dominance Argument for Probabilism

The first argument for Probabilism appeals to the following norm of standard decision theory. First, some terminology. If A and A′ are actions, we say that

  • A weakly dominates A′ if A is at least as good as A′ at all worlds and better than A′ at some world.
  • A strongly dominates A′ if A is better than A′ at all worlds.

Now, the norm:

Weak Act-Type Dominance. Suppose an agent is choosing between two sorts of action. Suppose further that the following hold:

  1. For every action of the first sort, there is an action of the second sort that strongly dominates it.
  2. For any action of the second sort, there is no other action of either sort that even weakly dominates it.

In this situation, an agent ought to choose an action of the second sort.

In section 7.2, we will consider a stronger version of this norm, but the weaker version will suffice for present purposes.

Now suppose that we characterize the inaccuracy measures as we did in Section 6.1.1. Then Weak Act-Type Dominance together with the following theorem gives an argument for Probabilism (and Countable Additivity, when it is applied to countable algebras). We call it the Accuracy Dominance Argument for Probabilism (Joyce 1998).

Theorem 7. Suppose G satisfies:

  • Strongly Non-Trivial,
  • Proposition-Wise Continuity,
  • Unbounded,
  • Truth-Directed,
  • Strong Convexity,
  • Symmetry, and
  • Dominating Compromise.

Then:

  1. For every non-probabilistic b in BP, there is a probabilistic p in P that strongly dominates it.
  2. For every probabilistic p in P, there is no credence function b in B that weakly dominates it.
Supplement on Proof of Theorem 7

Thus, if our global epistemic utility function satisfies the conditions listed in Section 6.1.1, then two things follow. First: For any non-probabilistic credence function b, there is a probabilistic credence function c that is more accurate than b however the world turns out; that is, c strongly dominates b. Now, it might seem that this alone ought to be enough to establish Probabilism. However, for all that has been said so far, it might be that, for every probabilistic credence function c, there is another credence function d that strongly dominates c. If this were the case, then we couldn't conclude Probabilism, since the probabilistic credence functions would suffer from the same epistemic vice as the non-probabilistic ones: that is, they would also be strongly dominated. The second part of Theorem 7 shows that this isn't so. No probabilistic credence function is even weakly dominated. From this, we conclude Probabilism.

6.2.2 Objections to the Accuracy Dominance Argument for Probabilism

The conditions on inaccuracy measures are too strong

The first raises problems with the conditions imposed on global inaccuracy measures. As noted above, Strong Convexity is the most controversial of the conditions required by the Accuracy Dominance Argument for Probabilism. In particular, it is not clear why equal compromises between equally inaccurate credence functions must be more accurate rather than just as accurate. However, it is not possible to weaken the condition in this way and retain the conclusion of the previous theorem (Maher 2002). Of course, it is difficult to adjudicate disputes over the veracity of what might be taken to be bedrock claims about our concept of accuracy.

Accuracy isn't the only virtue. The second sort of objection to the Accuracy Dominance Argument questions the assumption that an agent's epistemic utility function ought to measure only the accuracy of her credences. After all, there are other features of credences that we value. For instance, the simplicity of our epistemic state is typically taken to be a virtue, as is its informativeness, its explanatory power, and its verisimilitude, to name just a few. Surely our epistemic utility function ought also to measure the degree to which our epistemic states have these virtues. And perhaps once the epistemic utility function has been altered to reflect this variety of epistemic virtues, we will no longer be able to use it to argue for Probabilism. The proponent of the Accuracy Dominance Argument typically responds to this charge in one of three ways. First, she might argue that some of these apparent epistemic virtues are really pragmatic virtues: thus, she might say that explanatory power and simplicity are really pragmatic virtues because we value them for their usefulness in drawing inferences from our epistemic state and deciding how to act quickly on the basis of our epistemic state. Second, she might argue that those other virtues that are genuinely epistemic and not pragmatic are to be understood in terms of the virtue of accuracy. See, for instance, the discussion of verisimilitude in (Joyce 1998), where Joyce argues that an epistemic state that enjoys greater verisimilitude will typically also enjoy greater accuracy. It is hoped that a similar story can be told about the other putative epistemic virtues. And finally, third, she might argue that, while these other virtues are genuine, they are always trumped by considerations of accuracy (Leitgeb and Pettigrew 2010a).

No credence function that dominates on every measure. The third objection to the Accuracy Domination Argument concerns the normative force of the argument. While the decision-theoretic norm Weak Act-Type Dominance seems compelling, the normative force of the Accuracy Dominance Argument can still be questioned: Aaron Bronfman originally raised the following problem in an unpublished manuscript entitled “A Gap in Joyce's Proof of Probabilism”; it has been discussed by (Hájek 2008), (Pettigrew 2010). The conditions on a global inaccuracy measure on which this argument is based don't characterize a single function; they characterize a family of functions. But, for all the theorem tells us, it may well be that, for a given non-probabilistic credence function b, different functions in this family of global inaccuracy measures will give different probabilistic credence functions that strongly dominate b. Thus, an agent with a non-probabilistic credence function b might be faced with a range of probability functions, each of which strongly dominates b relative to a different global inaccuracy measure. Moreover, it may be that any probability function that strongly dominates b relative to G does not strongly dominate b relative to G′ and indeed risks very high inaccuracy at some world relative to G′, and vice versa. In this situation, it is plausible that the agent ought not to move from her non-probabilistic credence function to any probabilistic credence function.

There are two replies to this objection. According to the first, the objection relies on a false meta-normative claim; according to the second, it misunderstands the purpose of Joyce's conditions.

The meta-normative claim on which the objection relies is the following: For a norm to hold, there must be specific advice available to those who violate that norm concerning how to improve their behaviour. Bronfman's objection begins with the observation that, for any specific advice that one might give to a non-probabilistic agent concerning which credence function she should adopt in favour of her own, there will be inaccuracy measures that satisfy Joyce's conditions, but don't sanction this advice; indeed, there will be inaccuracy measures relative to which that advice is very bad. Thus, the Accuracy Domination Argument violates the meta-normative claim. But, the reply submits, the meta-normative claim is false: for a norm to hold, it is sufficient that there is a serious defect suffered by those who violate the norm that is not shared by those who satisfy the norm; it is not also required that there should be advice on which specific action an agent should perform to improve her behaviour. And Joyce's argument satisfies this sufficient condition. One ought to satisfy Probabilism because non-probabilistic credence functions suffer from a serious epistemic defect (namely, accuracy domination) that does not beset probabilistic ones. And this fact is ‘supertrue’, so to speak: that is, it is true on any precisification of the notion of accuracy that obeys Joyce's minimal conditions on an inaccuracy measure.

The second reply to this objection does not take issue with this meta-normative claim mentioned above; indeed, on the understanding of the Accuracy Domination Argument it proposes, the argument satisfies the necessary condition imposed by that claim. That is, according to this reply, the Accuracy Domination argument, properly understood, does in fact provide specific advice to non-probabilistic agents. The idea is this: There are (at least) three ways to understand the purpose of Joyce's conditions on inaccuracy measures. First, we might think that the notion of inaccuracy is vague; and we might say that any inaccuracy measure that satisfies the conditions is a legitimate precisification of it. This is a supervaluationist approach. On this approach, there is no specific advice available to non-probabilistic agents that is sanctioned by all precisifications. Second, we might think that the notion of inaccuracy is precise, but that we have only limited knowledge about it, and that the sum total of our knowledge is embodied in the conditions. This is an epistemicist approach. On this approach, there is specific advice, but it is not available to us. Third, we might think that there is no objectively correct inaccuracy measure; rather, any inaccuracy measure that satisfies the conditions is rationally permissible. But nonetheless, any particular agent has only one such measure. This is a subjectivist approach. On this understanding, there is specific advice for any non-probabilistic agent. Any such agent uses an inaccuracy measure that satisfies Joyce's conditions. And this gives, for any non-probabilistic credence function, a probabilistic credence function that strongly dominates it. So the specific advice is this: adopt one of the probabilistic credence functions that strongly dominates your non-probabilistic credence function relative to your favoured measure of inaccuracy. This gives us Probabilism and does so without violating the meta-normative claim on which Bronfman's objection relies.

6.2.3 The Expected Inaccuracy Argument for Probabilism

In the Accuracy Dominance Argument, we consider the epistemic norms that we can derive by imposing conditions on global inaccuracy measures. In the arguments considered in this section, we consider the effect of imposing conditions on local inaccuracy measures instead. In particular, we consider the effect of imposing conditions that narrow the class of legitimate local inaccuracy measures to the following quadratic family:

L(A, v, r) = λ|v(A) − r

where λ > 0. And we exploit the following familiar norm from standard decision theory:

Minimize expected disutility. An agent ought to choose an act that has minimal expected disutility relative to her current credence function and in the light of the total evidence she currently possesses.

That is: If an agent has the credence function b and the strongest proposition she knows is E, then she ought to choose an action a0 such that

ΣE b(w)U(a0, w) ≤ ΣE b(w)U(a, w)

for all a in A, where again we write ΣE to denote the sum over all worlds w in which E is true.

Now, in standard decision theory, this norm imposes no constraints on the credence function that an agent may have, although they are typically assumed to lie in P. The norm says how credence functions together with utility functions should inform action; it says nothing about what should inform credence functions or utility functions. However, in epistemic utility theory, the acts are the epistemic states. So there may be credence functions that it is not rational for an agent to have since it may be that those credence functions do not minimize expected epistemic disutility relative to themselves and in the light of certain evidence (Gibbard 2007). That is, by adopting them, an agent immediately violates Minimize expected disutility. As the following theorem shows, if we measure local inaccuracy by one of the quadratic local inaccuracy measures, then all and only the probabilistic credence functions p in P such that p(E) = 1 minimize expected inaccuracy relative to themselves and in the light of evidence E (Leitgeb and Pettigrew 2010b).

Theorem 8. Suppose L(A, v, r) = λ|v(A) − r|². Suppose b in B and E in F. Then the following are equivalent:

  1. For all A in F and r in ℜ,

    LExpL, E(b(A), A | b) ≤ LExpL, E(r, A | b)

  2. b is a probability function and b(E) = 1.
Supplement on Proof of Theorems 8 and 9

Thus, any agent whose credence function is not a probability function, or does not assign credence 1 to E, will expect some other credence function to be better, epistemically speaking, than she expects her own to be. From this, we might argue for Probabilism. We call this argument the Expected Inaccuracy Argument.

Again, we might object to the Expected Inaccuracy Argument by objecting to the characterization of the legitimate local inaccuracy measures. For instance, we might object to the heavy use of geometric assumptions in the characterization of L from Local and Global Comparability and Agreement on Directed Urgency. In that characterization, it is assumed that the global inaccuracy of a credence function at a world supervenes on the Euclidean distance between two closely-related vectors: Why the Euclidean metric? Why not some other metric? (Leitgeb and Pettigrew 2010a). Another objection is that the definition of expected values of random variables as weighted sums is legitimate only for probability functions and thus cannot be used in an argument for Probabilism (Joyce 1998).

6.3 Accuracy arguments for Conditionalization

In this section, we focus on arguments for Conditionalization that appeal to local measures of inaccuracy. Again, we appeal to Minimize expected disutility. When an agent receives new evidence in the form of a proposition learned with certainty, she ought to choose her new credence function so as to minimize its expected inaccuracy relative to her old credence function and in the light of her total evidence, which incorporates her new evidence. As the following theorem shows, if we measure local inaccuracy by one of the quadratic local inaccuracy measures, and if we assume Probabilism, this leads to Conditionalization via the following theorem (Leitgeb and Pettigrew 2010b):

Theorem 9. Suppose L(A, v, r) = λ|v(A) − r|². Suppose b, b′ in B are probability functions, E in F, and b(E) > 0. Then the following are equivalent:

  1. For all A in F and r in ℜ,

    LExpL, E(b′(A), A | b) ≤ LExpL, E(r, A | b)

  2. b′( •) = b( • | E).
Supplement on Proof of Theorems 8 and 9

That is, if our agent has credence function b prior to learning E, then the credence function she will expect to be best, epistemically speaking, in the light of evidence E is the credence function obtained from b by conditionalizing on E. The objections to this argument are the same as to the Expected Inaccuracy Argument for Probabilism.

Note that this argument for Conditionalization appeals crucially to the standard definition of expected utility. As noted above, this is only legitimate because the propositions in the algebra F are non-indexical, and so the worlds represented by the valuation functions are not centred worlds. If such worlds were allowed, then the definition of expected utility might change, and thus the correct update rule might be different. This is as it should be, since there are numerous concerns about Conditionalization in the presence of indexical propositions (Arntzenius 2003).

6.4 Accuracy arguments for other norms

Accuracy Arguments have also been given for a handful of further epistemic norms, including an alternative to Richard Jeffrey's generalization of Conditionalization (Leitgeb and Pettigrew 2010b) and both Halfer and Thirder solutions to the Sleeping Beauty puzzle (Kierland and Monton 2005).

7 Propriety Arguments

In the previous two sections, we considered particular apparent epistemic vices, namely, distance from calibration and distance from truth. And we explored what conditions an epistemic disutility function must have in order to count as a measure of these vices, and which norms could be justified by appealing to these functions. In this section, we consider conditions that we might impose on any epistemic disutility function, regardless of which epistemic vice or collection of epistemic vices it is intended to measure.

7.1 General epistemic disutility functions

There are two sorts of conditions that we might require any epistemic disutility function to satisfy. We treat them in this section.

7.1.1 Propriety, Strict Propriety, and Admissibility

The first sort of general condition on an epistemic disutility function stems from the following idea: There are some credence functions that we know it is rationally permissible to have in the presence of certain evidence. For instance, we might hold that, in the absence of any evidence, it is at least rationally permissible to have the probabilistic credence function that assigns the same credence to each possible world, even if this is not rationally required. Therefore, no legitimate epistemic disutility function should rule out these credence functions as irrational in the presence of that evidence. Depending on the class of credence functions to be preserved as rationally permissible, this condition can narrow the class of legitimate disutility functions enough to allow us to argue for Probabilism or Conditionalization.

Suppose, for instance, that, in the absence of any evidence, any credence function in a given set CB is rationally permissible. Then you might wish to impose one of the following three conditions on an epistemic disutility function:

Propriety for C. For all p in C and b in B, if bp, then, prior to any evidence, p expects itself to have at most as great epistemic disutility relative to U as it expects b to have.

That is, for all p in C and b in B, if bp, then GExpU, ⊤(p|p) ≤ GExpU, ⊤(b|p)

If this isn't the case, there is a credence function p in C that expects itself to be epistemically worse than it expects another credence function b to be. Together with Minimize Expected Disutility, this gives that p is not rationally permitted.

Strict Propriety for C. For all p in C and b in B, if bp, then, prior to any evidence, p expects itself to have less epistemic disutility relative to U than it expects b to have.

That is, for all p in C and b in B, if bp, then GExpU, ⊤(p|p) < GExpU, ⊤(b|p)

If this isn't the case, there is a credence function p in C that expects itself to be at most as epistemically good as it expects another credence function b to be. On its own, Minimize Expected Disutility does not declare p irrational on this basis. According to that norm, it is rationally permissible for an agent to choose an action that she expects to be at most as good as another. However, we might justify Strict Propriety nonetheless, if we are prepared to argue for a claim that we might call Conservatism, which says that, if an agent is in a rationally permissible epistemic state, then it is never rational for her to shift epistemic state in the absence of any new evidence (Oddie 1997). If Strict Propriety fails, then p expects b to be just as good as it expects itself to be. Thus, by Minimize Expected Disutility, it is rationally permissible for our agent to shift from p to b without any new evidence. It follows from this and Conservatism that p cannot be rationally permissible in the first place, which contradicts our assumption that all credence functions in C are rationally permissible.

Admissibility for C. For all p in C and b in B, if bp, then there is v in V such that U(p, v) > U(b, v).

If this isn't the case, there is p in C that is epistemically at least as bad as another credence function b at all worlds. Again, since this would permit a shift from p to b without any new evidence, Conservatism entails that p is not rationally permitted, which contradicts our assumption.

7.1.2 Separability

The second sort of general condition we might impose on any epistemic disutility function is this:

Separability. Suppose that F′ is a subset of the algebra F and, for all b, b′, c, c′ in B,

  1. b(A) = b′(A) and c(A) = c′(A) for all A in F
  2. b(A) = c(A) and b′(A) = c′(A) for all A in FF′.

Then U(b, v) > U(c, v) if, and only if, U(b′, v) = U(c′, v).

This requirement militates against certain sorts of holism about the epistemic disutility of credence functions. It rules out, for instance, an epistemic utility function that takes into account only the maximum local epistemic disutility when determining the global epistemic disutility, or accords weight to the variance amongst the credences assigned to propositions. It is not clear whether this is a problem for Separability, for it might be that, whenever we seem to value a global or holistic feature of a credence function, we really value some set of complex local facts.

7.2 Propriety arguments for Probabilism

There is a series of theorems that might underwrite arguments for Probabilism from the three conditions Propriety for C, Strict Propriety for C, and Admissibility for C. In each case, the set C of credence functions that we must accept as rationally permitted prior to any evidence is the set P of probabilistic credence functions. We consider this assumption below. But first we must state the stronger version of Weak Act-Type Dominance.

Strong Act-Type Dominance. Suppose an agent is choosing between two sorts of action. Suppose further that the following hold:

  1. For every action of the first sort, there is an action of the second sort that weakly dominates it.
  2. For any action of the second sort, there is no other action of either sort that even weakly dominates it.

In this situation, an agent ought to choose an action of the second sort.

The difference between Weak and Strong Act-Type Dominance lies in the first condition, which is weaker in the stronger version of the norm. We turn now to the theorems that underpin the Propriety Arguments for Probabilism.

Theorem 10. Propriety for P, Separability, and Proposition-Wise Continuity entail that

  1. For every non-probabilistic b in BP, there is a probabilistic p in P that weakly dominates it.
  2. For every probabilistic p in P, there is no credence function b in B that weakly dominates it.

Together with Strong Act-Type Dominance, this entails Probabilism (Predd, et al. 2009).

Theorem 11. Strong Propriety for P, Separability, and Proposition-Wise Continuity entail that

  1. For every non-probabilistic b in BP, there is a probabilistic p in P that strongly dominates it.
  2. For every probabilistic p in P, there is no credence function b in B that weakly dominates it.

Together with Weak Act-Type Dominance, this entails Probabilism (Predd, et al. 2009). Thus, by imposing the stronger condition of Strong Propriety for P on our inaccuracy measures, we need only appeal to the weaker norm of Weak Act-Type Dominance to establish Probabilism.

The following is a conjecture rather than a theorem:

Conjecture 1. Admissibility for P, Truth-Directedness, and Proposition-Wise Continuity entail that

  1. For every non-probabilistic b in BP, there is a probabilistic p in P that strongly dominates it.
  2. For every probabilistic p in P, there is no credence function b in B that weakly dominates it.

Together with Weak Act-Type Dominance, this would entail Probabilism (if it were true). The conjecture has been proved for cases in which F is not an algebra of propositions, but rather a finite set of mutually exclusive and exhaustive propositions (Joyce 2010). If it could be proved, we would no longer need to use expected disutilities in our argument for Probabilism, though it is not clear whether they are really problematic as they occur in propriety arguments, since they are always applied to probabilistic credence functions.

7.3 Objections to propriety arguments for Probabilism

Of course, these arguments require a premise that seems rather strong at first sight. In order to establish Probabilism, which says that it is a necessary condition on rationality to have a credence function in P, they must assume that the credence functions in P are all rationally permissible prior to any evidence, and thus that it is a sufficient condition on rationality to have a credence function in P, at least prior to obtaining any evidence. But is this assumption justified? The problem is that many philosophers wish to claim that Probabilism is not the strongest necessary condition on rationality in the absence of evidence. For instance, we might say that rationality requires further that our agent satisfies David Lewis' Principal Principle, which says that her credence in proposition A conditional on the proposition that the objective chances are given by the function ch ought to be ch(A) (Hájek 2008). Or we might say that an agent's credence function must obey Strict Coherence, which says that it must assign credence 0 only to necessary falsehoods and credence 1 only to necessary truths. Or we might say that our credence function ought to encode minimal information when that is measured by Shannon's entropy function: this norm is called Maximize Entropy. And so on. The point is that, if these more restrictive requirements for rationality hold, then it is not true that every probabilistic credence function is rationally permitted prior to any evidence. After all, the probabilistic credence functions that violate these further norms are not rationally permitted at all! And if this is true, the arguments above will fail.

However, the theorems above do show that, even if some more restrictive norm than Probabilism is true, there can be no argument from, for instance, Strong Propriety for C (where C is a proper subset of P) via Strong Act-Type Dominance to the conclusion that an agent ought to have a credence function in C. After all, any epistemic disutility function that satisfies Strong Propriety for P will satisfy Strong Propriety for C. And, for these functions, we have that all credence functions in BP are strongly dominated while no credence function in P is even weakly dominated. That is, they don't militate in favour of credence functions in C particularly.

7.4 Propriety arguments for Conditionalization

Strong Propriety for P, this time in conjunction with Minimize Expected Disutility, gives us an argument for Conditionalization (Greaves and Wallace 2006).

Theorem 12. Strong Propriety for P entails that, for all b, b′ in P and E in F, if b(E) > 0 and b′ ≠ b( • | E) then

GExpU, E(b( • | E) | b) < GExpU, E(b′ | b)

Supplement on Proof of Theorem 12

That is, if our epistemic disutility function satisfies Strong Propriety for P, conditionalizing on a piece of evidence E minimizes expected disutility by the lights of the agent's original credence function b and in the presence of E. This is a generalization of Theorem 9.

Of course, the same objections apply to Strong Propriety for P as we saw in the previous section.

8. Future Work

Epistemic utility theory has already proved itself a powerful tool in formal epistemology. In this survey, we have focussed only on the arguments for the core Bayesian norms. But there are many areas in which it hasn't yet been exploited. In this concluding section, we suggest some of the many questions that it might be used to answer:

  • As is reflected in this entry, epistemic utility theory has been employed mainly in the study of credence functions and their norms. But, as noted at the beginning, there are many other ways in which we may wish to model epistemic states. For instance, we might model them as sets of credence functions. And if we do, there are well known problems concerning how we should update in the light of new evidence (Seidenfeld and Wasserman 1993). Perhaps epistemic utility theory can shed light on this. Of course, in order to employ epistemic utility theory to explore norms governing epistemic states modelled in a particular way, we require a standard decision theory based on that sort of model. And, in the case of sets of credence functions, this is still controversial (White 2010), (Elga 2010).

  • Above, we mentioned briefly that epistemic utility theory has been employed a little in the theory of self-locating beliefs (Kierland and Monton 2005). But the account to which it gives rise is not fully general. Again, in order to make it fully general, we require a standard decision theory in the self-locating framework, and again this is controversial (Piccione and Rubinstein 1997).

  • We have focussed only on the norms of Probabilism, Countable Additivity, and Conditionalization. There are many other norms, both synchronic and diachronic, that we might try to justify using epistemic utility theory: Strict Coherence (or Regularity), Principal Principle (or New Principle), and Maximize Entropy to name a few.

  • Epistemic utility theory has been oddly absent from the discussion of the norms that govern epistemic states modelled as sets of full beliefs. When we say that an agent ought to have classically consistent full beliefs, we tend to justify this by pointing out that if she does not, there is no possible way the world might be on which her beliefs are all true. That is, if we take truth to be the correct notion of vindication for full beliefs, we appeal implicitly to Possibility of vindication. However, as we saw, there are concerns about this norm. Do the same epistemic norms for full beliefs follow if we consider different norms from decision theory, such as Weak or Strong Act-Type Dominance?

Bibliography

  • Arntzenius, F., 2003, “Some Problems for Conditionalization and Reflection,” Journal of Philosophy, 100: 356–70.
  • Brier, G. W., 1950, “Verification of Forecasts Expressed in Terms of Probability,” Monthly Weather Report, 78(1): 1–3.
  • Elga, A., 2010, “Subjective Probabilities Should be Sharp,” Philosophers' Imprint, 10(5): 1–11.
  • Gibbard, A., 2007, “Rational Credence and the Value of Truth,” in T. Szabo Gendler and J. Hawthorne (eds.), Oxford Studies in Epistemology (volume 2), New York: Oxford University Press. 143–164.
  • Greaves, H. and D. Wallace, 2006, “Justifying Conditionalization: Conditionalization Maximizes Expected Epistemic Utility,” Mind, 115(459): 607–632.
  • Hájek, A., 2008, “Arguments For—Or Against—Probabilism?,” British Journal for the Philosophy of Science, 59: 793–819.
  • –––, 2010, “Fifteen Arguments Against Hypothetical Frequentism,” Erkenntnis, 70: 211–235.
  • Joyce, J. M., 1998, “A Nonpragmatic Vindication of Probabilism,” Philosophy of Science 65: 575–603.
  • –––, 2009, “Accuracy and Coherence: Prospects for an Alethic Epistemology of Partial Belief”, in F. Huber and C. Schmidt-Petri (eds.), Degrees of Belief, Springer. 263–297.
  • Kierland, B. and B. Monton, 2005, “Minimizing Inaccuracy for Self-Locating Beliefs,” Philosophy and Phenomenological Research, 70(2): 384–395.
  • Lange, M., 1999, “Calibration and the Epistemological Role of Bayesian Conditionalization,” Journal of Philosophy 96: 294–324.
  • Leitgeb, H. and R. Pettigrew, 2010a, “An Objective Justification of Bayesianism I: Measuring Inaccuracy”, Philosophy of Science, 77: 201–235.
  • –––, 2010b, “An Objective Justification of Bayesianism II: The Consequences of Minimizing Inaccuracy”, Philosophy of Science, 77: 236–272.
  • Maher, P., 2002, “Joyce's Argument for Probabilism”, Philosophy of Science 69(1): 73–81.
  • Oddie, G., 1997, “Conditionalization, Cogency, and Cognitive Value,” British Journal for the Philosophy of Science, 48: 533–41.
  • Pettigrew, R., 2010, “Modelling Uncertainty,” Grazer Philosophische Studien 80: 309–316.
  • Piccione, M. and A. Rubinstein, 1997, “On the Interpretation of Decision Problems with Imperfect Recall,” Games and Econonmic Behaviour 20: 3–24.
  • Predd, J. B., R. Seiringer, E. H. Lieb, D. N. Osheron, H. V. Poor, and S. R. Kulkarni, 2009, “Probabilistic Coherence and Proper Scoring Rules,” IEEE Transactions on Information Theory 55(10): 4786–4792.
  • Quine, W. V. O., 1969, “Propositional Objects” in Ontological Relativity and Other Essays, New York: Columbia Press. 139–160.
  • Schervish, M. J., T. Seidenfeld, and J. B. Kadane, 2009, “Scoring Rules, Dominated Forecasts, and Coherence,” Decision Analysis 6(4): 202–221.
  • Seidenfeld, T., 1985, “Calibration, Coherence, and Scoring Rules,” Philosophy of Science 52: 274–294.
  • Seidenfeld, T. and L. Wasserman, 1993, “Dilation for Sets of Probabilities,” Annals of Statistics, 21(3): 1139–1154.
  • Shimony, A., 1988, “An Adamite Derivation of the Calculus of Probability,” in J. H. Fetzer (ed.), Probability and Causalty, Dordrecht: Reidel. 79–90.
  • van Fraassen, B. C., 1983, “Calibration: A Frequency Justification for Personal Probability”, in R. S. Cohen and L. Laudan (eds.), Physics, Philosophy, and Psychoanalysis: Essays in Honor of Adolf Grunbaum, Dordrecht: Reidel. 295–315.
  • White, R., 2010, “Evidential Symmetry and Mushy Credence”, in T. Szabo Gendler and J. Hawthorne (eds.), Oxford Studies in Epistemology (volume 3), New York: Oxford University Press. 161–186.

Other Internet Resources

Copyright © 2011 by
Richard Pettigrew <Richard.Pettigrew@bris.ac.uk>

This is a file in the archives of the Stanford Encyclopedia of Philosophy.
Please note that some links may no longer be functional.