Stanford Encyclopedia of Philosophy
This is a file in the archives of the Stanford Encyclopedia of Philosophy.

Bayesian Epistemology

First published Thu Jul 12, 2001; substantive revision Wed Mar 26, 2008

‘Bayesian epistemology’ became an epistemological movement in the 20th century, though its two main features can be traced back to the eponymous Reverend Thomas Bayes (c. 1701-61). Those two features are: (1) the introduction of a formal apparatus for inductive logic; (2) the introduction of a pragmatic self-defeat test (as illustrated by Dutch Book Arguments) for epistemic rationality as a way of extending the justification of the laws of deductive logic to include a justification for the laws of inductive logic. The formal apparatus itself has two main elements: the use of the laws of probability as coherence constraints on rational degrees of belief (or degrees of confidence) and the introduction of a rule of probabilistic inference, a rule or principle of conditionalization.

Bayesian epistemology did not emerge as a philosophical program until the first formal axiomatizations of probability theory in the first half of the 20th century. One important application of Bayesian epistemology has been to the analysis of scientific practice in Bayesian Confirmation Theory. In addition, a major branch of statistics, Bayesian statistics, is based on Bayesian principles. In psychology, an important branch of learning theory, Bayesian learning theory, is also based on Bayesian principles. Finally, the idea of analyzing rational degrees of belief in terms of rational betting behavior led to the 20th century development of a new kind of decision theory, Bayesian decision theory, which is now the dominant theoretical model for the both the descriptive and normative analysis of decisions. The combination of its precise formal apparatus and its novel pragmatic self-defeat test for justification makes Bayesian epistemology one of the most important developments in epistemology in the 20th century, and one of the most promising avenues for further progress in epistemology in the 21st century.


1. Deductive and Probabilistic Coherence and Deductive and Probabilistic Rules of Inference

There are two ways that the laws of deductive logic have been thought to provide rational constraints on belief: (1) Synchronically, the laws of deductive logic can be used to define the notion of deductive consistency and inconsistency. Deductive inconsistency so defined determines one kind of incoherence in belief, which I refer to as deductive incoherence. (2) Diachronically, the laws of deductive logic can constrain admissible changes in belief by providing the deductive rules of inference. For example, modus ponens is a deductive rule of inference that requires that one infer Q from premises P and PQ.

Bayesians propose additional standards of synchronic coherence — standards of probabilistic coherence — and additional rules of inference — probabilistic rules of inference — in both cases, to apply not to beliefs, but degrees of belief (degrees of confidence). For Bayesians, the most important standards of probabilistic coherence are the laws of probability. For more on the laws of probability, see the following supplementary article:

Supplement on Probability Laws

For Bayesians, the most important probabilistic rule of inference is given by a principle of conditionalization.

2. A Simple Principle of Conditionalization

If unconditional probabilities (e.g. P(S)) are taken as primitive, the conditional probability of S on T can be defined as follows:

Conditional Probability:
P(S/T) = P(S&T)/P(T).

By itself, the definition of conditional probability is of little epistemological significance. It acquires epistemological significance only in conjunction with a further epistemological assumption:

Simple Principle of Conditionalization:
If one begins with initial or prior probabilities Pi, and one acquires new evidence which can be represented as becoming certain of an evidentiary statement E (assumed to state the totality of one's new evidence and to have initial probability greater than zero), then rationality requires that one systematically transform one's initial probabilities to generate final or posterior probabilities Pf by conditionalizing on E — that is: Where S is any statement, Pf(S) = Pi(S/E).[1]

In epistemological terms, this Simple Principle of Conditionalization requires that the effects of evidence on rational degrees be analyzed in two stages: The first is non-inferential. It is the change in the probability of the evidence statement E from Pi(E), assumed to be greater than zero and less than one, to Pf(E) = 1. The second is a probabilistic inference of conditionalizing on E from initial probabilities (e.g., Pi(S)) to final probabilities (e.g., Pf(S) = Pi(S/E)).

Problems with the Simple Principle (to be discussed below) have led many Bayesians to qualify the Simple Principle by limiting its scope. In addition, some Bayesians follow Jeffrey in generalizing the Simple Principle to apply to cases in which one's new evidence is less than certain (also discussed below). What unifies Bayesian epistemology is a conviction that conditionalizing (perhaps of a generalized sort) is rationally required in some important contexts — that is, that some sort of conditionalization principle is an important principle governing rational changes in degrees of belief.

3. Dutch Book Arguments

Many arguments have been given for regarding the probability laws as coherence conditions on degrees of belief and for taking some principle of conditionalization to be a rule of probabilistic inference. The most distinctively Bayesian are those referred to as Dutch Book Arguments. Dutch Book Arguments represent the possibility of a new kind of justification for epistemological principles.

A Dutch Book Argument relies on some descriptive or normative assumptions to connect degrees of belief with willingness to wager — for example, a person with degree of belief p in sentence S is assumed to be willing to pay up to and including $p for a unit wager on S (i.e., a wager that pays $1 if S is true) and is willing to sell such a wager for any price equal to or greater than $p (one is assumed to be equally willing to buy or sell such a wager when the price is exactly $p).[2] A Dutch Book is a combination of wagers which, on the basis of deductive logic alone, can be shown to entail a sure loss. A synchronic Dutch Book is a Dutch Book combination of wagers that one would accept all at the same time. A diachronic Dutch Book is a Dutch Book combination of wagers that one will be motivated to enter into at different times.

Ramsey and de Finetti first employed synchronic Dutch Book Arguments in support of the probability laws as standards of synchronic coherence for degrees of belief. The first diachronic Dutch Book Argument in support of a principle of conditionalization was reported by Teller, who credited David Lewis. The Lewis/Teller argument depends on a further descriptive or normative assumption about conditional probabilities due to de Finetti: An agent with conditional probability P(S/T) = p is assumed to be willing to pay any price up to and including $p for a unit wager on S conditional on T. (A unit wager on S conditional on T is one that is called off, with the purchase price returned to the purchaser, if T is not true. If T is true, the wager is not called off and the wager pays $1 if S is also true.) On this interpretation of conditional probabilities, Lewis, as reported by Teller, was able to show how to construct a diachronic Dutch Book against anyone who, on learning only that T, would predictably change his/her degree of belief in S to Pf(S) > Pi(S/T); and how to construct a diachronic Dutch Book against anyone who, on learning only that T, would predictably change his/her degree of belief in S to Pf(S) < Pi(S/T). For illustrations of the strategy of the Ramsey/de Finetti and the Lewis/Teller arguments, see the following supplementary article:

Supplement on Dutch Book Arguments

There has been much discussion of exactly what it is that Dutch Book Arguments are supposed to show. On the literal-minded interpretation, their significance is that they show that those whose degrees of belief violate the probability laws or those whose probabilistic inferences predictably violate a principle of conditionalization are liable to enter into wagers on which they are sure to lose. There is very little to be said for the literal-minded interpretation, because there is no basis for claiming that rationality requires that one be willing to wager in accordance with the behavioral assumptions described above. An agent could simply refuse to accept Dutch Book combinations of wagers.

One of the main motivations for Jeffrey's new approach to the foundations of decision theory in Logic of Decision was his dissatisfaction with the identification of subjective probability with betting ratios. For example, no matter what one's degree of belief in the proposition that all human life will be destroyed within the next ten years, it would be not be rational to offer to buy a bet on its truth. Williamson extends de Finetti's Dutch Book Argument for a finite additivity constraint on rational degrees of belief to produce an argument for a countable additivity constraint on degrees of belief, but the argument is better interpreted as a reductio of the literal-minded interpretation of Dutch Book Arguments than as an argument for the rationality of a countable additivity constraint. The rational response to offers to bet on the proposition that all life will be destroyed within the next ten years or to bet on a single possible outcome in a countably infinite set of equiprobable possible outcomes is simply not to.

A more plausible interpretation of Dutch Book Arguments is that they are to be understood hypothetically, as symptomatic of what has been termed pragmatic self-defeat. On this interpretation, Dutch Book Arguments are a kind of heuristic for determining when one's degrees of belief have the potential to be pragmatically self-defeating. The problem is not that one who violates the Bayesian constraints is likely to enter into a combination of wagers that constitute a Dutch Book, but that, on any reasonable way of translating one's degrees of belief into action, there is a potential for one's degrees of belief to motivate one to act in ways that make things worse than they might have been, when, as a matter of logic alone, it can be determined that alternative actions would have made things better (on one's own evaluations of better and worse).

Another way of understanding the problem of susceptibility to a Dutch Book is due to Ramsey: Someone who is susceptible to a Dutch Book evaluates identical bets differently based on how they are described. Putting it this way makes susceptibility to Dutch Books sound irrational. But this standard of rationality would make it irrational not to recognize all the logical consequences of what one believes. This is the assumption of logical omniscience (discussed below).

If successful, Dutch Book Arguments would reduce the justification of the principles of Bayesian epistemology to two elements: (1) an account of the appropriate relationship between degrees of belief and choice; and (2) the laws of deductive logic. Because it would seem that the truth about the appropriate relationship between the degrees of belief and choice is independent of epistemology, Dutch Book Arguments hold out the potential of justifying the principles of Bayesian epistemology in a way that requires no other epistemological resources than the laws of deductive logic. For this reason, it makes sense to think of Dutch Book Arguments as indirect, pragmatic arguments for according the principles of Bayesian epistemology much the same epistemological status as the laws of deductive logic. Dutch Book Arguments are a truly distinctive contribution made by Bayesians to the methodology of epistemology.

It should also be mentioned that some Bayesians have defended their principles more directly, with non-pragmatic arguments. In addition to reporting Lewis's Dutch Book Argument, Teller offers a non-pragmatic defense of Conditionalization. There have been many proposed non-pragmatic defenses of the probability laws (e.g., van Fraassen; Shimony). The most compelling is due to Joyce. All such defenses, whether pragmatic or non-pragmatic, produce a puzzle for Bayesian epistemology: The principles of Bayesian epistemology are typically proposed as principles of inductive reasoning. But if the principles of Bayesian epistemology depend ultimately for their justification solely on the laws of deductive logic, what reason is there to think that they have any inductive content? That is to say, what reason is there to believe that they do anything more than extend the laws of deductive logic from beliefs to degrees of belief? It should be mentioned, however, that even if Bayesian epistemology only extended the laws of deductive logic to degrees of belief, that alone would represent an extremely important advance in epistemology.

4. Bayes' Theorem and Bayesian Confirmation Theory

This section reviews some of the most important results in the Bayesian analysis of scientific practice — Bayesian Confirmation Theory. It is assumed that all statements to be evaluated have prior probability greater than zero and less than one.

Bayes' Theorem and a Corollary

Bayes' Theorem is a straightforward consequence of the probability axioms and the definition of conditional probability:

Bayes' Theorem:
P(S/T) = P(T/S) × P(S)/P(T) [where P(T) is assumed to be greater than zero]

The epistemological significance of Bayes' Theorem is that it provides a straightforward corollary to the Simple Principle of Conditionalization. Where the final probability of a hypothesis H is generated by conditionalizing on evidence E, Bayes' Theorem provides a formula for the final probability of H in terms of the prior or initial likelihood of H on E (Pi(E/H)) and the prior or initial probabilities of H and E:

Corollary of the Simple Principle of Conditionalization:
Pf(H) = Pi(H/E) = Pi(E/H) × Pi(H)/Pi(E).

Due to the influence of Bayesianism, likelihood is now a technical term of art in confirmation theory. As used in this technical sense, likelihoods can be very useful. Often, when the conditional probability of H on E is in doubt, the likelihood of H on E can be computed from the theoretical assumptions of H.

Bayesian Confirmation Theory

A. Confirmation and disconfirmation. In Bayesian Confirmation Theory, it is said that evidence confirms (or would confirm) hypothesis H (to at least some degree) just in case the prior probability of H conditional on E is greater than the prior unconditional probability of H: Pi(H/E) > Pi(H). E disconfirms (or would disconfirm) H if the prior probability of H conditional on E is less than the prior unconditional probability of H.

This is a qualitative conception of confirmation. There is no general agreement in the literature on a quantitative measure of degree of confirmation or degree of evidential support. Earman (chap. 5) and Fitelson both provide a good overview of the various proposals. It might be thought that the degree to which evidence E supports (or would support) hypothesis H could be defined as Pi(H/E) − Pi(H). One potential problem with this proposal is that it has the consequence that no evidence can provide much evidential support to a hypothesis that is antecedently very probable, because as the probability of H approaches one, the difference goes to zero. Eells and Fitelson have argued that this apparently counterintuitive consequence can be avoided by distinguishing the historical question of how much a piece of evidence E actually contributed to the confirmation of H (which, of course, would have to be small if H were antecedently highly probable) from the question of the degree of evidential support E provides for H, the answer to which, they propose, is relative to the background information. So even if H is very probable at the time that evidence E is acquired, we can ask how much evidential support E would provide for H if we had no other evidence supporting H. Eells and Fitelson have also provided a useful framework for evaluating the various proposals in the literature, a framework within which most of them are found to be wanting.

B. Confirmation and disconfirmation by entailment. Whenever a hypothesis H logically entails evidence E, E confirms H. This follows from the fact that to determine the truth of E is to rule out a possibility assumed to have non-zero prior probability that is incompatible with H — the possibility that ~E. A corollary is that, where H entails E, ~E would disconfirm H, by reducing its probability to zero. The most influential model of explanation in science is the hypothetico-deductive model (e.g., Hempel). Thus, one of the most important sources of support for Bayesian Confirmation Theory is that it can explain the role of hypothetico-deductive explanation in confirmation.

C. Confirmation of logical equivalents. If two hypotheses H1 and H2 are logically equivalent, then evidence E will confirm both equally. This follows from the fact that logically equivalent statements always are assigned the same probability.

D. The confirmatory effect of surprising or diverse evidence. From the corollary above, it follows that whether E confirms (or disconfirms) H depends on whether E is more probable (or less probable) conditional on H than it is unconditionally — that is, on whether:

(b1) P(E/H)/P(E) > 1.

An intuitive way of understanding (b1) is to say that it states that E would be more expected (or less surprising) if it were known that H were true. So if E is surprising, but would not be surprising if we knew H were true, then E will significantly confirm H. Thus, Bayesians explain the tendency of surprising evidence to confirm hypotheses on which the evidence would be expected.

Similarly, because it is reasonable to think that evidence E1 makes other evidence of the same kind much more probable, after E1 has been determined to be true, other evidence of the same kind E2 will generally not confirm hypothesis H as much as other diverse evidence E3, even if H is equally likely on both E2 and E3. The explanation is that where E1 makes E2 much more probable than E3 (Pi(E2/E1) >> Pi(E3/E1), there is less potential for the discovery that E2 is true to raise the probability of H than there is for the discovery that E3 is true to do so.

E. Relative confirmation and likelihood ratios. Often it is important to be able to compare the effect of evidence E on two competing hypotheses, Hj and Hk, without having also to consider its effect on other hypotheses that may not be so easy to formulate or to compare with Hj and Hk. From the first corollary above, the ratio of the final probabilities of Hj and Hk would be given by:

Ratio Formula:
Pf(Hj)/Pf(Hk) = [Pi(E/Hj) × Pi(Hj)]/[Pi(E/Hk) × Pi(Hk)]

If the odds of Hj relative to Hk are defined as ratio of their probabilities, then from the Ratio Formula it follows that, in a case in which change in degrees of belief results from conditionalizing on E, the final odds (Pf(Hj)/Pf(Hk)) result from multiplying the initial odds (Pi(Hj)/Pi(Hk)) by the likelihood ratio (Pi(E/Hj)/Pi(E/Hk)). Thus, in pairwise comparisons of the odds of hypotheses, the likelihood ratio is the crucial determinant of the effect of the evidence on the odds.

F. Subjective and Objective Bayesianism. Are there constraints on prior probabilities other than the probability laws? Consider a situation in which you are to draw a ball from an urn filled with red and black balls. Suppose you have no other information about the urn. What is the prior probability (before drawing a ball) that, given that a ball is drawn from the urn, that the drawn ball will be black? The question divides Bayesians into two camps:

(a) Subjective Bayesians emphasize the relative lack of rational constraints on prior probabilities. In the urn example, they would allow that any prior probability between 0 and 1 might be rational (though some Subjective Bayesians (e.g., Jeffrey) would rule out the two extreme values, 0 and 1). The most extreme Subjective Bayesians (e.g., de Finetti) hold that the only rational constraint on prior probabilities is probabilistic coherence. Others (e.g., Jeffrey) classify themselves as subjectivists even though they allow for some relatively small number of additional rational constraints on prior probabilities. Since subjectivists can disagree about particular constraints, what unites them is that their constraints rule out very little. For Subjective Bayesians, our actual prior probability assignments are largely the result of non-rational factors—for example, our own unconstrained, free choice or evolution or socialization.

(b) Objective Bayesians (e.g., Jaynes and Rosenkrantz) emphasize the extent to which prior probabilities are rationally constrained. In the above example, they would hold that rationality requires assigning a prior probability of 1/2 to drawing a black ball from the urn. They would argue that any other probability would fail the following test: Since you have no information at all about which balls are red and which balls are black, you must choose prior probabilities that are invariant with a change in label (“red” or “black”). But the only prior probability assignment that is invariant in this way is the assignment of prior probability of 1/2 to each of the two possibilities (i.e., that the ball drawn is black or that it is red).

In the limit, an Objective Bayesian would hold that rational constraints uniquely determine prior probabilities in every circumstance. This would make the prior probabilities logical probabilities determinable purely a priori. None of those who identify themselves as Objective Bayesians holds this extreme form of the view. Nor do they all agree on precisely what the rational constraints on degrees of belief are. For example, Williamson does not accept Conditionalization in any form as a rational constraint on degrees of belief. What unites all of the Objective Bayesians is their conviction that in many circumstances, symmetry considerations uniquely determine the relevant prior probabilities and that even when they don't uniquely determine the relevant prior probabilities, they often so constrain the range of rationally admissible prior probabilities, as to assure convergence on the relevant posterior probabilities. Jaynes identifies four general principles that constrain prior probabilities, group invariance, maximium entropy, marginalization, and coding theory, but he does not consider the list exhaustive. He expects additional principles to be added in the future. However, no Objective Bayesian claims that there are principles that uniquely determine rational prior probabilities in all cases.

By introducing symmetry constraints on prior probabilities, the Objective Bayesians inherit the difficulties of the classical Principle of Indifference, so-named by Keynes, but usually attributed to Laplace. The simple example of the urn illustrates how invariance considerations can be used to give content to the Principle of Indifference. There the objectivist is able to uniquely determine the prior probabilities from the requirement that the rational prior probabilities should be invariant under switching the labels used to classify the balls in the urn.

However, it is generally agreed by both objectivists and subjectivists that ignorance alone cannot be the basis for assigning prior probabilities. The reason is that in any particular case there must be some information to pick out which parameters or which transformations are the ones among which one is to be indifferent. Without such information, indifference considerations lead to paradox. Objective Bayesians have been quite creative in finding ways to resolve many of the paradoxes (e.g., Jeffreys' solution to Bertrand's Pardox, Jaynes's solution to Buffon's Needle Paradox, or Mikkelson's solution to van Mises' Paradox). But there are always more paradoxes. Charles, Höcker, Lacker, Le Diberder, and T'Jampens provide an actual example from physics where maximum entropy yields conflicting results depending on parameterization and where a frequentist approach seems to be superior to any Objective Bayesian approach that employs any form of Conditionalization.

G. The typical differential effect of positive evidence and negative evidence. Hempel first pointed out that we typically expect the hypothesis that all ravens are black to be confirmed to some degree by the observation of a black raven, but not by the observation of a non-black, non-raven. Let H be the hypothesis that all ravens are black. Let E1 describe the observation of a non-black, non-raven. Let E2 describe the observation of a black raven. Bayesian Confirmation Theory actually holds that both E1 and E2 may provide some confirmation for H. Recall that E1 supports H just in case Pi(E1/H)/Pi(E1) > 1. It is plausible to think that this ratio is ever so slightly greater than one. On the other hand, E2 would seem to provide much greater confirmation to H, because, in this example, it would be expected that Pi(E2/H)/Pi(E2) >> Pi(E1/H)/Pi(E1).

These are only a sample of the results that have provided support for Bayesian Confirmation Theory as a theory of rational inference for science. For further examples, see Howson and Urbach. It should also be mentioned that an important branch of statistics, Bayesian statistics is based on the principles of Bayesian epistemology.

5. Bayesian Social Epistemology

One of the important developments in Bayesian epistemology has been the exploration of the social dimension to inquiry. The obvious example is scientific inquiry, because it is the community of scientists, rather than any individual scientist, who determine what is or is not accepted in the discipline. In addition, scientists typically work in research groups and even those who work alone rely on the reports of other scientists to be able to design and carry out their own work. Other important examples of the social dimension to knowledge include the use of juries to make factual determinations in the legal system and the decentralization of knowledge over the Internet.

There are two ways that Bayesian epistemology can be applied to social inquiry:

(1) Bayesian epistemology of testimony (understood generally, to include not only personal testimony but all media sources of information). Goldman has developed a Bayesian epistemology of testimony and applied it to social entities such as science and the legal system. In any such approach, a crucial issue is how to evaluate the reliability of the reports one receives. Goldman's approach is to focus on institutional design to motivate the production of reliable reports. Bovens and Hartmann instead try to model how, when there are reports from multiple sources, a Bayesian agent can use probabilistic reasoning to judge the reliability of the reports, and thus, how much credence to place in them. The idea that in evaluating the probability of a report we are implicitly evaluating the reliability of the reporter is developed by Barnes as a potential explanation of the prediction/accommodation asymmetry, discussed in the next section.

(2) Aggregate Bayesianism. If scientific knowledge or jury deliberations produce a group product, it is natural to consider whether the group's knowledge can be represented in aggregate form. In Bayesian terms, the question is whether the individuals' probabililty assignments can be usefully aggregated into a single probability assignment that reflects the group's knowledge. Although Seidenfeld, Kadane, and Schervish have shown that there is generally no way to define an aggregate Bayesian expected utility maximizer to represent the Pareto preferences of a group of two or more individual Bayesian expected utility maximizers, there is no impossibility result precluding the aggregation of individual probabililty assignments into a group probability assignment. However, there is no generally agreed upon rule for doing so. If a group of Bayesian individuals all had begun from the same initial probabilities, then simply sharing their evidence would lead them all to the same final probabilities. It may seem unfortunate that unanimity in science and other social endeavors cannot be achieved so easily, but Kitcher has argued that this is a mistake, because cognitive diversity plays an important role in scientific progress.

The fruitfulness of Bayesian social epistemology may ultimately depend on whether or not the idealizations of Bayesian theory are too unrealistic. For example, if one of the important effects of jury deliberations is that they tend to provide a way for the group to correct for the irrationality of individual members, then no model of jurors as ideal Bayesians is likely to be able to explain that feature of the jury system.

6. Potential Problems

This section reviews some of the most important potential problems for Bayesian Confirmation Theory and for Bayesian epistemology generally. No attempt is made to evaluate their seriousness here, though there is no generally agreed upon Bayesian solution to any of them.

6.1 Objections to the Probability Laws as Standards of Synchronic Coherence

A. The assumption of logical omniscience. The assumption that degrees of belief satisfy the probability laws implies omniscience about deductive logic, because the probability laws require that all deductive logical truths have probability one, all deductive inconsistencies have probability zero, and the probability of any conjunction of sentences be no greater than any of its deductive consequences. This seems to be an unrealistic standard for human beings. Hacking and Garber have made proposals to relax the assumption of logical omniscience. Because relaxing that assumption would block the derivation of almost all the important results in Bayesian epistemology, most Bayesians maintain the assumption of logical omniscience and treat it as an ideal to which human beings can only more or less approximate.

B. The special epistemological status of the laws of classical logic. Even if the assumption of logical omniscience is not too much of an idealization to provide a useful model for human reasoning, it has another potentially troubling consequence. It commits Bayesian epistemology to some sort of a priori/a posteriori distinction, because there could be no Bayesian account of how empirical evidence might make it rational to adopt a theory with a non-classical logic. In this respect, Bayesian epistemology carries over the presumption from traditional epistemology that the laws of logic are immune to revision on the basis of empirical evidence.

It is open to the Bayesian to try to downplay the significance of this consequence, by articulating an a priori/a posteriori distinction that aims to be pragmatic rather than metaphysical (e.g., Carnap's analytic/synthetic distinction). However, any such account must address Quine's well-known holistic challenge to the analytic-synthetic distinction.

6.2 Objections to The Simple Principle of Conditionalization as a Rule of Inference and Other Objections to Bayesian Confirmation Theory

A. The problem of uncertain evidence. The Simple Principle of Conditionalization requires that the acquisition of evidence be representable as changing one's degree of belief in a statement E to one — that is, to certainty. But many philosophers would object to assigning probability of one to any contingent statement, even an evidential statement, because, for example, it is well-known that scientists sometimes give up previously accepted evidence. Jeffrey has proposed a generalization of the Principle of Conditionalization that yields that principle as a special case. Jeffrey's idea is that what is crucial about observation is not that it yields certainty, but that it generates a non-inferential change in the probability of an evidential statement E and its negation ~E (assumed to be the locus of all the non-inferential changes in probability) from initial probabilities between zero and one to Pf(E) and Pf(~E) = [1 − Pf(E)]. Then on Jeffrey's account, after the observation, the rational degree of belief to place in an hypothesis H would be given by the following principle:

Principle of Jeffrey Conditionalization:
Pf(H) = Pi(H/E) × Pf(E) + Pi(H/~E) × Pf(~E) [where E and H are both assumed to have prior probabilities between zero and one]

Counting in favor of Jeffrey's Principle is its theoretical elegance. Counting against it is the practical problem that it requires that one be able to completely specify the direct non-inferential effects of an observation, something it is doubtful that anyone has ever done. Skyrms has given it a Dutch Book defense.

B. The problem of old evidence. On a Bayesian account, the effect of evidence E in confirming (or disconfirming) a hypothesis is solely a function of the increase in probability that accrues to E when it is first determined to be true. This raises the following puzzle for Bayesian Confirmation Theory discussed extensively by Glymour: Suppose that E is an evidentiary statement that has been known for some time — that is, that it is old evidence; and suppose that H is a scientific theory that has been under consideration for some time. One day it is discovered that H implies E. In scientific practice, the discovery that H implied E would typically be taken to provide some degree of confirmatory support for H. But Bayesian Confirmation Theory seems unable to explain how a previously known evidentiary statement E could provide any new support for H. For conditionalization to come into play, there must be a change in the probability of the evidence statement E. Where E is old evidence, there is no change in its probability. Some Bayesians who have tried to solve this problem (e.g., Garber) have typically tried to weaken the logical omniscience assumption to allow for the possibility of discovering logical relations (e.g., that H and suitable auxiliary assumptions imply E). As mentioned above, relaxing the logical omniscience assumption threatens to block the derivation of almost all of the important results in Bayesian epistemology. Other Bayesians (e.g., Lange) employ the Bayesian formalism as a tool in the rational reconstruction of the evidentiary support for a scientific hypothesis, where it is irrelevant to the rational reconstruction whether the evidence was discovered before or after the theory was initially formulated. Joyce and Christensen agree that discovering new logical relations between previously accepted evidence and a theory cannot raise the probability of the theory. However, they suggest that using Pi(H/E) − Pi(H/-E) as a measure of support can at least explain how evidence that has probability one could still support a theory. Eells and Fitelson have criticized this proposal and argued that the problem is better addressed by distinguishing two measures, the historical measure of the degree to which a piece of evidence E actually confirmed an hypothesis H and the ahistorical measure of how much a piece of evidence E would support an hypothesis H, on given background information B. The second measure enables us to ask the ahistorical question of how much E would support H if we had no other evidence supporting H.

C. The problem of rigid conditional probabilities. When one conditionalizes, one applies the initial conditional probabilities to determine final unconditional probabilities. Throughout, the conditional probabilities themselves do not change; they remain rigid. Examples of the Problem of Old Evidence are but one of a variety of cases in which it seems that it can be rational to change one's initial conditional probabilities. Thus, many Bayesians reject the Simple Principle of Conditionalization in favor of a qualified principle, limited to situations in which one does not change one's initial conditional probabilities. There is no generally accepted account of when it is rational to maintain rigid initial conditional probabilities and when it is not.

D. The problem of prediction vs. accommodation. Related to the problem of Old Evidence is the following potential problem: Consider two different scenarios. In the first, theory H was developed in part to accommodate (i.e., to imply) some previously known evidence E. In the second, theory H was developed at a time when E was not known. It was because E was derived as a prediction from H that a test was performed and E was found to be true. It seems that E's being true would provide a greater degree of confirmation for H if the truth of E had been predicted by H than if H had been developed to accommodate the truth of E. There is no general agreement among Bayesians about how to resolve this problem. Some (e.g., Horwich) argue that Bayesianism implies that there is no important difference between prediction and accommodation, and try to defend that implication. Others (e.g., Maher) argue that there is a way to understand Bayesianism so as to explain why there is an important difference between prediction and accommodation.

E. The problem of new theories. Suppose that there is one theory H1 that is generally regarded as highly confirmed by the available evidence E. It is possible that simply the introduction of an alternative theory H2 can lead to an erosion of H1's support. It is plausible to think that Copernicus' introduction of the heliocentric hypothesis had this effect on the previously unchallenged Ptolemaic earth-centered astronomy. This sort of change cannot be explained by conditionalization. It is for this reason that many Bayesians prefer to focus on probability ratios of hypotheses (see the Ratio Formula above), rather than their absolute probability; but it is clear that the introduction of a new theory could also alter the probability ratio of two hypotheses — for example, if it implied one of them as a special case.

F. The problem of the priors. Are there constraints on prior probabilities other than the probability laws? This is the issue that divides the Subjective from the Objective Bayesians, as discussed above. Consider Goodman's “new riddle of induction”: In the past all observed emeralds have been green. Do those observations provide any more support for the generalization that all emeralds are green than they do for the generalization that all emeralds are grue (green if observed before now; blue if observed later); or do they provide any more support for the prediction that the next emerald observed will be green than for the prediction that the next emerald observed will be grue (i.e., blue)? Almost everyone agrees that it would be irrational to have prior probabilities that were indifferent between green and grue, and thus made predictions of greeness no more probable than predictions of grueness. But there is no generally agreed upon explanation of this constraint.

The problem of the priors identifies an important issue between the Subjective and Objective Bayesians. If the constraints on rational inference are so weak as to permit any or almost any probabilistically coherent prior probabilities, then there would be nothing to make inferences in the sciences any more rational than inferences in astrology or phrenology or in the conspiracy reasoning of a paranoid schizophrenic, because all of them can be reconstructed as inferences from probabilistically coherent prior probabilities. Some Subjective Bayesians believe that their position is not objectionably subjective, because of results (e.g., Doob or Gaifman and Snir) proving that even subjects beginning with very different prior probabilities will tend to converge in their final probabilities, given a suitably long series of shared observations. These convergence results are not completely reassuring, however, because they only apply to agents who already have significant agreement in their priors and they do not assure convergence in any reasonable amount of time. Also, they typically only guarantee convergence on the probability of predictions, not on the probability of theoretical hypotheses. For example, Carnap favored prior probabilities that would never raise above zero the probability of a generalization over a potentially infinite number of instances (e.g., that all crows are black), no matter how many observations of positive instances (e.g., black crows) one might make without finding any negative instances (i.e., non-black crows). In addition, the convergence results depend on the assumption that the only changes in probabilities that occur are those that are the non-inferential results of observation on evidential statements and those that result from conditionalization on such evidential statements. But almost all subjectivists allow that it can sometimes be rational to change one's prior probability assignments.

Because there is no generally agreed upon solution to the Problem of the Priors, it is an open question whether Bayesian Confirmation Theory has inductive content, or whether it merely translates the framework for rational belief provided by deductive logic into a corresponding framework for rational degrees of belief.

7. Other Principles of Bayesian Epistemology

Other principles of Bayesian epistemology have been proposed, but none has garnered anywhere near a majority of support among Bayesians. The most important proposals are merely mentioned here. It is beyond the scope of this entry to discuss them in any detail.

A. Other principles of synchronic coherence. Are the probability laws the only standards of synchronic coherence for degrees of belief? Van Fraassen has proposed an additional principle (Reflection or Special Reflection), which he now regards as a special case of an even more general principle (General Reflection).[3]

B. Other probabilistic rules of inference. There seem to be at least two different concepts of probability: the probability that is involved in degrees of belief (epistemic or subjective probability) and the probability that is involved in random events, such as the tossing of a coin (chance). De Finetti thought this was a mistake and that there was only one kind of probability, subjective probability. For Bayesians who believe in both kinds of probability, an important question is: What is (or should be) the relation between them? The answer can be found in the various proposals for principles of direct inference in the literature. Typically, principles of direct inference are proposed as principles for inferring subjective or epistemic probabilities from beliefs about objective chance (e.g., Pollock). Lewis reverses the direction of inference, and proposes to infer beliefs about objective chance from subjective or epistemic probabilities, via his (Reformulated) Principal Principle.[4] Strevens argues that it is Lewis's Principal Principle that gives Bayesianism its inductive content.

C. Principles of rational acceptance. What is the relation between beliefs and degrees of belief? Jeffrey proposes to give up the notion of belief (at least for empirical statements) and make do with only degrees of belief. Other authors (e.g., Levi, Maher, Kaplan) propose principles of rational acceptance as part of accounts of when it is rational to accept a statement as true, not merely to regard it as probable.

Bibliography

Other Internet Resources

[Please contact the author with suggestions]

Related Entries

Bayes' Theorem | logic: inductive | probability, interpretations of

Acknowledgments

In the preparation of this article, I have benefited from comments from Marc Lange, Stephen Glaister, Laurence BonJour, and James Joyce.