Inductive Logic > The Effect on EQI of Partitioning the Outcome Space More Finely — and Proof of the Nonnegativity of EQI Theorem (Stanford Encyclopedia of Philosophy/Summer 2010 Edition)

Supplement to Inductive Logic

The Effect on EQI of Partitioning the Outcome Space More Finely — and Proof of the Nonnegativity of EQI Theorem

Here again we will only explicitly treat the case where condition-independence is assumed. If result-independence holds as well, all occurrences of ‘(c^k−1·e^k−1)’ may be dropped, which gives the theorem stated in the text. If neither independence condition holds, all occurrences of ‘c_k·(c^k−1·e^k−1)’ here are replaced by ‘cⁿ·e^k−1’, and occurrences of ‘b·c^k−1’ are replaced by ‘b·cⁿ’.

Given some experiment or observation (or series of them) c, is there any special advantage to parsing the space of possible outcomes O into more, rather than fewer alternatives? Couldn't we do as well at confirming hypotheses by parsing the space of outcomes into only two or three alternatives − e.g., one possible outcome that h_i says is very likely and h_j says is rather unlikely (e.g., describing a rejection region for h_j), one that h_i says is rather unlikely and h_j says is very likely (e.g., describing a rejection region for h_i), and perhaps a third outcome on which h_i and h_j pretty much agree? The answer is no, we cannot generally do as well at confirming hypotheses this way. In general, parsing the space of outcomes into more empirically distinct alternatives results in a better measure of confirmation. To see this intuitively, suppose some outcome description q can be parsed into two distinct outcome descriptions, q₁ and q₂ (where q is equivalent to (q₁∨ q₂)), and suppose that h_i differs from h_j much more on the likelihood of q₁ than on the likelihood of q₂. Then, intuitively, when q is found to be true, whichever of the more precise descriptions, q₁ or q₂, is true should make a difference in how strongly the hypotheses are supported. So reporting whichever of q₁ or q₂ occurs will be more informative than simply reporting q. If the outcome of the experiment is only described as q, relevant information is lost.

It turns out that EQI measures how well possible outcomes can distinguish between hypotheses in a way that reflects the intuition that a finer partition of outcomes is more informative. The numerical value of EQI is always made larger by parsing the outcome space more finely, provided that the likelihoods for outcomes in the finer parsing differ at least a bit form the likelihoods for outcomes of a less refined parsing. This is important for our main convergence result because in that theorem we want EQI to be positive, and the larger the better.

The following Partition Theorem implies the Nonnegativity of EQI theorem. It show that each EQI[c_k | h_i/h_j | b·(c^k−1·e^k−1)] must be non-negative, and will be positive just in case for at least one possible outcome o_ku, P[o_ku | h_j·b·c_k·(c^k−1·e^k−1)] ≠ P[o_ku | h_i·b·c_k·(c^k−1·e^k−1)]. It also shows that EQI[c_k | h_i/h_j |b·(c^k−1·e^k−1)] generally becomes larger with finer partitionings of the outcome space.

Notice that this result (when proved) implies that

EQI[c_k | h_i/h_j | b·c^k−1] = ∑_{e^k−1} EQI[c_k | h_i/h_j | b·(c^k−1·e^k−1)] · P[e^k−1 | h_i·b·c^k−1]

must be non-negative, and will be positive iff for at least one possible outcome o_ku,

P[o_ku | h_j·b·c_k·(c^k−1·e^k−1)] ≠ P[o_ku | h_i·b·c_k·(c^k−1·e^k−1)].

And since,

EQI[cⁿ | h_i/h_j | b] = ∑_k=1ⁿ EQI[c_k | h_i/h_j | b·c^k−1],

we also get that the average EQI, EQI [cⁿ | h_i/h_j | b], must be non-negative, and must be positive iff for some k,

P[o_ku | h_j·b·c_k·(c^k−1·e^k−1)] ≠ P[o_ku | h_i·b·c_k·(c^k−1·e^k−1)];

and it becomes larger as finer partitionings make the component EQI[c_k | h_i/h_j | b·(c^k−1·e^k−1)] larger.

Partition Theorem:
For any positive real numbers r₁, r₂, s₁, s₂:

if r₁/s₁ > (r₁+r₂)/(s₁+s₂), then (r₁+r₂) log[(r₁+r₂)/(s₁+s₂)] < r₁ log[r₁/s₁] + r₂ log[r₂/s₂]; and

if r₁/s₁ = (r₁+r₂)/(s₁+s₂), then r₁ log[r₁/s₁] + r₂ log[r₂/s₂] = (r₁+r₂) log[(r₁+r₂)/(s₁+s₂)].

For the Proof, first notice that

r₁/s₁ = (r₁+r₂)/(s₁+s₂) iff r₁s₁ + r₁s₂ = s₁r₁ + s₁r₂

iff r₁/s₁ = r₂/s₂.

We establish case (2) first. Suppose the antecedent of case (2) holds. Then,

r₁ log[r₁/s₁] + r₂ log[r₂/s₂]

= r₁ log[(r₁+r₂)/(s₁+s₂)] + r₂ log[(r₁+r₂)/(s₁+s₂)]

= (r₁ + r₂) log[(r₁+r₂)/(s₁+s₂)].

To get case (1), consider the following function of p: f(p) = p log[p/u] + (1−p) log[(1−p)/v], where we only assume that u > 0, v > 0, and 0 < p < 1. This function has its minimum value when p = u/(u+v). (This is easily verified by setting the derivative of f(p) with respect to p equal to 0 to find the minimum value of f(p); and it is easy to verified that this is a minimum rather than a maximum value.) At this minimum, where p = u/(u+v), we have

f(p) = −u/(u+v) log[u+v] − v/(u+v) log[u+v]

= −log[u+v].

Thus, for all values of p other than u/(u+v),

−log[u+v] < f(p)

= p log[p/u] + (1−p) log[(1−p)/v].

That is, for p ≠ u/(u+v), −log[u+v] < p log[p/u] + (1−p) log[(1−p)/v]. Now, let p = r₁/(r₁+r₂), let u = s₁/(r₁+r₂), and let v = s₂/(r₁+r₂). Plugging into the previous formula, and multiplying both sides by (r₁+r₂), we get:

if
r₁/(r₁+r₂) ≠ s₁/(s₁+s₂) (i.e., if r₁/s₁ ≠ (r₁+r₂)/(s₁+s₂)),
then
(r₁+r₂) log[(r₁+r₂)/(s₁+s₂)] < r₁ log[r₁/s₁] + r₂ log[r₂/s₂].

This completes the proof of the theorem.

To apply this result to EQI[c_k | h_i/h_j | b·(c^k−1·e^k−1)] recall that

EQI[c_k | h_i/h_j | b·(c^k−1·e^k−1)]

= ∑{u: P[o_ku | h_j·b·c_k] > 0} log[P[o_ku | h_i·b·c_k·(c^k−1·e^k−1)] /

P[o_ku | h_j·b·c_k·(c^k−1·e^k−1)]] · P[o_ku | h_i·b·c_k·(c^k−1·e^k−1)].

Suppose c_k has m alternative outcomes o_ku on which both

P[o_ku | h_j·b·c_k·(c^k−1·e^k−1)] > 0

and

P[o_ku | h_i·b·c_k·(c^k−1·e^k−1)] > 0.

Let's label their likelihoods relative to h_i (i.e., their likelihoods P[o_ku | h_i·b·c_k·(c^k−1·e^k−1)]) as r₁, r₂, …, r_m. And let's label their likelihoods relative to h_j as s₁, s₂, …, s_m. In terms of this notation,

EQI[c_k | h_i/h_j |b] = m
∑
u = 1 r_u·log[r_u/s_u].

Notice also that (r₁+r₂+r₃+…+r_m) = 1 and (s₁+s₂+s₃+…+s_m) = 1.

Now, think of EQI[c_k | h_i/h_j | b·(c^k−1·e^k−1)] as generated by applying the theorem in successive steps:

0 = 1· log[1/1]

= (r₁+r₂+r₃+…+r_m)·log[(r₁+r₂+r₃+…+r_m)/(s₁+s₂+s₃+…+s_m)]

≤ r₁·log[r₁/s₁] + (r₂+r₃+…+r_m)· log[(r₂+r₃+…+r_m)/(s₂+s₃+…+s_m)]

≤ r₁·log[r₁/s₁] + r₂·log[r₂/s₂] + (r₃+…+r_m)·log[(r₃+…+r_m)/(s₃+…+s_m)]

≤ …

≤

m
∑
u = 1 r_u·log[r_u/s_u]

= EQI[c_k | h_i/h_j | b·(c^k−1·e^k−1)].

The theorem also says that at each step equality holds just in case

r_u/s_u = (r_u+r_u+1+…+r_m)/(s_u+s_u+1+…+s_m),

which itself holds just in case

r_u/s_u = (r_u+1+…+r_m)/(s_u+1+…+s_m).

So,

EQI[c_k | h_i/h_j | b·(c^k−1·e^k−1)] = 0

just in case

1 = (r₁+r₂+r₃+…+r_m)/(s₁+s₂+s₃+…+s_m)

= r₁/s₁

= (r₂+r₃+…+r_m)/(s₂+s₃+…+s_m)

= r₂/s₂

= (r₃+…+r_m)/(s₃+…+s_m)

= r₃/s₃

= …

= r_m/s_m.

That is,

EQI[c_k | h_i/h_j | b·(c^k−1·e^k−1)] = 0

just in case for all o_ku such that P[o_ku | h_j·b·c_k·(c^k−1·e^k−1)] > 0 and P[o_ku | h_i·b·c_k·(c^k−1·e^k−1)] > 0,

P[o_ku | h_i·b·c_k·(c^k−1·e^k−1)]/P[o_ku | h_j·b·c_k·(c^k−1·e^k−1)] = 1.

Otherwise,

EQI[c_k | h_i/h_j | b·(c^k−1·e^k−1)] > 0;

and for each successive step in partitioning the outcome space to generate EQI[c_k | h_i/h_j | b·(c^k−1·e^k−1)], if

r_u/s_u ≠ (r_u+r_u+1+…+r_m)/(s_u+s_u+1+…+s_m),

we have the strict inequality:

(r_u+r_u+1+…+r_m) · log[(r_u+r_u+1+…+r_m)/(s_u+s_u+1+…+s_m)] <
r_u·log[r_u/s_u] + (r_u+1+…+r_m)·log[(r_u+1+…+r_m)/(s_u+1+…+s_m)].

So each such partitioning of (o_ku∨o_ku+1∨…∨o_km) into two separate propositions, o_ku and (o_ku+1∨…∨o_km), adds a strictly positive contribution to the size of EQI[c_k | h_i/h_j | b·(c^k−1·e^k−1)].

[Back to Text]

r₁/s₁ = (r₁+r₂)/(s₁+s₂)	iff	r₁s₁ + r₁s₂ = s₁r₁ + s₁r₂
	iff	r₁/s₁ = r₂/s₂.

=	r₁ log[(r₁+r₂)/(s₁+s₂)] + r₂ log[(r₁+r₂)/(s₁+s₂)]
=	(r₁ + r₂) log[(r₁+r₂)/(s₁+s₂)].

=	∑{u: P[o_ku \| h_j·b·c_k] > 0} log[P[o_ku \| h_i·b·c_k·(c^k−1·e^k−1)] /
	P[o_ku \| h_j·b·c_k·(c^k−1·e^k−1)]] · P[o_ku \| h_i·b·c_k·(c^k−1·e^k−1)].

1	=	(r₁+r₂+r₃+…+r_m)/(s₁+s₂+s₃+…+s_m)
	=	r₁/s₁
	=	(r₂+r₃+…+r_m)/(s₂+s₃+…+s_m)
	=	r₂/s₂
	=	(r₃+…+r_m)/(s₃+…+s_m)
	=	r₃/s₃
	=	…
	=	r_m/s_m.