Inductive Logic > Proof of the Non-Falsifying Refutation Theorem (Stanford Encyclopedia of Philosophy/Summer 2010 Edition)

Supplement to Inductive Logic

Proof of the Non-Falsifying Refutation Theorem

Here again we explicitly treat the case where only condition-independence is assumed. If result-independence holds as well, all occurrences of ‘(c^k−1·e^k−1)’ may be dropped, which gives the results stated in the text. If neither independence condition holds, all occurrences of ‘c_k·(c^k−1·e^k−1)’ are replaced by ‘cⁿ·e^k−1’ and occurrences of ‘b·c^k−1’ are replaced by ‘b·cⁿ’.

The proof of Convergence Theorem 2 requires the introduction of one more concept, that of the variance in the quality of information for a sequence of experiments or observations, VQI[cⁿ | h_i/h_j | b]. The quality of the information QI from a specific outcome sequence eⁿ will vary somewhat from the expected quality of information for conditions cⁿ. A common statistical measure of how widely individual values tend to vary from an expected value is given by the expected squared distance from the expected value, which is a quantity called the variance.

Definition: VQI — the Variance in the Quality of Information.
For h_j outcome-compatible with h_i on c^k, define
VQI[c_k | h_i/h_j | b·(c^k−1·e^k−1)] =
∑_u (QI[o_ku|h_i/h_j|b·c_k·(c^k−1·e^k−1)] − EQI[c_k|h_i/h_j|b·(c^k−1·e^k−1)])² · P[o_ku|h_i·b·c_k·(c^k−1·e^k−1)];

Next define VQI[c_k | h_i/h_j | b·c^k−1] =

∑_{e^k−1} VQI[c_k | h_i/h_j | b·(c^k−1·e^k−1)] · P[e^k−1 | h_i·b·c^k−1].

For a sequence cⁿ of observations on which h_j is outcome-compatible with h_i, define VQI[cⁿ | h_i/h_j | b] =

∑_{eⁿ} (QI[eⁿ | h_i/h_j | b·cⁿ] − EQI[cⁿ | h_i/h_j | b])² · P[eⁿ | h_i·b·cⁿ].

Clearly VQI will be positive unless h_i and h_j agree on the likelihoods of all possible outcome sequences in the evidence stream, in which case both EQI[cⁿ | h_i/h_j | b] and VQI[cⁿ | h_i/h_j | b] equal 0.

VQI[cⁿ | h_i/h_j | b] does not generally decompose into the sum of the VQI for individual experiments or observations c_k. However, when both independence conditions hold, the decomposition into the sum does follow.

Theorem: The VQI Decomposition Theorem for Independent Evidence:
Suppose both condition independence and result-independence hold. Then

VQI[cⁿ | h_i/h_j | b] = n
∑
k=1 VQI[c_k | h_i/h_j | b].

For the Proof, we employ the following abbreviations:

Q[e_k] = QI[e_k | h_i/h_j | b·c_k]

Q[e^k] = QI[e^k | h_i/h_j | b·c^k]

E[c_k] = EQI[c_k | h_i/h_j | b]

E[c^k] = EQI[c^k | h_i/h_j | b]

V[c_k] = VQI[c_k | h_i/h_j | b]

V[c^k] = VQI[c^k | h_i/h_j | b]

The equation stated by the theorem may be derived as follows:

V[cⁿ]

= ∑_{eⁿ} (Q[eⁿ] − E[cⁿ])² · P[eⁿ | h_i·b·cⁿ]

= ∑_{eⁿ} ((Q[e_n]+Q[eⁿ⁻¹]) − (E[c_n]+E[cⁿ⁻¹]))² · P[e_n | h_i·b·c_n]·P[eⁿ⁻¹ | h_i·b·cⁿ⁻¹]

= ∑_{eⁿ⁻¹} ∑_{{e_n}} ((Q[e_n]−E[c_n]) + (Q[eⁿ⁻¹]−E[cⁿ⁻¹]))² ·
  P[e_n | h_i·b·c_n]·P[eⁿ⁻¹ | h_i·b·cⁿ⁻¹]

= ∑_{eⁿ⁻¹} ∑_{{e_n}} ( (Q[e_n]−E[c_n])² + (Q[eⁿ⁻¹]−E[cⁿ⁻¹])² +
   2·(Q[e_n]−E[c_n])·(Q[eⁿ⁻¹]−E[cⁿ⁻¹]) ) · P[e_n | h_i·b·c_n]·P[eⁿ⁻¹ | h_i·b·cⁿ⁻¹]

= V[c_n] + V[cⁿ⁻¹] +
   2·∑_{eⁿ⁻¹} ∑_{{e_n}}(Q[e_n]·Q[eⁿ⁻¹] − Q[e_n]·E[cⁿ⁻¹] − E[c_n]·Q[eⁿ⁻¹] +
    E[c_n]·E[cⁿ⁻¹]) · P[e_n | h_i·b·c_n]·P[eⁿ⁻¹ | h_i·b·cⁿ⁻¹]

= V[c_n] + V[cⁿ⁻¹] +
2 · (E[c_n]·E[cⁿ⁻¹] − E[c_n]·E[cⁿ⁻¹] − E[c_n]·E[cⁿ⁻¹] + E[c_n]·E[cⁿ⁻¹])

= V[c_n] + V[cⁿ⁻¹]

= …

= n
∑
k = 1 VQI[c_k | h_i/h_j | b].

By averaging the values of VQI[cⁿ | h_i/h_j | b] over the number of observations n we obtain a measure of the average variance in the quality of the information due to cⁿ. We represent this average by underlining ‘VQI’.

Definition: The Average Variance in the Quality of Information
VQI[cⁿ | h_i/h_j | b] = VQI[cⁿ | h_i/h_j | b] ÷ n.

VQI is only a true average, a sum of n terms divided by n, when the independent evidence conditions hold. But our definition here does not presuppose independence, and the notion of “averaging” VQI, VQI, by dividing by the number of experiments and observations turns out to be useful even when the evidence is not independent.

We are now in a position to state a very general version of the second part of the Likelihood Ratio Convergence Theorem. It applies to all evidence streams not containing possibly falsifying outcomes for h_j. That is, it applies to all evidence streams for which h_j is outcome-compatible with h_i on each c_k in the stream. This theorem is essentially a specialized version of Chebyshev's Theorem, which is a so-called Weak Law of Large Numbers. This version of the theorem presupposes neither of the independence conditions.

Theorem 2*: Non-falsifying Likelihood Ratio Convergence Theorem
Choose positive ε < 1, as small as you like, but large enough that (for the number of observations n being contemplated) the value of EQI[cⁿ | h_i/h_j | h_i·b] > −(log ε)/n. Then
P[∨{eⁿ : P[eⁿ | h_j·b·cⁿ]/P[eⁿ | h_i·b·cⁿ] < ε} | h_i·b·cⁿ] ≥

1 −

1

n

·

VQI[cⁿ | h_i/h_j | b],

(EQI[cⁿ | h_i/h_j | h_i·b] + (log ε)/n )²

Thus, provided that the average expected quality of the information, EQI[cⁿ | h_i/h_j | b], for the stream of experiments and observations cⁿ doesn't get too small (as n increases), and provided that the average variance, VQI[cⁿ | h_i/h_j | b], doesn't blow up (e.g. it is bounded above), hypothesis h_i say it is highly likely that outcomes of cⁿ will be such as to make the likelihood ratio against h_j as compared to h_i as small as you like, as n increases.

Proof: Let

V = VQI[cⁿ | h_i/h_j | b]

E = EQI[cⁿ | h_i/h_j | b]

Q[eⁿ] = QI[eⁿ | h_i/h_j | b·cⁿ] = log(P[eⁿ | h_i·b·cⁿ]/P[eⁿ | h_j·b·cⁿ])

Choose any small ε > 0, and suppose (for n large enough) that E > −(log ε)/n. Then we have

V = ∑{eⁿ: P[eⁿ | h_j·b·cⁿ] > 0} (E − Q)² · P[eⁿ | h_i·b·cⁿ]

≥ ∑{eⁿ: P[eⁿ|h_j·b·cⁿ] > 0 & Q[eⁿ] ≤ −(log ε)} (E − Q)² · P[eⁿ|h_i·b·cⁿ]

≥ (E + (log ε))² · ∑{eⁿ: P[eⁿh_j·b·cⁿ] > 0 & Q[eⁿ] ≤ −(log ε)} P[eⁿh_i·b·cⁿ]

= (E + (log ε))² · P[∨{eⁿ: P[eⁿh_j·b·cⁿ] > 0 &Q[eⁿ] ≤ log(1/ε)}h_i·b·cⁿ]

= (E + (log ε))² · P[∨{eⁿ: P[eⁿ | h_j·b·cⁿ]/P[eⁿ | h_i·b·cⁿ] ≥ ε} | h_i·b·cⁿ]

So,

V

n·(E + (log ε)/n)²

= V/(E + (log ε))²

≥ P[∨{eⁿ: P[eⁿ | h_j·b·cⁿ]/P[eⁿ | h_i·b·cⁿ] ≥ ε} | h_i·b·cⁿ]

= 1 − P[∨{eⁿ: P[eⁿ | h_j·b·cⁿ]/P[eⁿ | h_i·b·cⁿ] < ε} | h_i·b·cⁿ]

Thus, for any small ε > 0,

P[∨{eⁿ: P[eⁿ | h_j·b·cⁿ]/P[eⁿ | h_i·b·cⁿ] < ε} | h_i·b·cⁿ] ≥

1 −

V

n·(E + (log ε)/n)²

(End of Proof)

The previous theorem shows that when VQI is bounded above, a sufficiently long stream of evidence will very likely result in the refutation of false competitors of a true hypothesis. This claim holds regardless of whether the evidence can be chunked into independent pieces. However, we can use the independence conditions to describe a very simple provision under which VQI is indeed bounded above. This gives us the theorem stated in the main text.

Likelihood Ratio Convergence Theorem 2 — The Non-falsifying Refutation Theorem.
Suppose that the independent evidence conditions hold. And suppose γ > 0 is a number smaller than 1/e² (≈ .135). And suppose that for each possible outcome o_ku of each observation condition c_k in cⁿ, either
P[o_ku | h_i·b·c_k·(c^k−1·e^k−1)] = 0

or

P[o_ku | h_j·b·c_k·(c^k−1·e^k−1)] / P[o_ku | h_i·b·c_k·(c^k−1·e^k−1)] ≥ γ.

Choose positive ε < 1, as small as you like, but large enough (for the number of observations n being contemplated) that the value of EQI[cⁿ | h_i/h_j | h_i·b] > −(log ε)/n. Then

P[∨{eⁿ : P[eⁿh_j·b·cⁿ] / P[eⁿh_i·b·cⁿ] < ε}h_i·b·cⁿ] >

1 −

1

n

·

(log γ)²

(EQI[cⁿ | h_i/h_j | h_i·b] + (log ε)/n )²

Proof: This follows from Theorem 2* together with the following observation, which holds given the independence conditions:

If for each c_k in cⁿ, for each of its possible outcomes o_ku, either P[o_ku | h_j·b·c_k] = 0 or P[o_ku | h_j·b·c_k]/P[o_ku | h_i·b·c_k] ≥ γ > 0, where γ < 1, then V = VQI[cⁿ | h_i/h_j | b] ≤ (log γ)².

To see that this observation holds, assume its antecedent.

First notice that when 0 < P[e_k | h_j·b·c_k] < P[e_k | h_i·b·c_k] we have
(log[P[e_k | h_i·b·c_k]/P[e_k | h_j·b·c_k]])² · P[e_k | h_i·b·c_k] ≤ (log γ)² · P[e_k | h_i·b·c_k].

So we only need establish that when P[e_k | h_j·b·c_k] > P[e_k | h_i·b·c_k] > 0, we will also have this relationship — i.e., we will also have

(log[P[e_k | h_i·b·c_k]/P[e_k | h_j·b·c_k]])² · P[e_k | h_i·b·c_k] ≤ (log γ)² · P[e_k | h_i·b·c_k].

(Then it will follow easily that VQI[cⁿ | h_i/h_j | b] ≤ (log γ)², and we'll be done.)
To establish the needed relationship, suppose that P[e_k | h_j·b·c_k] > P[e_k | h_i·b·c_k] > 0. Notice that for all p ≤ q, p and q between 0 and 1, the function g(p) = (log(p/q))² · p has a minimum at p = q, where g(p) = 0, and (for p < q) has a maximum value at p = q/e² — i.e., at p/q = 1/e². (To get this, take the derivative of g(p) with respect to p and set it equal to 0; this gives a maximum for g(p) at p = q/e².)
So, for 0 < P[e_k | h_i·b·c_k] < P[e_k | h_j·b·c_k] we have

(log(P[e_k | h_i·b·c_k]/P[e_k | h_j·b·c_k]))² · P[e_k | h_i·b·c_k] ≤ (log(1/e²))² · P[e_k | h_j·b·c_k] ≤ (log γ)² · P[e_k | h_j·b·c_k]

(since, for γ ≤ 1/e² we have logγ ≤ log(1/e²) < 0; so (logγ)² ≥ (log(1/e²))² > 0).

Now (assuming the antecedent of the theorem), for each c_k,

	VQI[c_k \| h_i/h_j \| b]
=	∑{o_ku: P[o_ku \| h_j·b·c_k] > 0} (EQI[c_k] − QI[c_k])² · P[o_ku \| h_i·b·c_k]
=	∑{o_ku: P[o_ku \| h_j·b·c_k] > 0} (EQI[c_k]² − 2·QI[c_k]·EQI[c_k] + QI[c_k]²) · P[o_ku \| h_i·b·c_k]
=	∑{o_ku: P[o_ku \| h_j·b·c_k] > 0} QI[c_k]² · P[o_ku \| h_i·b·c_k] − EQI[c_k]²
≤	∑{o_ku: P[o_ku \| h_j·b·c_k] > 0} QI[c_k]² · P[o_ku \| h_i·b·c_k] ≤ (log γ)².

So, given independence,

VQI[c_k | h_i/h_j | b] =

(1/n) · n
∑
k = 1 VQI[c_k|h_i/h_j | b]

≤ (log γ)².

[Back to Text]

	VQI[c_k \| h_i/h_j \| b]
=	∑{o_ku: P[o_ku \| h_j·b·c_k] > 0} (EQI[c_k] − QI[c_k])² · P[o_ku \| h_i·b·c_k]
=	∑{o_ku: P[o_ku \| h_j·b·c_k] > 0} (EQI[c_k]² − 2·QI[c_k]·EQI[c_k] + QI[c_k]²) · P[o_ku \| h_i·b·c_k]
=	∑{o_ku: P[o_ku \| h_j·b·c_k] > 0} QI[c_k]² · P[o_ku \| h_i·b·c_k] − EQI[c_k]²
≤	∑{o_ku: P[o_ku \| h_j·b·c_k] > 0} QI[c_k]² · P[o_ku \| h_i·b·c_k] ≤ (log γ)².

Q[e_k]	=	QI[e_k \| h_i/h_j \| b·c_k]
Q[e^k]	=	QI[e^k \| h_i/h_j \| b·c^k]
E[c_k]	=	EQI[c_k \| h_i/h_j \| b]
E[c^k]	=	EQI[c^k \| h_i/h_j \| b]
V[c_k]	=	VQI[c_k \| h_i/h_j \| b]
V[c^k]	=	VQI[c^k \| h_i/h_j \| b]

=	∑_{eⁿ} (Q[eⁿ] − E[cⁿ])² · P[eⁿ \| h_i·b·cⁿ]
=	∑_{eⁿ} ((Q[e_n]+Q[eⁿ⁻¹]) − (E[c_n]+E[cⁿ⁻¹]))² · P[e_n \| h_i·b·c_n]·P[eⁿ⁻¹ \| h_i·b·cⁿ⁻¹]
=	∑_{eⁿ⁻¹} ∑_{{e_n}} ((Q[e_n]−E[c_n]) + (Q[eⁿ⁻¹]−E[cⁿ⁻¹]))² · P[e_n \| h_i·b·c_n]·P[eⁿ⁻¹ \| h_i·b·cⁿ⁻¹]
=	∑_{eⁿ⁻¹} ∑_{{e_n}} ( (Q[e_n]−E[c_n])² + (Q[eⁿ⁻¹]−E[cⁿ⁻¹])² + 2·(Q[e_n]−E[c_n])·(Q[eⁿ⁻¹]−E[cⁿ⁻¹]) ) · P[e_n \| h_i·b·c_n]·P[eⁿ⁻¹ \| h_i·b·cⁿ⁻¹]
=	V[c_n] + V[cⁿ⁻¹] + 2·∑_{eⁿ⁻¹} ∑_{{e_n}}(Q[e_n]·Q[eⁿ⁻¹] − Q[e_n]·E[cⁿ⁻¹] − E[c_n]·Q[eⁿ⁻¹] + E[c_n]·E[cⁿ⁻¹]) · P[e_n \| h_i·b·c_n]·P[eⁿ⁻¹ \| h_i·b·cⁿ⁻¹]
=	V[c_n] + V[cⁿ⁻¹] + 2 · (E[c_n]·E[cⁿ⁻¹] − E[c_n]·E[cⁿ⁻¹] − E[c_n]·E[cⁿ⁻¹] + E[c_n]·E[cⁿ⁻¹])
=	V[c_n] + V[cⁿ⁻¹]
=	…

V	=	VQI[cⁿ \| h_i/h_j \| b]
E	=	EQI[cⁿ \| h_i/h_j \| b]
Q[eⁿ]	=	QI[eⁿ \| h_i/h_j \| b·cⁿ] = log(P[eⁿ \| h_i·b·cⁿ]/P[eⁿ \| h_j·b·cⁿ])

V	=	∑{eⁿ: P[eⁿ \| h_j·b·cⁿ] > 0} (E − Q)² · P[eⁿ \| h_i·b·cⁿ]
	≥	∑{eⁿ: P[eⁿ\|h_j·b·cⁿ] > 0 & Q[eⁿ] ≤ −(log ε)} (E − Q)² · P[eⁿ\|h_i·b·cⁿ]
	≥	(E + (log ε))² · ∑{eⁿ: P[eⁿh_j·b·cⁿ] > 0 & Q[eⁿ] ≤ −(log ε)} P[eⁿh_i·b·cⁿ]
	=	(E + (log ε))² · P[∨{eⁿ: P[eⁿh_j·b·cⁿ] > 0 &Q[eⁿ] ≤ log(1/ε)}h_i·b·cⁿ]
	=	(E + (log ε))² · P[∨{eⁿ: P[eⁿ \| h_j·b·cⁿ]/P[eⁿ \| h_i·b·cⁿ] ≥ ε} \| h_i·b·cⁿ]