Wednesday, July 3, 2013

Please show all your work for full credit...

(I promised a chapter-by-chapter critique of "A General Theory of Information Cost incurred by Successful Search". This is quite tedious work, so I wanted to make this little point up front - for full reading pleasure you have to be acquainted with (some of) the definitions used in the paper.)
(Nota Bene: In a reply to this post, Winston Ewert wrote on Uncommon Descent: "Dieb objects that the quasi-Bayesian calculation on Page 56 is incorrect, although it obtains the correct result. However, the calculation is called a quasi-Bayesian calculation because it engages in hand-waving rather than presenting a rigorous proof. The text in question is shortly after a theorem and is intended to explicate the consequences of that theorem rather than rigorously prove its result. The calculation is not incorrect, but rather deliberately oversimplified.")

On page 55 of their article "A General Theory of Information Cost incurred by Successful Search" (free download as pdf), the authors W. A. Dembski, W. Ewert and R. J Marks II (in future I'll refer to them as DEM) write:

To see how the probability costs ossociated with null and alternative searches relate, it is instructive to consider the following two quasi-Bayesian ways of reckoning these costs:
$\mathbf{P}$(locating $T$ via null search)=$\mathbf{P}$(null search locates T & null search is available)
=$\mathbf{P}$(null search locates T|null search is avail.) × $\mathbf{P}$(null search is avail.)
=$\mathbf{U}(T) \times 1$ [because the availability of null search is taken for granted]
=$p$.

$\mathbf{P}$(locating $T$ via alt. search)=$\mathbf{P}$(alt. search locates T & alt. search is available)
=$\mathbf{P}$(alt. search locates T|alt. search is avail.) × $\mathbf{P}$(alt search is avail.)
=$\mu(T) \times \overline{\mathbf{U}}(\overline{T}_q)$
$\le q\,\times\,p/q$
=$p$.
I have no problems with the results - at least if we can assume that the uniform measure is apt to be used on $\mathbf{M}(\Omega)$. But the equations seems to be a little bit fishy. Let me explain what I mean, using the most simple setting possible: Let $\Omega = \{0,1\}$, a set with two elements, and let $T=1$ be our target. Then $\mathbf{M}(\Omega)$ can be represented by the interval $[0;1]$: for $x \in [0;1]$, $\mu_x = (1-x) \delta_0 + x \delta_1$ is the measure with $\mu_x(\{1\}) = x$. We can even introduce an associated search $S_x := S_(\mu_x)$, which is in fact just a single guess on $\Omega$ distributed according to $\mu_x$. Ergo $\mathbf{E}(S_x) = x$. Now we can perform an experiment in two steps:
  1. Choose a measure $\mu_x \in \mathbf{M}(\Omega)$ at random.
  2. Try to locate $T$ using $S_x$.
This experiment can be represented by choosing (X,Y) on $[0;1] \times [0;1]$ according to the uniform distribution on the unit square: We look up a number x, which represents our measure, then a number y: if $y \le x$, we have located our target using $S_x$, otherwise not.
The picture displays the situation and allows us two answer some questions easily:
  • What's the probability to locate our target using the process above? Well, it's $p = 1/2$, represented by the whole green area
  • For a fixed $q$, what is the probability to choose $\mu_x$ and find our target? That's zero (or nil): the red line symbolizes this event, which is a null-set.
Dembski, Ewert, and Marks (DEM) obviously don't want to have this, that's why they don't look at $\{\theta \in \mathbf{M}(Q) | \theta(T) = q \}$, but at $\overline{T}_q = \{\theta \in \mathbf{M}(Q) | \theta(T) \ge q \}$. (pp. 53-54)
  • What's the probability to choose a measure for which the associated guess finds the target with a probability of at least q, i.e., $\overline{\mathbf{U}}(\overline{T}_q) $? That would be (1-q), easily to be seen in our case, but much more difficult to calculate for more complicated arrangements. DEM give a upper limit for this probability of $p/q$.
  • What is the probability of finding our target when we have chosen a measure in $\overline{\mathbf{U}}(\overline{T}_q)$? That depends on the measure, but it is at least $q$. On average, it is $\frac{1+q}{2}$: we get this by examining the darker green area...
  • What is the probability of choosing a measure in $\overline{\mathbf{U}}(\overline{T}_q)$ and finding the target? This is given by the darker green area, ergo $(1-q)\frac{1+q}{2}=\frac{1-q^2}{2}$
Now, the darker green area will always be smaller than the whole green area, not only in this simple example, but for all others, too. Therefore the statement: $$\mathbf{P}\text{(locating }T\text{ via alt. search}) \le p$$ is absolutely (and trivially) correct, as $\mathbf{P}\text{(locating }T\text{ via alt. search})$ is the probability of choosing an element of $\overline{\mathbf{U}}(\overline{T}_q)$ and finding the target using that element. But there is a problem in the equality $$\mu(T) \times \overline{\mathbf{U}}(\overline{T}_q) \le q\,\times\,p/q$$ While $\overline{\mathbf{U}}(\overline{T}_q) \le p/q$ we find that $$\mu(T) \ge q:$$ A measure taken from $\overline{T}_q$ will result in a search which finds the target with a probability of at least $q$. Above, we have seen, that the probability is on average $\frac{1+q}{2} > q$. So we cannot say anything about the size of $\mu(T) \times \overline{\mathbf{U}}(\overline{T}_q)$!. The shaded area in the picture shows $q\,\times\,p/q$: it has nothing to do with the probabilities which one can see so neatly in the graphic, it just happens to have the right area of $p$...
BTW: I don't think that we can split $\mathbf{P}$(alt. search locates T & alt. search is available) neatly into a product used by DEM, a little integrating would be necessary...
So, I wouldn't give full marks for this exercise, but perhaps I'm wrong?

No comments:

Post a Comment