Friday, October 2, 2009

Which parameters did Dawkins use for his weasel?

If R. Dawkins used a weasel algorithm as understood by virtually everyone who read his book (and tried to program it), he has two parameters to play with:
  1. the size of the population S and
  2. the probability µ to change a letter
So, which values did he use? That, I can't answer. But I can say which values he should have used:

Obviously, the greater the size of the population for a fixed µ, the smaller the number of generations it takes to find the target. On the other hand, for a fixed population size, a similar rule for µ can't be given - one probably wants to keep the parent in the next generation, so µ has to be quite small for small populations, and can be bigger for great ones.

Here, the coloured lines represent the expected values of the number of generations for different sizes of population as a function of µ. At each curve, I added the minimal expected number - and I joined these minima by a black line. So, a population size of 200 results at best in 49 generation on average - and this minimum is reached for µ ~ 0.044.

But if you want to have a fast algorithm, you'd look at the number of queries, not the number of generations, i.e., the total number of children, or how often the fitness function has to be calculated:

(Yep, I like this pic :-) Here, you can see that for getting the best results, the size of the population is surprisingly small: 20 individuals. But though it takes over 325 generations on average to reach the target, the fitness value for only ~6514 individuals has to be calculated - that is, if you chose µ ~ 0.041.
Another interesting effect: for population sizes from 20 to 200, the optimal µ doesn't change very much, it's just a little bit over 0.04.

Of course, no one does these calculations before running the program: you write your weasel and play with it. But the natural behaviour of this algorithm will lead you to population sizes from 20 to 200, and a µ of 0.04 - 0.05.

8 comments:

  1. I assume you got the plots by way of Markov chain analysis. Does "mutation" always change the character? If not, the effective mutation rate is µ * 26 / 27.

    You mentioned on my blog that you have been working in R. Would you be willing to share your code with us?

    ReplyDelete
  2. 1. Yes, I modeled a Markov chain
    2. Yes, indeed, the effective mutation rate is µ*26/27. I'm aware of this inconsistency: when I talked about Dembski's algorithm called Random Mutation, I used the effective rate, while elsewhere, I allowed for a change being not a change at all :-)
    3. Though my code is not elegant, I'll share it, of course. I'm just looking for a place to do so.

    ReplyDelete
  3. DiEb,

    Indeed, who wants to do anything to beat the dead Weasel but hack out the code. I was too lazy to do anything like generate plots. Yours are fantastic. You really do have a gift for visualization. (I've looked at your Wikipedia versus Conservapedia graphics.)

    If you send the R code to the email address in my profile, I'll gladly upload it to my website and provide you with links. They'll be good for at least two years.

    ReplyDelete
  4. By the way, the last two analyses in the IEEE SMC paper of Dembski and Marks are the (1,2)-EA and (1+1)-EA [evolutionary algorithm] solving the ONEMAX problem. I know for a fact that Marks knew the established names for the algorithms, and that he knew that are many analyses of them in the literature. Yet the article cites nothing. With five minutes of googling, I found a paper from 8 years ago that gives an exact analysis that allows for the target to drift due to mutation: Stefan Droste, "On the Analysis of the (1+1) EA for a Dynamically Changing ONEMAX." Obviously a fixed target is a special case.

    ReplyDelete
  5. Six and one-half years later, I declare this the most useful post ever on the Analysis of Id.

    ReplyDelete
  6. Thanks, I think... How comes?

    ReplyDelete
    Replies
    1. 1. A key feature of Iddish deceit is to fob off Dawkins's monkey/Shakespeare model of cumulative selection as a model of natural selection. (In hindsight, it seems that we've inadvertently assisted the Iddites by referring to the model as the Weasel program.) Dembski goes to the extreme, in his latest book, of presenting the model in a chapter titled "Natural Selection." You don't have to do fancy Web scraping to confirm that the Iddites and the Udders rarely enter into Biomorph Land.

      2. Joe Felsenstein recently characterized the monkey/Shakespeare model of cumulative selection as a simulation of the Wright-Fisher model in a special case (one survivor per generation, selection coefficient of infinity). He provided some theory for finite selection coefficients. I wrote a direct simulation of the model that Joe described, and tested not only against Joe's predictions, but also against yours (with the selection coefficient set very large to approximate infinity).

      3. I'm in the process of responding to an Id-friendly conference paper, "Parameter Dependence in Cumulative Selection," published by University of Ulster lecturer David Glass (once an active Udder, I believe) in the proceedings of a computing/engineering conference. I've tested my implementation of his genetic algorithm by (a) making the fitness function $(1+s)^n$, where $n$ is the number of loci in which the string matches the target string; (b) setting the selection coefficient $1+s$ very large, so that fitness-proportionate selection yields, with high probability, only the fittest of competing strings; (c) disabling recombination; and (d) comparing results for various parameter settings to your plots. This exercise in validation of my code in fact leads me to an interesting question I hadn't previously considered, namely, how does the mean value of the first hitting time change when recombination is added to a mutation-only algorithm.

      In sum, your post continues to be very valuable as a response to the Id Movement, because the Iddites continue to make ridiculously much of the monkey/Shakespeare model of cumulative selection. I've also found some nonobvious applications of it, and I thank you for your assistance.

      Delete
  7. At first, I thought that your were commenting on the general state of ID "research" - and the counterarguments: somehow, nothing much happened over the last six years, we are still discussing the weasel...

    I haven't read Joe Felsensteins's article at The Skeptical Zone, I'll do so.

    But I've read Glass's paper (haven't we talked about him once?) I was not impressed, it seemed to be quite superficial.

    I thought about modelling his approach as a Markoff-chain, but I was not sure how unrealistic it is to use just 29 states, as this assumes that the positioning of the right letters in the two parents are independent, which is obviously not true.

    ReplyDelete