The Relationship Between Occam's Razor and Convergent Guessing
David H. Wolpert
Theoretical Division and Center for Nonlinear Studies,
Los Alamos National Laboratory, Los Alamos, NM 87545, USA
Abstract
Occam's razor, the principle of parsimony, is a tool that finds application in many areas of science. Its validity has never been proven in full generality from first principles. Convergent guessing is the property that as more examples of an input--output mapping are provided, one would expect the modelling of that mapping to become more and more accurate. It too is widely used and has not been proven from first principles. In this paper it is shown that Occam's razor and convergent guessing are not independent---if convergent guessing holds, so does Occam's razor. (The converse of this statement is also true, providing some extra conditions are met.) Therefore, if you have reason to believe that your guesses are getting more accurate as you are fed more data, you also have reason to believe that application of Occam's razor will likely result in better guesses. Rather than attributes concerning how an architecture works (e.g., its coding length, or its number of free parameters), this paper is concerned exclusively with how the architecture guesses (which is, after all, what we're really interested in). In this context Occam's razor means that one should guess according to the "simplicity'' of an architecture's guessing behavior (as opposed to according to the simplicity of how the architecture works). This paper deduces an optimal measure of the "simplicity'' of an architecture's guessing behavior. Given this optimal simplicity measure, this paper then establishes the aforementioned relationship between Occam's razor and convergent guessing. This paper goes on to elucidate the many other advantages, both practical and theoretical, of using the optimal simplicity measure. Finally, this paper ends by exploring the ramifications of this analysis for the question of how best to measure the "complexity'' of a system.