A Sleepless Night and Funny Distribution

Hi All

Last night I was woke by Mr Darcy’s new habit of whistling while he sleeps. Being woken I could not fall back asleep so I I started thinking about about uniform distributions. And somehow, I started thinking about iteratively truncating a line segment. To be clear I can not imagine a purpose for this work.

The process is to start with a segment [0,1], then draw a random number x_0 uniformly from [0,1] and replace the line segment with [0,x_0]. The line segment is then further truncated by drawing a new uniform random uniform number (x_1) over the remaining range. The procedure is then repeated forever! The first question I had was what is the distribution of the n‘th number pulled in this way. The second question given a number or set of number what is the most likely generation/ iteration it came from.

The first question first question I could solve in bed, the second one was a bitter harder for me. To get the distribution consider the relation ships between joint, conditional and marginal distributions; i.e., f_{a,b}(a,b) = f_{a|b}(a)*f_b (b) and f_a (a) = \int db f_{a,b}(a,b).  Thus combining the two equations f_a (a) = \int db f_{a|b}(a)*f_b (b). Here x_0 \sim 1 = f_0 (x_0) and x_1|x_0 \sim 1/x_0 = f_{1|0}(x_1).  Thus marginal distribution can be calculated as f_1(x_1) = \int_0^1 dx_0 1/x_0 .

Now at this part my brain rebelled.

After far too long I realized that the problem is actually simple, I had incorrectly stated the conditional distribution. Give your self full marks if you already noticed that! The corrected distribution has an indicator function x_1|x_0 \sim 1/x_0  I[x_0>x_1] = f_{1|0}(x_1).  Thus marginal distribution can be calculated as f_1(x_1) = \int_0^1 dx_0  I[x_0>x_1]/x_0 =  \int_{x_1}^1 dx_0 1/x_0 = -\log(x_1).

So we are done right? Nope, Mr Darcy still had noises to make; besides what happens after more generations of the process? The conditional distribution f_{2|1}(x_2) =  1/x_1  I[x_1>x_2] so the integration process can be repeated.  f_2(x_2) = \int_0^1 dx_1  -log(x_1)*I[x_1>x_2]/x_1 =  -\int_{x_2}^1 dx_0 \log(x_1)/x_0 = -[\log(x_2)]^2/2 Then having spent some time feeling satisfied with my self having been able to remember this calculus identity, I realized that the process is indeed repeatable because \int da \log^n (a)/a  = [\log(a)]^{n+1}/(n+1). The basic point is that the [\log(x_{n+1})]^{n+1} will be multiplied by a 1/x_{n+1} and thus return to the required \log(x_{n+1})]^{n+1}/[x_{n+1}] form. The n+1 in the denominators will turn into a 1/n! term. That seemed intuitive to my sleep deprived brain because the region of integration is a simplex and has volume = 1/n!.

So the distribution for the n’th generation according to 3 AM me is $latex  | \log(x_n) |^n/(n!) $ .

Now to test this because believe it or not have had ideas at 3 AM which proved to be incorrect. Figure 1 shows the a histogram is simulated values along with the theoretical values. So it turns out I was right!

screen-shot-2016-12-29-at-4-30-34-pm

Figure 1: black a histogram of simulated x_n values and blue the theoretical distribution.

The second question which value of n maximizes |log(x_n)|^n/(n!) I don’t have a clear answer. But consulting the internet it turns out that this problem is actually hard.

So that it for now. Tune in next time for a special linear molding addition.