# A Sleepless Night and Funny Distribution

Hi All

Last night I was woke by Mr Darcy’s new habit of whistling while he sleeps. Being woken I could not fall back asleep so I I started thinking about about uniform distributions. And somehow, I started thinking about iteratively truncating a line segment. To be clear I can not imagine a purpose for this work.

The process is to start with a segment [0,1], then draw a random number $x_0$ uniformly from [0,1] and replace the line segment with [0,$x_0$]. The line segment is then further truncated by drawing a new uniform random uniform number ($x_1$) over the remaining range. The procedure is then repeated forever! The first question I had was what is the distribution of the $n$‘th number pulled in this way. The second question given a number or set of number what is the most likely generation/ iteration it came from.

The first question first question I could solve in bed, the second one was a bitter harder for me. To get the distribution consider the relation ships between joint, conditional and marginal distributions; i.e., $f_{a,b}(a,b) = f_{a|b}(a)*f_b (b)$ and $f_a (a) = \int db f_{a,b}(a,b)$.  Thus combining the two equations $f_a (a) = \int db f_{a|b}(a)*f_b (b)$. Here $x_0 \sim 1 = f_0 (x_0)$ and $x_1|x_0 \sim 1/x_0 = f_{1|0}(x_1)$.  Thus marginal distribution can be calculated as $f_1(x_1) = \int_0^1 dx_0 1/x_0$.

Now at this part my brain rebelled.

After far too long I realized that the problem is actually simple, I had incorrectly stated the conditional distribution. Give your self full marks if you already noticed that! The corrected distribution has an indicator function $x_1|x_0 \sim 1/x_0 I[x_0>x_1] = f_{1|0}(x_1)$.  Thus marginal distribution can be calculated as $f_1(x_1) = \int_0^1 dx_0 I[x_0>x_1]/x_0 = \int_{x_1}^1 dx_0 1/x_0 = -\log(x_1)$.

So we are done right? Nope, Mr Darcy still had noises to make; besides what happens after more generations of the process? The conditional distribution $f_{2|1}(x_2) = 1/x_1 I[x_1>x_2]$ so the integration process can be repeated.  $f_2(x_2) = \int_0^1 dx_1 -log(x_1)*I[x_1>x_2]/x_1 = -\int_{x_2}^1 dx_0 \log(x_1)/x_0 = -[\log(x_2)]^2/2$ Then having spent some time feeling satisfied with my self having been able to remember this calculus identity, I realized that the process is indeed repeatable because $\int da \log^n (a)/a = [\log(a)]^{n+1}/(n+1)$. The basic point is that the $[\log(x_{n+1})]^{n+1}$ will be multiplied by a $1/x_{n+1}$ and thus return to the required $\log(x_{n+1})]^{n+1}/[x_{n+1}]$ form. The $n+1$ in the denominators will turn into a $1/n!$ term. That seemed intuitive to my sleep deprived brain because the region of integration is a simplex and has volume $= 1/n!$.

So the distribution for the n’th generation according to 3 AM me is $latex | \log(x_n) |^n/(n!)$ .

Now to test this because believe it or not have had ideas at 3 AM which proved to be incorrect. Figure 1 shows the a histogram is simulated values along with the theoretical values. So it turns out I was right!

Figure 1: black a histogram of simulated x_n values and blue the theoretical distribution.

The second question which value of $n$ maximizes $|log(x_n)|^n/(n!)$ I don’t have a clear answer. But consulting the internet it turns out that this problem is actually hard.

So that it for now. Tune in next time for a special linear molding addition.