Hi All I just am going to expand on something I noticed in the last post; that is posterior predictive uncertainty. I will continue with the scheduled summarizing of linear modeling and generalized linear models as used in Business intelligence afterwards.

What caught my attention was that the predictive uncertainty of a new observation given the observations that I have so far. That is I observer data points from a Gaussian linear system with sensitivity matrix , model parameters , and residual variance , what is the distribution for new data points with sensitivity matrix ?

In a Bayesian framework, conceptually this can be solved by computing the integral of , where is the original data set and is the new data set. The reason this gives the answer is and the integral marginalizes the distribution to .

Figure 1 shows my actual derivation (although not the first attempt). Just as an aside buying old calendars and using their back is a very inexpensive way to have access to larger pieces of paper that are perfect for cumbersome derivations. I also just want to high light that I do derive everything I present here myself, mostly by hand. One of the main reasons that I went through this exercise is that the actual answer seems to be strangely missing from Bayesian Data analysis.

In real life I would just simulate from as it is trivial to draw random variables from a Gaussian distribution. However, because I like to pretend that I still know how to do some math for this post I have derived it analytically. The rest of this post is just going to go transcribe Fig. 1 into a more legible format.

Starting with , the posterior distribution of the data. This is proportional to which can be rewritten as using the completing the squares trick, where .

Similarly (a word that hides a wealth of mathematical sins) can also be written in the same format, i.e., where .

The product can be expressed as proportional to a Gaussian be defining a few terms. These are and . Thus , where and .

The Gaussian integral is known so which has the form of an inverse-gamma distribution. Thus the posterior predictive distribution .

The final form of the distribution is not clear to me. Largely this is because term which does not seem to be easily simplified. Also ironically, because of the gamma function the analytic answer is probably less efficient then just simulating the result. So this whole exercise may be a waste of time; that could explain why this is not presented any of the text books I used in my under grad. Regardless I hope it was at-least fun to follow along.

Tune in next time for more on linear models.