Rotating to the maximum correlated space

Hi All

This week an extension to principle component analysis is described. This has nothing to do with Bayesian Statistics in anyway. The idea came to me a few months ago and I have found it quite useful since then.

Principle components analysis is a way of rotating a cloud of data to its eigen vectors in order of the size of their corresponding eigen values. In general this a good way to visualize high dimensional data as the maximum data variance is kept at the selected level of dimension reduction.

My extension is to rotate the data to the dimension that maximize correlation with a known ranking vector (R).  For example I am almost always interested in how much money users have spent in my Games. So instead of rotating the cloud of game data to the vectors that have maximum variance I would like to rotate the data to the set of orthogonal vectors that have maximum correlation with each user’s spend.

Mathematically this quite simple instead of “arg-max-ing({\bf w}^t X^t X {\bf w}) / ({\bf w}^t {\bf w})   for {\bf w} the arg-max of  ({\bf w}^t [X^t R] [X^t R]^t {\bf w}) / ({\bf w}^t {\bf w})   is found. That is the X^t X is center multiplied by R R^t .

The noted racist and tobacco company flunky showed that the eigen vectors of the center matrix (e.g., X^tX ) are the solution set for problems of this class.

That’s it, tune in next time for Bayesian Neural Networks!

P.S. I have not appened any code because this is simply an application of the “eigen” function in r (i.e. “eigen(t(X)%*%X)” gives you basically the whole thing!)