Frailty part 1

One of the larger problems in modern statistics is that that telemetry (i.e., the data source) is chosen before analysis conducted.

[Yes it can be updated, yes a lot of good work goes into choices where events are recorded and how that data is recorded, and yes I really don’t care about the point you are trying to make right now… just read my article!]

The result of incomplete telemetry is that even if there are adequate model selection procedures (e.g., rjMCMC) there is still no guarantee that the data is available to accurately answer the parameter estimate question. This problem is has been discussed on this blog before; but that was in the context of GLM modelling.

This post explores the same problem within the context of survival analysis where for some bizarre accident of history dispersion is called frailty.

To be more precise, frailty is when two phone calls form the same population (i.e., have the same predictive statistics) have different true hang up time distributions. There could ether be unobservable sub-groups or a continuous spectrum.

The frailty model is S_i^{\star}(t_i) = \int_{\text{range}} du_i S_i(t_i | u_i) f(u_i) where u_i is a parameter of S_i() the survival function for the ith observation. There is an obvious parallel to Bayes theorem here. If f() is thought of as the prior distribution and S_i() the likelihood then S_i^{\star}() is the evidence.

Note that I don’t really know why the custom is to express frailty models in terms of survival function. For example S_i^{\star}(t_i) = \int_{\text{range}} du_i S_i(t_i | u_i) f(u_i) = \int_{\text{range}} du_i [1 - G_i(t_i | u_i)] f(u_i) note that G_i() is the cdf of the distribution of the ith observation. Thus S_i^{\star}(t_i) = \int_{\text{range}} du_i [1 - \int_{\text{range}} du_i g_i(t_i | u_i)] f(u_i). Which implies that the process could have been started with the specification of the frail pdf of the ith observation g_i^{\star} = \int_{\text{range}} du_i g_i(t_i | u_i) f(u_i).

An informative (simple-ish) example is the exponential-gamma frailty model. Consider a a population of phone calls with exponentially distributed hang-up times; i.e., g_i(t_i|\lambda_i) = \lambda_i \exp(-\lambda_i t_i). The complication is that the \lambda_i are distributed gamma(\alpha, \beta); i.e, f(\lambda) = [ \beta^\alpha / \Gamma(\alpha)] \lambda^{\alpha -1} e^{- \beta \lambda }

Thus to get the frailty model g^{\star} = \int_{\text{range}} du \ g_i(t_i | u) f(u) = \int_0^{\inf} d\lambda  \lambda \exp(-\lambda t_i) [ \beta^\alpha / \Gamma(\alpha)] \lambda^{\alpha -1} e^{- \beta \lambda }. this integral can be evaluated to be g^{\star}(t_i) =  [ \beta^\alpha / \Gamma(\alpha)] \int_0^{\inf} d\lambda \lambda^{\alpha } e^{- [\beta+t_i] \lambda } = [ \beta^\alpha / \Gamma(\alpha)]  [ \Gamma(\alpha + 1) / (\beta + t_i)^{\alpha +1} ] = \beta^\alpha  \alpha/( \beta + t_i)^{\alpha +1})

That is it for now. Just a word of warning you still want a prior for \alpha and \beta as they will not be constrained by the data. So don’t just MCMC sample from this thing yet!


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s