# Frailty part 1

One of the larger problems in modern statistics is that that telemetry (i.e., the data source) is chosen before analysis conducted.

[Yes it can be updated, yes a lot of good work goes into choices where events are recorded and how that data is recorded, and yes I really don’t care about the point you are trying to make right now… just read my article!]

The result of incomplete telemetry is that even if there are adequate model selection procedures (e.g., rjMCMC) there is still no guarantee that the data is available to accurately answer the parameter estimate question. This problem is has been discussed on this blog before; but that was in the context of GLM modelling.

This post explores the same problem within the context of survival analysis where for some bizarre accident of history dispersion is called frailty.

To be more precise, frailty is when two phone calls form the same population (i.e., have the same predictive statistics) have different true hang up time distributions. There could ether be unobservable sub-groups or a continuous spectrum.

The frailty model is $S_i^{\star}(t_i) = \int_{\text{range}} du_i S_i(t_i | u_i) f(u_i)$ where $u_i$ is a parameter of $S_i()$ the survival function for the ith observation. There is an obvious parallel to Bayes theorem here. If $f()$ is thought of as the prior distribution and $S_i()$ the likelihood then $S_i^{\star}()$ is the evidence.

Note that I don’t really know why the custom is to express frailty models in terms of survival function. For example $S_i^{\star}(t_i) = \int_{\text{range}} du_i S_i(t_i | u_i) f(u_i) = \int_{\text{range}} du_i [1 - G_i(t_i | u_i)] f(u_i)$ note that $G_i()$ is the cdf of the distribution of the ith observation. Thus $S_i^{\star}(t_i) = \int_{\text{range}} du_i [1 - \int_{\text{range}} du_i g_i(t_i | u_i)] f(u_i)$. Which implies that the process could have been started with the specification of the frail pdf of the ith observation $g_i^{\star} = \int_{\text{range}} du_i g_i(t_i | u_i) f(u_i)$.

An informative (simple-ish) example is the exponential-gamma frailty model. Consider a a population of phone calls with exponentially distributed hang-up times; i.e., $g_i(t_i|\lambda_i) = \lambda_i \exp(-\lambda_i t_i)$. The complication is that the $\lambda_i$ are distributed gamma( $\alpha, \beta)$; i.e, $f(\lambda) = [ \beta^\alpha / \Gamma(\alpha)] \lambda^{\alpha -1} e^{- \beta \lambda }$

Thus to get the frailty model $g^{\star} = \int_{\text{range}} du \ g_i(t_i | u) f(u) = \int_0^{\inf} d\lambda \lambda \exp(-\lambda t_i) [ \beta^\alpha / \Gamma(\alpha)] \lambda^{\alpha -1} e^{- \beta \lambda }$. this integral can be evaluated to be $g^{\star}(t_i) = [ \beta^\alpha / \Gamma(\alpha)] \int_0^{\inf} d\lambda \lambda^{\alpha } e^{- [\beta+t_i] \lambda } = [ \beta^\alpha / \Gamma(\alpha)] [ \Gamma(\alpha + 1) / (\beta + t_i)^{\alpha +1} ] = \beta^\alpha \alpha/( \beta + t_i)^{\alpha +1})$

That is it for now. Just a word of warning you still want a prior for $\alpha$ and $\beta$ as they will not be constrained by the data. So don’t just MCMC sample from this thing yet!