# Frailty part 1

One of the larger problems in modern statistics is that that telemetry (i.e., the data source) is chosen before analysis conducted.

[Yes it can be updated, yes a lot of good work goes into choices where events are recorded and how that data is recorded, and yes I really don’t care about the point you are trying to make right now… just read my article!]

The result of incomplete telemetry is that even if there are adequate model selection procedures (e.g., rjMCMC) there is still no guarantee that the data is available to accurately answer the parameter estimate question. This problem is has been discussed on this blog before; but that was in the context of GLM modelling.

This post explores the same problem within the context of survival analysis where for some bizarre accident of history dispersion is called frailty.

To be more precise, frailty is when two phone calls form the same population (i.e., have the same predictive statistics) have different true hang up time distributions. There could ether be unobservable sub-groups or a continuous spectrum.

The frailty model is $S_i^{\star}(t_i) = \int_{\text{range}} du_i S_i(t_i | u_i) f(u_i)$ where $u_i$ is a parameter of $S_i()$ the survival function for the ith observation. There is an obvious parallel to Bayes theorem here. If $f()$ is thought of as the prior distribution and $S_i()$ the likelihood then $S_i^{\star}()$ is the evidence.

Note that I don’t really know why the custom is to express frailty models in terms of survival function. For example $S_i^{\star}(t_i) = \int_{\text{range}} du_i S_i(t_i | u_i) f(u_i) = \int_{\text{range}} du_i [1 - G_i(t_i | u_i)] f(u_i)$ note that $G_i()$ is the cdf of the distribution of the ith observation. Thus $S_i^{\star}(t_i) = \int_{\text{range}} du_i [1 - \int_{\text{range}} du_i g_i(t_i | u_i)] f(u_i)$. Which implies that the process could have been started with the specification of the frail pdf of the ith observation $g_i^{\star} = \int_{\text{range}} du_i g_i(t_i | u_i) f(u_i)$.

An informative (simple-ish) example is the exponential-gamma frailty model. Consider a a population of phone calls with exponentially distributed hang-up times; i.e., $g_i(t_i|\lambda_i) = \lambda_i \exp(-\lambda_i t_i)$. The complication is that the $\lambda_i$ are distributed gamma($\alpha, \beta)$; i.e, $f(\lambda) = [ \beta^\alpha / \Gamma(\alpha)] \lambda^{\alpha -1} e^{- \beta \lambda }$

Thus to get the frailty model $g^{\star} = \int_{\text{range}} du \ g_i(t_i | u) f(u) = \int_0^{\inf} d\lambda \lambda \exp(-\lambda t_i) [ \beta^\alpha / \Gamma(\alpha)] \lambda^{\alpha -1} e^{- \beta \lambda }$. this integral can be evaluated to be $g^{\star}(t_i) = [ \beta^\alpha / \Gamma(\alpha)] \int_0^{\inf} d\lambda \lambda^{\alpha } e^{- [\beta+t_i] \lambda } = [ \beta^\alpha / \Gamma(\alpha)] [ \Gamma(\alpha + 1) / (\beta + t_i)^{\alpha +1} ] = \beta^\alpha \alpha/( \beta + t_i)^{\alpha +1})$

That is it for now. Just a word of warning you still want a prior for $\alpha$ and $\beta$ as they will not be constrained by the data. So don’t just MCMC sample from this thing yet!

As humans we know that Misfortune stalk us at every turn. But how much hazard are we  truly in?

This post answers that important question.

Recall that the hazard function [$h(t)$] is the ration of the event time pdf [$f(t)$] to the survival function [$S(t)$]; i.e., it the instantaneous probability of failure for an observation given that survived at least that long. Also recall that a Kaplan-Meier curve (KMC) is reasonable ways to estimate the survival function….

So what is the point of this post? Can’t the hazard function be gotten directly from the observed KMC? surprisingly no. The basic problem is that a KMC is not a smooth estimate of the $S(t)$ thus $f(t) = d/dt [1- S(t)]$ is not really defined.

The simplest way to get around this problem is to use the piece wise estimate $\hat{h}(t) = d_j / (r_j \Delta_j)$ where $t \in [t_{j-1}, t_{j})$, $r_j$ is the number of observations at risk in time interval $[t_{j-1}, t_{j})$,, $d_j$ is the number of “hanging ups” at time $t_j$, and $\Delta_j = t_{j} - t_{j-1}$.

In practice this method is not satisfying as its uncertainty [$\text{Var}(\hat{h}[t]) = \hat{h}(t)^2 (r_j - d_j)/(r_j d_j)$] tends to be large and estimates for adjacent segments tend to vary largely.

So here a Bayesian approach is proposed! Yes that is right I am finally moving in to a Bayesian survival analysis problem!

As noted before Bayesian methods tend to start with a likelihood function (yes ABC does not but it sort of does because it is based on defining an approximate likelihood function… do you feel smug for pointing that out? do you wake up in the night and hug yourself? No one cares keener!) . Here Order statistics are exploited to produce the likelihood function. The percentile of the $k$‘th of $n$ observation has a beta distribution because there are $k -1$ observations smaller and $n - k$ observations bigger. The percentiles can them be fit using a collections of weighted cdfs. The number of cdf and their parameters are found using rjMCMC. I know this is a fairly vague description (frankly it is a vague reminder) however I would like to get this post done and I beleive that if you have been reading the other posts this is not that far out of an idea.

To explore the idea I have made a simulation study. In the study a collection of observed exponential random variables  are used to create a estimated a hazard function. Truncated Gaussian distributions are used for the kernel of the rjMCMC inversion. In practice it would probably be better to use truncated generalized Gaussian distributions. However then the kernel could exactly fit the simulated data which is unlikely to be possible in practice. The choice of simulated exponential distributed data set is made for ease of interpretation as the exponential has a constant hazard function.  Figure 1 left shows simulated exponential random variables (circles) plotted vs their percentiles; Fig. 1 right gives a color histogram of the posterior hazard function. The posterior distribution is distinct from the true hazard function (the constant like $y = 1$) for much of the range zero to two however it is close. And it should be noted that the data set is constantly lower than the theoretical percentile for that range.

Figure 1: Left the simulated exponential random variables (circles) plotted vs their percentiles. The blue line is the theoretical values. Right is a color histogram of the posterior hazard function. The horizontal line is the hazard function of the simulation distribution; i.e., the “true value”.

That is it for this week. I hope this post is interesting As I had this idea in my head for a while before I could find time to write it.  The code for the simulated inversion is append.

Tune in next time for more Bayesian Business Intelligence! Same Bayesian Channel, Same Bayesian Time!

g.sim.data <-function(N = 1000, lambda = 1)
{
t  = rexp(N, rate = lambda )
t. = sort(t)

index = 1:N
cdf   = (1:N)/N

out = data.frame(t., index, cdf)
return(out)

}
g.sim.data(N= 100)

#  LL  j, P1, P2,  Mu1  Mu2, S1   S2
g.predict<-function(MOD=c(225.870633018 ,  3.000000000 ,  1.000000000 ,  0.408442587 ,  0.173129261 ,  0.006416128 ,  0.992600670 ,  1.333872850 ,  0.488384568 ,  1.279163793,   0.261276892 ), X = g.sim.data(N=100), plot.line=0 )
{
# upack the pars

j      = MOD[2]
z      = MOD[3:(j+2)]
mu     = MOD[(j+3):(2*j + 2)]
sigma  = MOD[(2*j + 3):(3*j + 2)]

#print(z)

# get the probs
z.star = c(z, 0)
prob = 1:j
for (i in 1:j)
{
prob[i] = z.star[i]-z.star[i+1]
}

if (min(prob) < 0 ) {return(-Inf) } #print(prob) t.     = X$t. n = nrow(X) k = 1:n alpha = k beta = n - k + 1 #print("-----------") #print(prob[1]) CDF.hat = prob[1]*(pnorm(t., mean = mu[1], sd=sigma[1]) - pnorm(0,mean = mu[1], sd=sigma[1]))/( 1- pnorm(0,mean = mu[1], sd=sigma[1]) ) if(j >= 2) { for ( i in 2:(j)) { #print("-----------") #print(prob[i]) temp = prob[i]*(pnorm(t., mean = mu[i], sd=sigma[i]) - pnorm(0,mean = mu[i], sd=sigma[i]))/( 1- pnorm(0,mean = mu[i], sd=sigma[i]) ) CDF.hat = CDF.hat + temp } } LL = dbeta(CDF.hat, shape1 = alpha, shape2 = beta, ncp = 0, log = TRUE) LL = sum(LL, na.rm=T) #print(CDF.hat) if (plot.line == 1 ) { plot(t., X$cdf)
lines(t., CDF.hat)
}

return(LL)
}
g.predict(plot.line=1)

g.perterb<-function(M=c(-Inf, 3, 0,  0.6, 0.2,   0, 0.5,1,   1, 0.5, 0.1 ), LB=c(0, -10, 0), UB = c(1, 10, 10), Qsd = c(0.01, 0.01, 0.01)  , data=g.sim.data(N=100)) { # unpacking hte parameters LL     = M[1] j      = M[2] z      = M[3:(j+2)] mu     = M[(j+3):(2*j + 2)] sigma  = M[(2*j + 3):(3*j + 2)]    #print(M)  # make the proposal model   z.prime      = z mu.prime     = mu sigma.prime  = sigma   # make the proposal model index  = sample(1:j, 1 ) if(index >1) {z.prime[index]     = z.prime[index]     + rnorm(1, mean = 0, sd= Qsd[1])} # add random noise to old model
mu.prime[index]                  = mu.prime[index]    + rnorm(1, mean = 0, sd= Qsd[2])
sigma.prime[index]               = sigma.prime[index] + rnorm(1, mean = 0, sd= Qsd[3])

SI = sort(z, index.return=T, decreasing = T)$ix z.prime = z.prime[SI] mu.prime = mu.prime[SI] sigma.prime = sigma.prime[SI] M.prime =c(LL, j, z.prime, mu.prime, sigma.prime) #print(M.prime) #print(z.prime) #print(mu.prime) #print(sigma.prime) #print(z.prime[index]) #print(mu.prime[index]) #print(sigma.prime[index]) if ((z.prime[index] >= LB[1]) & (z.prime[index] <= UB[1]) & # check the bounds (mu.prime[index] >= LB[2]) & (mu.prime[index] <= UB[2]) & (sigma.prime[index] >= LB[3]) & (sigma.prime[index] <= UB[3]) #&(sum(prob.prime ) <= 1) ) { LL.prime = g.predict(M.prime, X = data ) # compute loglikihood M.prime[1] = LL.prime # save LL r = runif(1) # random uniform MH = exp(LL.prime - LL) # Metropolis-hasting acceptance probability value if (r <= MH) { M = M.prime } } return(M) } g.birth<-function(M=c(-Inf, 3, 0, 0.6, 0.2, 0, 0.5,1, 1, 0.5, 0.1 ), LB=c(0, -10, 0), UB = c(1, 10, 10), Qsd = c(0.01, 0.01, 0.01) , data=g.sim.data(N=100)) { # unpacking hte parameters LL = M[1] j = M[2] z = M[3:(j+2)] mu = M[(j+3):(2*j + 2)] sigma = M[(2*j + 3):(3*j + 2)] #print(M) #print(z) if (j <= 14 ) { # make the proposal model LL.prime = LL j.prime = j + 1 z.prime = c(z , runif(1, min = LB[1], max = UB[1]) ) mu.prime = c(mu, runif(1, min = LB[2], max = UB[2]) ) sigma.prime = c(sigma , runif(1, min = LB[3], max = UB[3]) ) SI = sort(z.prime, index.return=T, decreasing = T)$ix       # resort the mods in prob
z.prime     = z.prime[SI]
mu.prime    = mu.prime[SI]
sigma.prime = sigma.prime[SI]

M.prime      = c(LL.prime, j.prime, z.prime, mu.prime, sigma.prime)
#print("------ gavin this line ------------")
#print(M.prime)
LL.prime     = g.predict(M.prime, X = data)                  # get predicted values
M.prime[1]   = LL.prime

r = runif(1)                                          # random uniform
MH = exp(LL.prime - LL)                               # Metropolis-hasting acceptance probability value note that the prior and proposal cancel
if (r <= MH) {M = M.prime}                            # if accepted
}
return(M)

}

g.death<-function(M=c(-Inf, 3, 0,  0.6, 0.2,   0, 0.5,1,   1, 0.5, 0.1 ), LB=c(0, -10, 0), UB = c(1, 10, 10), Qsd = c(0.01, 0.01, 0.01)  , data=g.sim.data(N=100)) { # unpacking hte parameters LL     = M[1] j      = M[2] z      = M[3:(j+2)] mu     = M[(j+3):(2*j + 2)] sigma  = M[(2*j + 3):(3*j + 2)] #print(M) if (j >= 3 )
{
# make the proposal model
index        = sample(2:j, 1)
LL.prime     = LL
j.prime      = j - 1
z.prime      = z[-index]
mu.prime     = mu[-index]
sigma.prime  = sigma[-index]
M.prime      = c(LL.prime, j.prime, z.prime, mu.prime, sigma.prime)
LL.prime     = g.predict(M.prime, X = data)                  # get predicted values
M.prime[1]   = LL.prime
r = runif(1)                                          # random uniform
MH = exp(LL.prime - LL)                               # Metropolis-hasting acceptance probability value note that the prior and proposal cancel
if (r <= MH) {M = M.prime}                            # if accepted
}
return(M)

}

g.explore<-function(old, d)
{
Qsd. = c(0.005, 0.005, 0.005)
LB.  = c(0, -10, 0)
UB.  = c(1, 10, 10)

move.type = sample(1:3, 1) # the type of move i.e., perterb, birth, death
#print("------ this line")
#print(move.type)

if (move.type == 1) {old = g.perterb(M=old, Qsd =Qsd., LB = LB., UB=UB., data= d)}
if (move.type == 2) {old = g.birth(M=old,  Qsd =Qsd., LB = LB., UB=UB., data = d)}
if (move.type == 3) {old = g.death(M=old, Qsd =Qsd., LB = LB., UB=UB., data = d )}

return(old)
}

g.rjMCMC<-function( Nsamp = 20000, BURN = 2000)
{

data = g.sim.data(N= 100, lambda = 1)

#x11(height=4, width = 5)
par(mfrow= c(1,2))
plot(data$t., data$cdf, xlab = "time", main = "Simulated Data", ylab= "Cumlative Probability", xlim=c(0, 5))
t.f = seq(0, 7, length.out = 1001)
lines(t.f,1-exp(-t.f)  , col="blue", lwd=3 )
points(data$t., data$cdf)

Mod.old  =c(-Inf, 3, 1,  0.6, 0.2,   0, 0.5,1,   1, 0.5, 0.1 )

for(i in 1:BURN) # the burn in
{
Mod.old = g.explore(old = Mod.old, d = data)
}
print(Mod.old)

REC = list()
for(i in 1:(Nsamp-1)) # the burn in
{
Mod.old = g.explore(old = Mod.old, d = data)
REC[[i]] = Mod.old
rownames(REC) = NULL
}

#print(table(REC[,2]))
#best = which.max(REC[,1])
#g.predict(MOD=REC[best,], X = data , plot.line=1)
#lines(data$t.,1-exp(-data$t.)  , col="blue", lwd=3 )
return(REC)

}

g.Haz<-function(N = 19999, REC= samps) { hist.mat = matrix(0, nrow= N, ncol = 101) t. = seq(0, 7, length.out = 101)      for (bi in 1:N)     {         #print(bi)     MOD = REC[[bi]]     # upack the pars          j      = MOD[2]     z      = MOD[3:(j+2)]     mu     = MOD[(j+3):(2*j + 2)]     sigma  = MOD[(2*j + 3):(3*j + 2)]               #print(z)          # get the probs      z.star = c(z, 0)     prob = 1:j     for (i in 1:j)         {         prob[i] = z.star[i]-z.star[i+1]         }          #print(prob)          PDF.hat  =  prob[1]*(dnorm(t., mean = mu[1], sd=sigma[1]))/( 1- pnorm(0,mean = mu[1], sd=sigma[1])  )          if(j >= 2)
{
for ( i in 2:(j))
{
temp =  prob[i]*(dnorm(t., mean = mu[i], sd=sigma[i]))/( 1- pnorm(0,mean = mu[i], sd=sigma[i])  )
PDF.hat  = PDF.hat + temp
}
}
if (min(PDF.hat) < 0 ) {print("problem pdf")}     #print("-----------")     #print(prob[1])     CDF.hat  =  prob[1]*(pnorm(t., mean = mu[1], sd=sigma[1]) -   pnorm(0,mean = mu[1], sd=sigma[1]))/( 1- pnorm(0,mean = mu[1], sd=sigma[1])  )          if(j >= 2)
{
for ( i in 2:(j))
{
temp =  prob[i]*(pnorm(t., mean = mu[i], sd=sigma[i]) - pnorm(0,mean = mu[i], sd=sigma[i]))/( 1- pnorm(0,mean = mu[i], sd=sigma[i])  )
CDF.hat  = CDF.hat + temp
}
}
if (max(CDF.hat) > 1 ) {print("problem cdf too big")}
if (max(CDF.hat) > 1 ) {print(prob)}

if (max(CDF.hat) > 1 ) {print(CDF.hat)}
if (min(CDF.hat) < 0 ) {print("problem cdf")}

S.hat = 1 - CDF.hat
if (min(S.hat) < 0 ) {print("problem s")}

h.hat = PDF.hat/S.hat

hist.mat[bi,] =  h.hat

}

plot(t., t., type= "n", xlab = "time", ylab = "hazard function", ylim=c(0,4 ), xlim= c(0, 5))

ncolor = 21
layers = matrix(0, nrow=ncolor, ncol=101)
for (i in 1:101)
{
q = quantile(hist.mat[,i], probs=seq(0, 1, length.out=ncolor))
layers[,i] = q
}
gcol = c("white", rev(heat.colors((ncolor-1)/2) ))
#       1            2           34        4  3 21

for (i in 1:((ncolor-1)/2))
{
PY = c(layers[i,], rev(layers[ncolor+1-i,]))
Px   = c(t., rev(t.))
polygon(Px, PY, col=gcol[i], border=F)
}
lines(t., layers[((ncolor-1)/2)+1,], col=gcol[((ncolor-1)/2)+1], lwd=2)

abline(h = 1)

#print(layers)

}

samps = g.rjMCMC()
g.Haz()