Estimating Passive Churn or Life time Conversion.

Hi All

This week I have been thinking about an idea that seems have enormous potential that I have not yet been able to make work with real data. For churn analysis the basic question is what percentage of uses will never return. For conversion (making an in app purchase) the question is what percentage will ever make a purchase. In both cases it is common to create a window of time say 7 or 14 days and then ask the simpler questions of “what percentage of user will not return in 7 days?” or “what percentage of users will have converted by 14 days after install?”, then to model this with a probit or logit regression. On this blog I have been advocating for the use of survival analysis to instead estimate the time until return or the time from install until conversion. While both method can provide useful information they both fail to answer the question of “given forever what percentage will have x”. To address this I have been considering what happens with cumulative survival functions S(t) that do not have the property that \lim_{t\to\infty} S(t) = 0. For example in the time to convention case the \lim_{t\to\infty} S(t) = 1-P where P is the probability that user will convert given forever. Or in the time to return example P is the probability that a user will ever return.

Before getting into the challenges with this idea First I want to talk about what has been working. That is I have been able to derive a parametric likelihood function for live time conversion or passive churn data. To get the likelihood [L(t| {\bf m})] of an observation t_i with a survival function that does not asymptote to zero there are 3 steps. The first is to define the survival function, the second is to use the survival function to derive a hazard function [h(t)], and the third is to “build” the likelihood from the survival function, hazard function, and censoring information (d_i).

Step 1:
The if \hat{S}(t) is a conventional survival function (i.e., asymptotes to zero) then S(t) = P\hat{S}(t) + (1-P) is a survival function that  asymptotes to (1-P). here by survival function I mean that positive, real, and monotonically decreasing. If we want to think about the underlying distribution of S(t) is? It has a point mass of probability (1-P) at \infty other wise it is the scaled distribution [\hat{f}(t)] that underlies \hat{S}(t).

Step 2:
To derive the hazard function recall that h(t) = -S'(t)/S(t). Thus  h(t) = P \hat{f}(t)/ [P\hat{S}(t) + (1-P)].

Step 3:
To get the likelihood for observed and right censored data again recall (with all these recollections is this blog really needed?) that the probability of an observed event is f(t) = h(t) S(t) and the probability of a right censored event is S(t). Thus if d_i is 1 if and only if the either observation has an observed event time and is zero other wise then the likelihood can be written as  L({\bf t }, {\bf d} | {\bf m}) = h(t_i)^{d_i} S(t_i).

What has not worked very well so far is picking a parametric distribution for \hat{f}(t). I have tried the usual suspects-the exponential, Weibull, and gamma distributions- however those have not adequately described my data. To estimate the model parameters ({\bf m}, i know it is sloppy to only define these now ) I have used MCMC. It may be the case that I need to find a better way of describing the prior distribution of {\bf m} (currently i am using bounded uniforms) and that is disrupting my estimates though in that case I am surpassed that the MLE point estimate is so bad. In any event stay tuned as I improve this process.



A short note on risk adjusted Survival functions

Hi all

First no this post does not mean that I have finished writing my textbook. In fact truth be told I have made very little progress since my last post. I choose to highlight that I have given four talks on data science in that time so 1-1 tie?

Secondly I want to bring up that my former colleague Alexandre Macmillan has also started a blog on business analytics. Alex is less mathy than me, however he is extremely insightful when considering user behavior; if he applies to your company just hire him.

Now onto risk adjusted survival functions…

As noted before survival analysis is great for modeling both churn and conversion. Also Cox proportional hazards (Cox PH) is a great way to do survival analysis. However, Cox PH models the relative hazards of members of a population and the common way to plot (and think about?) survival data is the empirical survival function. This post addresses how to integrate these two methods.

Before continuing I should not that r does this auto-magically with the “survfit” function:

# max some simulated data 
d.g1 = rexp(1000, rate = 1/720)
C.g1 = rev(seq(1,365, length.out = 1000 ))
E.g1 = rep(1,1000)
E.g1[d.g1 >= C.g1] = 0
d.g1[d.g1 >= C.g1] = C.g1[d.g1 >= C.g1]

d.g2 = rexp(1000, rate = 1/180)
C.g2 = rev(seq(1,365, length.out = 1000 ))
E.g2 = rep(1,1000)
E.g2[d.g2 >= C.g2] = 0
d.g2[d.g2 >= C.g2] = C.g2[d.g2 >= C.g2]

d.g3 = rweibull(1000, shape =1/2, scale = 180)
C.g3 = rev(seq(1,365, length.out = 1000 ))
E.g3 = rep(1,1000)
E.g3[d.g3 >= C.g3] = 0
d.g3[d.g3 >= C.g3] = C.g3[d.g3 >= C.g3]

event.time = c(d.g1, d.g2, d.g3)
event.result = c(E.g1, E.g2, E.g3)
Group = c(rep(1, 1000), rep(2, 1000), rep(3, 1000))

# fit a cox model and plot the result with a "predicted" survival curv
mod = coxph(Surv(time=event.time , event=event.result )~factor(Group))
km = survfit(Surv(time=event.time , event=event.result )~factor(Group))
plot(km, col=c("red", "blue", "green"), lwd=3, 
	 mark.time=FALSE, mark=3,   
	 xlim=c(0, 365),
	 xlab = "time (days)", ylab = "S(d)")
lines(survfit(mod, newdata = data.frame(Group = 1:3)), 
	   col=c("red", "blue", "green"), lty=2,	 
	   mark.time=FALSE )
Screen Shot 2018-07-14 at 2.15.05 PM

Figure 1: a Kaplan Meier Curve for simulated data (solid lines) with a Cox predicted (risk adjusted) survival function (dashed lines). Colors indicate the data cohort.

Figure 1 shows the result of the r code. Note that Cox model would not be appropriate in the is case as the proportional hazards assumption is not met; i.e., the KM lines cross!

So how does r calculate the risk adjusted survival function from the Cox model? To answer this question recall the definition of a hazard function; h(x) = f(x)/[1-F(x)]. Also recall that the Cox model describes the hazard function as \exp({\bf m}^t {\bf z}) h_0(x). Thus what is needed is some function g(y) such that

Screen Shot 2018-07-14 at 2.29.08 PM
is true.

All we have to do is find g(y) and use it to calculate the risk adjusted survival function from the empirical survival function.  But how to find g(y)? well like a lot of math it is much easier to know the answer and work back wards!

In this case I tried a few functions and could not get any of them to work. Then I had the idea to assume F(x) = 1- \exp(-\lambda x), i.e., it is exponential. Using this it became clear that g(y) = 1-(1-y)^{\exp({\bf m}^t {\bf z})}. It then turns out that this definition of g(y) works for all F(x)!

Screen Shot 2018-07-14 at 2.43.57 PM

Thus risk adjusted survival function \hat{S}(x) = S(x)^{\exp({\bf m}^t {\bf z})}.

That’s it for now.