Hi All
This week I have been thinking about an idea that seems have enormous potential that I have not yet been able to make work with real data. For churn analysis the basic question is what percentage of uses will never return. For conversion (making an in app purchase) the question is what percentage will ever make a purchase. In both cases it is common to create a window of time say 7 or 14 days and then ask the simpler questions of “what percentage of user will not return in 7 days?” or “what percentage of users will have converted by 14 days after install?”, then to model this with a probit or logit regression. On this blog I have been advocating for the use of survival analysis to instead estimate the time until return or the time from install until conversion. While both method can provide useful information they both fail to answer the question of “given forever what percentage will have x”. To address this I have been considering what happens with cumulative survival functions that do not have the property that
. For example in the time to convention case the
where
is the probability that user will convert given forever. Or in the time to return example
is the probability that a user will ever return.
Before getting into the challenges with this idea First I want to talk about what has been working. That is I have been able to derive a parametric likelihood function for live time conversion or passive churn data. To get the likelihood [] of an observation
with a survival function that does not asymptote to zero there are 3 steps. The first is to define the survival function, the second is to use the survival function to derive a hazard function [
], and the third is to “build” the likelihood from the survival function, hazard function, and censoring information (
).
Step 1:
The if is a conventional survival function (i.e., asymptotes to zero) then
is a survival function that asymptotes to
. here by survival function I mean that positive, real, and monotonically decreasing. If we want to think about the underlying distribution of
is? It has a point mass of probability
at
other wise it is the scaled distribution [
] that underlies
.
Step 2:
To derive the hazard function recall that . Thus
.
Step 3:
To get the likelihood for observed and right censored data again recall (with all these recollections is this blog really needed?) that the probability of an observed event is and the probability of a right censored event is
. Thus if
is 1 if and only if the either observation has an observed event time and is zero other wise then the likelihood can be written as
.
What has not worked very well so far is picking a parametric distribution for . I have tried the usual suspects-the exponential, Weibull, and gamma distributions- however those have not adequately described my data. To estimate the model parameters (
, i know it is sloppy to only define these now ) I have used MCMC. It may be the case that I need to find a better way of describing the prior distribution of
(currently i am using bounded uniforms) and that is disrupting my estimates though in that case I am surpassed that the MLE point estimate is so bad. In any event stay tuned as I improve this process.
Cheers