Proportional Hazards (unpacking equation 12)

Hi All

This week I am looking at proportional hazards regression. In particular, trying to clarify the conditional likelihood formula.

Before diving into that please note that proportional hazards, also called Cox proportional hazards (CoxPH), was first presented in 1972 in “Regression models and life-tables” which has been cited ~ 46,000 times! So Dr Cox’s contribution with this one paper (he has a lot of other great papers as well) really is extraordinary.

That said I did not find it the clearest of papers in particular equation 12 took me some time to understand. Figure 1 shows an excerpt of the paper.

Screen Shot 2017-05-29 at 12.03.51 PM

Figure 1: an excerpt of Cox’s 1972 “regression models and life tables”

The point that Cox is getting at is, if the hazard function (\lambda(t) in his notation and h(t) in ours ) is completely unknown but the order of failures are know can inference still be conducted? The answer is of course yes. Cox proposes the conditional likelihood function; that is, the probability of the order of failures or to put it another way the likelihood conditioned on the observed times of failures. Thus Cox’s equation 12 is apparently the probability that the i’th phone call ended given that one phone call end at that time. This post explores his assertion.

So to start, in CoxPH the hazard function is not assumed to have any particular form however it is assumed that all phone calls have the same form of hazard function only differing in a parametrically defined proportion; i.e., h_{(i)}(t) = \exp({\bf z}_{(i)} {\bf m}) h_0(t) is the hazard function for the i’th largest observation.

The \exp({\bf z}_{(i)}{\bf m}) gives the relative hazard of the i’th observation relative to h_0(t) the base hazard. This is the “proportional” in proportional hazards.

Returning to equation 12 and translating it into the this blogs notation we get

Screen Shot 2017-05-29 at 3.09.11 PM.

This is the proportional term divided by the sum of the probational terms for the risk set [R(t)]. The risk set is the set of observation with event times greater that of the i’th observation. By multiplying the top and bottom of the fraction with the baseline hazard h_0(t_{(i)}) it becomes clear that this is the hazard of the ending phone call divided by the hazard of all phone call still active imitatively before time t_{(i)}.

Screen Shot 2017-05-29 at 4.43.42 PM

Intuitively, this should make sense the probability of phone call i ending given that exactly one of the phone calls ended is the hazard of the i’th phone call divided by the total hazard of all active phone calls. But sadly Intuitive is not really good enough. So continuing to please back some of these layers and using the definition of a hazard function:
Screen Shot 2017-05-29 at 4.41.38 PM.  This can be rewritten as
Screen Shot 2017-05-29 at 7.31.59 PM using Bayes rule [P(A|B) = P(A \cap B )/P(B) ].  Given the assumption that there is one event at time t_{(i)} (this assumption accounts deals for the conditional term) the probabilities in the fraction can be interpreted as
Screen Shot 2017-05-29 at 7.11.35 PM,  making the additional assumption that the event times are independent of each other the denominator can be interpreted as the probability that there is one event at time t_{(i)} this results in further simplification:
Screen Shot 2017-05-29 at 7.13.43 PM.  Finally using Bayes rule again the fraction can be written as single conditional probability
Screen Shot 2017-05-29 at 7.15.34 PM. So CoxPH is correct and 46,000 papers don’t need an errata! But as I said I would have like a little more expansion in the original paper.

Basically that is everything I wanted to get through, however, …, I like like to talk too much / over think things. In particular,  the assumption that there is only one event at each time. In the proof this leads to the probability of the an event happening at the observed being the sum of the probabilities; however, in situation where there could be more than one event at a given time (likely due to sloppy data collection) then that is clearly not true (Cox and the reviewers discuss this in the paper a bit). The probability if at-least one observations has event time t_{(i)} is then 1- \prod(1-P[\text{Obs. k has event time } t_{(i)}]). While it is difficult to make general observations about this probability it must be larger then the assumed probability. In addition, the numerator would then be not just the probability that observation i has event time t_{(i)}  but instead the probability that all the observations that end at t_{(i)} have event time t_{(i)}. This probability must smaller. So it does look like this would change the conditional likelihood a lot.

Tune in next time for exciting Bayesian Business intelligence (I promise next week there will actually be a Bayesian component to this)!


4 thoughts on “Proportional Hazards (unpacking equation 12)

  1. Pingback: Creating a distribution with a given hazard function. | Bayesian Business Intelligence

  2. Pingback: Calculating Bayesian Evidence for Cox Proportional Hazards. | Bayesian Business Intelligence

  3. Pingback: A short note on risk adjusted Survival functions | Bayesian Business Intelligence

  4. Pingback: An introduction or “vulgurization” on Churn | Bayesian Business Intelligence

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s