A Bayesian Proportional-Hazards Model in Survival Analysis

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

A Bayesian Proportional-Hazards Model

In Survival Analysis
Stanley Sawyer — Washington University — August 24, 2004

1. Introduction. Suppose that a sample of n individuals has possible-censored survival times

Y1 ≤ Y2 ≤ . . . ≤ Yn (1.1)

Let δi = 1 if the ith time Yi is an observed death and δi = 0 if it was a right-censored event: That
is, the individual was alive at time Yi , but was last seen at that time. If Ti (1 ≤ i ≤ n) are the true
survival or failure times, then Yi = Ti if δi = 1 and Yi < Ti if δi = 0, in which case the true failure
time Ti is unknown.
We also assume d-dimensional covariate vectors X1 , X2 , . . . , Xn for the n individuals in (1.1).
The components of Xi might be age, income status, etc. The basic data for (1.1) is the set of triples
(Yi , δi , Xi ) for 1 ≤ i ≤ n. The most important statistical questions are connected with estimating
the effect of the covariates Xi on the true survival times Ti .
Let
Ye1 < Ye2 < . . . < Yem (1.2)

be the distinct survival times in (1.1). At each time Y =PY fj , let dj be the number of observed
m
deaths and aj the number of censored events. Then n = j=1 (dj + aj ) is the total sample size
Pm Pn
and nobs = j=1 dj = i=1 δi is the total number of observed deaths. The number of distinct
Pm
observed death times is r = j=1 I[dj >0] ≤ m.
The basic statistical model that we describe below is essentially due to Kalbfleisch (1978). See
Clayton (1991) and Ibrahim et al. (2001) for additional discussion and details, and Lee and Wang
(2003) for an introduction to survival analysis. The model described below is nonparametric in flavor,
but still allows tied survival-time data to be handled in a natural way. The likelihood formula that
we derive below for tied data appears to be new. Previous work on this model has mostly assumed
survival times (1.1) that are either without ties or else with grouped survival times (Kalbfleisch 1978,
Ibrahim et al. 2001).

2. A Survival Model. Let Y be the true lifetime of a random individual with covariates X. By
definition, the survival function is

¡ ¢ ³ Z t ´
SX (t) = PX (Y > t) = exp −HX (t) = exp − hX (dy) (2.1)
0

where HX (t) is a right-continuous increasing function with HX (0) = 0 and hX (dy) is the related
Lebesgue-Stieltjes measure. The function HX (t) is one form of the cumulative hazard function and
hX (dy) the instantaneous hazard measure (or hazard rate). The Proportional Hazards assumption is

hX (dy) = eβX h(dy) so that HX (t) = eβX H(t) (2.2)

for some d-dimensional vector of parameters β, where βX in (2.2) is the dot product. One of the
purposes of the model is to estimate β from the data and to test each component of β to find out
whether that component of X has a statistically significant effect on the survival times Y . A secondary
goal is to estimate the baseline hazard density h(dy), which would allow us to estimate the expected
survival time distribution SX (t) for an individual whose covariates are X, even if X is not among
the covariate vectors Xi in the data.
A Bayesian Proportional-Hazards Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

In principle, the likelihood of the data (Yi , δi , Xi ) in (1.1) is


  
Y Y
L =  PXi (Y > Yi )  PXi (Y = Yi ) (2.3)
[δi =0] [δi =1]

where PX (Y = Yi ) is with respect to some natural measure on the real line. Many inferential
methods in statistics are based on finding the parameters that are the most likely for known data, in
the sense of those parameters that have the largest value of L.
To derive an explicit formula for (2.3), choose numbers ∆j > 0 such that Y fj + ∆j <
Yej+1 − ∆j+1 for all j and define the binned likelihood
(
Yn
PXi (Y > Yfj + ∆j ) if δi = 0
L∆ = (2.4)
i=1 fj − ∆j < Y ≤ Y
PXi (Y fj + ∆j ) if δi = 1

By definition, the true lifetime Ti > Y fj for censored individuals with Yi = Y


fj , so that (2.4) is
the appropriate probability if the ∆j > 0 are sufficiently small. The likelihood (2.4) should be
asymptotically proportional to (2.3) in the limit as ∆j → 0.
We can write (2.4) in terms of the survival function SX (t) in (2.1) as
(
Y n fj + ∆j )
SXi (Y if δi = 0

i=1 fj − ∆j ) − SX (Y
SXi (Y fj + ∆j ) if δi = 1
i

 ³ R ´
 Yej +∆j
Yn ³ Z Yej −∆j 
´ exp − h Xi (dy) if δi = 0
Yej −∆j
= exp − hXi (dy) ³ ´
 R
i=1 0  1 − exp − Yej +∆j hXi (dy)
 if δi = 1
Yej −∆j

Define
Z Yej −∆j Z Yej +∆j
Zj = h(dy) and Zj0 = h(dy) (2.5)
Yj−1 +∆j−1 Yej −∆j

Then
Z Yej −∆j j−1
X j
X j−1
X
¡ ¢
h(dy) = Zj + Zk + Zk0 = Zk + Zk0
0 k=1 k=1 k=1
so that
n Z Yej −∆j
X n
X Z Yej −∆j
hXi (dy) = eβXi h(dy)
i=1 0 i=1 0
 
m
à j j−1
!
X  X X X
βXi 
=  e  Zk + Zk0
j=1 [Yi =Yej ] k=1 k=1
   
Xm X m X Xm Xm X
   
= Zk  eβXi  + Zk0  eβXi 
k=1 j=k [Yi =Yej ] k=1 j=k+1 [Y =Ye ]
i j


m ´
= Zj Rj (β) + Zj0 Rj+1 (β) (2.6)
j=1
A Bayesian Proportional-Hazards Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

In (2.6), Rj (β) is the risk sum


 
Xm X X
 
Rj (β) =  eβXi  = eβXi (2.7)
k=j [Yi =Yek ] [Yi ≥Yej ]

corresponding to the individuals who are at risk immediately before time Y fj . We can then write the
binned likelihood (2.4) as
 
Xm ³ ´ Y m Y ¡ ¢
L∆ = exp − Zj Rj (β) + Zj0 Rj+1 (β)  exp −Zj0 eβXi
j=1 j=1
· ¸
Yi =Yej
δi =0
m
Y Y ³ ¡ ¢´
× 1 − exp −Zj0 eβXi
j=1 [Y =Ye ,δ =1]
i j i

Ã
m ³
!
X ´
= exp − Zj Rj (β) + Zj0 Rj0 (β) + Sj (Zj0 , β) (2.8)
j=1

where
X ³ ¡ ¢´
Sj (Zj0 , β) = log 1 − exp −Zj0 eβXi (2.9)
[Yi =Yej ,δi =1]

is a sum over the observed deaths at times Yi = Y fj and


X X
Rj0 (β) = eβXi + eβXi (2.10)
[Yi =Yej ,δi =0] [Yi >Yej ]

fj .
is the risk sum for individuals who are at risk exactly immediately after time Y
Rt
3. A Gamma-process Prior for H(t) = 0 h(dy). A useful way to estimate properties of the
Rt
baseline hazard density h(dy) is to assume a parameteric model for H(t) = 0 h(dy) and then
estimate the parameters involved. A useful parametric probability distribution for the set of increasing
functions H(t) for t ≥ 0 is the gamma process Z(t). This is a stochastic process with independent
increments whose increments have the gamma distribution
³ ¡ ¢ ´
Z(t) − Z(s) ≈ G θ α(t) − α(s) , λ (3.1)

where α(t) is some strictly-increasing function that is continuously differentiable for t > 0. In (3.1),
Z ≈ G(θ, λ) means that Z is a random variable with the gamma probability density
λθ θ−1 −λx
x e for 0 ≤ z < ∞
Γ(θ)
Examples of α(t) in (3.1) would be α(t) = t or α(t) = tσ for some σ > 0. By (3.1),
¡ ¢ ³ ¡ ¢´ ¡ ¢
E Z(t) − Z(s) = θ α(t) − α(s) /λ = µ α(t) − α(s) and
¡ ¢ ³ ¡ ¢´ ¡ ¢
Var Z(t) − Z(s) = θ α(t) − α(s) /λ2 = µ α(t) − α(s) /λ (3.2)
A Bayesian Proportional-Hazards Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

for µ = θ/λ. ¡ ¢
If α(t) = t, E Z(t) = µt in (3.2), so that α(t) ¡= t corresponds
¢ to “noisy exponential”
baseline survival times. Similarly, if α(t) = tσ , then E Z(t) = µtσ , corresponding to “noisy
Weibull” survival distributions. The function α(t) is assumed fixed and θ and λ are¡parameters
¢ to be
estimated. Given µ = θ/λ, 1/λ determines the variance of H(t) = Z(t) about E H(t) = µα(t).
Often θ or θ and λ are given preassigned values to improve estimation.
The sample paths of the gamma process Z(t) are, with probability one, strictly-increasing
purely-discontinuous functions of t, although the probability that any preassigned value of t is a
jump is zero. This has the modeling advantage that tied survival-time values can occur with positive
probability, even though the survival times themselves (not conditioned on the path Z(t)) have a
continuous distribution, which means that any preassigned survival time has probability zero of
being attained.
For any process Z(t) with independent increments, the differences Zj , Zj0 in (2.5) are inde-
pendent random variables. By (3.1), the Zj , Zj0 are independent random variables with gamma
distributions

Zj ≈ G(θWj∆ , λ) where Wj∆ = α(Yj − ∆j ) − α(Yj−1 + ∆j−1 )


∆ ∆
Zj0 ≈ G(θWj0 , λ) where Wj0 = α(Yj + ∆j ) − α(Yj − ∆j ) (3.3)

where we write Yj = Y fj for Y


fj in (1.2) for ease of notation.
If the Zj , Zj0 are considered parameters or “hidden variables” in the data (Yi , δi , Xi ) with the
probability distribution (3.3), then the parameters θ, λ in (3.1) are considered hyperparameters. In a
Bayesian framework, the hyperparameters themselves are given probability (or prior) distributions.
In this case, we assume gamma prior distributions θ, λ ≈ G(², ²) for some small ² > 0 (² = 0.001
is the most common choice) and an uninformative normal prior for each component βj of β ∈ Rd ,
specifically that the prior distributions of βj are independent normal with means zero and standard
deviation 1/² (Ibrahim et al. 2001). However, improper uniform priors for λ and the βj would work
just as well in this case.

4. The Full Likelihood L. Under these conditions, the full binned likelihood of the data, including
the prior distributions for Zj , Zj0 and θ, λ, β, corresponding to (2.8) is
d
²² ²−1 −²θ ²² ²−1 −²λ Y 1 ¡ ¢
L∆ = θ e λ e √ exp −²2 βa2 /2
Γ(²) Γ(²) a=1

Ym µ θW ∆
λ j θW ∆ −1
× ∆
Zj j e−λZj e−Zj Rj (β) (4.1)
j=1
Γ(θWj )

λθWj0

θW ∆ −1 0 Y ³ ´¶
× ∆
Zj0 j0 e−λZj0 e−Zj0 Rj (β) 1 − exp(−Zj0 e βXi
Γ(θWj0 )
[Yi =Yej ,δi =1]

As each ∆j → 0,

Wj∆ → Wj = α(Yj ) − α(Yj−1 ) > 0 and ∆


Wj0 →0 (4.2)

for Yj = Y fj as before. The expressions in the first line of (4.1) vary continuously as W ∆ → Wj > 0.
j
As Wj0 → 0, the j th factor in the second line in (4.1) is asymptotic to C(Zj0 )Γ(θWj0
∆ ∆ −1
) ∼
0 f
C(Zj0 )θα (Yj )∆j for Zj0 > 0 and C(Zj0 ) > 0. There are two cases for the asymptotic behavior
of the j th factor in the second line in (4.1):
A Bayesian Proportional-Hazards Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

If dj = 0, the j th factor has a delta-function singularity at Zj0 = 0 as ∆j → 0 and L∆ does


not need to be rescaled. In this case, the factors in (4.1) with Zj0 disappear in the limit as ∆j → 0
(with Zj0 = 0).
If dj ≥ 1, the function C(Zj0 ) is a bounded and continuous function of Zj0 for Zj0 ≥ 0 and
the j th factor in (4.1) is asymptotic to C(Zj0 )θα0 (Yj )∆j as ∆j → 0.
Thus, ignoring constants that depend on ∆j for dj > 0, the limit of L∆ in (4.1) as maxj ∆j → 0
is the limiting full likelihood
à d
!
¡ r+²−1 −²θ ¢ X
²−1 −²λ 2 2
L = Cλ e θ e exp −² βa /2
a=1
m µ ³ ¶
Y λθWj θWj −1 ¡ ¢´
× Z exp −Zj λ + Rj (β) (4.3)
j=1
Γ(θWj ) j
Q ³ ¡ ¢´ 
³ ´ βXi
Ym
¡ ¢ [Yi =Yej ,δi =1]
1 − exp −Z j0 e
× exp −Zj0 λ + Rj0 (β)  
Zj0
[dj ≥1]

fj with dj ≥ 1. As
In (4.3), C depends on ² and α0 (Yj ) and r is the number of distinct times Yj = Y
mentioned earlier, inferences about which parameter values are relatively more likely are based on
finding relatively larger values of L in (4.3) for the data (Yi , δi , Xi ).

5. Estimating Parameters Using the Likelihood L. We estimate the parameters and hidden vari-
ables (θ, λ, Zj , Zj0 , β) in (4.3) by using Markov Chain Monte Carlo methods (Metropolis et al. 1953,
Hastings 1970, Gilks et al. 1996).
Specifically, we define a Markov Chain Qn that takes its values in the space of possible parameter
vectors (θ, λ, Zj , Zj0 , β) and which has a stationary or asymptotic distribution that is proportional
to (4.3). This means that Qn spends most of its time where the likelihood (4.3) is the largest. Mean
or median values of components or functions of components of Qn can be used to provide estimates
of the parameters affecting the true survival times Ti .
The Markov chain Qn proceeds by changing or updating each of the components of the vector
(θ, λ, Zj , Zj0 , β) in turn in a way that depends on the conditional probability distribution of that
parameter value given the data and all the other parameters. We carry out these parameter changes
or updates in the following way:
Updating θ : Ignoring multiplicative constants and also ignoring factors in (4.3) that do not depend
on θ, the conditional density of θ given the data and the other parameters is
m θW m
Y Zj j X
r+²−1 −²θ θW
θ e λ where W = Wj (5.1)
j=1
Γ(θWj ) j=1

for Wj in (4.2). The density (5.1) is asymptotic to Cθr+m+²−1 as θ → 0 and decays faster than
exponentially at infinity, and can be updated efficiently by one step of a Metropolis random walk
(Metropolis et al. 1953).
Alternatively, the density (5.1) is a log-concave function of θ, so that θ can be updated by a “Gibbs
sampler” step that samples directly from the distribution (5.1) using one of the adaptive-rejection
methods of Gilks and Wild (1992) or Gilks (1992). (See also Gilks et al. 1995.)
In general, a function f (θ) is called log-concave if (d/dθ)2 (log f (θ)) < 0 for all θ, or, more
generally, if (d/dθ) log f (θ) is decreasing in θ. The log-concavity of (5.1) follows from the identity
d2 ¡ ¢
2
log Γ(θ) = Var log G(θ, 1) > 0 (5.2)

where G(θ, 1) represents a gamma-distributed random variable (as in (3.1) ). (Exercise: Prove (5.2).)
A Bayesian Proportional-Hazards Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Updating λ : Ignoring multiplicative constants and factors in (4.3) that do not depend on λ, the
conditional density of λ given the data and the other parameters is
³ ³ m
X ¡ ¢´´
θW +²−1
λ exp −λ ² + Zj + Zj0 (5.3)
j=1

where Zj0 = 0 if dj = 0. This can be updated by a Gibbs sampler step by sampling from the gamma
distribution
³ Xm
¡ ¢´
λ ≈ G ² + θW, ² + Zj + Zj0
j=1

See Fishman (1995) for algorithms for generating gamma-distributed random variates. Two other
good references for statistical computation and for scientific computing in general are Devroye (1986)
and Press et al. (1992).
Updating Zj : Ignoring multiplicative constants and factors that do not depend on Zj , the conditional
density of Zj given the other parameters is
¡ ¢
θW −1
Zj j e−Zj λ+Rj (β) (5.4)

Thus Zj can be updated by sampling from the gamma distribution


³ ´
Zj ≈ G θWj , λ + Rj (β)

Updating Zj 0 for dj ≥ 1 : Ignoring multiplicative constants and factors that do not depend on Zj0 ,
the conditional density of Zj0 given the other parameters is
Q ³ ¡ ¢´ 
¡ ¢ 1 − exp −Z e βXi
0 [Yi =Yej ,δi =1] j0
e−Zj0 λ+Rj (β)   (5.5)
Zj0

The density (5.5) is normalizable in Zj0 and can be updated by a one step of a Metropolis random
walk. Unfortunately, the density (5.5) is not log-concave in Zj0 due to the factor of Zj0 in the
denominator.
Alternatively, a more general sampling technique can be used for (5.5) that does not require log
concavity (Gilks et al. 1995). This method, called “Metropolis-within-Gibbs” sampling, is equivalent
to independence Metropolis-Hastings sampling (Gilks et al. 1996) using, as the proposal distribution,
an approximation of the density (5.5) based on the method of Gilks (1992). If the density that is being
approximated is log concave, the method reduces to the adaptive-rejection method of Gilks (1992).
Technically speaking, the term “Metropolis-within-Gibbs” is not quite corrent, since indepen-
dence sampling is not Metropolis sampling in the original sense. Metropolis et al. (1953) only
described proposal distributions that are one step of a symmetric Markov chain. Independence sam-
pling is contained in a generalization of Metropolis et al. (1953) due to Hastings (1970). The latter
sampling scheme (or schemes) are usually called “Metropolis-Hastings” sampling.
Independence samplers can have extremely bad convergence properties if the proposal distribu-
tion is less singular or less heavy-tailed than the distribution being approximated (Gilks et al. 1996).
In that case, a Metropolis random walk can be used instead.
Large values of βXi can cause numerical underflows and overflows in the exponential and
risk sums in (5.5), but do not cause arbitrarily large values in the likelihood (5.5). Depending on
the compiler, computer programs may have to be adjusted to avoid program crashes in evaluating
A Bayesian Proportional-Hazards Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
¡ ¢ ¡ ¢
exp −Zj0 eβXi in (5.5) if βXi is large and positive. The term exp −Zj0 eβXi in (5.5) can be set
explicitly to be equal to zero in programming code if Zj0 > 0 and βXi > 500 and equal to one
if βXi < −500. Most modern computers replace exponential underflows (that is, smaller positive
values than the program can handle) by zero without a program warning or crash. If numerical
underflows in exponentials can also cause program crashes, program adjustments may also have to
be made if Zj0 eβXi by itself is large.
Updating β : Ignoring multiplicative constants and factors that do not depend on β, the conditional
density of βa in (4.3) given the data and other parameters is
µ X m ³ ´ 1 ¶
exp − Zj Rj (β) + Zj0 Rj0 (β) − Sj (Zj0 , β) − ²2 βa2 (5.6)
j=1
2

where X ³ ¡ ¢´
Sj (Zj0 , β) = log 1 − exp −Zj0 eβXi
[Yi =Yej ,δi =1]

is a sum over the observed deaths at times Yi = Y fj as in (2.9). If there are no observed deaths at
time Yi = Y fj , then Sj (Zj0 , β) = 0 and Zj0 = 0, and the second two terms in the sums in (5.6) do
not appear.
Baring linear dependencies among the sample covariates, the conditional likelihood in (5.6)
is normalizable in each component βj , so that each βj can be updated efficiently by one step of a
Metropolis random walk.
Alternatively, the density (5.6) is log-concave in βa , so that Gibbs sampler updates can be
made using the adaptive rejection methods of Gilks and Wild (1992) or Gilks (1992). Gilks has
programming code in C on a Web site for carrying out Metropolis-within-Gibbs sampling that reduces
to Gilks (1992) if a parameter is set. This C code can be used for non-Metropolis updates of θ, Zj0 ,
and β.

6. A Likelihood For (θ, λ, β) . The advantage of the Markov Chain Monte Carlo (MCMC) proce-
dure of the previous section is that it also gives us information about the conditional distribution of
the baseline cumulative hazards

Zj ≈ H(Yj −) − H(Yj−1 ) and Zj0 ≈ dH(Yj ) = h(dYj )

given the observed data. If we are primarily interested the parameters (θ, λ, β) and not in the baseline
hazard density h(dy), the Zj , Zj0 can be integrated out of the likelihood (4.3) to obtain a marginal
likelihood that depends only onR (θ, λ, β).
Evaluating the integrals L(Zj ) dZj in (4.3) in succession yields
m µ ¶θWj à d
!
Y λ X
²−1 −²λ r 2 2
L = Cλ e θ × exp −² βa /2 (6.1)
j=1
λ + Rj (β) a=1
Q ³ ¡ ¢´ 
³ ´ βXi
m
Y ¡ ¢ [Yi =Y ,δ
j i =1] 1 − exp −Z j0 e
× exp −Zj0 λ + Rj0 (β)  
Zj0
[dj ≥1]

While λ no longer has a simple gamma update, the parameter θ now has a gamma update, specifically
³ m
X ¡ ¢´
θ ≈ G r + 1, Wj log (λ + Rj (β))/λ (6.2)
j=1
A Bayesian Proportional-Hazards Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

The parameters Zj0 can be integrated by using the identity


Z ∞ Z ∞ Z b Z b
e−at − e−bt −θt dθ b
dt = e dθdt = = log (6.3)
0 t 0 a a θ a

for b > a > 0. Thus if dj = 1, the j th factor in the second line of (6.1) integrates to
à ! à !
λ + Rj0 (β) + eβXi λ + Rj (β)
log = log
λ + Rj0 (β) λ + Rj0 (β)

In particular, Rif dj ≤ 1 for all j, so that there are no ties among observed death times, then evaluating
the integrals L(Zj0 ) dZj0 for dj = 1 in (6.1) leads to the more compact form
m
õ ¶θWj µ ¶!
Y λ λ + Rj (β)
²−1 −²λ r
L = L(θ, λ, β) = C λ e θ log (6.4)
j=1
λ + Rj (β) λ + Rj0 (β)

ignoring the prior terms in β. If dj = 0, then Rj0 (β) = Rj (β) and the logarithmic factor does not
appear. Analogous expressions can be found for dj ≥ 2 by expanding the last product in (6.1) into
a linear combination of differences of exponentials and applying (6.3).
The likelihood (6.4) no longer has information about the baseline hazards Zj , Zj0 , although the
conditional density of Zj , Zj0 is given by (5.4) and (5.5) if β is known precisely. See Kalbfleisch
(1978) for a different derivation if dj ≤ 1 for all j.

7. The Posterior Distribution of the Hazard Function H(t). (In Bayesian terminology, “poste-
rior” means “conditional on the observed data for a given prior”.)
SA j
For any j and any partition (Yj−1 , Yj ) = a=1 (Yj,a−1 , Yja ) of (Yj−1 , Yj ), define Zj =
PAj
H(Yj ) − H(Yj−1 ) = a=1 Z ja for Z ja = H(Y ja −) − H(Yj,a−1 ). The same argument as in
(2.5) to (4.3) shows that the posterior distribution (4.3) is still valid with Zja in place of Zj , with
of course dja = 0 unless there is an actual observed death at Yja . This implies that the posterior
distribution of the random variables Zja is that they are independent gamma-distributed random
variables with distributions
³ ¡ ¢ ´
Zja ≈ G θ α(Yja ) − α(Yj,a−1 ) , λ + Rj (β) (7.1)

This in turn implies that, for each j, the posterior distribution of the process Z(t) − Z(Yj−1 ) =
H(t) − H(Yj−1 ) for Yj−1 < t < Yj is that of a gamma process in (Yj−1 , Yj ) with scale parameter
λj = λ + Rj (β), with jumps Z(Y fj +) − Z(Y fj −) ≈ Zj0 in the posterior distribution of Z(t) at
f
observed death times Yj . As before, Zj0 = 0 if dj = 0. If dj > 0, Zj0 has the density (5.5). (See
also Kalbfleisch 1978 and Clayton 1991.)

8. Simulating Data for the Model in Sections 1–3. We can simulate survival data (Yi , δi , Xi ) for
the model (2.1)–(2.2)–(3.1) as follows:
First, choose a sample size n, the number of covariates d, and, for i ≤ i ≤ n, covariates
Xi ∈ Rd . As in most regression models, these are assumed to be deterministic and are arbitrary.
Choose arbitrary parameters values θ, λ > 0 and risk parameters β ∈ Rd . Also, choose a strictly-
increasing continuously-differentiable function α(t) with α(0) = 0, for example α(t) = t.
The first goal is to define failure times Yi satisfying (2.1)–(2.2)–(3.1), that is
¡ ¢ ¡ ¢
P (Yi > t) = exp −HXi (t) = exp −eβXi H(t) , t≥0 (8.1)
A Bayesian Proportional-Hazards Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

where H(t) = Z(t) is a realization of the gamma process


¡ ¢ ¡ ¢
Z(t) ≈ G θα(t), λ ≈ (1/λ)G θα(t), 1 (8.2)

The final step will be to modify the construction so that some of the observations Yi can be censored.
The sample paths of Z(t) are right-continuous with jumps in every time interval (t1 , t2 ) with
0 ≤ t1 < t2 . This implies
¡ ¢ ¡ ¢
P (Yi > t) = P Z(Yi ) > Z(t) = exp −eβXi Z(t)

so that ¡ ¢
P (Z(Yi ) > s) = exp −eβXi s (8.3)
whenever s = Z(t). This suggests that Z(Yi ) might have an exponential distribution with mean
e−βXi , but this is not correct. In fact, given Z(t), the values of Z(Yi ) are restricted to the range
of Z(t), which is the complement of an open dense set of real numbers since Z(t) is increasing
with jumps in every open interval. This means that if the random variable Z(Yi ) has a probability
distribution with a density g(s), then g(s) = 0 on an open dense set of real numbers s. Thus Z(Yi )
cannot have a probability distribution with a continuous density.
If the variables Z(Yi ) were exponentially distributed, then we could simulate Yi ≈ Z −1 (Zi )
where Zi ≈ Z(Yi ) had a known distribution. However, we can do essentially the same even though
the Z(Yi ) are not exponentially distributed.
Let Zi be independent exponentially distribution random variables with mean e−βXi , as incor-
rectly suggested for Z(Yi ) by (8.3). The Zi can be simulated as
¡ ¢
Zi ≈ e−βXi − log(Ui )
where Ui are independent uniforms for 0 ≤ Ui ≤ 1. Define

Yi = min{ t : Z(t) ≥ Zi } (8.4)

Then Yi ≤ t2 if and only if Z(t2 ) ≥ Zi , so that


¡ ¢
P (Yi > t) = P (Z(t) < Zi ) = exp −eβXi Z(t) (8.5)
¡ ¢
which is exactly (8.1). If follows from (8.3) and (8.5) that P Z(Yi ) ≤ s = P (Zi ≤ s) whenever s
is a value attained by Z(t), but Z(Yi ) and Zi have different probability distributions.
To simulate Yi from (8.4), we need an approximate sample path of Z(t). Define independent
gamma-distributed random variables
¡ ¢
Qj ≈ G θ∆(j, m), 1 for 1 ≤ j ≤ mT (8.6)

where ∆(j, m) = α(j/m) − α((j − 1)/m) and m and T are large. In particular, ∆(j, m) = 1/m
if α(t) = t. In general, by (8.2) and (8.6),
k
X
¡ ¢ ¡ ¢
Z(k/m) ≈ G θα(k/m), λ ≈ (1/λ)G θα(k/m), 1 ≈ (1/λ) Qj
j=1

Thus we can simulate Yi in (8.4) by


k
X Xk
1
Yi = min{ k/m : (1/λ) Q k ≥ Zi } = min{ k : Qk ≥ λZi }
j=1
m j=1
A Bayesian Proportional-Hazards Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

or, equivalently, by

Xk
1
Yi = min{ k : Qk ≥ Zei } where (8.7)
m j=1
¡ ¢
Zei ≈ λ exp(−βXi ) − log(Ui ) ≈ λZi

To include censoring, we define censoring times

Xk
1 ¡ ¢
Yic = min{ k : Qj ≥ Zeic } for Zeic ≈ µe−βXi − log(Ui )
m j=1

in the same way for some constant µ > 0. Define δi = 1 (that is, the true failure time Yi = Ti is
observed) if Yi < Yic and δi = 0 (that is, Yi < Ti and Yi is censored) if Yic < Yi . The last observed
times (observed failure or censoring times) are

Xk
1
Yio = min{ Yi , Yic } = min{ k : Qj ≥ Zeio }, Zeio = min{ Zei , Zeic } (8.8)
m j=1

In general, if X1 and X2 are independent exponentials with E(X1 ) = µ1 and E(X2 ) = µ2 ,


then X3 = min{X1 , X2 } is exponential with E(X3 ) = µ1 µ2 /(µ1 + µ2 ) and P (X1 < X2 ) =
µ2 /(µ1 + µ2 ). Morever, X3 and the event {X1 < X2 } are independent. (Exercise: Prove these
three statements.)
This implies that the triple (Yio , δi , Xi ) for Yio in (8.8) satisfies the conditions of the model (2.1)–
(2.2)–(3.1) with λ replaced by λµ/(λ + µ). Moreover, the variables δi = I{Ze <Zec } are independent
i i

with P (δi = 0) = P (Z ec < Zei ) = λ/(λ + µ), and the δi are independent of Zeo .
i i
This means that if we choose θ, λ and 0 < q < 1 and define

Xk
1 ¡ ¢
Yi = min{ k : Qj > Zei } for Zei ≈ λ(1 − q)e−βXi − log(Ui )
m j=1

and then, for each i, independently of the value of Yi , call Yi censored (δi = 0) with probability q
and observed (δi = 1) with probability 1 − q, then (Yi , δi , Xi ) satisfy the conditions of Sections
1–3 with the original value of λ. (Exercise: Prove this. Note that if µ = ((1 − q)/q)λ, then
λµ/(λ + µ) = (1 − q)λ and λ/(λ + µ) = q.)

References.
1. Clayton, David G. (1991) A Monte Carlo method for Bayesian inference in frailty models.
Biometrics 47, 467–485.
2. Devroye, L. (1986) Non-uniform random variate generation. Springer-Verlag, New York.
3. Fishman, George S. (1995) Monte Carlo: Concepts, algorithms, and applications. Springer series
in operations research, Springer Verlag.
4. Gilks, W.R. (1992) Derivative-free adaptive rejection sampling for Gibbs sampling. In Bayesian
Statistics 4, (eds J.M. Bernardo, J.O. Berger, A.P. Dawid, and A.F.M. Smith), 641–649. Oxford
University Press.
A Bayesian Proportional-Hazards Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

5. Gilks, W.R, N.G. Best, and K.K.C. Tan (1995) Adaptive rejection Metropolis sampling within
Gibbs sampling. Appl. Statist. 44, 455–472.
6. Gilks, W.R., S. Richardson, and D. J. Spiegelhalter (1996) Markov chain Monte Carlo in practice.
Chapman & Hall/CRC, Boca Raton.
7. Gilks, W.R. and P. Wild (1992) Adaptive rejection sampling for Gibbs sampling. Appl. Statist.
41, 337–348.
8. Hastings, W. K. (1970) Monte Carlo sampling methods using Markov chains and their applica-
tions. Biometrika 57, 97–109.
9. Ibrahim, J.G, M.-H. Chen, and D. Sinha (2001) Bayesian survival analysis. Springer-Verlag,
New York.
10. Kalbfleisch, J.D. (1978) Non-parametric Bayesian analysis of survival time data. J. R. Statist.
Soc. B 40, 214–221.
11. Lee, Elisa, and J.W. Wang (2003) Statistical methods for survival data analysis, 3rd edition.
John Wiley & Sons.
12. Metropolis, N., A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller, and E. Teller (1953) Equations
of state calculations by fast computing machines. J. Chem. Phys. 21, 1087–1092.
13. Press, W. H., S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery (1992) Numerical recipes in C:
the art of scientific computing, 2nd edition. Cambridge University Press, Cambridge, England.

You might also like