A semiparametric random effects model for multivariate competing risks data

Share Embed


Descrição do Produto

A semiparametric random effects model for multivariate competing risks data

Thomas H. Scheike Yanqing Sun Mei-Jie Zhang Tina Kold Jensen

Research Report 09/2

Department of Biostatistics University of Copenhagen

A Semiparametric Random Effects Model for Multivariate Competing Risks Data Thomas H. Scheike Department of Biostatistics, University of Copenhagen Øster Farimagsgade 5, DK-1014, Denmark [email protected]

Yanqing Sun Department of Mathematics and Statistics The University of North Carolina at Charlotte 9201 University City Boulevard Charlotte, NC 28223, USA. [email protected]

Mei-Jie Zhang Division of Biostatistics, Medical College of Wisconsin 8701 Watertown Plank Road, Milwaukee 53226, U.S.A. [email protected]

Tina Kold Jensen Institute of Public Health, University of Southern Denmark Winslowparken 17, DK-5000 Odense C, Denmark [email protected] Summary: We propose a semiparametric random effects model for multivariate competing risks data when the failures of a particular type are of interest. Under this model, the marginal cumulative incidence functions follow a generalized semiparametric additive model. The associations between the cause-specific failure times can be studied through the dependence parameters of copula functions which are allowed to depend on the cluster level covariates. A cross-odds ratio type measure is proposed to describe the associations between cause-specific failure times and its relationship to the dependence parameters are explored. We develop a simple two-stage strategy, the marginal models are estimated in the first stage and the dependence parameters are estimated in the second stage. The large sample properties of the proposed estimators are derived. The proposed procedures are applied to the Danish twin data to model the cumulative incidence functions of the ages to natural menopause and to investigate genetic associations in the ages to natural menopause between the twins of same family. Running title: Random effects model for competing risks data. Key words: Asymptotics, binomial modelling, copula functions, cross-odds ratio, cumulative incidence functions, estimation equations, inverse-censoring probability weighting, two-stage estimation, twin data.

1

1.

Introduction

This work is motivated by a data set from the Danish twin registry (e.g. Skytthe et al., 2003) on the timing of natural menopause among female Danish twins. The menopause is the state when a woman stops menstruation. It can occur naturally or surgically. Naturally menopausal women have a gradual decline in estrogen levels over a two to ten year period between the ages of 40 - 60. A woman is considered to have reached menopause when menstrual periods have completely stopped for 12 months. The occurrence of natural menopause is not recorded for all women because of two main types of intervention, namely hormonal treatment for menopause or surgical menopause. The surgical menopause happens when women have their uterus, cervix, and ovaries removed before they reach natural menopause. Different types of menopause, natural or surgical, may be associated with different levels of post-menopause symptoms. Our focus here is to understand whether there is a genetic connection among female twins of same family in the times to natural menopause. Surgical and hormonal treatment before natural menopause are both considered as competing causes for menopause. Let T be the time to failure such as menopause, and ² the cause of failure. Assuming that the cause ² = 1 is the primary cause of interest, and ² = 2 for other competing causes. The competing risks failure times in this case can be represented by {Tki , ²ki }, for i = 1, . . . , nk and k = 1, . . . , K, where K is the number of cluster and nk is the number of subject within the kth clusters. The clusters are assumed to be independent but the competing risks failure times with clusters are generally correlated. In a competing risks set-up, the probability of failure due to cause 1 conditional on covariates x can be described by the conditional cumulative incidence function P1 (t; x) = P (T ≤ t, ² = 1|x), also known as the crude incidence. Other often used measure for describing survival experiences is cause-specific hazard function due to cause 1 defined as λ1 (t; x) = f1 (t; x)/S(t; x), where f1 (t; x) is the derivative of P1 (t; x) with respect to t and S(t; x) = P (T > t|x) is the conditional overall survival function. Much of the early work for analyzing competing risks data has involved modelling the cause-specific hazard functions of the different failure types, see Kalbfleisch & Prentice (1980). The cause-specific hazard functions can often be estimated by the methods applicable to survival data when there is only a single cause of failure. Let λ2 (t; x) be the cause-specific hazard due to causes other than 1. Hence, the estimation of the cumulative incidence function P1 (t; R tx) can be obtained by the plug-in estimators of λj (t; x), j = 1, 2, to the formula P1 (t; x) = 0 λ1 (s; x)S(s; x) ds, R s− where S(s; x) = exp[− 0 {λ1 (u; x) + λ2 (u; x)} du]. It is clear that the effect of a covariate on the cause-specific hazard function of a particular cause can be very different from the effect of the covariate on the corresponding cumulative incidence function. The cause-specific hazard λ1 (t; x) is the instantaneous rate of failure due to cause 1 in the presence of all of the competing risks. However, often the main scientific interest is the instantaneous rate of failure of cause 1 in the absence of any other competing risks. The cause-specific hazard based analysis is very important and useful. But one needs to be careful in its interpretation. The cumulative incidence function P1 (t; x), on the other hand, measures the absolute causespecific risk. This measure is intuitively appealing and easy to interpret. It provides the 2

facts based summaries for decision making. Fine & Gray (1999) proposed a proportional model to directly model the cumulative incidence function. Scheike et al. (2008) studied a general model for the cumulative incidence function. The model postulates that h{P1 (t; x)} = g{η(t), γ, x},

(1)

where h is a known link function, g is a known regression function, and η(t) and γ are unknown regression functions and regression parameters. Model (1) includes the semiparametric multiplicative model cloglog{P1 (t; x, z)} = η(t)T x + γ T z

(2)

and the semiparametric additive model − log{1 − P1 (t; x, z)} = η(t)T x + (γ T z)t,

(3)

where x is (p + 1)-dimensional covariates with first element to be 1 for all subjects and z is q-dimensional covariates. These flexible models allow some covariates, x, having timevarying effects and other covariates, z, having constant effects. The multiplicative model contains the Fine & Gray (1999) proportional model as a special case. The goodness-of-fit procedures for models (2) and (3) have also been proposed by Scheike & Zhang (2008). There have been a slow progress in developing statistical methods for correlated causespecific failure times in the competing risks setting. Recently, Bandeen-Roche & Liang (2002) introduced a conditional cause-specific cross hazard ratio association measure for multivariate cause-specific failure times. An alternative measure was proposed by Cheng & Fine (2008). Cheng et al. (2007) studied nonparametric estimations of bivariate cause-specific hazard functions and bivariate cumulative incidence functions. They also proposed two association measures in terms of the bivariate cause-specific hazard functions and bivariate cumulative incidence functions. All the existing association measures require specifications of joint cause-specific intensities for all cause combinations. Let (xki , z ki ) be the covariates associated with the multivariate competing risks failure times {Tki , ²ki }, for i = 1, . . . , nk and k = 1, . . . , K. The marginal cumulative incidence function conditional on the covariate (xki , z ki ) is P1 (t; xki , z ki ) = P (Tki ≤ t, ²ki = 1|xki , z ki ). A simple extension of the additive model (3) for multivariate competing risks data is to introduce random effects to the model © ª − log{1 − P1 (t; xki , z ki )} = θk η(t)T xki + (γ T z ki )t , where θk , k = 1, .., K, are independent gamma distributed random effects with mean 1 and variance α. The marginal cumulative incidence functions then become © ª¤¤ £ £ P1 (t; xki , z ki ) = Eθk 1 − exp −θk η(t)T xki + (γ T z ki )t ¡ ¢ = 1 − Ψα η(t)T xki + (γ T z ki )t 3

where Ψα (x) = E{exp(−θk x)} is the Laplace transform of a gamma distribution with mean 1 and variance α. Similar extension was considered by Katsahian et al. (2006) in the context of the multiplicative model (2). The drawbacks to this approach are that the frailty parameter α can be identified solely from the marginal mean models and that the marginal models depend on the distribution of the subject specific effects. In this paper, we suggest an alternative approach to avoid this problem by working on the random effects model (4) in Section 2. Under this model, the marginal cumulative incidence functions follow the additive model (3) that is independent of the frailty parameters and frailties account only for correlations between competing risks failure times within clusters. Similar procedures can be worked out under the general marginal model (1). The proposed method resembles the two-stage approach for multivariate failure times with a single cause of failure where the marginal distributions are fixed and the correlation is described through a single dependence parameter, see (Oakes, 1989; Clayton, 1978; Glidden, 2000). The proposed model allows for different degrees of association among for different types of clusters. For example, the associations between the times to natural menopause for monozygotic twins may be different from dizygotic twins. The paper is structured as follows. In Section 2 we present a random effects model and develop the inverse probability of censoring weighted (IPCW) estimation procedures for the marginal and dependent parameters of the model. We also introduce the concept of cross-odd ratio to measure the degree of association between two cause-specific failure times within the same cluster. The asymptotic distributions of the estimators are derived, which can be used to develop confidence intervals/bands and make inferences. Section 3 contains a worked example based on the Danish twin cohort data where the marginal probabilities for the times to natural menopause are estimated as well the associations between the twins. Finally, A discussion is given in Section 4. The detailed formula for the variance estimations and additional technical details are given in the Appendix A and B. 2.

Estimation procedures with multivariate competing risks data

Let Tki and Cki be the event time and right censoring time for the ith subject within cluster k = 1, . . . , K, respectively, and let ²ki ∈ {1, 2} denote the failure type. Let Nki (t) = I(Tki ≤ t, ²ki = 1) be the counting process associated with cause 1 and let ∆ki = I(Tki ≤ Cki ) be the indicator for uncensored failure time. We observe K independent identically distributed (i.i.d.) replications of {(Teki , e ²ki , X ki , Z ki ), i = 1, . . . , nk } where Teki = min(Tki , Cki ), e ²ki = ²ki ∆ki , X ki = (1, Xki,1 , . . . , Xki,p )T , and Z i = (Zki,1 , . . . , Zki,q )T . Let [0, τ ] be the time period from which data are collected. We assume that (Tki , ²ki ) are independent of Cki given covariates of (X ki , Z ki ). 2.·1 Model specification Let θk be independently distributed random variables. Assume that the subjects within clusters are independent conditional on θk and (X ki , Z ki ). Let P1∗ (t; xki , z ki ) = P (Tki ≤ 4

t, ²ki = 1|θk , xki , z ki ) be the cluster-specific conditional cumulative incidence function given θk and (X ki , Z ki ) = (xki , z ki ). We consider the following random effects model: £ ¡ © ª¢¤ T T P1∗ (t; xki , z ki , θk )} = 1 − exp −θk Ψ−1 , (4) νk exp −η(t) xki − (γ z ki )t where Ψνk (t) = Eθk {exp(−θk t)|xki , z ki } is the Laplace transformation of the random effects θk conditional on (xki , z ki ) and νk are the dependence parameters. For example, we can take Ψνk (t) as the Laplace transformation of gamma distributions with mean 1 and variance νk . Here we allow νk to depend on some cluster level covariates Qk that is a subset of (X ki , Z ki ). This generalization allows for different degrees of association among for different types of clusters. For example, the associations between the times to natural menopause for monozygotic twins may be different from dizygotic twins. Here, we consider the following simple regression model νk = αT Qk (5) where α is a r-dimensional vector of parameters and Qk is a regression design. Under (4), the marginal cumulative incidence follows the generalized semiparametric additive model: − log{1 − P1 (t; xki , z ki )} = η(t)T xki + (γ T z ki )t.

(6)

The joint “survival function” of the cause-specific failure times E{(1 − I(Tki ≤ t, ²ki = 1)(1 − I(Tkj ≤ s, ²kj = 1))|xki , z ki , xkj , z kj } ¡ ¢ −1 = Ψνk Ψ−1 νk (1 − P1 (t; xki , z ki ))+Ψνk (1 − P1 (s; xkj , z kj ))

(7)

−1 has an Archimedean copula function representation. The Ck (u, v) = Ψνk (Ψ−1 νk (u)+Ψνk (v)) is −1 an Archimedean copula function with Ψνk (u) as its generator. Various frailty distributions for θk can be used to describe correlations within members of each cluster. When the conditional distribution of θk given (X ki , Z ki ) follows the gamma distribution with mean 1 and variance of νk , Ck (u, v) = (u−νk + v −νk − 1)−1/νk is the Clayton copula, see Clayton (1978). Other Archimedean copula families include Frank (1979) and Gumbel (1960) which are generated by the Laplace transforms of the log series distribution and positive stable distribution. The statistical analysis of the correlated failure times with a single cause through copula functions in the absence of covariates have been studied by Clayton (1978), Oakes (1989), Genest & MacKay (1995), and Shih & Louis (1995). Glidden (2000) studied multivariate failure time regressions by specifying the joint survival function through a multivariate Clayton copula function with a single dependence parameter, where the marginal hazards follow a stratified Cox model. The correlation between multivariate failure times were measured through an association parameter of the Clayton copula. The proposed model (4) extends the joint modelling of multivariate failure times to multivariate cause-specific failure times in competing risks settings. When the failure times due to one particular cause is of interest, the model (4) does not involve distributions of failure

5

times for the other causes. This avoids the problems caused by the misspecifications of the nuisance cause-specific hazard functions. The model (4) inherits the features of multivariate failure times modelling through copula functions. The marginal models are specified independently of the associations between cause-specific failure times that are measured through the parameters in a copula family. However, the Kendall’s tau like interpretation of dependence parameters is no longer applicable since cumulative incidence functions are in general not distribution functions. Instead, we propose a cross-odds ratio type measure to describe the association between cause-specific failure times. 2.·2 IPCW estimation of marginal model Note that the random effects model (4) implies the semiparametric additive model (6) for the marginal cumulative incidence functions. This suggests that the model (4) can be estimated through a two-stage procedure where the parameters of the marginal models are estimated in the first stage and the dependence parameter is estimated in the second stage. The parameters of the marginal models can be estimated by slightly modifying the procedure of Scheike et al. (2008). Let Gc (t|X ki , Z ki ) = P {Cki > t|X ki , Z ki }. Under the conditional independence Cki and (Tki , ²ki ) given the covariates, We have ¯ ¯ ½ ¾ ¾ Z t ½ ¯ ∆ki Nki (t) ∆ki dNki (s) ¯¯ ¯ E X ki , Z ki X ki , Z ki = E Gc (Tki |X ki , Z ki ) ¯ Gc (s|X ki , Z ki ) ¯ 0 Z t dP1 (s; X ki , Z ki ) = P1 (t; X ki , Z ki ). = 0

We consider estimating equations based on the inverse probability of censoring weighted response ∆ki Nki (t)/Gc (Tki |X ki , Z ki ), whose mean equals the marginal cumulative incidence function. In practice the censoring distribution Gc (t; xki , z ki ) is unknown, it can be estimated by a Kaplan-Meier estimator or by a predicted value for each individual based on a regression model to improve the efficiency, see Scheike et al. (2008). For simplicity, we use Kaplan-Meier bc (t). In the regression case, Gc (t; xki , z ki ) will be estimated estimator and denote it by G ˜ based on the data {(Tki , ∆ki , X ki , Z ki ), i = 1, . . . , nk , k = 1, . . . , K}. (k) Let P 1 (t, η(t), γ) be the nk × 1 vector of P1 (t; xki , z ki ) for i = 1, . . . , nk , R(k) (t) be the ˆ c (Tki ), D (k) (t, η(t), γ) and D (k) (t, η(t), γ) nk × 1 vector of adjusted responses ∆ki Nki (t)/G η γ be the nk × (p + 1) and nk × q matrixes with the ith rows equal to D η ,ki (t, η(t), γ) = ∂P1 (t; xki , z ki )/∂η(t) and D γ ,ki (t, η(t), γ) = ∂P1 (t; xki , z ki )/∂γ, respectively. Let W (k) (t) = diag(wki (t)) where wki (t) are the weights. Following Scheike et al. (2008), the marginal regression functions η(t) and regression parameters γ can be estimated based on the following estimation equations: U η (t, η(t), γ) = D Tη (t, η(t), γ)W (t) {R(t) − P 1 (t, η(t), γ)} = 0 Z τ U γ (τ, η(·), γ) = D Tγ (t, η(t), γ)W (t) {R(t) − P 1 (t, η(t), γ)} dt = 0 0

6

(8) (9)

where (1)

(K)

(1)

(K)

P 1 (t, η(t), γ) = [{P 1 (t, η(t), γ)}T , . . . , {P 1 (t, η(t), γ)}T ]T D η (t, η(t), γ) = [{D η (t, η(t), γ)}T , . . . , {D η (t, η(t), γ)}T ]T (1)

(K)

D γ (t, η(t), γ) = [{D γ (t, η(t), γ}T , . . . , {D γ (t, η(t), γ)}T ]T R(t) = [{R(1) (t)}T , . . . , {R(K) (t)}T ]T and W (t) = diag(W (k) (t)). For simplicity we consider un-weighted estimation, thus taking W (t) as an identity matrix. Scheike et al. (2008) proposed an iterative procedure to solve these score equations simultaneously. The iterative equations for the estimators are given in (23) and (24) in Appendix A. These estimation equations can solved using standard software for GLM-models, which also lead to conservative estimation for the standard errors. The procedure is also implemented in the timereg package for R. Suppose that there exists a constant N0 such that√nk < N0 for all√k. Under regularity conditions, we show in Appendix A that as K → ∞, K(b γ − γ) and K(b η (t) − η(t)) are asymptotically equivalent to the following i.i.d. decompositions √

K I −1 γ

nk K X X

W γ ,ki (τ )



and

nk K X © ª−1 X K I η (t) W η ,ki (t),

k=1 i=1

k=1 i=1

respectively, Pnk where the formula Pnk for W γ ,ki (τ ) and W η ,ki (t) are given in Appendix A. The sums W η ,ki (t) are i.i.d. distributed with asymptotically mean i=1 W γ ,ki (τ ) and i=1 √ γ − γ) converges in distribution to a mean zero normal zero for k = 1, . . . , K. Thus K(b random vector by a simple application of the central limit theorem. By an application of functional central limit theorem for empirical processes (cf. Lemma 1 of Sun & Wu (2005)), √ K(b η (t) − η(t)) converges weakly to a mean zero Gaussian process on t ∈ [0, τ ]. bγ, W c γ ,ki (τ ), I b η (t) and W c η ,ki (t) be the estimators of their population counter Let I parts by the√ plug-in estimates √ of all needed quantities; see Appendix A. The asymptotic γ − γ) and K{b η (t) − η(t)} can be estimated by variance of K(ˆ  (n )⊗2  K k X X b γ = KI b −1  c γ ,ki (τ ) b −1 , I Σ W (10) γ γ k=1

i=1

 (n )⊗2  K k o−1 X n o−1 X b b c b   Ση (t) = K I η (t) W η ,ki (t) I η (t) n

(11)

i=1

k=1

where a⊗2 = aaT for a column vector a. These estimators can be used to construct confidence intervals for γ and η(t). The Gaussian multiplier resampling method can be used to construct simultaneous confidence intervals for η(t) over a fixed time interval or make inference about the regression 7

function. We note that, as an application of Lemma 1 of Sun & Wu (2005), converges weakly to the same mean zero Gaussian limit processes as √



K(b η (t) − η(t))

n

nk K X o−1 X c η ,ki (t), b K I η (t) Gk W k=1 i=1

given the observed data where G1 , .., GK are i.i.d. standard normal random variables. Hence b (t) can be constructed by repeated a (1 − α) × 100% asymptotic confidence band for η resampling of G1 , .., GK (Scheike & Zhang (2008)). 2.·3 Estimation of dependence parameters To estimate the dependence parameters α, we consider the cross product of the inverse probability of censoring weighted responses within clusters. Let Ckij (t) =

∆ki Nki (t) ∆kj Nkj (t) Gc (Tki ) Gc (Tkj )

(12)

where i, j are two subjects within cluster k. It follows that, for i 6= j, ckij (t, α, η, γ) = E{Ckij (t)|X ki , Z ki , X kj , Z kj } = E [E {Ckij (t)|θk , X ki , Z ki , X kj , Z kj } |X ki , Z ki , X kj , Z kj ] −1 = E{[1−exp{−νk Ψ−1 νk (exp{−Λki (t)})}][1−exp{−θk Ψνk (exp{−Λkj (t)})}]|X ki , Z ki , X kj , Z kj } ¢ ¡ −1 (13) = 1−exp{−Λki (t)}−exp{−Λkj (t)}+Ψνk Ψνk (exp {−Λki (t)})+Ψ−1 νk (exp {−Λkj (t)}) where νk = αT Qk under (5), Λki (t) = η(t)T X ki + (γ T Z ki ) t and similarly for j. bc (t) in (12) to Let Hk denote the index set for the kth cluster. Replacing Gc (t) by G bkij (t), we consider the following estimating equations for α: obtain C Z

τ

bc ) = b, γ b, G Uα (α, η 0

K X

X

n

o b b, γ b ) Ckij (t) − ckij (t, α, η b, γ b ) dt = 0, (14) Dα,kij (t, α, η

k=1 i,j∈Hk , i6=j

b, γ b ) = ∂ckij (t, α, η b, γ b )/∂α, and η b (t), γ b are the estimators of η(t), γ from where Dα,kij (t, α, η the marginal model. Let α b be the estimator of α solving (14). Then the cluster-specific dependence parameter is given by νbk = α bT Qk . The probability of both subjects within the kth cluster experiencing the event of interest by time t can be estimated by plugging-in the empirical counterparts of (13), denoted as b ckij (t). By Taylor approximation, µ ¶−1 √ −1 ∂Uα (α, η, γ, Gc ) bc ) + op (1). b, γ b, G K −1/2 Uα (α, η (15) K(b α − α) = − K ∂α

8

We show in Appendix B that



K(b α − α) has the following i.i.d. approximation: ( K ) X √ © ª −1 K(b α − α) = K −1/2 K −1 Iα (α, η, γ) Wα,k (α, η, γ, Gc ) + op (1),

(16)

k=1

where the formula for Iα (α, η, γ) and Wα,k (α, η, γ, Gc ) are given in (37) and (39) in Appendix B. Their explicit expressions under gamma √ random effects model are given in (39), (42) and (44) in the Appendix B. It follows that K(b α − α) is asymptotically normal with mean 0 and a variance that can be estimated by # "K o2 Xn −1 b α = K(Ibα ) cα,k (Ibα )−1 , (17) Σ W k=1

cα,k are the empirical counterparts of Iα (α, η, γ) and Wα,k (α, η, γ, Gc ), rewhere Ibα and W spectively. 2.·4 Cross-odds ratio There seems no simple direct extension of the association measures for multivariate failure times to the competing risks model. The popular Kendall’s tau like interpretation of dependence parameter is no longer applicable to multivariate cause-specific failure times since cumulative incidence functions are in general not distribution functions. Bandeen-Roche & Liang (2002) introduced a conditional cause-specific cross hazard ratio association measure for multivariate cause-specific failure times. An nonparametric estimator of this measure is proposed by Bandeen-Roche & Ning (2008). An alternative measure was proposed by Cheng & Fine (2008) and Cheng et al. (2007). They proposed two association measures in terms of the bivariate cause-specific hazard functions and bivariate cumulative incidence functions. All the existing association measures require specifications of joint cause-specific intensities for all cause combinations. We introduce a cross-odds ratio to describe the dependence between cause-specific failure times. Consider the case of bivariate failure times (Tki , Tkj ) with (²ki , ²kj ) as the indicator for cause 1. We define the following association measure for the failure times due to cause 1 between individual i and j in the same cluster k: πi|j (t) =

P (Tki ≤ t, ²ki = 1|Tkj ≤ t, ²kj = 1)/P ({Tki ≤ t, ²ki = 1}c |Tkj ≤ t, ²kj = 1) P (Tki ≤ t, ²ki = 1)/P ({Tki ≤ t, ²ki = 1}c )

(18)

where {A}c means the complement event to A. This measure of association resembles odds ratio and it allows to depend on time. It measures the odds of the occurrence of (Tki ≤ t, ²ki = 1) conditional on (Tkj ≤ t, ²kj = 1) relative to its unconditional odds of occurrence. If the events (Tki ≤ t, ²ki = 1) and (Tkj ≤ t, ²kj = 1) are independent, then πi|j (t) = 1. If they are positively associated (meaning P (Tki ≤ t, ²ki = 1|Tkj ≤ t, ²kj = 1) > P (Tki ≤ t, ²ki = 1)), then πi|j (t) > 1; otherwise πi|j (t) < 1. If P (Tki ≤ t, ²ki = 1|Tkj ≤ t, ²kj = 1) = 1, thus 9

perfectly positively correlated, then πi|j (t) = ∞. If P (Tki ≤ t, ²ki = 1|Tkj ≤ t, ²kj = 1) = 0, then πi|j (t) = 0. Further, this measure is symmetric about the individuals i and j provided that the two have same marginal subprobabilities, i.e., P (Tki ≤ t, ²ki = 1) = P (Tkj ≤ t, ²kj = 1). We note that the following symmetric measure of association can also be used and it can be expressed in terms of πi|j (t): P (Tki ≤ t, ²ki = 1; Tkj ≤ t, ²kj = 1) − P (Tki ≤ t, ²ki = 1)P (Tkj ≤ t, ²kj = 1) P (Tki ≤ t, ²ki = 1; Tkj ≤ t, ²kj = 1) ½ ¾ 1 c = P ({Tki ≤ t, ²ki = 1} ) 1 − . πi|j (t)

(19)

In the presence of covariates (xki , z ki ) and (xkj , z kj ) for the ith and jth subjects, the above measure is considered to be conditional on these covariates, namely πi|j (t|xki , z ki , xkj , z kj ), and all probabilities in (18) will be the conditional probabilities given (xki , z ki , xkj , z kj ). For our model specified in (4), πi|j (t|xki , z ki , xkj , z kj ) −1 1 − exp(−Λki (t)) − exp(−Λkj (t)) + Ψνk (Ψ−1 νk (exp(−Λki (t))) + Ψνk (exp(−Λkj (t)))) = −1 exp(−Λki (t)) − Ψνk (Ψ−1 νk (exp(−Λki (t))) + Ψνk (exp(−Λkj (t)))) ¾ ½ exp(−Λki (t)) (20) 1 − exp(−Λki (t)) where Λki (t) = η(t)T xki + (γ T z ki ) t. When the random effects θk follow a gamma distribution with mean 1 and variance νk , −νk Ψνk (x) = (1 + νk x)−1/νk , for x > −1/νk and its inverse function is Ψ−1 − 1)/νk for νk (y) = (y y > 0. The cross-odds ratio is closely related to the dependence parameter of the Clayton Archimedean copula family. The cross-odds ratio πi|j can be easily estimated by a plug-in estimator. Suppose that νk ≡ α does not depend on k, i.e., the within subjects associations are same for all clusters. Recall that P1 (t; xki , z ki ) = 1 − exp(−Λki (t)). A special case of (20) is when all the covariates are cluster level covariates, i.e., (xki , z ki ) = (xkj , z kj ), so that P1 (t; xki , z ki ) = P1 (t; xkj , z kj ). In this case, the cross-odds ratio πi|j (t|xki , z ki , xkj , z kj ) depends only on P1 (t; xki , z ki ) and α. The plots of πi|j (t|xki , z ki , xkj , z kj ) against on α at different values of P1 (t; xki , z ki ) are shown in Figure 1 under the Clayton copula. Insert Figure 1 about here We can show that when (xki , z ki ) = (xkj , z kj ), πi|j (t|xki , z ki , xkj , z kj ) = 2 for α = 1. The association modeled by the Clayton copula is always positive. A larger value of α indicates stronger positive association in terms of times to cause-specific failure times within clusters. Other copula functions may be used to model negative as well as positive associations. 10

More generally, it may also be of interest to consider the generalized cross-odds ratio πi|j (s, t) =

P (Tki ≤ t, ²ki = 1|Tkj ≤ s, ²kj = 1)/P ({Tki ≤ t, ²ki = 1}c |Tkj ≤ s, ²kj = 1) . P (Tki ≤ t, ²ki = 1)/P ({Tki ≤ t, ²ki = 1}c )

3.

Danish twin data

We identified 503 pairs of female twins born between 1931 and 1952 through the Danish Twin Registry. The twins are established as monozygotic or dizygotic from the registry data. There are 269 monozygotic twin pairs and 234 dizygotic twin pairs identified (Skytthe et al., 2003). Before they reach natural menopause, some women may experience surgical menopause as a result of removal of ovaries or uterus and others may be treated with hormones that also affect their timing of menopause. Surgical menopause and hormonal treatment are therefore competing risks for menopause. Of 1006 twins, 416 twins experienced natural menopause, 57 twins had surgical related menopause, 183 twins underwent hormonal treatments before natural menopause, 246 twins had censored event times and there were 104 twins with unknown failure types/censoring status which are ignored for this study. Researchers are interested in evaluating the timing of women experiencing their natural menopause. Defining the event time to be the age at menopause, we want to estimate the cumulative incidence function of the ages to natural menopause and check wether there is a genetic connection among female twins of same family in their ages to natural menopause. We first estimated the marginal cumulative incidence of natural menopause excluding the cohort structure. Let xki = {1, Iki (dizygotic twin)}T , where Iki (dizygotic twin) is 1 if the ith twin from the kth family is dizygotic for i = 1, 2 and k = 1, . . . , 503. Under model (4) without covariate z ki , the marginal cumulative incidence is − log{1 − P1 (t; xki )} = η(t)T xki . Under this model, we obtain the nonparametric estimates of the distribution of the ages to natural menopause separately for the monozygotic and dizygotic twins. The estimates and their 95% confidence intervals and 95% confidence bands based on Guassian multiplier resampling technique are shown in Figures 2(a) and 2(b). We note that the difference between monozygotic and dizygotic twins in their age distributions to natural menopause is minor. Insert Figure 2 around here To examine whether the cumulative incidence of the ages to natural menopause may change over the different time cohorts, we estimate the cumulative incidence function of natural menopause by adjusting for time cohort effects. This is done by partitioning the data into 4 cohorts of almost equal size according to birthdate: the early cohort [1931, 1936] 11

has 120 twin pairs, the [1937, 1942] cohort has 123 twin pairs, the [1943, 1947] cohort has 121 twin pairs, and last cohort [1948, 1952] has 139 twin pairs. Let Iki {cohort[1937,1942]} be the cohort indicator of whether the twins in the kth family are born between 1937 and 1942. In addition to xki defined above, we introduce covariates z ki to the random effects model (4), where z ki = {Iki {cohort[1937,1942]}, Iki {cohort[1943,1947]}, Iki {cohort[1948,1952]}}T . The marginal cumulative incidence functions then follow the semiparametric additive model: − log{1 − P1 (t; xki , z ki )} = η(t)T xki + (γ T z ki ) t. This model assumes separate nonparametric baseline functions for monozygotic and dizygotic twins and constant time cohort effects. The goodness-of-fit procedures of Scheike & Zhang (2008) indicated that the time cohorts can be well described as having constant effects. Figure 3 plots the estimated cumulative incidence functions of natural menopause for early time cohort [1931,1936] and late time cohort [1948,1952] by the monozygotic and dizygotic twins, respectively. There is a clear trend over time that the later cohort experiencing their b . It shows that the natural menopause later. Table 1 presents the estimated cohort effects γ late cohort is significantly different from early cohort. Insert Figure 3 and Table 1 around here We now examine wether there exist genetic associations in the ages to natural menopause between the twins of same family under the random effects model (4). To allow for different degrees of associations for different types of twins, we let νk = α0 + α1 xk under (5), where xk is 0 for monozygotic twins and 1 for dizygotic twins. Hence the dependence parameter is ρ1 = α0 for monozygotic twins and ρ2 = α0 + α1 dizygotic twins. Using the estimation procedure presented in Section 2.·3, we have ρˆ1 = 2.31, s.d.(ˆ ρ1 ) = 0.72, and ρˆ2 = 0.94, s.d.(ˆ ρ2 ) = 0.22. Note that ρ1 = 0 indicates that there is no genetic association in the ages to natural menopause between monozygotic twins and ρ2 = 0 shows no association between dizygotic twins. Hence, a genetic association exists in the ages to natural menopause with p−value = 0.0007 for monozygotic twins and p−value < 0.0001 for dizygotic twins. The estimates ρˆ1 and ρˆ2 also show that the association between the monozygotic twins is much stronger than the dizygotic twins. To formally validate this, we estimate the variance d ρ1 − ρˆ2 ) = 0.1764, which yields a p−value = 0.0006 for testing ρ1 = ρ2 . Var(ˆ Note that the joint probability of both twins experiencing natural menopause by age t depends on the degree of the association between the twins. This probability is ckij (t) given in (13) under the random effects model (4) where we assume the ages to natural menopause are dependent between the twins. For monozygotic twins and early time cohort, this joint probability is plotted in Figure 4(a) against age (dark dashed line). The random effects θk in model (4) represents family-to-family variability and induces dependence between twins. A large value of θk is associated with higher dependence between the twins in family k. Let θk,.95 and θk,.05 be the 95% percentile and the 5% percentile of the gamma distribution with mean 1 and variance νk , respectively. The joint probabilities of both twins experiencing natural menopause by age t for a strongly correlated twin pair at θk = θk,.95 and a weakly 12

correlated twin pair at θk = θk,.05 can be expressed as P1∗ (t; xki , z ki , θk,.95 )P1∗ (t; xkj , z kj , θk,.95 ) and P1∗ (t; xki , z ki , θk,.05 )P1∗ (t; xkj , z kj , θk,.05 ), respectively. For monozygotic twins and early time cohort, the plot of this joint probability against age is given in the dashed line on the top of Figure 4(a) for strongly correlated twins and at the bottom for weakly correlated twins. The plot of the joint probability of both twins experiencing natural menopause by age t, P1 (t; xki , z ki )P1 (t; xkj , z kj ), under the marginal model (6) and the presumed independence between twins is also given in Figure 4(a) (solid thin line). Similar plots for dizygotic twins are given in Figure 4(b). Figure 4 shows that the joint probabilities of both twins experiencing natural menopause under random effects model are larger than those under the independence assumption for both monozygotic and dizygotic twins. The differences are more apparent for the monozygotic twins than for the dizygotic twins. Other quantities such as the conditional probabilities of the second twin experiencing natural menopause given that the first twin has already experienced her natural menopause can also be calculated and plotted. Insert Figure 4 around here Finally, we calculate the cross-odds ratio to show how the onset of the natural menopause for one twin affects the probability of the onset of the natural menopause for the other twin. The cross-odds ratios for monozygotic twins are floating around 3.4 (the solid line in Figure 5), which is consistent with the plots in Figure 1 for the estimate ρˆ1 = 2.31. The plot shows that, given that one twin has experienced natural menopause, the odds of the second twin experiencing natural menopause by the same age more than triples the unconditional odds of natural menopause for the second twin. For the dizygotic twins, we find a cross-odds ratio that lies around 1.9. This suggests that, if one twin experiences natural menopause, then the odds of natural menopause for the second twin by the same age doubles the unconditional odds of natural menopause for the second twin. Insert Figure 5 around here

4.

Discussion

We proposed a semiparametric random effects model for multivariate competing risks data when the failures of a particular type are of interest. Under this model, the marginal cumulative incidence functions follow a generalized semiparametric additive model where the effects of some covariates are unspecified functions of time and the effects of other covariates are constant in time. The joint “survival function” of the cause-specific failure times has an Archimedean copula function representation. Hence, the associations between the causespecific failure times can be studied through the dependence parameters of copula functions which are allowed to depend on the cluster level covariates. In the Danish twin data example, the associations in the ages to natural menopause between the twins of same family are allowed to be different for different types of twins, monozygotic or dizygotic. We developed a simple two-stage strategy, the marginal models are estimated in the first stage and the 13

dependence parameters are estimated in the second stage. The large sample properties of the proposed estimators are derived. The simultaneous as well as pointwise confidence intervals of the regression functions can be constructed. The proposed model (4) extends the joint modelling of multivariate failure times to multivariate cause-specific failure times in competing risks settings. It inherits the features of multivariate failure times modelling through copula functions. However, the Kendall’s tau like interpretation of dependence parameters is no longer applicable since cumulative incidence functions are in general not distribution functions. Instead, we proposed a crossodds ratio type measure to describe the associations between cause-specific failure times. When the random effects follow a gamma distribution with mean 1 and variance of α, we explored the relationship between the cross-odds ratio and the dependence parameter in Figure 1. This relationship does not change much for different marginal cumulative incidence probabilities. For the Danish twin data example, we estimated the marginal cumulative incidence probabilities of the ages to natural menopause and found that there is little difference between monozygotic twins and dizygotic twins. However, the data suggests that women experience their natural menopause later in the later cohort than in the earlier cohort. There is a clear association in the ages to natural menopause between the twins of same family. This association is stronger for the monozygotic twins than for the dizygotic twins. The cross-odds ratio is around 2 for dizygotic twins and is 3.5 for monozygotic twins. The random effects model (4) are estimated through a two-stage approach where the marginal parameters are estimated separately from the dependence parameters. This procedure is simple to implement and the asymptotic properties of the estimators are easy to derive. However, additional efficiency can be gained by jointly estimating both the marginal parameters and the dependence parameters and by using all the joint observations of the inverse probability of censoring weighted responses for the dependence parameters. Alternatively, a starting point could also be to use the likelihood score of the dependence parameter in a two-stage procedure. This issue deserves further attention and investigation. Acknowledgments The research was partially supported by National Cancer Institute grant 2 R01 CA54706-10. Thomas Scheike was also supported by a grant from the Danish Research Council on ”Point process modelling and statistical inference”. The research of Yanqing Sun was partially supported by NSF grant DMS-0604576 and NIH grant 5 RO1 AI054165-06.

Appendix A b and η b (t). An iterative procedure for the estimators γ Scheike et al. (2008) proposed to simultaneously solve estimation equations (8) and (9). To solve these score equations simultaneously, Taylor expanding P 1 (t, η(t), γ) around the 14

b (m) ) gives current value of the estimates (b η (m) (t), γ b (m) (t), γ b (m) ) P 1 (t, η(t), γ) ≈P P 1 (t, η

n o b (m) (t), γ b (m) ) η(t) − η b (m) (t) D η (t, η n o (m) (m) (m) b (t), γ b ) γ−γ b D γ (t, η .

+ +

(m) b (m) (t), γ b (m) ), Replace it into the score equations (8) and (9), and denote D η (t) = D η (t, η (m) b (m) (t), γ b (m) ), P (m) b (m) (t), γ b (m) ), we have approximated score D γ (t) = D γ (t, η 1 (t) = P 1 (t, η equations h n o n oi (m) (m) (m) (m) b (m) = 0 (21) η (m) (t) −D γ (t) γ − γ D η (t)T W (t) R(t)−P 1 (t)−D η (t) η(t)−b Z τ h n o n oi (m) (m) (m) (m) (m) (m) T b D γ (t) W (t) R(t)−P 1 (t)−D η (t) η(t)−b η (t) −D γ (t) γ − γ dt = 0 (22) 0

Solving equations (21) and (22) for γ and η(t), the (m + 1)th iterative estimators for γ and η(t) are given by n o−1 (m) b (m+1) = γ b (m) + I (m) γ Bγ γ

(23)

b (m+1) (t) = η b (m) (t) η

n o (m) (m) (m) (m) (m) (m) +{I η (t)}−1 {D η (t)}T W (t) R(t)−P 1 (t)−D γ {I γ }−1 B γ

(24)

where Z (m)



(m)



τ

= Z0 τ = 0

(m)

(m)

D γ (t)T W (t)H (m) (t)D γ (t)dt (m)

T

D γ (t) W (t)H (m)

(m)

n (t) R(t) −

(m)

o

(m) P 1 (t)

dt

(m)

H (m) (t) = I − D η (t){I η (t)}−1 D η (t)T W (t) (m)

(m)

(m)

I η (t) = D η (t)T W (t)D η (t). On the asymptotic distributions of



K(ˆ γ − γ) and



K{b η (t) − η(t)}. √ √ Here we give brief derivation for the variance estimation of K(ˆ γ − γ) and K{b η (t) − η(t)}. First, by applying the Taylor approximation similar to (23), √ √ K(b γ − γ) = K I −1 (25) γ B γ + op (1)

15

where I γ , B γ , D η (t), D γ (t), H(t), R(t), P 1 (t, η(t), γ) are all evaluated at the true value of {η(t), γ}. Let ½ ¾ XZ τ © ª ∆ki Nki (t) e Bγ = D γ ,ki (t) − K(t)D η ,ki (t) wki (t) − P1 (t, xki , z ki ) dt G c (Tki ) 0 k,i XZ τ = ζ γ ,ki (t) dt k,i

∆γ =

0

XZ k,i

τ 0

bc (Tki ) © ª Gc (Tki ) − G D γ ,ki (t) − K(t)D η ,ki (t) wki (t)∆ki Nki (t) dt bc (Tki )Gc (Tki ) G

K(t) = D Tγ (t)W (t)D η (t){I η (t)}−1 . Then

e γ + ∆γ . Bγ = B

Since K nl Z τ bc (Tki ) − Gc (Tki ) dMljC (s) G −I(Tki ≤ t) X X I(s ≤ Tki ) ∆ki Nki (t) ≈p ∆ki Nki (t) , bc (Tki )Gc (Tki ) Gc (Tki ) Y• (s) G 0 l=1 j=1

Rs P P where Y• (s) = k,i Yki (s) = k,i I(Teki ≥ s) and MljC (s) = I(Telj ≤ s, ∆lj = 0) − 0 Ylj (u) d(− log Gc (u)) is the martingale associated with the censoring time. The ∆γ can be approximated by © ª XZ τZ τ X D γ ,ki (t) − K(t)D η ,ki (t) wki (t)∆ki Nki (t) dMljC (s) ∆γ ≈p I(s ≤ Tki ≤ t)dt Gc (Tki ) Y• (s) l,j 0 0 k,i ¾ X Z τ ½Z τ q γ (s, t) C = dMki (s) dt Y• (s) 0 0 k,i XZ τ ψ γ ,ki (t)dt, = k,i

0

where © ª X D γ ,lj (t) − K(t)D η ,lj wlj (t)∆lj Nlj (t) q γ (s, t) = I(s ≤ Tlj ≤ t). Gc (Tlj ) l,j Thus, √

K (b γ − γ) ≈p



−1 K Iγ

X k,i

16

W γ,ki (τ )

(26)

where W γ,ki (τ ) =

XZ k,i

τ

n

0

o ζ γ ,ki (t) + ψ γ ,ki (t) dt.

(27)

b γ and W c γ,ki (τ ) be estimators of I γ and W γ,ki (τ ) by plugging in the estimators η b (t), γ b Let I √ C c (t). Then the asymptotic variance of K(b γ − γ) can be consistently estimated by and M ki  (n )⊗2  K k X X b γ = KI b −1  c γ,ki (τ ) b −1 . I Σ W (28) γ γ k=1

i=1

From a Taylor expansion similar to (24), we have √ √ © ª−1 X K {b η (t) − η(t)} ≈p K I η (t) W η,ki (t),

(29)

k,i

where

n o W η,ki (t) = ζ η ,ki (t) + ψ η ,ki (t) − D Tη (t)W (t)D γ (t)I −1 W (τ ) γ,ki γ ½ ¾ ∆ki Nki (t) ζ η ,ki (t) = D η ,ki (t)wki (t) − P1 (t, xki , z ki ) Gc (Tki ) ) Z τ (X C D η ,lj (t)wlj (t)∆lj Nlj (t) dMki (s) ψ η ,i (t) = I(s ≤ Tlj ≤ t) . G Y c (Tlj ) • (s) 0 l,j

(30)

The I η (t) and W η,ki (t) can be consistently estimated by the plug in estimators, denoted as √ b η (t) and W c η,ki (t), and the variance of K{b I η (t) − η(t)} can be estimated by  (n )⊗2  K k n o−1 n o−1 X X c η,ki (t) b η (t) b η (t) = K I b η (t)  I  W . (31) Σ k=1

i=1

Appendix B On the asymptotic distribution of



K(b α − α).

√ To establish the asymptotic distribution for K(b α − α), we consider the decomposition Z τX K X bc ) = b, γ b, G b, γ b ) {Ckij (t) − ckij (t, α, η, γ)} dt Uα (α, η Dα,kij (t, α, η 0

Z

k=1 i,j∈Hk , i6=j τ

+ 0

Z 0

X

n o bkij (t) − Ckij (t) dt b, γ b) C Dα,kij (t, α, η

(32)

k=1 i,j∈Hk , i6=j τ



K X

K X

X

b, γ b ) {ckij (t, α, η b, γ b ) − ckij (t, α, η, γ)} dt. Dα,kij (t, νk , η

k=1 i,j∈Hk , i6=j

17

Note that K X

X

bkij (t) − Ckij (t)} b, γ b ){C Dα,kij (t, α, η

k=1 i,j∈Hk , i6=j

=

K X

X

bkij (t) − Ckij (t)} + op (K 1/2 ) Dα,kij (t, α, η, γ){C

k=1 i,j∈Hk , i6=j

=−

K X

X

Dα,kij (t, νk , η, γ)∆ki Nki (t)∆kj Nkj (t)

k=1 i,j∈Hk , i6=j

) b b Gc (Tkj ) − Gc (Tkj ) Gc (Tki ) − Gc (Tki ) + + op (K 1/2 ) b b b Gc (Tki )Gc (Tki )Gc (Tkj ) Gc (Tkj )Gc (Tkj )Gc (Tki ) ) ¾( b ½ K X X ∆ki Nki (t)∆kj Nkj (t) Gc (Tki ) − Gc (Tki ) = −2 Dα,kij (t, α, η, γ) + op (K 1/2 ). bc (Tki ) Gc (Tki )Gc (Tkj ) G k=1 i,j∈Hk , i6=j (

Since nl Z τ K X C X bc (Tki ) − Gc (Tki ) G dMlm (s) I(s ≤ Tki ) + op (K −1 ), = −1 bc (Tki ) Y (s) • G 0 l=1 m=1

bc ) in (32) equals b, γ b, G the second term of Uα (α, η Z

τ

K X

X

nl Z K X X

τ

C dMlm (s) dt + op (K 1/2 ) Y (s) • 0 k=1 i,j∈H , i6=j l=1 m=1 0 k ) ( nl Z τ Z τ K K X C X X X dMlm (s) Dα,kij (t, α, η, γ)Ckij (t)I(s ≤ Tki ) =2 dt + op (K 1/2 ) Y (s) • 0 k=1 i,j∈H , i6=j l=1 m=1 0

2

=

nl K X X l=1 m=1

Z

τ 0

where qα (s, t) = 2

Dα,kij (t, α, η, γ)Ckij (t)

I(s ≤ Tki )

k

Z

τ 0

C qα (s, t)dMlm (s) dt + op (K 1/2 ),

nP

K k=1

P

o D (t, α, η, γ)C (t)I(s ≤ T ) /Y• (s). α,kij kij ki i,j∈Hk , i6=j

18

(33)

By the first order Taylor expansion at the true values (η(t), γ), K X

X

b, γ b ) {ckij (t, α, η b, γ b ) − ckij (t, α, η, γ)} Dα,kij (t, α, η

k=1 i,j∈Hk , i6=j

¶ ∂ckij T ∂ckij T = Dα,kij (t, α, η, γ) xki + xkj (b η (t) − η(t)) ∂Λ ∂Λ ki (t) kj (t) k=1 i,j∈Hk , i6=j µ ¶ K X X ∂ckij T ∂ckij T + Dα,kij (t, α, η, γ) z ki + z kj (b γ − γ)t + op (K 1/2 ) ∂Λki (t) ∂Λkj (t) k=1 i,j∈H , i6=j K X

µ

X

k

= Kq x (t, α, η, γ)(b η (t) − η(t)) + Kq z (t, α, η, γ)(b γ − γ) + op (K 1/2 ),

(34)

where ¶ ∂ckij T ∂ckij T q x (t, α, η, γ) = K Dα,kij (t, α, η, γ) xki + xkj ∂Λ ∂Λ ki (t) kj (t) k=1 i,j∈Hk , i6=j µ ¶ K X X ∂ckij T ∂ckij T −1 Dα,kij (t, α, η, γ) q z (t, α, η, γ) = K z + z t. ∂Λki (t) ki ∂Λkj (t) kj k=1 i,j∈H , i6=j −1

K X

µ

X

k

By the equations (26), (29), (32), (33) and (34), K X X Z τ −1/2 −1/2 bc ) = K b, γ b, G Dα,kij (t, α, η, γ) {Ckij (t) − ckij (t, νk , η, γ)} dt K Uα (α, η 0

k=1 i,j∈Hk , i6=j

+ K

−1/2

− K

nk Z τZ K X X

τ

k=1 i=1 0 0 nk Z τ K X X −1/2

− K −1/2

k=1 i=1 0 nk Z τ K X X k=1 i=1

0

C qα (s, t)dMik (s) dt

(35)

q x (t, α, η, γ)(I η (t)/n)−1 W η ,ki (t) dt

q z (t, α, η, γ)(I γ /n)−1 W γ ,ki (t) dt + op (1),

where I η (t), W η ,ki (t), I γ and W γ ,ki (t) are given in Appendix A. By the Taylor approximation, µ ¶−1 √ −1 ∂Uα (α, η, γ, Gc ) bc ) + op (1). b, γ b, G K(b α − α) = − K K −1/2 Uα (α, η ∂α

(36)

From the estimation function for α given in (14), −K

−1 ∂Uα (α, η, γ, Gc )

∂α

=K

−1

K X

X

k=1 i,j∈Hk , i6=j

≡ K −1 Iα (α, η, γ) + op (1).

Z

τ 0

µ

∂ckij (t, α, η, γ) ∂α

¶⊗2 dt + op (1) (37)

19

Hence √

K(b α − α) = K

−1/2

{Iα (α, η, γ)/n}

−1

K X

Wα,k (α, η, γ, Gc ) + op (1),

(38)

k=1

where

Z

X

Wα,k (α, η, γ, Gc ) =

τ

Dα,kij (t, νk , η, γ) {Ckij (t) − ckij (t, α, η, γ)} dt

i,j∈Hk , i6=j nk Z τ X

0

Z

τ

+

i=1

− −

0

nk Z X

0 τ

i=1 0 nk Z τ X i=1

0

C (s) dt qα (s, t)dMik

(39)

q x (t, α, η, γ)(I η (t)/n)−1 W η ,ki (t)dt q z (t, α, η, γ)(I γ /n)−1 W γ ,ki (t)dt.

√ It follows from the central limit theorem that K(b α − α) is asymptotically normal with mean 0 and a variance that can be estimated by K ³ ´2 ³ ´−1 ³ ´−1 X c b b Wα,k Ibα , Σα = K Iα k=1

cα,k are the empirical counterparts of Iα (α, η, γ) and Wα,k (α, η, γ, Gc ), rewhere Ibα and W cC (t) for M C (t). spectively, by plugging in the estimates of α, η(t), γ and by replacing M ki ki Explicit expressions for Iα and Wα,k under gamma random effects. Under the gamma random effects distribution, the explicit formula for ckij (t, α, η, γ) and ∂ckij (t, α, η, γ)/∂α can be worked out. We first look at some properties of the Laplace transformation Ψνk (x) of a Gamma distribution with mean 1 and variance νk with νk = αT Qk . For convenience, we also denoted Ψ(νk , x) for Ψνk (x) as the function of both νk and x. It is easy to see that Ψνk (x) = (1 + νk x)−1/νk ,

for x > −1/νk .

−νk Its inverse function is Ψ−1 − 1)/νk , for y > 0. We have the following νk (y) = (y

∂Ψνk (x) ∂νk ∂Ψνk (x) ∂x ∂Ψ−1 νk (y) ∂νk ∂Ψ−1 νk (y) ∂y

£ ¤ = νk −2 (1 + νk x)−1/νk log(1 + νk x) + (1 + νk x)−1 − 1 = −(1 + νk x)−1/νk −1 = −νk −2 y −νk [log(y νk ) − y νk + 1] = −y −νk −1 . 20

(40)

−1 Let y1 = exp(−Λki (t)), y2 = exp(−Λkj (t)), and θνk = Ψ−1 νk (y1 ) + Ψνk (y2 ). Then by (13),

ckij (t, α, η, γ) = 1 − y1 − y2 + Ψ(νk , θνk ) and

dckij (t, α, η, γ) ∂Ψ(νk , θνk ) ∂Ψ(νk , θνk ) dθνk = + . dνk ∂νk ∂θνk dνk

(41)

Since 1 + νk θνk = y1−νk + y2−νk − 1, we have ckij (t, α, η, γ) = 1 − y1 − y2 + (y1−νk + y2−νk − 1)−1/νk = 1 − exp(−Λki (t)) − exp(−Λkj (t)) + (exp(νk Λki (t)) + exp(νk Λkj (t)) − 1)−1/νk . (42) By the expressions in (40) and (41), it follows that ∂Ψ(νk , θνk ) = νk −2 (y1−νk + y2−νk − 1)−1/νk {log(y1−νk + y2−νk − 1) + (y1−νk + y2−νk − 1)−1 − 1} ∂νk ∂Ψ(νk , θνk ) = −(y1−νk + y2−νk − 1)−1/νk −1 ∂θνk dθνk = −νk −2 {y1−νk log(y1νk ) + y1−νk − 1 + y2−νk log(y2νk ) + y2−νk − 1}, dνk and dckij (t, α, η, γ) = νk −2 (y1−νk + y2−νk − 1)−1/νk −1 (43) dνk {(y1−νk + y2−νk − 1) log(y1−νk + y2−νk − 1) + y1−νk log(y1νk ) + y2−νk log(y2νk )}. By (37) and (43), we have −K −1

∂Uα (α, η, γ, Gc ) ∂α K X X Z τ

= K −1

k=1 i,j∈H , i6=j

0

νk−4 (exp(νk Λki (t)) + exp(νk Λkj (t)) − 1)−2/νk −2

k ½ [exp(νk Λki (t)) + exp(νk Λkj (t)) − 1] log[exp(νk Λki (t)) + exp(νk Λkj (t)) − 1] ¾2 (44) −νk Λki (t) exp(νk Λki (t)) − νk Λkj (t) exp(νk Λkj (t)) Q⊗2 k dt + op (1).

The explicit expressions of Iα (α, η, γ) and Wα,k (α, η, γ, Gc ) under gamma random effects model can then obtained using (39), (42) and (44).

21

References Bandeen-Roche, K. & Liang, K.-Y. (2002). Modelling multivariate failure time associations in the presence of a competing risk. Biometrika 89, 299–314. Bandeen-Roche, K. & Ning, J. (2008). Nonparametric estimation of bivaiate failure time associations in the presence of a competing risk. Biometrika 95, 221–232. Cheng, Y. & Fine, J. (2008). Nonparametric estimation of cause-specific cross hazard ratio with bivariate competing risks data. Biometrika 95, 233–240. Cheng, Y., Fine, J. P. & Kosorok, M. R. (2007). Nonparametric association analysis of bivariate competing-risks data. J. Amer. Statist. Assoc. 102, 1407–1415. Clayton, D. G. (1978). A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence. Biometrika 65, 141–151. Fine, J. P. & Gray, R. J. (1999). A proportional hazards model for the subdistribution of a competing risk. J. Amer. Statist. Assoc. 94, 496–509. Frank, M. J. (1979). On the simultaneous associativity of f(x, y) and x + y - f(x, y). Aequationes Mathematicae , 194–226. Genest, C. & MacKay, R. J. (1995). A semiparametric estimation procedure of dependence parameters in multivariate families of distributions. Biometrika 82, 543–552. Glidden, D. (2000). A two-stage estimator of the dependence parameter for the ClaytonOakes model. Lifetime Data Anal. 6, 141–156. Gumbel, E. J. (1960). Bivariate exponential distributions. J. Amer. Statist. Assoc. , 698–707. Kalbfleisch, J. D. & Prentice, R. L. (1980). The statistical analysis of failure time data. Wiley, New York . Katsahian, S., Resche-Rigon, M., Chevret, S. & Porcher, R. (2006). Analysing multicentre competing risks data with a mxed proportinal hazards model for the subdistribution. Statist. Med. 25, 4267–4278. Oakes, D. (1989). Bivariate survival models induced by frailties. J. Amer. Statist. Assoc. 84, 487–493. Scheike, T. H. & Zhang, M.-J. (2008). Flexible competing risks regression modelling and goodness-of-fit. Lifetime Data Anal. early online, DOI:10.1007/s10985–008–9094–0.

22

Scheike, T. H., Zhang, M.-J. & Gerds, T. (2008). Predicting cumulative incidence probability by direct binomial regression. Biometrika 95, 205–20. Shih, J. H. & Louis, T. A. (1995). Inference on association parameter in copula models for bivariate survival data. Biometrics 51, 1384–1399. Skytthe, A., Pedersen, N., Kaprio, J., Stazi, M., Hjelmborg, J., Iachine, I., Vaupel, J. & Christensen, K. (2003). Longevity studies in genomeutwin. Twin. Res. 6, 448–454. Sun, Y. & Wu, H. (2005). Semiparametric time-varying coefficients regression model for longitudinal data. Scandinavian Journal of Statistics 32, 21 – 47.

23

Table 1: The estimated marginal time cohort effects γ b for the cumulative incidence of natural menopause. Variable γˆ × 1000 I{cohort[1937,1942]} −1.88 I{cohort[1943,1947]} −5.12 I{cohort[1948,1952]} −8.76

24

σ ˆγ × 1000 1.02 0.96 0.85

3.0

π(α, p1)

2.5

2.0

p1=.1 p1=.2 p1=.4 p1=.6 p1=.8 p1=.9

1.5

1.0 0.0

0.5

1.0

1.5

2.0

α

Figure 1: Cross-odds ratio as a function of dependence parameter for different values of cumulative incidence.

25

40

45

50 Age (years) (a)

55

Monozygotic twins

60

Probability of natural menopause

40

45

50 Age (years) (b)

55

Dyzygotic twins

60

Figure 2: Estimated marginal cumulative incidence functions of natural menopause for monozygotic twins (a) and dizygotic twins (b) with 95% pointwise confidence intervals (dashed lines) and 95% confidence bands (dotted lines).

Probability of natural menopause

1.0

0.8

0.6

0.2

0.0

0.4

1.0 0.8 0.6 0.4 0.2 0.0

26

40

45

50 Age (years) (a)

55

Monozygotic twins

60

Probability of natural menopause

40

45

50 Age (years) (b)

55

Dyzygotic twins

60

Figure 3: Estimated marginal cumulative incidence function of natural menopause for monozygotic twins (a) and dizygotic twins (b) for early and late cohorts with 95% pointwise confidence intervals (dashed lines) and 95% confidence bands (dotted lines). The confidence intervals/bands on the top are for early cohort and at the bottom for late cohort.

Probability of natural menopause

1.0

0.8

0.6

0.2

0.0

0.4

1.0 0.8 0.6 0.4 0.2 0.0

27

40

45

50 Age (years) (a)

Monozygotic

55

60

Probability of double menopause

40

45

50 Age (years) (b)

Dizygotic

55

60

Figure 4: Figure (a) shows the estimated probability of both twins experienced natural menopause by a given age for monozygotic twins and early time cohort (dark dashed line), the estimated probabilities under independence (solid thin line), for strongly correlated (top dashed line) and weakly correlated twins (bottom dashed line). Figures (b) shows similar plots for dizygotic twins.

Probability of double menopause

1.0

0.8

0.6

0.2

0.0

0.4

1.0 0.8 0.6 0.4 0.2 0.0

28

2.5 1.0

1.5

2.0

cross−odds ratio

3.0

3.5

4.0

Cross−odds ratio

40

45

50

55

60

Age (years)

Figure 5: Estimated cross-odds ratio for incidence of natural menopause for monozygotic twins (solid line), and dizygotic twins (dashed line).

29

Research Reports available from Department of Biostatistics http://www.pubhealth.ku.dk/bs/publikationer ________________________________________________________________________________

Department of Biostatistics University of Copenhagen Øster Farimagsgade 5 P.O. Box 2099 1014 Copenhagen K Denmark

08/1

Siersma, V. & Kreiner, S. Monte Carlo Evaluation of Model Search in Graphical Models for Ordinal Data.

08/2

Andersen, P.K. & Pohar Perme, M. Inference for outcome probabilities in multi-state models.

08/03 Scheike, T.H. & Zhang, M. Flexible competing risks regression modelling and goodness-offit. 08/04 Graw, F., Gerds. T.A. & Schumacher, M. On pseudo-values for regression analysis in competing risks models. 08/05 Gerds, T.A., Qvist, V., Strub, J.R. & Keiding, N. Survival analysis in dental research: The typical observational patterns. 08/06 Carstensen. B. Limits of agreement: How to use the regression of differences on averages. 08/07 Holst, K.K., Larsen, P.K., Alkjær, T., Nielsen, S.F., Petersen, J.H., Lynnerup, N., Simonsen, E.B. & Scheike, T.H. Modelling gait data using a functional data analysis approach. 08/08 Martinussen, T. & Scheike, T.H.. Covariate selection for the semiparametric additive risk model. 08/09 Aalen, O.O., Andersen, P.K., Borgan, Ø., Gill, R.D., Keiding, N. Application of martingales in survival analysis. 08/10 Martinussen, T. & Scheike, T.H. The Aalen additive hazards model with high-dimensional regressors. 08/11 Cortese, G., Scheike, T.H., Martinussen, T. Flexible survival regression modelling. 08/12 Andersen, P.K., Pohar Perme, M. Pseudo-observations in survival analysis. 08/13 Frydman, H., Gerds, T., Groen, R., Keiding, N. Non-parametric estimation in an “illnessdeath” model when the transition times are interval-censored and one transition is not observed.

08/14 Didelez, V., Kreiner, S., Keiding, N. On the use of graphical models for inference under outcome dependent sampling. 09/1

Scheike, T.H., Martinussen, T., Silver, J. Estimating haplotype effects for survival data.

09/2

Sheike, T.H., Sun, Y., Zhang, M., Jensen, T.K. A semiparametric random effects model for multivariate competing risks data.

Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.