Semiparametric spatio-temporal frailty modeling

Share Embed


Descrição do Produto

ENVIRONMETRICS Environmetrics 2003; 14: 523–535 (DOI: 10.1002/env.613)

Semiparametric spatio-temporal frailty modeling Sudipto Banerjee and Bradley P. Carlin* Division of Biostatistics, School of Public Health, University of Minnesota, Mayo Mail Code 303, Minneapolis, Minnesota 55455, U.S.A.

SUMMARY Recent developments in GIS have encouraged health science databases to incorporate geographical information about the subjects under study. Such databases have in turn generated interest among statisticians to develop and analyze models that account for spatial clustering and variation. In this article we develop a semiparametric (Cox) hierarchical Bayesian frailty model for capturing spatio-temporal heterogeneity in the survival patterns of women diagnosed with breast cancer in Iowa. In the absence of appropriate surrogates for standards of treatments and health care for cancer patients in the different counties, epidemiologists and health-care professionals are interested in discerning spatial patterns in survival from breast cancer that might be present among the counties. In addition, it is naturally of interest to see if the counties show discernible temporal trends over the years. The SEER (Surveillance Epidemiology and End Results) database from the National Cancer Institute (NCI) provides data on a cohort of breast cancer patients observed progressively through time, spanning 26 years. We implement our hierarchical spatio-temporal models on data extracted from the database for the 99 counties of Iowa. Our results suggest log-relative hazards that are generally flat initially, but steadily decrease for more recent years. Copyright # 2003 John Wiley & Sons, Ltd. key words:

proportional hazards; Cox model; Markov chain Monte Carlo; random effects

1. INTRODUCTION Probabilistic models and statistical analysis for technological and medical survival data have garnered enormous attention among biostatisticians (see, for example, Cox and Oakes, 1984). In the medical setting, data are usually obtained for a sample of individuals, and consist of either an observed survival time or a censoring time, and a set of explanatory variables or risk factors. In addition, such data are often grouped into strata (clusters), such as clinical sites, geographic regions, and so on. In this setting, a hierarchical modeling approach using stratum-specific frailties is often appropriate (see, for example, Carlin and Hodges, 1999). In this article we develop a Bayesian hierarchical model to analyze time series of such spatially clustered survival data. Together with the survival times of a large cohort of women diagnosed with breast cancer in the 99 counties of the state of Iowa, the dataset also includes individual-specific

*Correspondence to: Bradley P. Carlin, Division of Biostatistics, School of Public Health, University of Minnesota, Mayo Mail Code 303, Minneapolis, Minnesota 55455, U.S.A. Contract/grant sponsor: NSF/EPA; contract/grant number: SES 99-78238. Contract/grant sponsor: NiEHS; contract/grant number: R01-ES07750.

Copyright # 2003 John Wiley & Sons, Ltd.

Received 15 December 2001 Accepted 14 September 2002

524

S. BANERJEE AND B. P. CARLIN

covariates, such as age at diagnosis, information on multiple cancers, race, extent of the disease when diagnosed, and so on. Our main analytic objective is the development and implementation of hierarchical spatio-temporal survival models to explain the pattern of breast cancer using important covariates, while accounting for spatially correlated differences in the hazards among the counties and for possible space–time interactions. Survival time is treated as our response variable, where only patients whose cause of death can be attributed to metastasis from the breast cancer are considered to have failed. All other patients, including both those who survived the study period and those who dropped out or died of other causes, are considered censored. Our data was extracted from the SEER (Surveillance, Epidemiology and End Results) database maintained by the National Cancer Institute. Our data comprises information on 15 375 women diagnosed with breast cancer in Iowa beginning in 1973, with enrollment and follow-up continued through the end of 1998. Of these women, 11 912 were observed to die of breast cancer. Together with the survival time and the individual-specific covariates, each woman’s county of residence is also provided. This offers the possibility of using spatio-temporal models to discern spatial patterns in breast cancer survival, and how these patterns evolve over the study period. Clustered survival data are often handled using frailty models. The frailties are random effects (one for each cluster) that are traditionally assumed to be independent. To illustrate in the case of a single diagnosis year, suppose tij is the time to death for subject j in region i, where i ¼ 1; 2; . . . ; I and j ¼ 1; 2; . . . ; ni , and let xij denote the vector of individual-specific covariates. Then the usual assumption of proportional hazards hðtij ; xij Þ enables the models of the form hðtij ; xij Þ ¼ h0i ðtij Þwi expðT xij Þ ¼ h0i ðtij Þexpð T xij þ Wi Þ

ð1Þ

where h0i denotes the county specific baseline hazard and Wi ¼ log wi is the region-specific frailty term. For ease of notation, let t ¼ ftij g; x ¼ fxij g; W ¼ fWi g, and c ¼ fij g, where ij is a death indicator (0 if alive, 1 if dead). The likelihood for our model is then given by Lð; W; t; x; Þ /

ni I Y Y    fh0i ðtij ; xij Þgij exp H0i ðtij Þexp bT xij þ Wi i¼1 j¼1

where H0i ðtÞ ¼

ðt

h0i ðuÞdu

0

the integrated baseline hazard. Prior distributions pðWjÞ, pðÞ, and pðbÞ complete the hierarchical Bayesian model specification. In this article we build upon the above framework to incorporate possibly spatially correlated frailties. In addition, we account for the fact that the patients are followed through time and we would like to discern spatio-temporal effects as well. The remainder of our article is thus organized as follows. In Section 2, we lay out our proposed spatio-temporal hierarchical survival model, including the distribution of the spatio-temporal frailty parameters. Section 3 then describes methods for implementing a Bayesian version of the Cox semiparametric baseline hazard h0 ðtij Þ. Section 4 provides the description and associated analysis of our Iowa breast cancer dataset. Finally, in Section 5 we summarize and offer suggestions for further research in this area.

Copyright # 2003 John Wiley & Sons, Ltd.

Environmetrics 2003; 14: 523–535

SEMIPARAMETRIC SPATIO-TEMPORAL FRAILTY MODELING

525

2. SPATIO-TEMPORAL HIERARCHICAL SURVIVAL MODELS In the standard parametric clustered survival setting, the frailty parameters Wi are assumed to be exchangable Gaussian or log-gamma variables. While modeling frailties has become quite popular, incorporation of spatial effects into survival models have only just begun to attract attention. In a recent paper, Banerjee et al. (2003) have investigated parametric frailty modeling for spatially correlated survival data. In the present context, we begin by extending the framework of the preceding section to incorporate temporal dependence. Here we have tijk as the response (time to death) for the jth subject residing in the ith county who was diagnosed in the kth year, while the individual-specific vector of covariates in now denoted by xijk, for i ¼ 1; 2; . . . ; I; k ¼ 1; . . . ; K and j ¼ 1; 2; . . . ; nik . Furthermore, the spatial random effects Wi in the preceding section are now modified to Wik , to represent spatio-temporal frailties corresponding to the ith county for the kth diagnosis year. Our spatial frailty specification in Equation (1) now becomes hðtijk ; xijk Þ ¼ h0i ðtijk ÞexpðbT xijk þ Wik Þ

ð2Þ

In situations where we think there is spatial correlation among regions, our random effects distribution for the kth year, pðWk jk Þ may be chosen to reflect this spatial structure. A popular prior for such purposes is the conditionally autoregressive model (Besag et al., 1991), which we abbreviate CAR(k ). Models of this type usually incorporate information about the adjacency of regions rather than any type of continuous distance metric. The most common form of this prior (Bernardinelli and Montomoli, 1992) has a joint distribution proportional to " # " # I k X k X I=2 I=2 2 k exp  ðWik  Wi0 k Þ mi Wik ðWik  W ik Þ ð3Þ / k exp  2 i adj i0 2 i¼1 where i adj i0 denotes that regions i and i0 are adjacent, W ik is the average for the kth year of the set of Wi0 k , the number of these adjacencies. Recent work (Knorr-Held, 2002; Hodges et al., 2003) suggests that ðI  1Þ=2 may be more appropriate than I=2 as the exponent for k in (3); fortunately this turns out to make little difference in our case (as we have I ¼ 99), so we proceed with the latter, more traditional choice. Several authors (e.g. Sun et al., 1999) show how to ensure propriety for the CAR prior through incorporation of an extra parameter. However, the common CAR prior shown in (3) is a member of the class of pairwise difference priors (Besag et al., 1995), which are identified only up to an additive constant. This typically makes any intercept term in  unidentifiable. Since one of our analytic objectives is to discern the spatio-temporal pattern in the frailties we avoid including an intercept term in the covariates. A consequence of our prior specification is that Wik jWði0 6¼iÞk  NðW ik ; 1=ðk mi ÞÞ Note that we can account for temporal correlation in the frailties by assuming that the k are themselves identically distributed from a common hyperprior (Waller et al., 1997). A gamma prior (usually vague but proper) is often selected here, since this is particularly convenient for Markov chain Monte Carlo (MCMC) implementation using the Gibbs sampler (Gelfand and Smith, 1990). A flat prior for  is typically chosen, since this still admits a proper posterior distribution. Adaptive rejection

Copyright # 2003 John Wiley & Sons, Ltd.

Environmetrics 2003; 14: 523–535

526

S. BANERJEE AND B. P. CARLIN

(Gilks and Wild, 1992) or Metropolis–Hastings (Metropolis et al., 1953) sampling are usually required to update the Wk and b parameters in a hybrid Gibbs sampler (Carlin and Louis, 2000, Sections 5.4.4 and 7.8.2). An alternative is to employ a geostatistical model for pðWk jk Þ. Geostatistical approaches use geographical location of a strata (e.g. in terms of latitude and longitude). The frailties are now seen as partial realizations of a Gaussian process whose dispersion structure is specified through a valid correlation function (e.g. exponential, Gaussian, Matern). These models have been discussed and implemented by Banerjee et al. (2003) for spatial frailty models. With our dataset there were no significant differences in the analysis, yet the geostatistical frailty models take much longer to produce because of matrix inversions and determinant computations required in the Gibbs updates. As a result, in this article we restrict our attention to the CAR model. We remark that it would certainly be possible to include both spatial and nonspatial frailties, as is now common practice in spatial lattice modeling (see, for example, Waller et al., 1997). In our case, this would mean supplementing our spatial frailties Wik with a collection of nonspatial frailties, say iid Vik  Nð0; 1=k Þ. One problem with this approach is that the frailties now become identified only by the prior, and so the proper choice of priors for k s and k s (or h) becomes problematic (see, for example, Eberly and Carlin, 2000). Another problem is the resultant decrease in algorithm performance wrought by the addition of so many additional, weakly identified parameters. However, we pursued the notion and managed to obtain appropriate MCMC convergence in Section 4. We therefore may summarize our full hierarchical model as follows: Lð; W; t; x; Þ /

nik K Y I Y Y    fh0i ðtijk ; xijk Þgijk exp H0i ðtijk Þexp bT xijk þ Wik þ Vik k¼1 i¼1 j¼1

where pðWk jk Þ  CARðk Þ and pðVk jk Þ  NI ð0; k IÞ for k ¼ 1; 2; . . . ; K; and k  Gða; bÞ; k  Gðc; dÞ:

3. SEMIPARAMETRIC SPATIAL PROPORTIONAL HAZARDS MODELING Banerjee et al. (2003) assumed a parametric formulation for the baseline hazard using the Weibull distribution. In this framework, they incorporated spatial structure in the frailties. In particular, they imposed CAR and geostatistical (i.e. kriging-style, using the lattitude and longitude of the county centroids) priors on the Wi . While parametric random effects models are already quite flexible, many practitioners prefer the additional richness of the nonparametric baseline hazard offered by the celebrated Cox model (Cox and Oakes, 1984). In a recent paper, Banerjee and Carlin (2002) have demonstrated that this additional richness can easily be implemented with spatial frailties, and can produce better fit than its parametric counterparts. Since we continue to assume a proportional hazard of the form Equation (1), in which the covariate effects are still modeled parametrically, such models are often referred to as semiparametric. Within the Bayesian framework, several authors (e.g. Carlin et al., 1993) have proposed using the partial likelihood to obtain a posterior distribution for the treatment effect. However, the Cox model does not allow fully hierarchical modeling of stratumspecific baseline hazards (with stratum-specific frailties) because the baseline hazard is implicit in the partial likelihood computation. In this article, we therefore consider a nonparametric approach to modeling the baseline hazard in Cox regression.

Copyright # 2003 John Wiley & Sons, Ltd.

Environmetrics 2003; 14: 523–535

SEMIPARAMETRIC SPATIO-TEMPORAL FRAILTY MODELING

527

Banerjee and Carlin (2002) considered an implementation of the Cox model using the counting process formulation of Clayton (1991, 1994). However, this approach is somewhat opaque methodologically, yet produces results very similar to those of a simpler semiparametric approach due to Gelfand and Mallick (1995). This latter approach flexibly models the integrated baseline hazard as a mixture of monotone functions. In particular, these authors use a simple transformation to map the integrated baseline hazard onto the interval [0, 1], and subsequently use a result of Diaconis and Ylvisaker (1985) to approximate this function by a weighted mixture of incomplete beta functions. Implementation issues are discussed in detail by Gelfand and Mallick (1995) and also by Carlin and Hodges (1999) for stratum-specific baseline hazards. The likelihood and Bayesian hierarchical setup remain exactly as above. Thus, letting h0i ðtÞ be the baseline hazard in the ith region and H0i ðtÞ be the corresponding integrated baseline hazard, we define J0i ðtÞ ¼ a0 H0i ðtÞ=fa0 H0i ðtÞ þ b0 g which conveniently takes values in [0, 1]. We discuss below the choice of a0 and b0 but note that this is not as much a modeling issue as a computational one; we only require appropriate coverage of the interval [0, 1]. We model J0i ðtÞ as a mixture of Betaðrl ; sl Þ cdfs, for l ¼ 1; . . . ; m. The rl and sl are chosen so that the beta cdfs have evenly spaced means (see below for our choices of m and the ðrl ; sl Þ), centered around ~ J0 ðtÞ, a suitable function transforming the timescale to [0, 1]. We thus have J0i ðtÞ ¼

m X

  vil IB ~J 0 ðtÞ; rl ; sl

l¼1

P where m l¼1 vil ¼ 1 for all i, and IBð; a; bÞ denotes the incomplete beta function (i.e. the cdf of a Betaða; bÞ distribution). Since any distribution function on [0, 1] can be approximated arbitrarily well by a finite mixture of beta cdfs, the same is true for J0i, an increasing function that maps [0, 1] onto itself. Thus, working backwards, we find the following expression for the cumulative hazard in terms of the above parameters:   P ~ b0 m l¼1 vil IB J 0 ðtÞ; rl ; sl   H0i ðtÞ ¼  P ~ a0 1  m l¼1 vil IB J 0 ðtÞ; rl ; sl Taking derivatives, we have for the hazard function   P ~ b0 @t@ ~J 0 ðtÞ m l¼1 vil Beta J 0 ðtÞ; rl ; sl h0i ðtÞ ¼   2 P ~ a0 1  m l¼1 vil IB J 0 ðtÞ; rl ; sl Typically m, the number of mixands of the Beta cdfs, is fixed, as are the fðrl ; sl Þgm l¼1 , so chosen that the resulting beta densities cover the interval [0, 1]. In fact, we fixed m ¼ 5; frl g ¼ ð1; 2; 3; 4; 5Þ and fsl g ¼ ð5; 4; 3; 2; 1Þ to produce five evenly spaced Beta cdfs. Turning to the choice of a0 and b0 , we note that it is intuitive to specify ~ J 0 ðtÞ to represent a plausible central function around which the J0i are ~ 0 ðtÞ, to specify ~J 0 ðtÞ, then we get distributed. Thus, if we consider a cumulative hazard function, say H ~ ~ ~ J 0 ðtÞ ¼ a0 H 0 ðtÞ=ða0 H 0 ðtÞ þ b0 Þ. An exponential distribution specification with mean 1= sets ~ 0 ðtÞ ¼ t. In cases where the regression includes an intercept,  may be taken as equal to one. H

Copyright # 2003 John Wiley & Sons, Ltd.

Environmetrics 2003; 14: 523–535

528

S. BANERJEE AND B. P. CARLIN

However, as we have already mentioned in Section 2, in typical spatio-temporal applications the CAR prior on the frailties precludes inclusion of an intercept. Here  is treated as a specified hyperparameter. A small value (say, 0.1 or 0.01) may be assigned to , corresponding to a large variance (100 or 10 000) for the exponential distribution, thereby keeping our prior confidence about ~J 0 ðtÞ rather vague. Hyperpriors may be assigned to  but in cases like ours the data does not inform much about variability in , resulting in poor MCMC convergence for this parameter. Therefore we fixed  at 0.1. In our scenario, since the survival times ranged between 1 and 312 (survival time in months), we found that choosing a0 ¼ 50 (so that a0  ¼ 5) and b0 ¼ 100 then led to values for ~J 0 ðtÞ that largely cover the interval [0, 1]. The likelihood is thus a function of the regression coefficients b, the stratum-specific weight vectors vi ¼ ðvi1 ; . . . ; vim ÞT , and the random frailties fWik g and fVik g. It is natural to model the vi as draws from a Dirichletð1 ; . . . ; m Þ distribution, where for simplicity we often take 1 ¼    ¼ m ¼ . A hyperprior may be assigned for  but again the data typically do not contain enough information for  and so we treat  as a specified hyperparameter equal to 0.50. The precise choice hyperparameters like  and  are typically determined by preliminary analyses and MCMC convergence assessments under different values.

4. ANALYSIS OF IOWA BREAST CANCER DATA We work with a cohort of 15 375 women in Iowa, who were diagnosed with breast cancer starting in 1973 and have been undergoing treatment and progressively monitored since. Only those who have been identified as having died from metastasis of cancerous nodes in the breast are considered to have failed, while the rest (including those who might have died from metastasis of other types of cancer, or from other causes of death) are considered censored. By the end of 1998, 11 912 of the patients had died of breast cancer while the remaining were censored, either because they survived until the end of the study period, dropped out of the study, or died of causes other than breast cancer. For each individual, the dataset records the time in months (1 to 312) that the patient survived, and her county of residence at diagnosis. Several individual-level covariates are also available, including race (black, white, or other), age at diagnosis, number of primaries (i.e. the number of other types of cancer diagnosed for this patient), and the stage of the disease (local, regional, or distant). 4.1. Results for the full model We begin by summarizing our results for the spatio-temporal frailty model described near the end of Section 2 above, i.e. the full model having both spatial frailties Wik and nonspatial frailties Vik . We chose vague G(0.01, 0.01) hyperpriors for the k and k (having mean 1 but variance 100) in order to allow maximum flexibility in the partitioning of the frailties into spatial and nonspatial components. Best et al. (1999) suggest that an even vaguer prior for the k (say, a G(0.001, 0.001)) may lead to better prior ‘balance’ between the spatial and nonspatial random effects, but there is controversy on this point and so we do not pursue it here. While overly diffuse priors (as measured, for example, in Weiss, 1996) may result in weak identifiability of these parameters, their posteriors remain proper, and the impact of these priors on the posterior for the well-identified subset of parameters (including b and the logrelative hazards themselves) should be minimal (Daniels and Kass, 1999; Eberly and Carlin, 2000). Indeed, we feel that a more serious error would be to impose an overly informative prior that emerges as in conflict with the data.

Copyright # 2003 John Wiley & Sons, Ltd.

Environmetrics 2003; 14: 523–535

529

SEMIPARAMETRIC SPATIO-TEMPORAL FRAILTY MODELING

Table 1.

Posterior summaries for the spatio-temporal frailty model

Covariate Age at diagnosis Number of primaries Race (white ¼ 0) Black Other Stage (local ¼ 0) Regional Distant

2.5%

50%

97.5%

0.0135 0.43

0.0148 0.40

0.14 2.25

0.21 0.30

0.53 0.97

0.30 1.45

0.34 1.51

0.38 1.58

0.0163 0.36

Our algorithm ran five initially overdispersed parallel MCMC chains for 30 000 iterations each, where convergence was monitored using sample autocorrelations within the chains, cross-correlations between the parameters, plots of the sample traces, and Gelman and Rubin (1992) diagnostics. These tools suggested discarding the first 20 000 iterations from each chain as pre-convergence burn-in. Retaining every 10th of the remaining 5  10 000 ¼ 50 000 iterations yielded a final sample of size 5000 for posterior analysis. The size of our dataset precluded running our models in WINBUGS, so the computations were carried out in Cþþ, using R for posterior summarization and ArcView for creating choropleth maps. Table 1 provides 2.5, 50, and 97.5 posterior percentiles for the main effects (components of b) in our model. All of the predictors except those having to do with race are significant at the 0.05 level. Since the reference group for the stage variable is local, we see that women with regional and distant (metastasized) diagnoses have higher and much higher hazard of death, respectively; the posterior median hazard rate increases by a factor of e1:51 ¼ 4:53 for the latter group. Higher age at diagnosis also increases the hazard, but a larger number of primaries (the number of other types of cancer a patient is be suffering from) actually leads to a lower hazard, presumably due to the competing risk of dying from one of these other cancers. Figures 1–3 map the posterior medians of the frailties Wik þ Vik for three representative years: 1973, 1986, and 1998. Note the scales are different across the three maps, to facilitate visual identification of the spatial pattern in each year. We see that these patterns are fairly similar, with each year having clusters of counties with lower median frailties in the north- and south-central parts of the state, and also clusters of counties with higher median frailties in the central, northeastern and southeastern parts of the state.

Figure 1. Fitted spatio-temporal frailties, Iowa counties, 1973

Copyright # 2003 John Wiley & Sons, Ltd.

Environmetrics 2003; 14: 523–535

530

S. BANERJEE AND B. P. CARLIN

Figure 2. Fitted spatio-temporal frailties, Iowa counties, 1986

Figure 3. Fitted spatio-temporal frailties, Iowa counties, 1998

The scales of the three maps suggest an overall decreasing pattern in the frailties over time. Figure 4 clarifies this pattern by showing boxplots of the posterior medians of the Wik over time (recall that our full model does not have year-specific intercepts; the average of the Wik for year k plays this role). We see an essentially horizontal trend during roughly the first half of our observation period, followed by a decreasing trend that seems to be accelerating. Overall the total decrease in median log hazard is about

Figure 4. Boxplots of posterior medians for the spatial frailties Wik over the Iowa counties for each year, k ¼ 1973; . . . ; 1998

Copyright # 2003 John Wiley & Sons, Ltd.

Environmetrics 2003; 14: 523–535

SEMIPARAMETRIC SPATIO-TEMPORAL FRAILTY MODELING

531

Figure 5. Boxplots of posterior medians for the nonspatial frailties Vik over the Iowa counties for each year, k ¼ 1973; . . . ;1998

0.7 units, or about a 50 per cent reduction in hazard over the observation period. An cancer epidemiologist would likely be unsurprised by this decline, since it coincides with the recent rise in the use of mammography by American women. Similar to Figure 4, Figure 5 gives the posterior medians for the nonspatial frailties, Vik . The contrast with Figure 4 is stark: the nonspatial frailties are much smaller in absolute value, and are centered fairly close to 0 for all years. This suggests that the Vik may not really be necessary in our model—an issue we revisit in the next subsection. Finally, the similarity of the boxplots across years in Figures 4 and 5 (apart from the decreasing level in the former), as well as the similarity of the maps in Figures 1–3, suggest that the spatial structures in the data are fairly similar across diagnosis years. This is confirmed by Figure 6, which summarizes the posterior distributions of the spatial frailty conditional standard deviation terms (the pffiffiffiffiffi pffiffiffiffi 1= k , shown in gray) and the nonspatial frailty standard deviation terms (the 1= k , shown in white). Since these parameters indicate the magnitude of the spatial and nonspatial frailties, the horizontal trend over time indicates that the amount of local (spatial) and statewide (nonspatial)

pffiffiffi Figure 6. Boxplots of posterior samples for the spatial frailty conditional standard deviation terms, 1= k ; and the nonspatial pffiffiffi frailty conditional standard deviation terms, 1=  k ; over the Iowa counties for each year, k ¼ 1973; . . . ; 1998

Copyright # 2003 John Wiley & Sons, Ltd.

Environmetrics 2003; 14: 523–535

532

S. BANERJEE AND B. P. CARLIN

similarity in the covariate-adjusted death rates is constant over time. This again motivates a model simplification, which we consider in the next subsection.

4.2. Bayesian model choice and checking Since our use of a flat prior for b precludes the use of Bayes factors for model choice, we turn instead to Deviance Information Criterion (DIC), an extension of the Akaike Information Criterion (AIC) for hierarchical models. This criterion is based on the posterior distribution of the deviance statistic, DðhÞ ¼ 2 log f ðyjhÞ þ 2 log hðyÞ

ð4Þ

where f ðyjhÞ is the likelihood function for the observed data vector y given the parameter vector h, and hðyÞ is some standardizing function of the data alone (which thus has no impact on model selection). In this approach, the fit of a model is summarized by the posterior expectation of the deviance,  ¼ Ejy ½D, while the complexity of a model is captured by the effective number of parameters, pD . D Spiegelhalter et al. (2002) show that a reasonable definition of pD is   DðhÞ pD ¼ Ejy ½D  DðEjy ½hÞ ¼ D i.e. the expected deviance minus the deviance evaluated at the posterior expectations. Typically, this ‘effective’ parameter total pD will be less than the actual total number of parameters in the model, due to the borrowing of strength across random effects (in our case, the Wik and Vik ). The DIC is then defined analogously to the AIC as the expected deviance plus the effective number of parameters, i.e.  þ pD DIC ¼ D  indicate good fit while small values of pD indicate a parsimonious model, Since small values of D small values of the sum (DIC) indicate preferred models. Note that DIC may be readily calculated during an MCMC run by monitoring both h and DðhÞ, and at the end of the run simply taking the sample mean of the simulated values of D, minus the plug-in estimate of the deviance using the sample means of the simulated values of h. However, as with AIC and other penalized likelihood criteria, DIC is not intended for identification of the ‘correct’ model, but rather merely as a method of comparing a collection of alternative formulations (all of which may be incorrect). Note also that DIC is scale-free; the choice of standardizing function hðyÞ in Equation (4) is arbitrary. Thus, values of DIC have no intrinsic meaning; only differences in DIC across models are meaningful. The first six lines of Table 2 provide pD and DIC values for our full model and several simplications thereof. Note that the full model (sixth line) is estimated to have only just over 150 effective parameters, a substantial reduction (recall there are 2  99  26 ¼ 5148 random frailty parameters alone). Removing the spatial frailties Wik from the log-relative hazard has little impact on pD but substantial negative impact on the DIC score. By contrast, removing the nonspatial frailties Vik reduces (i.e. improves) both pD and DIC, consistent with our findings in the previous subsection. Further simplifying the model to having a single set of spatial frailties Wi that do not vary with time (but now also reinserting year-specific intercepts k ) has little effect on pD but does improve DIC a bit more (though this improvement appears only slightly larger than the order of Monte Carlo error in our

Copyright # 2003 John Wiley & Sons, Ltd.

Environmetrics 2003; 14: 523–535

533

SEMIPARAMETRIC SPATIO-TEMPORAL FRAILTY MODELING

Table 2.

DIC and effective number of parameters pD for the competing models

Baseline hazard Semiparametric Semiparametric Semiparametric Semiparametric Semiparametric Semiparametric Weibull Weibull Weibull

Log-relative hazard mixture mixture mixture mixture mixture mixture

T

b xijk bT xijk þ k bT xijk þ k þ Wi bT xijk þ Wik bT xijk þ Vik T b xijk þ Wik þ Vik bT xijk þ k þ Wi bT xijk þ Wik bT xijk þ Wik þ Vik

pD 6.17 33.16 80.02 81.13 149.45 151.62 79.22 80.75 141.67

DIC 780 743 187 208 732 280 221 239 315

calculations). Even more drastic simplifications (eliminating the Wi , and perhaps even the k ) lead to further drops in pD , but at the cost of unacceptably large increases in DIC. Thus, overall our countylevel breast cancer survival data seem to have strong spatial structure that is still unaccounted for by the covariates in Table 1, but structure which is fairly similar for all diagnosis years. The last three lines of Table 2 reconsider the best three log-relative hazard models above, but where we now replace the semiparametric mixture baseline hazard with a Weibull hazard having regioni 1 specific baseline hazards h0i ðtijk ; i Þ ¼ i tijk (note the spatial frailties play the role of the second parameter customarily associated with the Weibull model). These fully parametric models offer small advantages in terms of parsimony (smaller pD ), but these gains are apparently more than outweighed by a corresponding degradation in fit (much larger DIC score). Finally, thinking of the space–time separable model (third line of Table 2) as our ‘best’ model, we attempt to validate its fit using two approaches. First, Figure 7 plots the observed Kaplan–Meier curve (i) along with the Kaplan–Meier curve (iv) based on a sample from the posterior predictive distribution under the space–time separable model. Also shown is a similar posterior predictive curve (iii) arising from the nonseparable but spatial only model (fourth line of Table 2), and a final posterior predictive

Figure 7. Kaplan–Meier plots for model checking: (i) based on the data itself; (ii) from posterior predictive samples, space–time separable Weibull model; (iii) from posterior predictive samples, nonseparable semiparametric mixture model; (iv) from posterior predictive samples, separable semiparametric mixture model (with 95% pointwise credible limits shown as dashed lines)

Copyright # 2003 John Wiley & Sons, Ltd.

Environmetrics 2003; 14: 523–535

534

S. BANERJEE AND B. P. CARLIN

curve (ii) arising from the separable Weibull model (seventh line of Table 2). The agreement of all three semiparametric models is excellent, confirming the quality of fit afforded by the Cox approach of Section 3. In particular, the data-based (i) curve is almost entirely contained within the 95 per cent equal-tail pointwise credible bands for the separable semi-parametric model (shown as dashed lines, iv). However, the Weibull (ii) curve is noticeably discrepant, suggesting poorer fit. Second, wePcomputed a Bayesian p-value (Gelman et al., 1995, p. 169) based on the goodness-of-fit P statistic D ¼ i j ½gðtijk Þ  Eðgðtijk ÞjhÞ2 =Varðgðtijk ÞjhÞ. This is computed as the empirical proportion of D* replicates, each arising from a posterior predictive sample, that exceed the actual observed value, Dobs . Taking g equal to the identity function and the hazard function, the values obtained (0.38 and 0.44, respectively) also indicate acceptable fit on both scales.

5. CONCLUDING REMARKS AND FUTURE DIRECTIONS In this article we have described a semiparametric Bayesian approach to frailty modeling for spatiotemporally correlated survival data and illustrated it for a challenging yet publicly available data set (from the NCI’s SEER program). As with many previous MCMC-based Bayesian approaches, the resulting full posterior summaries enabled easy and accurate summary of the results in both traditional graphical and GIS formats. The DIC statistic also helped clarify the lack of significant spatio-temporal interaction for our particular data set. The GIS maps of the smoothed spatial frailties can be used to identify counties to target for further intervention (say, more aggressive mammogram use) or to investigate for spatially varying covariates still missing from the main effects portion of our model. While not justified for our data, future work in this area might contemplate extensions of our full model, such as one including both spatial main effects Ui and spatio-temporal interactions Wik . Future work also looks toward extending our approach to multiple cancers (say, breast, colorectal, and lung), which may be naturally expected to be correlated. Here a multivariate CAR (MCAR) distribution for the spatial frailty terms may be most sensible; see, for example, Mardia (1988), Gelfand and Vounatsou (2002), and Carlin and Banerjee (2002). We might also attempt to incorporate spatial correlation into multiple or repeated endpoint studies. We hope to report on these and other related topics in future articles.

ACKNOWLEDGEMENTS

The work of both authors was supported in part by NSF/EPA grant SES 99-78238, while that of the second author was also supported in part by NIEHS grant R01-ES07750. The authors are grateful to Prof. Aaron Folsom for crucial assistance in understanding the SEER database.

REFERENCES Banerjee S, Carlin BP. 2002. Spatial semiparametric proportional hazards models for analyzing infant mortality rates in Minnesota counties. In Case Studies in Bayesian Statistics, Volume VI, Gatsonis C, et al. (eds). Springer-Verlag: New York; 137–151. Banerjee S, Wall MM, Carlin BP. 2003. Frailty modeling for spatially correlated survival data, with application to infant mortality in Minnesota. Biostatistics 4: 123–142. Bernardinelli L, Montomoli C. 1992. Empirical Bayes versus fully Bayesian analysis of geographical variation in disease risk. Statistics in Medicine 11: 983–1007.

Copyright # 2003 John Wiley & Sons, Ltd.

Environmetrics 2003; 14: 523–535

SEMIPARAMETRIC SPATIO-TEMPORAL FRAILTY MODELING

535

Besag J, York JC, Mollie´ A. 1991. Bayesian image restoration, with two applications in spatial statistics (with discussion). Annals of the Institute of Statistical Mathematics 43: 1–59. Besag J, Green P, Higdon D, Mengersen K. 1995. Bayesian computation and stochastic systems (with discussion). Statistical Science 10: 3–66. Best NG, Arnold RA, Thomas A, Waller LA, Conlon EM. 1999. Bayesian models for spatially correlated disease and exposure data (with discussion). In Bayesian Statistics 6, Bernardo JM, Berger JO, Dawid AP, Smith AFM (eds). Oxford University Press: Oxford; 131–156. Carlin BP, Banerjee S. 2002. Hierarchical multivariate CAR models for spatio-temporally correlated survival data (with discussion). In Bayesian Statistics 7, Bernardo JM, Bayarri MJ, Berger JO, Dawid AP, Heckerman D, Smith AFM, West M (eds). Oxford University Press: Oxford, to be published. Carlin BP, Hodges JS. 1999. Hierarchical proportional hazards regression models for highly stratified data. Biometrics 55: 1162– 1170. Carlin BP, Louis TA. 2000. Bayes and Empirical Bayes Methods for Data Analysis, 2nd ed. Chapman & Hall/CRC Press: Boca Raton, FL. Carlin BP, Chaloner K, Church T, Louis TA, Matts JP. 1993. Bayesian approaches for monitoring clinical trials with an application to toxoplasmic encephalitis prophylaxis. The Statistician 42: 355–367. Clayton D. 1991. A Monte Carlo method for Bayesian inference in frailty models. Biometrics 47: 467–485. Clayton D. 1994. Some approaches to the analysis of recurrent event data. Statistics in Medical Research 3: 244–262. Cox DR, Oakes D. 1984. Analysis of Survival Data. Chapman & Hall: London. Daniels MJ, Kass RE. 1999. Nonconjugate Bayesian estimation of covariance matrices and its use in hierarchical models. Journal of the American Statistical Association 94: 1254–1263. Diaconis P, Ylvisaker D. 1985. Quantifying prior opinion. In Bayesian Statistics 2, Bernardo JM, et al. (eds). North Holland: Amsterdam; 133–156. Eberly LE, Carlin BP. 2000. Identifiability and convergence issues for Markov chain Monte Carlo fitting of spatial models. Statistics in Medicine 19: 2279–2294. Gelfand AE, Mallick BK. 1995. Bayesian analysis of proportional hazards models built from monotone functions. Biometrics 51: 843–852. Gelfand AE, Smith AFM. 1990. Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association 85: 398–409. Gelfand AE, Vounatsou P. 2002. Proper multivariate conditional autoregressive models for spatial data analysis. Biostatistics, to be published. Gelman A, Rubin DB. 1992. Inference from iterative simulation using multiple sequences (with discussion). Statistical Science 7: 457–511. Gelman A, Carlin J, Stern H, Rubin DB. 1995. Bayesian Data Analysis. Chapman & Hall/CRC Press: Boca Raton, FL. Gilks WR, Wild P. 1992. Adaptive rejection sampling for Gibbs sampling. Journal of the Royal Statistical Society, Series C (Applied Statistics) 41: 337–348. Hodges JS, Carlin BP, Fan Q. 2003. On the precision of the conditionally autoregressive prior in spatial models. Biometrics, to be published. Knorr-Held L. 2002. Some remarks on Gaussian Markov random field models for disease mapping. In Highly Structured Stochastic Systems, Hjort N, Green P, Richardson S (eds). Oxford University Press: Oxford, to be published. Mardia KV. 1988. Multi-dimensional multivariate Gaussian Markov random fields with application to image processing. Journal of Multivariate Analysis 24: 265–284. Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E. 1953. Equations of state calculations by fast computing machines. Journal of Chemical Physics 21: 1087–1091. Spiegelhalter DJ, Best N, Carlin BP, van der Linde A. 2002. Bayesian measures of model complexity and fit (with discussion). Journal of the Royal Statistical Society, Series B 64: 583–639. Sun D, Tsutakawa RK, Speckman PL. 1999. Posterior distribution of hierarchical models using CAR(1) distributions. Biometrika 86: 341–350. Waller LA, Carlin BP, Xia H, Gelfand AE. 1997. Hierarchical spatio-temporal mapping of disease rates. Journal of the American Statistical Association 92: 607–617. Weiss RE. 1996. Bayesian model checking with applications to hierarchical models. Technical Report, Department of Biostatistics, UCLA School of Public Health. Available online at http://rem.ph.ucla.edu/  rob/papers/index.html.

Copyright # 2003 John Wiley & Sons, Ltd.

Environmetrics 2003; 14: 523–535

Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.