Empirical Bayes Age-Period-Cohort Analysis of Retrospective Incidence Data

June 23, 2017 | Autor: Claus Holst | Categoria: Statistics, Age-period-cohort models
Share Embed


Descrição do Produto

# Board of the Foundation of the Scandinavian Journal of Statistics 2000. Published by Blackwell Publishers Ltd, 108 Cowley Road, Oxford OX4 1JF, UK and 350 Main Street, Malden, MA 02148, USA Vol 27: 415±432, 2000

Empirical Bayes Age±Period±Cohort Analysis of Retrospective Incidence Data YOSIHIKO OGATA and KOICHI KATSURA Institute of Statistical Mathematics, Tokyo

NIELS KEIDING and CLAUS HOLST University of Copenhagen

ANDERS GREEN University of Aarhus ABSTRACT. We analyse the (age, time)-speci®c incidence of diabetes based on retrospective data obtained from a prevalent cohort only including survivors to a particular date. From underlying point processes with intensities corresponding to the (age, time)-speci®c incidence rates the observed point pattern is assumed to be generated by an independent thinning process with parameters (assumed known) depending on population density and survival probability to the sampling date. A Bayesian procedure is carried out for the optimal adjustment and comparison of isotropic and anisotropic smoothing priors for the intensity functions, as well as for the decomposition of the intensity on the (time, age) Lexis diagram into the three factors of age, period and cohort.

Key words: ABIC, age±period±cohort decomposition, anisotropic smoothness prior, B-spline, detection rate, diabetes incidence, integrated likelihood, intensity function, Lexis diagram, random deletion, spatial Poisson process

1. Introduction Let á(a, t) denote the incidence of some chronic disease at age a and time t (proportional to the probability that a healthy individual at (a, t) will become ill in the next in®nitesimal time interval). A central task in chronic disease epidemiology is to decompose á(a, t) into effects explained by age a, calendar time (``period'') t and birth time (``cohort'') c ˆ t ÿ a. However, because of the linear dependence t ˆ c ‡ a this decomposition is not unique without further prior information, as extensively discussed in the chronic disease epidemiology literature, e.g. Osmond & Gardner (1982), Clayton & Schif¯ers (1987) and Robertson & Boyle (1998a, b). One way out of the identi®ability problem is to impose smoothness constraints on the age, period, and cohort effects, and this may be put into a useful formal framework by adopting a Bayesian approach. Indeed, Bayesian methods for this problem using Markov Chain Monte Carlo (MCMC) ideas were studied by Berzuini et al. (1993), Berzuini & Clayton (1994) and Besag et al. (1995). The main purpose of this paper is to adopt an empirical Bayes approach for this problem, using Akaike's (1980) Bayesian Information Criterion (ABIC) for model selection as well as estimation of the parameters of the prior distributions (the hyperparameters). This approach is completely empirical, and has the advantage over the model of Berzuini et al. that it includes smoothness constraints in the linear parameters. The approach can be viewed as a generalization of Nakamura's (1982, 1986) study of a logistic model for a cohort table of survey data. The model deriving from smoothed age, period, and cohort effects is contrasted to an

416

Y. Ogata et al.

Scand J Statist 27

unrestricted smoothing (using similar principles) of á(a, t), our non-parametric background model. While the latter might suf®ce for descriptive purposes, the age±period±cohort decomposition is important in the substantive epidemiological applications; its ®t is conveniently gauged by comparison with the background model. We follow Brillinger (1986) and Keiding (1991) in generating a bivariate Poisson process on the Lexis diagram of disease incidences from an assumption of a (non-homogeneous) Poisson process of births. In this way the one-dimensional time in the de®nition of disease incidence á(a, t) (age and time always move at the same speed for any one person!) becomes transformed into the two freely varying coordinates (a, t) for the bivariate Poisson process. Of®cial statistical data on survivors are brought in to approximate the density of survivors. Ogata (1996) compared and discussed, in the context of spatial statistical models, the empirical Bayes approach adopted in the present paper and the full MCMC approach. We apply the methods to retrospective data on diabetes incidence (see section 2 for details) earlier documented by Keiding et al. (1989) and Keiding (1990). In these earlier analyses a descriptive approach was taken and the results were reported as horizontal, vertical and diagonal cross-sections of a smoothed (time, age)-intensity surface over the Lexis diagram. McKeague & Utikal (1990) gave a mathematical-statistical treatment of this kind of bivariate hazard smoothing. An important methodological complication of the retrospective study derives from the fact that the time and age of diabetes onset were recorded only for diabetics surviving until 1 July 1973. Keiding et al. noted heuristically that an adequate correction for this decreased detection rate corresponds to individually weighting each individual's incidence contribution by an estimate of the reciprocal of its survival probability from diabetes onset until 1 July 1973. It turns out that this kind of individual weighting corresponds to the so-called Horvitz±Thompson estimator in sampling theory, cf. Horvitz & Thompson (1952) or Cochran (1983). The idea seems to have been introduced into survival analysis by Koul et al. (1981) and was used independently by Robins & Rotnitzky (1992), see also Whittemore & Halpern (1997). In this paper, we propose decompositions of the incidence rate into effects of age, period and cohort by taking the retrospective nature of the data into consideration and by assuming smoothness of the incidence rates. The Horvitz±Thompson approach to the retrospective data is in principle the same as that by Keiding et al. (1989), who however only gave a brief heuristic explanation. 2. Data Green et al. (1981) traced 1499 insulin-dependent diabetic patients in Fyn County (population approximately 450,000), Denmark, as of 1 July 1973. The case registration rate was close to 100%. Among these, Keiding et al. (1989) studied the 710 (410 males and 300 females) who had their disease diagnosed at age at most 30 years and after 1 January 1933. For each of these patients the time of disease onset (as certi®ed by a doctor) was ascertained from patient records, so that a retrospective (age, time)-pattern of diabetes incidence is available over the rectangle [0, 30 years] 3 [January 1933, June 1973]. Each circle in Fig. 1(a) and (b) is centred at the observed onset age and time (ai , ti ) of a diabetic (the sizes of the circles are explained later). Because of the retrospective data collection, patients who died before 1 July 1973 are not included, which must be taken into account when estimating diabetes incidence. Keiding et al. (1989) estimated the (age, time)-dependent detection rate (i.e. the probability of survival until 1 July 1973) 0 < î(a, t) < 1 using an (independently) estimated survival distribution of diabetic patients. A generalized Cox regression model (Andersen et al., 1985) was used and estimates # Board of the Foundation of the Scandinavian Journal of Statistics 2000.

Scand J Statist 27

Empirical Bayes age-period-cohort analysis

417

Fig. 1. Age±time-speci®c onset of disease for (a) male and (b) female diabetic patients alive in Fyn County on 1 July 1973. The area of the circle is inversely proportional to the survival probability until that date. The populations at risk are about 3000 and 3100 individuals/year2 on average for the males and females, respectively.

quoted along with population census data approximating the age 3 calendar time density of (disease-free) survivors in Fyn County. The contour maps in Fig. 2(a) and (b) show these estimated detection probabilities î(a, t) for males and females, respectively. We see little difference of the patterns between males and

Fig. 2. (Age, period)-contour map of the estimated detection probability ^ î(a, t) of an onset of (a) male and (b) female diabetes at (a, t). The range of the detection probability is from about 20% to 100%, and the contour interval is 5%. The populations at risk are about 3000 and 3100 individuals/year2 on average for the males and females, respectively. # Board of the Foundation of the Scandinavian Journal of Statistics 2000.

418

Y. Ogata et al.

Scand J Statist 27

females. These rates primarily depend on the calendar years but some cohort effect is also seen in the earlier part of the period. The area of a circle in Fig. 1(a) and (b) with centre (a, t) is inversely proportional to the detection probability î(a, t), which is relevant for the weighting of the corresponding observation as indicated above and further elaborated below. 3. The basic estimation problem in the Lexis diagram Let á(a, t) be the intensity of disease occurrence at age a and calendar time tÐwe shall call it the incidenceÐde®ned by the probability that a healthy individual of age a at calendar time t contracts the disease during the time interval [t, t ‡ h) (and therefore age interval [a, a ‡ h)) is á(a, t)h. Note that although the intensity function á(a, t) depends on the two variables a and t it is still an ordinary intensity in one-dimensional time. Following Brillinger (1986) and Keiding (1991) assume now that births happen according to a nonhomogeneous Poisson process with intensity â(t) at calendar time t and that the death intensity at age a and time t is ì(a, t). It then follows from Brillinger's analysis that the point process of disease onset (a, t) in the Lexis diagram is a bivariate Poisson point process with intensity function ã(a, t)á(a, t), where  …a  ã(a, t) ˆ â(t ÿ a)exp ÿ [ì( y, t ÿ a ‡ y) ‡ á( y, t ÿ a ‡ y)] dy : 0

The interpretation of ã(a, t) is the density of (disease-free) survivors. We do not observe this point process because each patient survives until observation with probability î(a, t) as explained in section 2 above, so that the observed point process has intensity ë(a, t) ˆ ã(a, t)á(a, t)î(a, t):

(1)

In the following analysis we shall assume ã(a, t), the density of disease-free survivors, known, in practice estimated by the age 3 calendar time histogram approximation from of®cial statistics, and we shall also assume î(a, t) known as explained by Keiding et al. (1989) and brie¯y surveyed above. Sometimes we shall refer to ç(a, t) ˆ ã(a, t)î(a, t) as the modulator. It generalizes (by not necessarily being bounded above by 1) the detection rate function used in earlier examples of occurrence data with a non-negligible number of missing events due to incomplete observation, where the detection rate of events is not the same over the de®ned domain. This complication can be handled if additional data and models are available to estimate the detection rate function of the underlying events. For instance, Ogata & Katsura (1993) analysed the change of detection rate of earthquakes in time and space based on the earthquakes' magnitude together with an empirical magnitude frequency model. For the present case, estimates of survival î(a, t) for individual patients were available, which Keiding et al. (1989) combined with census data to obtain an estimate of the modulator ç(a, t). In order to estimate the underlying rate of diabetic onset, Keiding et al. (1989) performed a smoothing using a bivariate Epanechnikov kernel k(a, t) ˆ (1 ÿ a2 ÿ t 2 )=(2ð) for a2 ‡ t 2 , 1 taking the distinct detection rates into consideration. That is, from the observed onset event (ai , ti ), i ˆ 1, 2, . . ., N , the intensity was estimated by   N 1 X a ÿ a i t ÿ ti ^ (a, t) ˆ 2 á , wi : k ó ó ó iˆ1 with the weights wi ˆ ç(ai , ti )ÿ1 and the bandwidth ó ˆ 3 years. The smoothed incidence rates were given by Keiding et al. (1989). Preliminary work by the ®rst author, published # Board of the Foundation of the Scandinavian Journal of Statistics 2000.

Empirical Bayes age-period-cohort analysis

Scand J Statist 27

419

by Keiding (1990) showed the surface estimated from the same data but by a different smoothing procedure which multiplies the two independently smoothed B-spline surfaces; one for the intensity of point patterns of observed events assuming a non-homogeneous Poisson process, the other interpolating the weights fwi g. In this paper we use a penalized likelihood approach, with the likelihood derived on the assumptions that we observe a set of retrospective incidence data f(ai , ti ) 2 [A0 , A1 ] 3 [T0 , T1 ]; i ˆ 1, 2, . . ., N g, that the modulator ç(a, t) is assumed known and that the intensity á(a, t) is parameterized by a vector è. Then, the log-likelihood function is written by … A1 … T1 N X log L(è) ˆ log ë(ai , ti ; è) ÿ ë(a, t; è) dt da A0

iˆ1

ˆ

N X

T0

logfá(ai , ti ; è):ç(ai , ti )g ÿ

… A1 … T1

iˆ1

A0

T0

á(a, t; è):ç(a, t) da dt:

(2)

Here, the 2-dimensional integral is computed by a numerical approximation.

4. Age±period±cohort modelling: introduction One of our main concerns is the decomposition of the intensity function into three risk factors of age, period and cohort. For this purpose we consider a log-linear model for the intensity á(a, t) of the underlying events such that log á(a, t) ˆ ì ‡ f A (a) ‡ f T (t) ‡ f C (t ÿ a),

(3)

where ì is a constant, and f A (a), f T (t), and f C (c) are continuous functions with the constraints … A1 … T1 … C1 f A (a) da ˆ f T (t) dt ˆ f C (c) dc ˆ 0: (4) A0

T0

C0

Clayton & Schif¯ers (1987), commenting on earlier works by Osmond & Gardner (1982), discussed the dif®culty in the decomposition of the incidence rate into three factors stressing the non-identi®ability of the linear factors. Typically, suppose a solution f A , f T and f C in (3) and (4) for a given log á(a, t). Then, for any constant ø,   A0 ‡ A1  f A (a) ˆ f A (a) ‡ ø a ÿ 2   T0 ‡ T1 f T (t) ˆ f T (t) ÿ ø t ÿ 2   C0 ‡ C 1 f C (c) ˆ f C (c) ‡ ø c ÿ (5) 2 is also the solution in (3) and (4) which does not change the value of the log-likelihood in (2). This identi®ability problem will not be solved without further restrictions among the three factors. A trivial strong restriction is that one of the three components is set to be the zero function by external evidence. One of the weak restrictions is the use of a smoothness prior for the functions. Berzuini et al. (1993) used a second order autoregressive prior, although this will only be useful for the decomposition in the case where the linear factors are removed from the surface log á(a, t) in (3) as indicated by Clayton & Schif¯ers (1987). Nakamura (1982, 1986) # Board of the Foundation of the Scandinavian Journal of Statistics 2000.

420

Y. Ogata et al.

Scand J Statist 27

used ®rst-order autoregressive priors. We shall combine these approaches and return to a discussion in section 7 after having speci®ed the models and the estimation methods. 5. Spline models We implement the age±period±cohort models by postulating spline decompositions of f A (a), f T (t) and f C (c) in (3), and in addition, we postulate a background (non-parametric) model by bivariate spline modelling of log á(a, t). The latter serves as reference for capturing possible higher-order interactions not included by the age±period±cohort model. 5.1. Age±period±cohort model Let [A0 , A1 ], [T0 , T1 ] and [C0 , C1 ] ˆ [T0 ÿ A1 ‡ A0 , T1 ] be the intervals on which the variables of age, period and cohort are de®ned, respectively, and suppose that each interval is equidistantly divided into M A , M T and M C subintervals, respectively, to de®ne cubic Bspline functions in the following manner. For a given location x 2 [U , V ] from each of the above intervals, the ith subinterval which includes x is identi®ed in such a way that x ˆ U ‡ (i ÿ 1)d ‡ rx for some 0 < rx < d where d ˆ (V ÿ U )=M. Then, the functions f in (3) are each expanded by the spline bases in such a way that f (x) ˆ f è (x) ˆ

3 X

èi‡ k B4ÿ k (rx =d)

(6)

kˆ0

with coef®cients è ˆ fèi g with i ˆ 1, 2, . . ., M ‡ 3. Here, the bases Bk (:) with k ˆ 1, 2, 3, 4, are de®ned for r 2 [0, 1] by B1 (r) ˆ r 3 =6 B2 (r) ˆ (ÿ3r 3 ‡ 3r 2 ‡ 3r ‡ 1)=6 B3 (r) ˆ (3r 3 ÿ 6r 2 ‡ 4)=6 B4 (r) ˆ (ÿr 3 ‡ 3r 2 ÿ 3r ‡ 1)=6

(7)

according to de Boor (1978), Inoue (1986) and Ogata & Katsura (1988), for instance. Thus, taking the zero sum constraints in (4) into consideration, the spline coef®cients è A ˆ (è1A , . . ., è AM A ‡2 ), è T ˆ (è1T , . . ., è TM T ‡2 ) and è C ˆ (è1C , . . ., è CM C ‡2 ) for the factors of age, period and cohort, respectively, are to be estimated. 5.2 Background model Later on, in place of the model in (3) for the age±period±cohort decomposition, we also consider a 2-dimensional function j(:, :) such that j(a, t) ˆ log á(a, t):

(8)

This should be ¯exible enough to examine whether or not there exists any effect of higher order interaction of the factors or anything else other than the decomposed model in (3). For the parameterization the bilinear B-spline function is de®ned as follows. Consider the rectangular domain [A0 , A1 ] 3 [T0 , T1 ] which is equally divided into M A 3 M T subrectangles. For a given location (a, t) 2 [A0 , A1 ] 3 [T0 , T1 ], the (i, j)th cell which includes (a, t) is identi®ed in such a way that a ˆ A0 ‡ (i ÿ 1)d A ‡ ra and t ˆ T0 ‡ ( j ÿ 1)d T ‡ rt for # Board of the Foundation of the Scandinavian Journal of Statistics 2000.

Empirical Bayes age-period-cohort analysis

Scand J Statist 27

421

some 0 < ra < d A and 0 < rt < d T where d A ˆ (A1 ÿ A0 )=M A and d T ˆ (T1 ÿ T0 )=M T . Then, j(a, t) is expanded by the spline bases in such a way that j(a, t) ˆ jè (a, t) ˆ

3 X 3 X

èi‡ k, j‡ l B4ÿ k (ra =d A )B4ÿ l (rt =d T )

(9)

kˆ0 lˆ0

with coef®cients è ˆ fèi, j g with i ˆ 1, 2, . . ., M A ‡ 3 and j ˆ 1, 2, . . ., M T ‡ 3. For both models, the intensity function for the observed events in (1) is eventually log-linearly parameterized. The log-likelihood function in (2) is concave with respect to the parameters because the Hessian matrix is everywhere negative-de®nite in the parameter space (see Ogata, 1978). Thus, at most one maximum of the log-likelihood function could have been expected unless the Hessian matrix of the log-likelihood function were degenerated. In fact, this is not true owing to the redundancy of the parameters relative to the number of data events and owing to the non-identi®ability of the decomposition (discussed in section 4), so that we have to make further reasonable restrictions to apply a Bayesian procedure described below. The number of parameters used for the intensity in (3) is M A ‡ M T ‡ M C ‡ 7, taking the restriction in (4) into consideration, and the j-function in (9) has (M A ‡ 3)(M T ‡ 3) parameters to represent a 2dimensional spline surface. The number of subintervals M A , M T and M C for the respective factors should be suf®ciently large to represent the local change suf®ciently well. Without constraints the maximum likelihood method will usually provide a rapidly ¯uctuating spline curve or surface. Therefore, we need to consider some restrictions among the parameters, which leads us to Bayesian smoothing methods.

6. Bayesian smoothing methods Consider ®rst the case of ®tting the background model in (8) with the 2-dimensional spline function in (9) to the observed onset events shown in Fig. 1(a) and (b). In principle, for suf®ciently large number of the spline knots M A 3 M T , the log-likelihood increases until the intensity surface becomes spiky. However, such a surface cannot be accepted as the optimal one. Therefore, we impose the smoothness penalties  2 … T1 … A1  2 @j @j Ö1 (èjr1 , r3 ) ˆ r1 ‡r3 da dt, @a @t T0 A0 Ö2 (èjr2 , r4 , r5 ) ˆ

 2  2  2 @2j @2j @2j r2 ‡2r4 ‡r5 da dt @a2 @a@ t @ t2 A0

… T1 … A1 T0

(10)

to measure the roughness of the function, where r ˆ (r1 , . . ., r5 ) are tuning weights in the penalties for the restriction among parameters è in (9); we call the weights hyperparameters. The isotropic smoothing (de Boor, 1978; Inoue, 1986; Ogata & Katsura, 1988) is adjusted by a reduced number of the hyper-parameters in (10) such that r1 ˆ r3 and r2 ˆ r4 ˆ r5 under which Ö1 and Ö2 above are rotation invariant in (a, t). On the other hand, when Ogata et al. (1991) made a Bayesian imaging by inverting the earthquake data, anisotropic smoothing in a certain region of the lithosphere actually better ®tted than the isotropic one in three dimensions. In any case, for a set of hyper-parameters r ˆ (r1 , . . ., r5 ), maximization of the penalized log-likelihood Q(è; r) ˆ log L(è) ÿ penalty(èjr r),

(11)

where penalty(èjr r) ˆ Ö1 (èjr1 , r3 ) ‡ Ö2 (èjr2 , r4 , r5 ), # Board of the Foundation of the Scandinavian Journal of Statistics 2000.

(12)

422

Y. Ogata et al.

Scand J Statist 27

(Good & Gaskins 1971) is carried out. In practice, we use the standard quasi-Newton optimization procedure for the maximization in which the occasional use of the Newton method makes the convergence very fast (see Ogata & Katsura, 1988, 1993; Ogata et al., 1991); the former uses the gradient vector of the function Q, while the latter needs the Hessian matrix (i.e. 2nd order derivative) in addition. When the age±period±cohort model (3) consisting of the three spline functions is ®tted by maximizing the log likelihood in (2), a similar rapid ¯uctuation is expected. Thus a measure of the local variation of curves can be given by the standard roughness penalties, or smoothness constraints, of the functions f A ˆ f A (a; è A ), f T ˆ f T (t; è T ), and f C ˆ f C (c; è C ) such that  2 2 … A1  @ fA @2 f A F1 (è A jr1 , r2 ) ˆ r1 ‡ r2 da @a @a2 A0 F2 (è T jr3 , r4 ) ˆ

… T1 T0

 2  2 @ fT @2 f T r3 ‡ r4 dt, @t @ t2

and F3 (è C jr5 , r6 ) ˆ

… C1 C0

 r5

@ fC @c



2 ‡ r6

@2 f C @c2

2 dc,

(13)

where the non-negative constants r1 , . . ., r6 are hyper-parameters which adjust the strength of the penalties. Thus, for given values of the hyper-parameters r ˆ (r1 , . . ., r6 ), the estimates of the parameters è are obtained so as to maximize the penalized log-likelihood Q(èjr r) ˆ log L(è) ÿ penalty(èjr r),

(14)

where penalty(èjr r) ˆ F1 (è A jr1 , r2 ) ‡ F2 (è T jr3 , r4 ) ‡ F3 (è C jr5 , r6 ),

(15)

for è ˆ (è A , è T , è C ). Now, we have to resolve two con¯icting aims in curve or surface ®tting in (11) and (14), respectively, which are to produce a good ®t to the data but to avoid too much rapid local variation. This is a crucial problem of the optimum tuning of the hyper-parameters for the provided set of penalties above. For this purpose, we are led to a Bayesian interpretation of the penalized log-likelihood in which the prior distribution is obtained from the penalty function prior(èjr r) ˆ „

expfÿpenalty(èjr r)g expfÿpenalty(èjr r)g dè

although there is some complication to avoid the improper prior (see Ogata & Katsura, 1988, 1993; Ogata et al., 1991). Similarly, to de®ne posterior distribution, we need the normalizing factor … … L (r r) ˆ    L(è):prior(èjr r) dè (16) which may be called the likelihood of a Bayesian model, sometimes called the integrated likelihood or the marginal likelihood. The maximization of the function or its logarithm with respect to the hyper-parameter r was suggested by Good (1965) and the maximizing ^ is called the type II maximum likelihood estimate which we adopt for hyper-parameter r the optimal tuning of the penalized log-likelihood. In practice, we need the Gaussian approximation of the posterior function to implement the integration in (16) with respect to # Board of the Foundation of the Scandinavian Journal of Statistics 2000.

Empirical Bayes age-period-cohort analysis

Scand J Statist 27

423

the parameters è which is usually very high dimensional (see Ogata & Katsura, 1988, 1993; Ogata et al., 1991; Ogata, 1996, and also appendix, for technical details). Thus far, we have shown the various models with different likelihoods or penalty functions for the smoothing. To compare the goodness-of-®t of these models Akaike's Bayesian Information Criterion (ABIC; Akaike, 1980), ABIC ˆ (ÿ2)max logL (r r) ‡ 2:dim(r r) r

(17)

is useful, where ``max'' in (17) is taken with respective to the hyper-parameters. Given different Bayesian models for the same data, the one with a smaller ABIC is considered to be a better ®t to the data. The last term of ABIC is provided for the consistent use with the Akaike's Information Criterion (AIC; Akaike, 1974) which is used for the comparison of ordinary maximum likelihood models. ^ A ), f T (t; è ^ T ) and f C (c; è ^ C ) in (3) at each It is useful to get the estimation errors of f A (a; è age a, time t and cohort c. We expect that the joint error distribution of the parameters at ^ ˆ (è ^ A, è ^T , è ^ C ) is approximately multivariate normal N (0, H ÿ1 ), where H ÿ1 ˆ (h p, p9 ) is the è inverse of the Hessian matrix H ˆ (h p, p9 ) ˆ (ÿ@ 2 Q=@è p @è9p9 ) of the penalized log-likelihood function. Since f A (a), for instance, is given by the linear combination with respect to the vector ^ A , its error distribution is approximately normal according to the standard large sample theory, è and the covariance function of the error spline process is given by C(a; a9) ˆ

3 X 3 X

h i‡ k,i9‡ k9 B4ÿ k (ra =d A )B4ÿ k9 (ra9 =d A9 ),

(18)

kˆ0 k9ˆ0

where a ˆ (i ÿ 1)d A ‡ ra and a9 ˆ (i9 ÿ 1)d A9 ‡ ra9 for some 0 < ra < d A and 0 < ra9 < d A9 where the bases fBk (:); k ˆ 1, 2, 3, 4g have been provided in (7). Therefore, the ^ A ) at age a is provided by standard errors of f A (a; è p å(a) ˆ C(a; a): (19) Similarly, the errors å(t) and å(c) for f T and f C, respectively, are obtained from the corresponding factors of the inverse Hessian matrix.

7. Age±period±cohort models: discussion We now return to a discussion of the implementation of the smoothness penalties required to obtain identi®ability in the age±period±cohort model, as introduced in section 4 above. Note ®rst that the second order autoregressive prior of Berzuini et al. (1993) corresponds to the second order differentials in the penalties in (13), while the ®rst order autoregressive priors of Nakamura (1982, 1986) correspond to the ®rst order differentials in the penalties in (13). In fact, if we substitute the above functions in (5) to the penalty (15) with (13), we have penalty(èjr r) ˆ R1 (èjr1 , r3 , r5 ) ‡ R2 (èjr2 , r4 , r6 ) where R1 (èjr1 , r3 , r5 ) ˆ r1

… A1  A0

2 2 2 … T1  … C1  df A df T df C ‡ ø da ‡ r3 ÿ ø dt ‡ r5 ‡ ø dc da dt dc T0 C0 (20)

and # Board of the Foundation of the Scandinavian Journal of Statistics 2000.

424

Y. Ogata et al.

R2 (èjr2 , r4 , r6 ) ˆ r2

Scand J Statist 27

… A1  A0

d2 f A da2

2 da ‡ r4

… T1  T0

d2 f T dt 2

2 dt ‡ r6

… C1  C0

d2 f C dc2

2 dc:

Here R1 (èjr1 , r3 , r5 ) is a quadratic function with respect to ø, the unique minimum of which is attained at ø0 ˆ

ÿr1 f f A (A1 ) ÿ f A (A0 )g ‡ r3 f f T (T1 ) ÿ f T (T0 )g ‡ r5 f f C (C1 ) ÿ f C (C0 )g r1 (A1 ÿ A0 ) ‡ r3 (T1 ÿ T0 ) ‡ r5 (C1 ÿ C0 )

(21)

for given r1 , r3 and r5 . On the other hand, R2 (èjr2 , r4 , r6 ) includes no ø term, which means that the second order penalties in (13) cannot discriminate the decomposed functions in (5) as mentioned above. Further details regarding the effect of the ABIC in identifying the age±period±cohort model were recently given by Nakamura (1996) for his logit model for standard cohort tables. A similar discussion is possible here, most conveniently in terms of the quadratic approximation of the penalized log-likelihood developed in the appendix. For the details, see Ogata et al. (1999). In summary, the procedure of maximizing Ë(r r) provides a consistent way for selecting among the competing models, and the Akaike's Bayesian Information Criterion (ABIC) in (17) trades off ®t to the data and model complexity as the AIC does (Akaike, 1974). That is to say, as shown above, the model with less components and less variability of the three components measured by (13) is preferred as far as the ®t of log á(a, t) ˆ ì ‡ f A (a) ‡ f T (t) ‡ f C (t ÿ a) to the data is same.

8. Results For the age±period±cohort decomposition model in (3), we set M A ˆ 12, M T ˆ 16 and M C ˆ 28 to de®ne spline functions on respective intervals. If all the three factors are used, we have six hyper-parameters in the penalty in (15) with (13) to adjust the smoothness of period, age and cohort factors, respectively. However, sometimes we may need only one or two of the factors, for which fewer hyper-parameters are used. Thus, selection of the factors is reduced to the comparison of the ABIC values of the corresponding models. For instance, when we assume that no cohort effect exists in (3), the ABIC value is calculated with four hyper-parameters corresponding to the period and age factors. In this way, for both the male and female data, all possible combinations of the factors are examined to compute their ABIC values and corresponding estimated value of hyper-parameters which maximize the likelihood for each case. The results are listed in Table 1. For males the smallest ABIC is attained for the (age, period) model for which the estimated functions ^f A (a) and ^f P ( p) are displayed in Fig. 3(a). It is seen that there is a rather sharp increase in incidence from birth, levelling off at puberty and possibly slightly descending, and then increasing rather sharply in the late 20s. The total range of ^f A (a) is [ÿ1, 0:5] corresponding to a range (1, e 1:5 )  (1, 4:5) in age-speci®c relative risk from age 0 to age 30. The period effect ^f P ( p) rises smoothly from 1933 to 1973, mostly in the middle of the period, with a possible tendency of a slight decrease toward the end. Its range is about [ÿ0:3, 0:4], corresponding to an overall increase of e 0:7  2 in diabetes incidence, mainly during the period 1940±1960. The slightly less optimal (age, cohort)-model is displayed in Fig. 3(b); it is seen that the age pattern is very similar while the estimated cohort effect ^f C (c) is very close to linear with a slope # Board of the Foundation of the Scandinavian Journal of Statistics 2000.

Empirical Bayes age-period-cohort analysis

Scand J Statist 27

425

Table 1. Model comparison by ABIC Models Males A‡P‡C A‡P A ‡C P‡C A P C const. Females A‡P‡C A‡P A ‡C P‡C A P C const.

^ ÿ2 log Ë

2(#P)

ABIC

Difference from minimum

5686.7 5686.7 5689.0 5696.3 5704.4 5716.3 5730.5 5730.5

12 8 8 8 4 4 4 0

5698.7 5694.7 5697.0 5704.3 5708.4 5720.3 5734.5 5730.5

4.0 0.0 2.3 9.6 13.7 25.6 39.8 35.8

4380.9 4381.0 4381.0 4405.6 4381.9 4405.6 4405.5 4405.5

12 8 8 8 4 4 4 0

4392.9 4389.0 4389.0 4413.6 4385.9 4409.6 4409.5 4405.5

7.0 3.1 3.1 27.7 0.0 23.7 23.6 19.6

Fig. 3. Estimated age, period and cohort effects f A (a), f P ( p) and f C (c) for males. The baseline yearly incidence rate exp(^ ì) ˆ 11:4 cases per 105 person years. (a): age-period model, (b): age-cohort model. Thin real lines show 1 standard error bounds.

of about 0.025, corresponding to an increased risk of about 2.5% for each new yearly generation, or a factor of 2.7 over 40 years of birth cohorts. For the females the optimal model according to ABIC is the age (only) model, and ^f A (a) is shown in Fig. 4(a). The general age-speci®c incidence picture starts, as was the case for the males, with a steep increase from birth. However there is now indication of a pre-pubertal # Board of the Foundation of the Scandinavian Journal of Statistics 2000.

426

Y. Ogata et al.

Scand J Statist 27

Fig. 4. Estimated age, period and cohort effects f A (a), f P ( p) and f C (c) for females. The baseline yearly incidence rate exp(^ ì) ˆ 9:14 cases per 105 person years. (a): age model, (b): age-period model. Thin real lines show 1 standard error bounds.

trough before the pubertal peak, and a decreasing incidence in the late 20s. The total range may be somewhat larger than for the males. The slightly less optimal (age, period) model in Fig. 4(b) shows essentially the same age pattern and a weakly increasing period effect with a total estimated range of about 0.2, or a range in (1, e 0:2 )  (1, 1:2) in relative risk over the period considered. There may be higher order interaction of the factors or anything else other than the loglinearly decomposed risks in (3). The ®rst step to explore this possibility is to create the composed surface in (3) based on the estimated factors in the above ®gures. Figure 5(a) and (b) ^ (a, t) ˆ expf^ì ‡ f^A (a) ‡ f^T (t) ‡ f^c (t ÿ a)g for the shows the contour map of the surface á cases of male and female, respectively. Then, for comparison with the composed ®gures, the alternative (background model) estimate of á(a, t) using the two dimensional spline function is carried out. We set M A ˆ 12 and M T ˆ 16 for the bilinear B-spline function to estimate the j-function in (9). Table 2 shows the hyperparameters corresponding to freely varying r i (anisotropic ®t), yielding ABIC values about 11 and 6 lower than the best age±period±cohort ®ts for males and females, respectively. Figure 6(a) and (b) shows the contour map of the estimated surface ~ (a, t) ˆ expf~ á j(a, t)g which is the solution to the maximum penalized log-likelihood in (11) with the optimally adjusted weights r i in Table 2. It is seen that the ``non-parametric'' ®t of Fig. 6(a) and (b) is very close to that of the (reduced) age±period±cohort models in Fig. 5(a) and (b), supporting the ®t of the above model. Note that the isotropic ®t (rotation invariant in the (age, period)-Lexis diagram) would have required the restrictions r1 ˆ r3 , r2 ˆ r4 ˆ r5 , which seem far from satis®ed in Table 2. The resulting ®t (not shown here) indeed had ABIC values of 5705.6 and 4409.7 for males and females, rather larger than the corresponding ABIC values for the non-isotropic ®t. This con®rms the expectation that the rotation invariance of the penalty function is rather less natural in the (age, period) Lexis diagram here than in earlier applications (e.g. Ogata & Katsura, 1993) to the distributions of earthquakes across a two-dimensional geographical plane. # Board of the Foundation of the Scandinavian Journal of Statistics 2000.

Empirical Bayes age-period-cohort analysis

Scand J Statist 27

427

Fig. 5. Contour map and birds-eye-view of (age, period)-intensity surface composed of the sum of the estimated factors of age, period and cohort shown in Fig 3(a) and 4(a); (a) male and (b) female. The intensity surfaces vary from about 0.4 to 26.0 cases per 105 persons year2 and from 2.0 to 16.0 cases per 105 persons year2 for the males' and females' populations at risk, respectively. Contour interval is 2.0 cases per 105 persons year2.

Table 2. Hyperparameter estimates with ABIC, anisotropic smoothing Males Females

r1

r2

r3

r4

r5

ABIC

3.4 3 10ÿ1 9.0 3 10ÿ2

1.6 3 103 3.0 3 101

3.4 3 100 1.8 3 101

2.6 3 109 1.1 3 107

7.7 3 103 2.5 3 103

5683.4 4379.2

9. Comments on the results as analysed In the present example, the age±period±cohort model (Fig. 5(a), (b)) seemed to ®t nicely when compared to a general model with anisotropic smoothing constraints (Fig. 6(a), (b)). Also, the results were well compatible with those of the earlier, purely descriptive analysis of Keiding et al. (1989). From the diabetological perspective the results con®rm in many respects previously described epidemiological features of insulin-dependent diabetes. Concerning the age effect, we have reidenti®ed the well-known peak in the incidence around puberty (e.g. Christau et al., 1979). The results suggest a decrease in the incidence towards the end of the age range 0±31 years for females, supporting other recent Danish analyses (Mùlbak et al., 1994) (Fig. 4(a), (b)). For males, however, the trend seems to be modestly increasing with age (Fig. 3(a), (b)). This contrast cannot possibly be explained by admixture of male patients with other types of diabetes than the insulin-dependent type since detailed medical investigations of samples of the patients indicate that insulin-treated diabetes with onset before age 30 years may to a very high extent be considered representative of clinically de®ned insulin-dependent diabetes (Green & Hougaard, 1983). # Board of the Foundation of the Scandinavian Journal of Statistics 2000.

428

Y. Ogata et al.

Scand J Statist 27

Fig. 6. Contour map and birds-eye-view of (age, period)-spline intensity surface directly estimated by the Bayesian smoothing using anisotropic penalty functions; (a) male and (b) female. The intensity surfaces vary from about 4.0 to 28.0 cases per 105 persons year2 and from 2.0 to 18.0 cases per 105 persons year2 for the males' and females' populations at risk, respectively. Contour interval is 2.0 cases per 105 persons year2.

As regards temporal effects, the present approach downplays cohort effects, thus cautiously supporting results from recent studies in Sweden (NystroÈm et al., 1990) and Finland (Tuomilehto et al., 1991). This leaves the increasing temporal trend to be explained by period effects. For males, this is described as a steadily increasing incidence since the 1930s, with a possible modest decline towards the end of the observation period (Fig. 3(a)). This result is well compatible with independent analyses of the diabetes incidence in Danish male conscript birth cohorts 1949±1964, observed during the age interval 0±19 years (Green & Andersen, 1983; Green et al., 1992). In contrast, no or only a slight period effect was identi®ed for females (Fig. 4(a) and (b)). This apparent difference in calendar time trends between the male and female incidences is not supported from recent, independent descriptive studies in Denmark. Thus, for both male and female patients, with childhood onset diabetes, the incidence level in parts of Denmark for the years 1989±1990 was some 40% higher (Green et al., 1992) than estimated for the same populations in the beginning of the 1970s (Christau et al., 1979). Also, as compared with the prevalence of insulin-treated diabetes with onset before age 30 years as of 1 July, 1973 in the present study region (Green et al., 1981), a new and comparable study in 1987 found increased prevalence for both male and female patients (Eshùj et al., 1994). Accordingly, there is strong evidence for an increasing incidence for male as well as female patients in Denmark. These estimated trends must be interpreted with due reservation to the underlying assumptions. Thus, the assumed historic mortality effects may not be truly representative for the conditions previously prevailing in the study region. Also, our analytical approach assumes observations in a closed population so that differential net-migrations among patients over the borders delineating the study region may introduce bias. Keiding et al. (1989) gave some # Board of the Foundation of the Scandinavian Journal of Statistics 2000.

Scand J Statist 27

Empirical Bayes age-period-cohort analysis

429

general discussion of the validity of the rather restrictive approach, notably the in¯uence of migration. We refer to that discussion here. In summary, we have presented a formal analytical approach that, under certain assumptions, makes it possible to investigate age, cohort and period effects retrospectively from crosssectional data when age at onset is recorded for all prevalent cases. The potential applicability of our approach relates in particular to diseases for which it is dif®cult to obtain reliable incidence data. Some dermatologic conditions like psoriasis, atopic dermatitis, and nickel allergy, together with neurological diseases like multiple sclerosis, belong to this group. Unless the mortality in the patients may be considered similar to that in non-patients, information or assumptions regarding disease mortality forms an integral part of this approach. References Akaike, H. (1974). A new look at the statistical model identi®cation. IEEE Trans. Automat. Control 19, 716±723. Akaike, H. (1980). Likelihood and Bayes procedure. In Bayesian Statistics, (eds J. M. Bernard, M. H. De Groot, D. U. Lindley & A. F. M. Smith), University Press, Valencia, Spain. Akaike, H., Ozaki, T., Ishiguro, M., Ogata, Y., Kitagawa, G., Tamura, Y. H., Arahata, E., Katsura, K. & Tamura, Y. (1984). Time series and control program package, TIMSAC-84, Institute of Statistical Mathematics, Tokyo. Andersen, P. K., Borch-Johnsen K., Deckert, T., Green, A., Hougaard, P., Keiding, N. & Kreiner, S. (1985). A Cox regression model for relative mortality and its application to diabetes mellitus survival data. Biometrics 41, 921±932. Besag, J., Green, P., Higdon, D. & Mengersen, K. (1995). Bayesian computation and stochastic systems. Statist. Sci. 10, 3±66. Berzuini, C. & Clayton, D. (1994). Bayesian analysis of survival on multiple time scales. Statist. Med. 13, 823±838. Berzuini, C., Clayton, D. & Bernardinelli, L. (1993). Bayesian inference on the Lexis diagram. Bull. Internat. Statist. Inst. 55, (1), 149±165 with discussion (4), 42±43. Brillinger, D. R. (1986). The natural variability of vital rates and associated statistics (with discussion). Biometrics 42, 693±734. Christau, B., Kromann, H., Christy, M., Andersen, O. O. & Nerup, J. (1979). Incidence of insulin-dependent diabetes mellitus (0±29 years at onset) in Denmark. Acta Med. Scand. Suppl. 624, 54±60. Clayton, D. & Schif¯ers, E. (1987). Models for temporal variation in cancer rates. II: age±period±cohort models. Statist. Med. 6, 469±481. Cochran, W.G. (1983). Horvitz±Thompson estimator. Encyclopedia of Statistical Sciences 3, 665±668. de Boor, C. (1978). A practical guide to splines, Springer-Verlag, New York. Eshùj, O., Green, A., Borch-Johnsen, K., Feldt-Rasmussen, B. & Beck-Nielsen, H. (1994). Increased prevalence of insulin-treated diabetes mellitus in the Funen County, Denmark. J. Internat. Med. 235, 405±410. Fletcher, R. & Powell, M. J. D. (1963). A rapidly convergent descent method for minimization. Computer J. 6, 163±168. Good, I. J. (1965). The estimation of probabilities, MIT Press, Cambridge, MA. Good, I. J. & Gaskins, R. A. (1971). Non-parametric roughness penalties for probability densities. Biometrika 58, 255±277. Green, A. & Andersen, P. K. (1983). Epidemiological studies of diabetes mellitus in Denmark: 3. Clinical characteristics and incidence of diabetes among males aged 0 to 19 years. Diabetologia 25, 226±230. Green, A. & Hougaard, P. (1983). Epidemiological studies of diabetes mellitus in Denmark: 4. Clinical characteristics of insulin-treated diabetes. Diabetologia 25, 231±234. Green, A., Hauge, M., Holm, N. V. & Rasch, L. L. (1981). Epidemiological studies of diabetes mellitus in Denmark: II. a prevalence study based on insulin prescriptions. Diabetologia 20, 468±470. Green, A., Andersen, P. K., Svendsen, A. J. & Mortensen, K. (1992a). Increasing incidence of early onset Type 1 (insulin-dependent) diabetes mellitus: a study of Danish male birth cohorts. Diabetologia 35, 178±182. Green, A., Gale, E. A. M. & Patterson, C.C. for the EURODIAB ACE Study Group (1992b). Incidence of childhood-onset insulin-dependent diabetes mellitus: the EURODIAB ACE study. Lancet, 339, 905±909. # Board of the Foundation of the Scandinavian Journal of Statistics 2000.

430

Y. Ogata et al.

Scand J Statist 27

Horvitz, D. G. & Thompson, D. J. (1952). A generalization of sampling without replacement from a ®nite universe. J. Amer. Statist. Assoc. 47, 663±685. Inoue, H. (1986). A least square smooth ®tting for irregularly spaced data: ®nite element approach using the cubic B-spline basis. Geophysics 51, 2051±2066. Ishiguro, M. & Sakamoto, Y. (1983). A Bayesian approach to binary response curve estimation. Ann. Inst. Statist. Math. 35, 115±137. Keiding, N. (1990). Statistical inference in the Lexis diagram. Phil. Trans. Roy. Soc. Lond. A 332, 487±509. Keiding, N. (1991). Age-speci®c incidence and prevalence: a statistical perspective (with discussion). J. Roy. Statist. Soc. Ser. A 154, 371±412. Keiding, N., Holst, C. & Green A. (1989). Retrospective estimation of diabetes incidence from information in a prevalent population and historical mortality. Amer. J. Epidemiol. 130, 588±600. Koul, H. L., Susarla, V. & Van Ryzin, J. (1981). Regression analysis with randomly right-censored data. Ann. Statist. 9, 1276±1288. Kowalik, J. and Osborne, M. R. (1968). Methods for unconstrained optimization problems. American Elsevier, New York. McKeague, I. W. & Utikal, K. J. (1990). Inference for a nonlinear counting process regression model. Ann. Statist. 18, 1172±1187. Mùlbak, A. G., Christau, B., Marner, B., Borch-Johnsen, K. & Nerup, J. (1994). Incidence of insulindependent diabetes in age groups over 30 years in Denmark. Diabetic Med. 11, 650±655. Murata, Y. (1993). Estimation of optimum average sur®cial density from gravity data: an objective Bayesian approach. J. Geophys. Res. 98, 12097±12109. Nakamura, T. (1982). A Bayesian cohort model for standard cohort table analysis. Proc. Inst. Statist. Math. 29, 77±97 (in Japanese). Nakamura, T. (1986). Bayesian cohort models for general cohort table analyses. Ann. Inst. Statist. Math. 38, 353±370. Nakamura, T. (1996). Re-examination of the identi®ability problem in cohort analysisÐa Bayesian cohort model (xv), Proceedings of the 24th annual meeting of the Behaviometric Society of Japan, 208±209 (in Japanese). NystroÈm, L., Dahlquist, G., Rewers, M. & Wall, S. (1990). The Swedish Childhood Diabetes Study. An analysis of the temporal variation in diabetes incidence 1978±1987. Int. J. Epidemiol. 19, 141±146. Ogata, Y. (1978). The asymptotic behaviour of maximum likelihood estimators for stationary point processes. Ann. Inst. Statist. Math. 30, 243±261. Ogata, Y. (1996). Evaluation of spatial Bayesian modelsÐtwo computational methods. J. Statist. Plann. Inference 51, 1±18. Ogata, Y. & Katsura, K. (1986). Point-process models with linearly parameterized intensity for application to earthquake data. In Essays in time series and allied processes, papers in honor of E. J. Hannan (eds J. Gani & M. B. Priestley) J. Appl. Probab. 23A, 291±310. Ogata, Y. & Katsura, K. (1988). Likelihood analysis of spatial inhomogeneity for marked point patterns. Ann. Inst. Math. Statist. 40, 29±39. Ogata, Y. & Katsura, K. (1993). Analysis of temporal and spatial heterogeneity of magnitude frequency distribution inferred from earthquake catalogues. Geophys. J. Internat. 113, 727±738. Ogata, Y., Imoto, M. & Katsura, K. (1991). 3-D spatial variation of b-values of magnitude-frequency distribution beneath the Kanto District, Japan. Geophys. J. Internat. 104, 135±146. Ogata, Y., Katsura, K., Keiding, N., Holst, C. & Green, A. (1999). Age±period±cohort analysis of incidence from incompletely detected retrospective data, Research memorandum No. 723, The Institute of Statistical Mathematics, Tokyo. Osmond, C. & Gardner, M. J. (1982). Age, period and cohort models applied to cancer mortality rates. Statist. Med. 1, 245±259. Ramlau-Hansen, H. (1983). Smoothing counting process intensities by means of kernel functions. Ann. Statist. 11, 453±466. Robertson, C. & Boyle, P. (1998a). Age±period±cohort analysis of chronic disease rates. I: modelling approach. Statist. Med. 17, 1305±1323. Robertson, C. & Boyle, P. (1998b). Age±Period±Cohort analysis of chronic disease rates. II: graphical approaches. Statist. Med. 17, 1325±1339. Robins, J. M. and Rotnitzky, A. (1992). Recovery of information and adjustment for dependent censoring using surrogate markers. AIDS epidemiology, methodological issues (eds N. P. Jewell, K. Dietz & V. T. Farewell) 297±331, BirkhaÈuser, Boston. Ê kerblom, Tuomilehto, J., Rewers, M., Reunanen, A., Lounamaa, P., Lounamaa, R., Tuomilehto-Wolf, E. & A # Board of the Foundation of the Scandinavian Journal of Statistics 2000.

Empirical Bayes age-period-cohort analysis

Scand J Statist 27

431

H. K. (1991). Increasing trend in Type 1 (insulin-dependent) diabetes mellitus in childhood in Finland. Analysis of age, calendar time and birth cohort effects during 1965 to 1984. Diabetologia 34, 282±287. Whittemore, A. S. & Halpern, J. (1997). Multi-stage sampling in genetic epidemiology. Statist. Med. 16, 153±167. Received October 1998, in ®nal form September 1999 Yosihiko Ogata, The Institute of Statistical Mathematics, 4-6-7 Minami-Azabu, Minato-Ku, Tokyo 106, Japan

Appendix: a Gaussian approximation of the posterior From (6) with the zero sum constraints (4), the penalties F1 , F2 and F3 in (13) have quadratic forms with respect to the parameters è ˆ (è A , è T , è C ) with some positive de®nite matrices Ó A , Ó T and Ó C such that 1 1 F1 (è A jr1 , r2 ) ˆ è A (r1 SA ‡ r2 UA )èTA ˆ è A Ó A èTA 2 2 1 1 F2 (è T jr3 , r4 ) ˆ è T (r3 ST ‡ r4 UT )èTT ˆ è T ÓèTT 2 2 1 1 F3 (è C jr5 , r6 ) ˆ è C (r5 SC ‡ r6 UC )èTC ˆ è C Ó C èTC 2 2

(22)

where ``T'' means the transpose of the (row) vectors. On the other hand, the exponential of the negative sum ÿ(F1 ‡ F2 ‡ F3 ) in (15) is considered to be proportional to the prior distribution prior(èjr r) ˆ ð A (è A jr1 , r2 )ð T (è T jr3 , r4 )ð C (è C jr5 , r6 )

(23)

characterized by the weights, or the hyper-parameters, r ˆ (r1 , . . ., r6 ). Hence, the prior ð in (23) is a multivariate Gaussian distribution such that (det Ó A )1=2 ð A (è A jr1 , r2 ) ˆ p exp(ÿè A Ó A èTA =2) 2ð M A ‡2 (det Ó T )1=2 ð T (è T jr3 , r4 ) ˆ p exp(ÿè T Ó T èTT =2) 2ð M T ‡2 and (det Ó C )1=2 ð C (è C jr5 , r6 ) ˆ p exp(ÿè C Ó C èTC =2): 2ð M C ‡2

(24)

Since the log-likelihood in (2) is certainly based on non-Gaussian distribution, we cannot obtain the analytic solution of the integral in (16) unlike the linear models given in Akaike (1980). Nevertheless, since the prior ð in (16) is Gaussian from (24), we can apply the Gaussian approximation of the posterior (e.g. Ishiguro & Sakamoto, 1983). That is to say, the logarithm of the integrand in (16) Q0 (èjr r) ˆ logfL(è)prior(èjr r)g, ˆ Q(èjr r) ‡

1 1 log det(Ó A Ó T Ó C ) ÿ (M A ‡ M T ‡ M C ‡ 6)log 2ð, 2 2

(25)

where Q is the penalized log-likelihood in (14), is approximated by the quadratic form ^ H Q (èjr ^ r) ‡ 1 (è ÿ è) ^ r)(è-è) ^ T, Q0 (èjr r)  Q0 (èjr 2 # Board of the Foundation of the Scandinavian Journal of Statistics 2000.

(26)

432

Y. Ogata et al.

Scand J Statist 27

^ is the vector which maximizes Q0 for ®xed r ˆ (r1 , r2 , . . ., r6 ), and H Q (èjr ^ r) is where è ^ the Hessian matrix (i.e. negative second derivatives) of the penalized log-likelihood at è such that   2 ^ ^ r) ˆ diag(Ó A , Ó T , Ó C ) ‡ ÿ@ log L(è) H Q (èjr (27) @è@èT where diag(:, :, :) is the block diagonal matrix. Using the quadratic approximation in (26) together with the equation in (27), we have an approximated log likelihood of the Bayesian model written by Ë(r r) ˆ log L (r r) ^ r) ÿ  Q0 (èjr ^ r) ‡ ˆ Q(èjr

1 1 logfdet H Q g ‡ (M A ‡ M T ‡ M C ‡ 6)log 2ð 2 2

1 1 log det(Ó A Ó T Ó C ) ÿ logfdet H Qr g: 2 2

(28)

It should be noted that the both maximizations of log L in (28) and the penalized loglikelihood Q in (14) with (15) have to be carried out by the non-linear optimization procedure (see Ogata & Katsura, 1988; Ogata et al., 1991). We use the quasi-Newton type method such as the Davidon±Fletcher±Powell's (e.g. Fletcher & Powell, 1963; Akaike et al., 1984) to maximize Q in (14) with (4) with respect to the vector è of parameters for ®xed r ˆ (r1 , . . ., r6 ). For maximizing log L in (28) with respect to r, the direct search method such as the simplex optimization method is used (see Kowalik & Osborne, 1968, and also Murata, 1993, for instance). The two optimizations are repeated by turns until the latter optimization converges. In ^ r) for maximizing Q in (14) with respect to è we found that the use of the exact Hessian H(èjr the standard Newton's procedure in a suitable stage makes the convergence very rapid in spite of the large dimensionality of è. This supports that the approximation (26) for (25) is good enough.

# Board of the Foundation of the Scandinavian Journal of Statistics 2000.

Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.