Semi-parametric estimation of age-time specific infection incidence from serial prevalence data

Share Embed


Descrição do Produto

STATISTICS IN MEDICINE Statist. Med. 18, 307—320 (1999)

SEMI-PARAMETRIC ESTIMATION OF AGE—TIME SPECIFIC INFECTION INCIDENCE FROM SERIAL PREVALENCE DATA NICO NAGELKERKE  *, SIEM HEISTERKAMP, MARTIEN BORGDORFF, JAAP BROEKMANS AND HANS VAN HOUWELINGEN National Institute of Public Health and the Environment, Bilthoven, The Netherlands Department of Public Health, Erasmus University, Rotterdam, The Netherlands Department of Medical Microbiology, University of Manitoba, Winnipeg, Canada Royal Netherlands Tuberculosis Association, The Hague, The Netherlands Department of Medical Statistics, University of Leiden, Leiden, The Netherlands

SUMMARY Many infections cause lasting detectable immune responses, whose prevalence can be estimated from cross-sectional surveys. However, such surveys do not provide direct information on the incidence of infection. We address the issue of estimating age and time specific incidence from a series of prevalence surveys under the assumption that incidence changes exponentially with time, but make no assumption about the age specific incidence. We show that these assumptions lead to a proportional hazards model and estimate its parameters using semi-parametric maximum likelihood methods. The method is applied to tuberculin surveys in The Netherlands to explore age dependence of the risk of tuberculous infection in the presence of a strong secular decline in this risk. Copyright  1999 John Wiley & Sons, Ltd.

1. INTRODUCTION For many infectious pathogens, human infections often do not give rise to disease specific clinical symptoms, and thus escape diagnosis. However, immune responses such as antibodies are formed which can subsequently be detected by specific tests, such as Elisa tests or the Mantoux test (allergy to tuberculin). Thus, the stock of never-infected test negative individuals is depleted with age according to a catalytic (compartmental) model  f (a, t)"![*/*a#*/*t]S(a, t)"k(a, t)S(a, t) where S(a, t) is the fraction of the population still uninfected at age a and calendar time t (the survival function), f (a, t) is the incidence of (first) infection at age a and calendar time t, and k(a, t) is the incidence density, also called the ‘force of infection’, ‘(annual) risk of infection’ or ‘hazard rate’, the incidence among those still uninfected, at age a and calendar time t. * Correspondence to: N. J. D. Nagelkerke, National Institute of Public Health and the Environment (RIVM), Dept., IMA, Bag 86, P.O. Box 1, 3720 BA Bilthoven, The Netherlands. E-mail: [email protected]

CCC 0277—6715/99/030307—14$17.50 Copyright  1999 John Wiley & Sons, Ltd.

Received February 1997 Accepted March 1998

308

N. NAGELKERKE E¹ A¸.

Sometimes, the force of infection can be estimated directly. For HIV-1, p24 antigen can be detected earlier than the presence of antibodies. Thus, when the mean time lag between the two tests is known, a direct estimate of incidence can be obtained from the difference between the prevalence of p24 antigen and the prevalence of antibodies. This idea, while potentially applicable to other infections for which several diagnostic tests are available, has only been applied to study HIV incidence. For most infections, cohort studies of initially seronegative individuals are as yet the only direct method of observing incidence and thus of estimating the force of infection. Although such cohort studies are in many respects the best way to estimate the force of infection, such studies are costly and can rarely be carried out on representative samples of the population due to attrition and other complications. Cross-sectional seroprevalence surveys are generally much simpler and quicker to organize and carry out. However, estimates of the age specific force of infection and its temporal trend have to be recovered from age and time specific prevalence data. In such surveys, for each individual tested the only information obtained is whether he/she has already experienced an infection before the age (a) of testing or not. If the force of infection does not change with time (stationarity) then a single cross-sectional survey of individuals at different ages is sufficient to estimate age-specific infection rates. However, to evaluate control measures and monitor trends in force of infection, repeated cross-sectional population surveys should be used; cross-sectional samples of the population are tested for prevalence of immune responses at different times using the same sampling design. If samples include a single age(group), such as tuberculin surveys of army recruits, then the temporal trend in the force of infection can be monitored from the changes in prevalence of immune responses between successive surveys. However, if different ages are sampled, then age specific prevalences of immune responses can be estimated as well. From such data, one would ideally like to obtain both an estimate of the age specific force of infection at any given time and its secular trend. In the following we shall assume that immune responses, once they are formed, are detectable for the remainder of an individual’s life. We shall also assume that excess mortality of individuals testing positive is negligible. For many bacterial and viral pathogens this holds approximately true. Even for serious diseases like tuberculosis only a small proportion (approximately 10 per cent) of infections will ultimately lead to disease of which these days only a small percentage will succumb to the disease. However, for infections such as HIV this approximation will only hold true among recently infected individuals. The problem of recovering age specific incidence from prevalence data is by no means unique to epidemiology. For example, this problem commonly occurs in the analysis of demographic surveys, especially those carried out in developing countries, such as the Demographic and Health Surveys (DHS, Macro Internat. Inc., Calverton, MD, U.S.A). In such surveys it is, for example, recorded whether individuals have ever been married, without recording the age at first marriage. This is known as ‘current status’ data. Recovering the incidence density of age at first marriage from such data is similar to estimating incidence of infection from seroprevalence data. If surveys are repeated at different times (for example, DHS surveys are repeated every several years in participating countries), then time trends in the age at first marriage can be estimated. Current status data also arises in the context of tumourigenecity studies, in which groups of rodents who have been exposed to specific potential carcinogens are sacrificed at different ages to see if a tumour has developed. The time of onset of any tumour found (that is, the failure time) is of course unknown. When a reasonable prior idea about the functional dependence on age of the force of infection is available, fully parametric approaches to the analysis of current status data are probably the most powerful. They also come with standard likelihood asymptotic properties, making Copyright  1999 John Wiley & Sons, Ltd.

Statist. Med. 18, 307—320 (1999)

AGE—TIME SPECIFIC INFECTION INCIDENCE

309

approximate significance tests easy to perform. Also, with parametric approaches other forms of interval censoring rarely pose major complications of analysis. Such data may arise, for example, in studies in which cohorts are tested periodically for the presence of antibodies. There, the censoring interval is the interval (t !, t #] between the last negative and the first positive test, G G leading to likelihoods of the form %+S(t !)!S(t #),. For many parametric families of survival G G functions, such likelihoods are relatively easy to calculate and maximize. Parametric methods also allow the analysis of interval censored data where the presence of antibody (for example, to HIV-1) is associated with significant excess mortality.\ However, in some situations knowledge of the functional form of the force of infection is either unknown, or may be the object of analysis. For example, the question whether the force of infection increases with age may be subject to debate. In such situations a non-parametric approach, if possible, may be supplementary to a parametric one. A data analyst may also favour a non-parametric approach when he is only interested in factors affecting the force of infection rather than the force of infection itself, and would thus prefer to treat its functional form as a ‘nuisance’ function. In theoretical analyses, in order to achieve greater generality, it may also be attractive not to specify a parametric form of the survival function. For example, Brunet and Struchiner derived expressions for the efficacy of interventions to prevent infection as a function of the age and time specific prevalence of infection, without specifying a parametric form for this prevalence, which may thus be estimated non-parametrically. Non-parametric methods for the analysis of current status data are based on only assigning a probability mass to points (ages) at which prevalence is measured. Unfortunately, treating these probability masses as additional parameters using a standard likelihood approach may result in non-decreasing estimates of the survival function. In the following we will first consider some available non-parametric methods for stationary situations. Then we will explore when and how temporal trends can be translated into covariable information in a proportional hazards model. Then, by generalizing non-parametric methods for stationary problems, we will develop a new algorithm for the semi-parametric estimation of covariate effects and hazard function in the proportional hazards model when current status data is available. 2. STATIONARITY For stationary (time homogeneous) situations, that is, situations in which the force of infection remains constant over time, the problem of estimating incidence from cross-sectional prevalence data has drawn considerable attention (Keiding, Diamond and MacDonald, Groeneboom and Lopuhaa). When observed prevalences increase monotonically with age, the problem of estimating the survival function is straightforward, as then 1 minus the prevalence at age a is the maximum likelihood estimate of the infection free survival function at age a, from which estimates of the hazard function can be derived. However, unless samples at each age are very large, a monotone increase with age of observed prevalences will only rarely occur. As the survival function is a monotonically decreasing function, one has to estimate the survival function under order restrictions. The solution to this estimation problem, the isotonic (regression) estimator, has been known for a long time.  Similar to the familiar Kaplan—Meier estimator for right censored survival data, the isotonic estimator estimates the survival function as a ‘step’ function with possible steps at ages a , 2 , a for which observations are available. Its asymptotic  L properties have been studied in detail rather recently by Groeneboom. Copyright  1999 John Wiley & Sons, Ltd.

Statist. Med. 18, 307—320 (1999)

310

N. NAGELKERKE E¹ A¸.

Algorithms for calculating the isotonic estimator are based on the ‘pool adjacent violators’ principle; if adjacent observations violate the order restrictions then they are pooled (or ‘‘amalgamated’) until no more violations occur. In most situations it is reasonable to assume that the underlying hazard function is continuous and smooth. Yet the estimated survival function is a step function. Thus, some form of smoothing has to be applied to obtain acceptable estimates of the hazard function k(a). One possibility is to use Kernel smoothing



k(a)"!(1/b) K((a!u)/b) dSK (u)/SK (u!). where K is a kernel and b a bandwidth. A simple alternative is to fit a smooth curve (for example, a (monotone) smoothing spline or polynomial) through the cumulative hazard function !log(SK (a))"! ? dSK (u)/SK (u!) and calcu late analytically the derivative of the fitted curve. However, one has to verify that this fitting does not introduce new non-monotonicities. 3. NON-STATIONARITY The assumption of stationarity is often untenable. The force of infection of a pathogen may either increase or decrease with time under the influence of, for example, improved treatment, changes in the pathogen, or intrinsic epidemic cycles. The force of infection of tuberculosis in The Netherlands, for example, has been steadily declining by somewhat over 10 per cent per year since the Second World War and the assumption of stationarity would lead to a gross overestimation of (current) force of infection; a bias that would increase with age. Several methods have been developed to take such trends into account.\ Here we shall propose a method that is appropriate for situations in which the force of infection increases or declines (approximately) exponentially for prolonged periods of time. Such behaviour can be expected from infectious pathogens with slow intrinsic dynamics, such as tuberculosis. 3.1. Temporal effects as covariates Let k(a, t) be the age specific incidence rate (force of infection) at age a at (calendar) time t. We shall assume an exponential decline (or increase) in force of infection. Thus, for two calendar times t and t :   k(a, t )"exp(b(t !t ))k (a, t ) (1)     We want to explore how this exponential period effect translates into a cohort effect. For this, we define the cohort specific, age specific incidence rate j(a, b) as the age specific incidence rate experienced by a cohort born at calendar time (for example, year) b. A cohort born at calendar time b has age a at calendar time a#b. Thus, by definition of j (a, b) we have that j(a, b)"k(a, a#b). For two cohorts born at calendar time b and b , we have by virtue of (1):   j(a, b )"k(a, a#b )"k(a, a#b #(b !b ))      "k(a, b #a) exp (b(b !b ))"exp (b(b !b ))j(a, b )       Copyright  1999 John Wiley & Sons, Ltd.

(2)

Statist. Med. 18, 307—320 (1999)

AGE—TIME SPECIFIC INFECTION INCIDENCE

and thus, by substituting b "b and b "0 into (2)   j(a, b)"k(a, b) exp (ba)

311

(3)

Thus, if we are able to estimate b and j(a, b) we can also estimate k(a, b) by multiplying j(a, b) by exp(!ba). Note that we have assumed that there are no intrinsic cohort effects. Cohort effects are induced by confounding with period effects. For a discussion of the intricacies of age-period-cohort analysis, including tools such as the Lexis diagram, we refer the reader to the vast literature available on this subject.\ 3.2. The proportional hazards model From equation (2) we find that a proportional hazards model applies to the cohort and age specific incidence density (hazard rate), with cohort, that is, year of birth, acting as a covariable. The effect of this covariable could be estimated in conjunction with the effect of other covariables of interest, provided suitable estimators are available. If times of developing immune responses, for example, seroconversion, would either have been observed, or have been right censored, the problem of estimating b and j (a) would have been  reduced to a standard proportional hazards regression problem (Cox’s model) for which standard maximum partial likelihood estimators are available in many statistical packages. An attractive property of Cox’s model is that for right censored data the likelihood can be factored into different parts which allows the hazard function to be ‘factored out’ from the estimation of the covariate effects. It therefore does not require any specification of the functional shape of j ( ). Following  the estimation of the covariate effects, the hazard function j ( ) can be estimated in a highly non parametric fashion. Because of these properties Cox’s model is often called semi-parametric. Unfortunately, maximum partial likelihood cannot be used for current status data which are both left and right censored, and different methods for the semi-parametric inference for the proportional hazards regression model, which should retain much of the flexibility of Cox’s method, need to be developed for this type of data. Recently, Huang has explored such methods using a generalization of one of the standard algorithms for the isotonic estimator. Here we present a simpler algorithm and demonstrate that it yields the semi-parametric maximum likelihood estimators of b and S (t).  4. SEMI-PARAMETRIC ESTIMATION OF THE PROPORTIONAL HAZARDS MODEL FOR CURRENT STATUS DATA Let +(y , a , x ), be observed, with y a (0, 1) variable denoting whether individual i with covariate G G G G vector x has experienced the event of interest before a . G G Let j(a, x)"exp(bx)j (a) be the model relating the hazard function of an individual with  covariate vector x to the base hazard function j (a), and let S (a)"exp(! ? (j (u))du). Thus     S (a) is the base survival function.  Then the log-likelihood ¸ to be maximized is (4) ¸"&+exp (bx )y log (S (a ))#(1!y ) log (1!S (a )@VG),. G G  G G  G There are two (sets of ) parameters which have to be estimated, S (a) and b. In Appendix I we  show how to estimate b for given S (a) and how to estimate S (a) as a (monotonically decreasing)   Copyright  1999 John Wiley & Sons, Ltd.

Statist. Med. 18, 307—320 (1999)

312

N. NAGELKERKE E¹ A¸.

step function for given b (that is, for given exp(bx )). Then, we estimate both b and S (a) by G  alternating between: (i) estimating S (a) for given b,  (ii) estimating b for given S (a),  and (re)iterate this procedure until convergence has occurred. The analogy with the EM algorithm suggests the name of MM algorithm (maximization-maximization) for this procedure. Since (conditional) maximum likelihood is used at each step, the overall likelihood will be non-decreasing at each iteration. As the iteration process may halt at a local maximum, resulting estimates have to be inspected before being accepted, for example, by comparing predicted prevalences with observed prevalences, looking for systematic deviations. 4.1. Estimating the covariance matrix of bK and SK (a)  Even for stationary situations, that is, without covariates, the asymptotics of isotonic estimators are extremely complex. To obtain unconditional (co)variance estimates of bK and SK (a) bootstrap ping can be used; B (bootstrap) samples of the same size as the observed sample are taken from the observed sample with replacement. To investigate sampling properties, such as standard errors, of samples from the underlying (population) distribution, bootstrap samples are treated as if they are samples from this underlying distribution instead of from an estimate of it. Relatively small bootstrap sample sizes B may suffice to estimate the standard error of the bK ’s and use these to construct approximate confidence intervals. However, the bootstrap distribution of SK (a) may  be highly asymmetric. Thus, bootstrap confidence intervals  are needed to construct reliable confidence intervals for SK (a). This often requires very large bootstrap samples and may thus be  highly computer intensive. An advantage of bootstrapping is that special features of the sample, such as stratified, multi-stage and cluster sampling can be taken into account. For example, if instead of simple random samples of individuals random samples of schools were taken, those schools can be resampled. Alternatively, ‘quick and dirty’ methods can be used. One such method is to calculate the covariance matrix of bK conditional on SK (a). This ‘naive’ procedure will generally lead to an  underestimate of the true standard errors. However, when the covariates have been properly centred, the estimators bK and SK (a) may not be strongly correlated and then the conditional  covariance matrix will be close to the unconditional one. An advantage of this method is that it is automatically available provided covariates have been centred to have zero mean. Another ‘quick and dirty’ method is to use the Fisher information matrix of the unconstrained maximum likelihood (in which SK (a) is not forced to be monotonic) as an estimator of the covariance matrix  of the constrained ML estimator. As the unconstrained ML estimator ignores the a priori information about monotonicity, this method may tend to overestimate standard errors. Also, unless data are heavily grouped or restricted to certain ages, calculations may involve very large matrix manipulations, making this method unattractive. 5. EXAMPLE: TUBERCULIN SURVEYS IN THE NETHERLANDS As an example, we applied our methods to tuberculosis (TB) epidemiology. To study the epidemiology of TB infection in The Netherlands, surveys of schoolchildren were carried out in the period 1966—1973. Children were tested by means of Tuberculin (PPD). Indurations Copyright  1999 John Wiley & Sons, Ltd.

Statist. Med. 18, 307—320 (1999)

313

AGE—TIME SPECIFIC INFECTION INCIDENCE

Figure 1. TB infection-free survival curves, females and males, by age, The Netherlands, cohort of 1956

'"10 mm were considered evidence of previous infection with M. tuberculosis. The data are reproduced from Sutherland et al. (a reprint with minor amendments of an earlier publication) where further details can be found. The data: numbers PPD positive; numbers tested; age in years; year of birth and sex (0"female, 1"male), are presented in Appendix II. This data has also been used to develop and validate dynamical models of tuberculosis transmission. As information on PPD positivity (induration'"10 mm) was originally published as percentages with two digits after the decimal point, numbers on PPD positives may have a small error. As in some cities in The Netherlands, oral BCG vaccination was given to newborns during 1950 and 1951; these cohorts have been omitted from analysis. Otherwise, health policy in the Netherlands has been opposed to vaccination in order to retain tuberculin testing as a screening and monitoring instrument. A proportional hazards model was used with year of birth and sex as covariates. For purposes of centring, 1956)10 and 0)53 were subtracted from the recorded year of birth and sex, respectively. Figure 1 shows the estimated infection-free survival curve SK ( ), that is, the probability of being still uninfected at age a, for girls and boys born in 1956. Note that as a result of the large sample size, grouping of data points occurs in only two intervals: between the ages of 6 and 8 years, and between the ages of 9 and 11 years. A plot of observed versus predicted prevalences (not shown) showed good agreement, suggesting a global maximum of the likelihood was found. The age specific hazard curves for the years 1956 and 1961 were calculated using (3). The estimated cumulative hazard function of 1956 was smoothed using a monotonic smoothing cubic Copyright  1999 John Wiley & Sons, Ltd.

Statist. Med. 18, 307—320 (1999)

314

N. NAGELKERKE E¹ A¸.

Figure 2. Annual risk of TB infection, females, by age, The Netherlands

spline (with two knots), using the SAS PROC TRANSREG procedure. The derivative of this spline, a smooth estimate of the hazard function of 1956, is presented in Figure 2. In addition, Figure 2 shows the hazard curve for 1961, calculated from the 1956 hazard curve under the proportional hazards assumption. The decrease in the force of infection over just a five year period is impressive. Note that the higher curve in Figure 2 pertains to individuals who are at age a during the year 1956, whereas Figure 1 pertains to individuals born in 1956 and are thus at age a during the year 1956 #a. As no observations are made before age 6, we estimated the hazard function from that age onward. However, note that there already appears to be a non-negligible prevalence of infection at age 6. Thus, there must have been a considerable force of TB infection before that age if we assume absence of infection at birth. As the force of infection appears to be near zero between the ages 6 and 10, and as we restrict estimation of hazards to ages over 6, this might easily be overlooked. Unfortunately, the way the cumulative hazard at age 6 is ‘spread out’ over the first 6 years of life is unknown. This ‘high’ prevalence among 6-year-old children is perhaps due to contacts with infectious relatives. Alternatively, it may be due to false-positive reactions; the tuberculin test is known to have imperfect specificity as a consequence of crossreactions with non-specific mycobacteria. After age 10 the hazard increases sharply with age, perhaps as a consequence of increasing social interaction with age. For the analysis of tuberculin surveys this apparent age dependency implies that adjustment for age in the comparison between surveys needs to be done very carefully. Comparison of summary annual risks of infections in different surveys, calculated on the assumption of an age independent infection risk, as is Copyright  1999 John Wiley & Sons, Ltd.

Statist. Med. 18, 307—320 (1999)

315

AGE—TIME SPECIFIC INFECTION INCIDENCE

commonly done in such analyses, may be misleading if the age structure in the different surveys is very different. To estimate uncertainty in bK , 100 bootstrap runs were carried out. Although cluster (schools) sampling was used, a breakdown of information by cluster is no longer available, so instead data were analysed as if the data were from a simple random sample. Such ignoring of the design effect usually leads to some underestimation of the true standard errors. Estimates of b’s for year of birth and sex yielded estimates (SE bootstrap) of !0)119 (0)0036) and 0)089 (0)0098), corresponding to an annual rate of decline in the force of infection of 11)2 per cent (95 per cent CI:10)6—11)8 per cent), and a 9)3 per cent (95 per cent CI: 7)2—11)5 per cent) higher risk of infection for boys than for girls. No attempt was made to estimate uncertainty in SK (a) as 100 bootstrap samples may be  insufficient to reliably estimate bootstrap percentiles. The ‘naive’ conditional standard errors for bK were also calculated. Estimates of the conditional standard errors of the two coefficients are 0)0014 and 0)0094, respectively. As expected, the conditional estimators underestimate the errors, although for sex the conditional estimate is close to the bootstrap estimate. An ANSI C computer program which performs the proposed analysis is available from the corresponding (first) author. APPENDIX I Estimating b for given S0 (a) For given S (a) the problem of estimating b is straightforward. The conditional (on S (a)) ML   estimator of b can be obtained by equating the derivative with respect to b of the log-likelihood (4) to 0 and solving it for bK . This can be done using, for example, Newton—Raphson. As expressions for the derivatives involved are straightforward they do not need to be presented here. Minus the inverse of the matrix of second derivatives yields an estimate of the conditional (on S (a)) covariance matrix of bK .  Estimating S0 (a) for given b We propose the following recursive algorithm to estimate S (a) for given +exp (bx ),.  G Algorithm: Let observed ages be a , a , . . . . a .   L Let P (a , a ), (1)i)k)n) be the (scalar) function of all observations with a )a )a , which I G H I assumes as its value the P that maximizes & +exp (bx )y log (P)#(1!y ) log (1!P @VH ), H H H H where summation takes place over all observations (y , a , x ) with a )a )a . Thus, the function H H H G H I P(a , a ) is the maximum likelihood estimator of the survival function over the interval [a , a ] if G I G I we force it to be constant over that interval, and ignore all information outside it (and thus monotonicity constraints). The function SK ( ) is now calculated as follows:  1. let SK (0)"1; let k"n;  2. check whether P(a , a ))SK (a ); L I  L\I\ Copyright  1999 John Wiley & Sons, Ltd.

Statist. Med. 18, 307—320 (1999)

316

N. NAGELKERKE E¹ A¸.

3. if so, then let SK (a )"2"SK (a )"P(a , a );  L  I L I 4. otherwise, let k"k!1 and go to 2. This recursive algorithm can easily be implemented using a computer language which allows recursive function calls, such as Pascal or C. This type of algorithm is also known as backtracking. Essentially the algorithm does not solve SK (a) explicitly but defines SK (a ) in terms of   L +SK (a ), (i'0), which when the algorithm is first ‘called’ are still unknown, and also have to be  L\G solved using a recursive call of the same algorithm. As SK (a "0)"1 is known the recursion   cannot become infinitely ‘deep’. Note, this recursive algorithm, while extremely easy to program, may be inefficient and might cause stack overflow problems when the recursion becomes very deep. In such cases we recommend adapting one of the non-recursive algorithms for isotonic regression to the present situation. For faster convergence of the iterative estimation procedure it is probably best to centre the x , G that is, to use deviations of x from their mean value. Note that even when the MM algorithm G converges to the global (and not a local) maximum of the likelihood, the estimator SK ( ) of S ( )   may not be consistent everywhere. For example, if S ( ) is strictly monotonical and observations  are only made at a fixed number n of ages a , . . . , a , but at each point the number of  L observations goes to infinity, then SK ( ) will only be consistent in the observed ages +a ,.  G Consistency in +a , follows from the facts that: (i) the unconstrained (that is, not constrained to G monotonicity) ML estimator is consistent; and (ii) that, as a corollary of ( i), the probability that any non-monotonicity occurs converges in probability to zero. Thus the constrained ML estimator converges to the unconstrained one and is thus also consistent. By contrast, SK ( )  asymptotically overestimates the true survival curve everywhere else as its value between a and I a equals a . I> I We will show that the recursive (backtracking) algorithm to calculate SK (a), discussed above,  maximizes the log-likelihood (4) for given +exp (bx ), among all step functions &c I [0, a ); c *0. G G G G This is a non-linear programming problem. From the Kuhn—Tucker necessary conditions for the solution to such a problem it follows that either: (i) c "0 that is, it assumes a value on the boundary of the feasible region; or G (ii) d¸/dc "0, that is, c is an unconstrained maximum if c is not on the boundary of the G G G feasible region. As the step heights +SK (v ), between successive ‘jumps’ or ‘vertices’ +v , of SK (a) are linear  G G  combinations of the +c ,, the +SK (v ), also have to be unconstrained maxima conditional on the G  G vertices +v ,. As ¸ only depends on SK (v ) through the observations v )a (v , SK (v ) is G  G G H >  G a constant that maximizes the part of the log-likelihood that depends on observations at +a "v )a (v ,. Thus, given the vertices, the (conditional) maximum likelihood is uniquely H G H G> determined. To determine which observations a are vertices, we first note that the more vertices H the larger the value of the likelihood, as more vertices make SK ( ) fit the data better. However, this  is subject to: (i) the monotonicity constraint on SK ( );  (ii) step heights having to be unconstrained ML estimates as a result of the Kuhn—Tucker necessary conditions. The latter condition implies that if a is a vertex, then the two sections of SK ( ) on the two H  subdomains 0)a (a (that is, ‘left’ of a ) and a*a (that is, ‘right’ of and including a ) are H H H H H Copyright  1999 John Wiley & Sons, Ltd.

Statist. Med. 18, 307—320 (1999)

317

AGE—TIME SPECIFIC INFECTION INCIDENCE

unconstrained maximum likelihood estimates over the observations to the left and to the right of a (a recursive property!). H We start by considering the highest observation a and see whether we can make it a vertex. To L do this we have to calculate the maximum likelihood over observations to the left of a L (recursion!) and over a itself. If this does not give rise to inequality constraints we can make L a a vertex. If, however, it does give rise to an inequality constraint violation then a canot be L L a vertex. We then have to try to make a a vertex, and as a is not a vertex, thereby pooling L\ L a and a . This process continues until an observation can be made a vertex. L\ L APPENDIX II PPD# 76 64 36 19 1 3 143 185 144 164 7 4 71 77 68 89 6 6 37 27 53 64 25 35 11 5 120 111 92 100 167 127 82 64 254 289 238 250

N

AGE

BRTHYR

SEX

20047 20821 8769 9105 1087 1076 51379 54581 46561 49765 1307 1500 18865 22114 18593 21944 794 1040 3169 4151 12105 12161 9247 9018 1898 1741 17432 17136 23213 22886 38877 37415 15891 15070 41666 41345 44968 45519

6 6 6 6 6 6 7 7 7 7 7 7 8 8 8 8 8 8 9 9 9 9 9 9 9 9 10 10 10 10 10 10 10 10 11 11 11 11

64 64 65 65 66 66 63 63 64 64 65 65 62 62 63 63 64 64 61 61 62 62 63 63 64 64 60 60 61 61 62 62 63 63 59 59 60 60

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

Copyright  1999 John Wiley & Sons, Ltd.

PPD# 220 239 110 102 252 237 270 303 255 248 209 202 317 336 157 188 112 128 45 71 777 861 773 867 682 751 594 620 361 420 56 72 43 47 24 19 1290 1468

N

AGE

BRTHYR

SEX

44028 45256 20407 20046 20898 19818 23967 22297 23884 23013 23322 21755 44464 46736 23843 27326 16348 18951 7925 9024 59357 55573 58154 56705 57816 55662 55538 54467 42950 46373 5198 7272 2596 3691 1026 1522 72109 76069

11 11 11 11 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 14 14

61 61 62 62 54 54 55 55 56 56 57 57 58 58 59 59 60 60 61 61 53 53 54 54 55 55 56 56 57 57 58 58 59 59 60 60 52 52

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

Statist. Med. 18, 307—320 (1999)

318

N. NAGELKERKE E¹ A¸.

APPENDIX II. Continued PPD# 1078 1357 1016 1185 851 1046 416 520 1131 1660 992 1379 879 1316 369 499 877

N

AGE

BRTHYR

SEX

70005 74578 68243 72714 65500 69774 39630 45297 57683 72524 56704 69647 54636 68945 29762 38757 38992

14 14 14 14 14 14 14 14 15 15 15 15 15 15 15 15 16

53 53 54 54 55 55 56 56 52 52 53 53 54 54 55 55 52

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0

PPD# 1369 744 1211 301 555 938 1527 553 1014 186 421 586 1135 442 995 108 326

N

AGE

BRTHYR

SEX

56824 37433 55572 18499 30665 25435 40962 22242 39479 10184 20287 13961 26289 11940 24878 4504 12212

16 16 16 16 16 17 17 17 17 17 17 18 18 18 18 18 18

52 53 53 54 54 49 49 52 52 53 53 48 48 49 49 52 52

1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

ACKNOWLEDGEMENTS

The authors wish to thank two anonymous referees for their many valuable suggestions. REFERENCES 1. Muench, H. Catalytic Models in Epidemiology, Harvard University Press, Boston, 1959. 2. Griffiths, D. A. ‘A catalytic model of infection for measles’, Applied Statistics, 23, 330—339 (1974). 3. Brookmeyer, R. and Quinn, T. C. ‘Estimation of current human immunodeficiency virus incidence rates from a cross-sectional survey using early diagnostic tests’, American Journal of Epidemiology, 141(2), 166—173 (1995). 4. Styblo, K. ‘The relationship between the risk of tuberculous infection and the risk of developing tuberculosis’, Bulletin of the International ºnion against ¹uberculosis and ¸ung Disease, 60, 117—119 (1985). 5. Hoel, D. G. and Walburg, H. E. ‘Statistical analysis of survival experiments’, Journal of the National Cancer Institute, 49, 361—372 (1972). 6. Farrington, C. P. ‘Modelling forces of infection for measles, mumps and rubella’, Statistics in Medicine, 9, 953—967 (1990). 7. Cox, D. R. and Hinkley, D. V. ¹heoretical Statistics, Chapman and Hall, London, 1974. 8. Gregson, S., Donnelly, C .A., Parker, G. and Anderson, R. M. ‘Demographic approaches to the estimation of HIV-1 infection among adults from age-specific prevalence data in stable endemic conditions’, AIDS, 10, 1689—1697 (1996). 9. Batter, V., Matela, B., Nsuami, M., Manzila, T., Kamenga, M., Behets, F., Ryder, R. W., Heyward, W. L., Karon, J. M. and St Louis, M. E. ‘High HIV-1 incidence in young women masked by stable overall seroprevalence among childbearing women in Kinshasa, Zaire: estimating incidence from serial seroprevalence data’, AIDS, 8, 811—817 (1994). 10. Goubar, A. and Costagliola, D. ‘Estimate of HIV incidence in childbearing women in the Paris area, France’, AIDS, 11, (5), 699—701 (1997). 11. Grenfall, B. T. and Anderson, R. M. ‘The estimation of age-related rates of infection from case notifications and serological data’, Journal of Hygiene, 95, (2), 419—436 (1985). 12. Brunet, R. C. and Struchiner, C. J. ‘Rate estimation from prevalence information on a simple epidemiological model for health interventions’, ¹heoretical Population Biology, 50, 209—226 (1996). Copyright  1999 John Wiley & Sons, Ltd.

Statist. Med. 18, 307—320 (1999)

AGE—TIME SPECIFIC INFECTION INCIDENCE

319

13. Finkelstein, D. M. ‘A proportional hazards model for interval-censored failure time data’, Biometrics, 42, 845—854 (1986). 14. Keiding, N. ‘Age specific incidence and prevalence: a statistical perspective’, Journal of the Royal Statistical Society, Series A, 54, 371—412 (1991). 15. Diamond, I. D. and McDonald, J. W. ‘Analysis of current-status data’, in Trussell, J., Hankinson, R. and Tilton, J. (eds), Demographic Applications of Event History Analysis, Oxford University Press, Oxford, 1991, Chapter 12. 16. Groeneboom, P and Lopuhaa, H. P. ‘Isotonic densities of monotone densities and distribution functions: basic facts’, Statistica Neerlandica, 47, 175—183 (1993). 17. Barlow, R. E., Bartholomew, D. J., Bremner, J. M. and Brunk, H. D. Statistical Inference under Order Restrictions, Wiley, New York, 1972. 18. Ayer, M., Brunk, H. D., Ewing, G. M., Reid, W. T. and Silverman, E. ‘An empirical distribution function for sampling with incomplete information’, Annals of Mathematical Statistics, 26, 641—647 (1955). 19. Grenander, U. ‘On the theory of mortality measurement II’, Skand. Akt. ¹idskr. 39, 126—153 (1956). 20. Groeneboom, P. ‘Asymptotics for incomplete censored observations’, Report 87-18, University of Amsterdam, The Netherlands, 1987. 21. Scott, D. W. Multivariate Density Estimation, Wiley, New York, 1992. 22. Smith, P. L. ‘Splines as a useful and convenient statistical tool’, American Statistician, 33, 57—62 (1979). 23. Nagelkerke, N. J. D., and Oosting, J. ‘The fitting of growth curves using constrained splines’, Computers and Biomedical Research, 16, 144—148 (1983). 24. Sutherland, I., Bleiker, M. A., Meijer, J. and Styblo, K. ‘The risk of tuberculosis infection in The Netherlands from 1967 to 1979’, ¹ubercle, 64, (4), 241—253 (1983). 25. Bleiker, M. A., Sutherland, I., Styblo, K., Ten Dam, H. G. and Misljenovic, O. ‘Guidelines for estimating the risk of tuberculous infection from tuberculin test results in a representative sample of children’, Bulletin of the International ºnion against ¹uberculosis and ¸ung Disease, 64, 7—12 (1989). 26. Walker, J., Nokes, D. J. and Jennings, R. ‘Longitudinal study of toxoplasma seroprevalence in South Yorkshire’, Epidemiology and Infection, 108, 99—106 (1992). 27. Ades, A. E. and Nokes, D. J. ‘Modeling age- and time-specific incidence from seroprevalence: toxoplasmosis’, American Journal of Epidemiology, 137, 1022—1034 (1993). 28. Marschner, I. C. ‘A method for assessing age-time disease incidence using serial prevalence data’, Biometrics, 53, 1384—1398 (1997). 29. Blower, S. M., McLean, A. R., Porco, T. C., Small, P. M., Hopewell, P. C., Sanchez, M. A. and Moss, A. R. ‘The intrinsic transmission dynamics of tuberculosis epidemics’, Nature Medicine, 1, 815—821 (1995). 30. Kupper, L. L., Janis, J. M., Karmous, A. and Greenberg, B. G. ‘Statistical age-period-cohort analysis: a review and critique’, Journal of Chronic Disease, 38, 811—830 (1985). 31. Holford, T. R. ‘An alternative approach to statistical age—period-cohort analysis’, Journal of Chronic Disease, 38, 831—840 (1985). 32. Keiding, N. ‘Statistical inference in the Lexis diagram’, Philosophical ¹ransactions of the Royal Society of ¸ondon, Series A, 332, 487—509 (1990). 33. Berzuini, C. and Clayton, D. ‘Bayesian analysis of survival on multiple time scales’, Statistics in Medicine, 13, 823—838 (1994). 34. Cox, D. R. ‘Regression models and life tables’, Journal of the Royal Statistical Society, Series B, 34, 187—220 (1972). 35. Huang, J. ‘Efficient estimation for the proportional hazards model with interval censoring’, Annals of Statistics, 24, 540—568 (1996). 36. Little, R. J. A., and Rubin, D. B. Statistical Analysis with Missing Data, Wiley, New York, 1987. 37. Efron, B. ¹he Jackknife, the Bootstrap and other Resampling Plans, SIAM monograph 38, CBNS-NSF, Philadelphia, 1982. 38. Efron, B. and Tibshirani, R. J. An Introduction to the Bootstrap, Chapman and Hall, London, 1993. 39. Davison, A. C. and Hinkley, D. V. Bootstrap Methods and their Application, Cambridge University Press, Cambridge, 1997. Copyright  1999 John Wiley & Sons, Ltd.

Statist. Med. 18, 307—320 (1999)

320

N. NAGELKERKE E¹ A¸.

40. Sutherland, I., Bleiker, M. A., Meijer, J. and Styblo, K. ‘The risk of tuberculous infection in The Netherlands from 1967 to 1979’, in Selected Papers 22, Royal Netherlands Tuberculosis Asscociation, The Hague, 1984, pp. 75—91. 41. Garcia, A., Maccario, J and Richardson, S. ‘Modelling the annual risk of tuberculosis infection’, International Journal of Epidemiology 26, 190—203 (1997). 42. SAS Institute Inc. SAS/S¹A¹ ºser’s Guide, »ersion 6, Fourth Edition, »olume 2, SAS Institute Inc., Cary, NC, 1989. 43. Rieder, H. L. ‘Methodological issues in the estimation of the tuberculosis problem from tuberculin surveys’, ¹ubercle and ¸ung Disease, 76, (2), 114—121 (1995). 44. Sedgewick, R. Algorithms, Addison—Wesley, Reading, 1984. 45. Walsh, G. R. Methods of Optimization, Wiley, New York, 1977.

Copyright  1999 John Wiley & Sons, Ltd.

Statist. Med. 18, 307—320 (1999)

Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.