Do General Dimensions of Quality of Life Add Clinical Value to Symptom Data?

June 15, 2017 | Autor: Mary Redman | Categoria: Palliative Care, Quality of life, Multivariate Analysis, Humans, Male, Health Status
Share Embed


Descrição do Produto

Do General Dimensions of Quality of Life Add Clinical Value to Symptom Data? Carol M. Moinpour, Gary W. Donaldson, Mary W. Redman

J Natl Cancer Inst Monogr 2007;37:31–8

In evaluating the effect of cancer treatment on the patient, we intuitively seek the patient’s perspective of global impacts on day-to-day life. The physician’s question, “how are you doing?” has the same motivation. The designers of clinical trials, however, often limit primary inquiry to physical aspects of illness and functioning. In part, this may reflect greater familiarity with reports based on traditional therapeutic outcomes, such as symptom status, compared with more global measures of health-related quality of life (HRQL) (1). However, comprehensive HRQL assessment (1–3) requires a broader standard of evidence. In this article, we show that the evidence contained in the broader HRQL assessment adds clinical value to the targeted symptom evaluation. Standard analyses of HRQL outcomes at different time points are limited in the types of questions that they can answer. For example, under certain assumptions, the sample mean difference in a randomized controlled trial can be interpreted as an estimate of the average causal effect of the treatment on the measured outcome (4–10). However, this summary may not always adequately address important clinical questions. A broader examination requires methods amenable to multivariate longitudinal outcomes and predictors, that allow for individual diversity in treatment response, and that account for measurement error in these responses. This article poses four important clinical questions and describes analytic methods to address these questions. We propose that if HRQL data gathered in clinical trials are analyzed using the proposed methods, such results can greatly extend our clinical understanding of the broader impact of treatment effects on patient HRQL. A patient’s perception of global HRQL (GHRQL) is a highly personal evaluation of overall functioning and well-being. GHRQL reflects a number of components, one of which might be the degree of pain experienced by the patient. In addition to the magnitude of the difference in treatment arm means, we can also ask questions about relationships among the various components Journal of the National Cancer Institute Monographs, No. 37, 2007

of GHRQL. Simple regression analyses reveal the extent to which we can predict GHRQL based on knowing the patient’s reported pain intensity. However, we might also like to know if the relationship between pain intensity and GHRQL differs by treatment arm. By including time in the model, we can examine how broader HRQL outcomes may change over the course of treatment. That is, change in the residual component of GHRQL (i.e., “pain-free” GHRQL) may differ across treatment arms, reflecting other treatment impacts or side effects. Or, the treatment may induce greater residual GHRQL response variability across patients in one treatment arm than in another even if their mean trajectories are equal. Questionnaires and surveys used in social science and health research should meet standard criteria for reliability to reduce measurement error in questionnaire scores and trial outcomes (11–13). However, because any observed score still contains measurement error obscuring reliable inference (14), we support the additional use of techniques presented in this article, which further reduce the impact of measurement error. Statistical techniques such as growth curve/latent trajectory modeling (LTM) (14–18) can, under certain assumptions, take advantage of repeated assessments to disentangle measurement and other error from true individual variation in GHRQL around the mean response.

Affiliations of authors: Southwest Oncology Group Statistical Center, Fred Hutchinson Cancer Research Center, Seattle, WA (CMM, MWR); Pain Research Center, University of Utah, Salt Lake City, UT (GWD). Correspondence to: Carol M. Moinpour, PhD, Southwest Oncology Group Statistical Center, Fred Hutchinson Cancer Research Center/M3-C102, 1100 Fairview Ave North/Box 19024, Seattle, WA (e-mail: cmoinpou@ fhcrc.org). See “Notes” following “References.” DOI: 10.1093/jncimonographs/lgm007 © The Author 2007. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: [email protected].

31

Downloaded from http://jncimono.oxfordjournals.org/ by guest on February 4, 2016

Since global health-related quality of life (GHRQL) reflects broad impacts of treatment, its assessment in an advanced-stage disease trial should add valuable clinical information beyond that of a targeted symptom. Using latent trajectory modeling that allowed for individual trends as well as overall relationships, we reanalyzed three repeated assessments of the present pain intensity from the McGill Pain Questionnaire and the European Organization for Research and Treatment of Cancer Quality of life Questionnaire- Core 30 (QLQ-C30) GHRQL score from a hormone-refractory prostate cancer trial. Within- and between-treatment differences not detected in the original S9916 report of pain palliation and GHRQL suggested substantial individual variation in GHRQL level and change after controlling for within-assessment pain. The treatment had a differential effect on the relationship between GHRQL and pain; we observed an approximately threefold stronger association of reported pain with GHRQL in the docetaxel plus estramustine (D + E) arm compared with the mitoxantrone plus prednisone (M + P) arm (P = .02). In addition, the treatment had an effect, on average, on the rate of change in GHRQL, after controlling for pain level. GHRQL for patients on the M + P arm tended to improve over the assessment period while GHRQL tended to deteriorate for patients on the D + E arm (P = .02). Important, interpretable effects and systematic individual variation in GHRQL remain after controlling statistically for the effects of pain, the targeted symptom, in this trial. In addition, identifying the rate at which a person’s GHRQL changes or responds to treatment provides clinically relevant information.

Clinical Questions The following questions reflect key clinical issues and determine the definition and comparison of models. These questions were generated by considering the different ways by which a treatment can impact a patient’s HRQL. The outcome most familiar to researchers is the average symptom or HRQL response compared across treatment arms. But treatment effects may manifest in other types of evidence as well. 1. Does the strength of the association between GHRQL and pain differ by treatment arm? 2. Among patients with the same pain scores, do the average level and rate of change in GHRQL differ across the two treatment arms? 3. Do the treatments produce greater variability in how GHRQL changes in one arm compared with the other (i.e., is one less consistent, and hence, less predictable relative to the other)? 4. How large is the variability in true rate of change in GHRQL after controlling for pain? To illustrate the LTM approach, we will use pain intensity and GHRQL variables from a Southwest Oncology Group (SWOG) trial in hormone-refractory prostate cancer (HRPC) (20–22). In HRPC, bone pain is the primary symptom experienced by patients (23). Therefore, pain intensity is the primary targeted symptom in our analysis (and was one of two primary HRQL outcomes in the SWOG trial). We investigate whether structured multivariate analyses incorporating GHRQL add value in addition to consideration of pain intensity alone. “Added value” is defined as the information gained by the enhanced interpretation of treatment effects on multivariate HRQL outcomes when both within-arm and within-individual variation are considered. This preliminary reanalysis of the SWOG trial provides a first step in supporting the value of measuring the full spectrum of HRQL outcomes. Structured multivariate analyses can incorporate longitudinal measures of multiple HRQL domains and symptoms. “Structured” refers to a sequenced specification of two types of models. A measurement model links observed variables (indicators) to unob32

served, latent variables (constructs). A structural model represents our hypotheses about the interrelationships among latent variables (constructs); that is, the structural model suggests causal relationships (direct and indirect) among these constructs (17,24).

Methods Patients and Study Design Both regimens, docetaxel plus estramustine (D + E) and mitoxantrone plus prednisone (M + P), are known to palliate pain in HRPC. In SWOG9916, the D + E arm was hypothesized to have greater clinical efficacy as well as equivalent or better palliation of diseaserelated symptoms, particularly pain. Men with stage D1 or D2 prostate cancer were randomized to one of the two regimens. Details of eligibility criteria, treatment regimens, and therapeutic results have been published (22). Petrylak et al. (22) reported significantly better clinical efficacy for three clinical endpoints for D + E versus M + P: overall survival advantage, median time to progression, and percentage of patients showing at least a 50% reduction in prostate-specific antigen. Health-Related Quality of Life Preliminary Findings and Assessment Plan for Current Analysis HRQL measures of interest are the present pain intensity (PPI) item from the McGill Pain Questionnaire (25) and the GHRQL item from the European Organization for Research and Treatment of Cancer Quality of life Questionnaire- Core 30 (26,27). These outcomes were prespecified as primary HRQL outcomes for the SWOG trial and were analyzed separately. [See Berry et al. (21) for a description of the full set of HRQL measures collected in the SWOG trial.] The McGill pain intensity (PI) score used for this analysis ranges from 0 to 10 and was collected at the beginning of each of eight cycles and then at 12 months after randomization. The GHRQL score ranges from 0 to 100 and was collected at the beginning of cycles 1, 4, and 8 and at 12 months after randomization. Published Health-Related Quality of Life Results. Determination of pain palliation used data from eight potential cycles of treatment and at 12 months; GHRQL data were analyzed (as continuous data in a set of pattern mixture sensitivity analyses) from the four scheduled time points. Seven hundred twenty three patients were eligible for the HRQL component of the study; 97% submitted baseline pain forms, 96%, baseline QLQ-C30 forms. Published HRQL findings (20,21) indicated no conclusive evidence for a statistically significant difference in the GHRQL endpoint. We also found no statistically significant treatment arm difference for pain palliation (20,21). The pain palliation outcome was analyzed separately as a composite variable with three criteria: a 2-point or greater change in the PPI score, PPI score maintained for at least two consecutive cycles, and no increase in analgesic use (28). The proportion of pain responders was compared for the two treatment arms in Berry et al. (21), whereas the PI continuous score is used in the structured multivariate analyses reported here. Structured Multivariate Analysis. The analysis reported in this article included both primary outcomes in the same analysis: the 0 to 10 PI score and the total GHRQL score for those time points at Journal of the National Cancer Institute Monographs, No. 37, 2007

Downloaded from http://jncimono.oxfordjournals.org/ by guest on February 4, 2016

Although it is well known that the mean response does not reflect an individual patient’s response, this important caveat needs to be constantly reinforced, since the mean response difference is the only result usually cited in both academic journals and the popular media. In the following discussion, we highlight the importance of individual variability (19) in HRQL scores at single time points and over time. To use this approach successfully (as in any modeling pproach) requires an explicit model of how individuals are systematically changing, including a system of equations to address baseline levels, population effects, and intra- and interindividual change. We propose the use of structured multivariate analyses to address a set of questions of interest to clinicians and to examine relationships among variables, rather than among single outcomes. These analyses allow the researcher to profit from the wealth of information present in longitudinal HRQL assessments. In this article, we support our proposal that an analysis that identifies and quantifies the systematic evidence revealed by multivariate analysis of HRQL and targeted symptom outcomes adds clinical value to mean differences in outcomes reported for clinical trials.

which both pain and GHRQL were collected (baseline, week 10 [cycle 4], and month 6 [cycle 8]). Higher scores reflect both worse pain and worse GHRQL in the current analysis. Data from the 12month assessment are not included because there were substantial missing data and analysis of datasets with complicated missing data patterns is not the focus of this effort. Statistical Methods Below we apply LTM to the SWOG dataset and outline how the four questions noted above correspond to statistical models that are compared with an initial reference model.

yit( g ) = ai( g ) + bi( g )t + γ ( g ) X it( g ) + ε(itg ) , ( ai( g ) , bi( g ) )′ ~ MVN( ( α ( g ) , β( g ) )′, Ψ ( g ) ), ε(itg ) ~ N( 0, σ2( g ) ), where MVN denotes the multivariate normal distribution and N indicates the normal distribution; other parameters are described below. There are two parallel models indicated by the superscript (g) in the equation, one for each treatment arm. In the reference model equation,“patient-specific” intercepts and slopes are represented as ai and bi, respectively, random variables that correspond to the latent I and S variables in Fig. 1. The intercepts and slopes define the linear trends in GHRQL with respect to time t. The linear trends may differ even among patients who share the same set of pain scores, as indicated by the variance parameters ⌿ (ag ) and ⌿ (bg ) in Table 1. The variance of the intercept variable ( ⌿(ag ) ) is the extent of true patient differences in initial GHRQL levels, and the variance of the slope variable ( ⌿ (bg ) ) is the extent of true patient differences in rates of GHRQL change, both controlled for contemporaneous pain level. The mean intercept and slope may also differ across treatment arm, and these population means are denoted by ␣(g) and ␤(g), respectively. The latent intercept and slope variables are correlated, as shown in Fig. 1 by the curved arrow between I and S; a person with a higher baseline may tend to have either a higher (positive correlation) or lower (negative correlation) rate of change. In the model equation, ⌿( g) represents the covariance matrix of a and b; it incorporates the above correlation as well as variability in the patient-specific intercepts and slopes. Xit stands for measured pain intensity for individual i at time t. Gamma (␥(g)) is a regression parameter representing the average change in GHRQL for each unit increase in pain, within each time t. Journal of the National Cancer Institute Monographs, No. 37, 2007

Fig. 1. The figure shows the latent growth model consisting of two latent or unobserved variables, the intercept (I) and the slope (S). Patients can differ in baseline levels (I) and rates of change (S) in global health-related quality of life (GHRQL); the curved arrow indicates correlation between I and S, so that a person with a higher baseline level of GHRQL may have a higher (or lower) rate of growth. The ε are random errors with variance ␴2, representing measurement and other sources of error.

This relationship between GHRQL and pain is allowed to differ between treatment arms but is assumed to be constant over time. The e(g) are random residuals with variance ␴2 (assumed constant across individuals and occasions) comprising errors of measurement (in GHRQL) and prediction (of GHRQL) from pain. The equalities assumed for regression parameters and error variances are simplifying restrictions not critical for this illustration; as with certain other simplifying assumptions not discussed here, they could be tested and relaxed if necessary. Clinical Questions and Models. Our four clinical questions can now be pursued by estimating a sequence of LTM models. For consistency of numbering, we define the reference model as model 0, against which other models are compared; the reference model has no parameter restrictions across treatment arms. Each model successively adds parameter restrictions across treatment arms to test the question of clinical interest. There are other model sequences that could be followed: for example, one can begin with the null model (full restrictions) and progressively relax restrictions on parameters. For each of the models below, the clinical question is rephrased as a statistical question. Question 1/model 1. Does the strength of the association between pain and GHRQL differ by treatment arm? The clinical question of interest can be posed statistically as: does the regression coefficient ␥ differ across treatment arms? To answer this question, model 1 restricts the regression of GHRQL on pain to be equal for the two arms. A test of this question is provided by comparing model 1 (the restricted model) with model 0, the reference model. If model 1 is rejected, then we conclude that treatment had a differential effect on the relationship between GHRQL and pain as indicated by the regression of GHRQL on pain. 33

Downloaded from http://jncimono.oxfordjournals.org/ by guest on February 4, 2016

Reference Model. Our strategy requires formulation of an initial reference model that is broad and flexible enough to accommodate restrictions and comparisons that test the criteria using the clinical trial data. Figure 1 portrays the relationship between GHRQL and pain at the three assessment occasions and is a partial representation of the reference model equation presented below. In this model, all parameters are allowed to vary across treatment arms; that is, each arm has a completely different set of parameter values. This permits the criteria associated with each model to be evaluated with systematic restrictions on the initial reference model across the treatment arms; this sequence is described below for models 1–3. The full model represents GHRQL (y) for patient i at assessment time t = 0,1,2 in population g as

Table 1. Nested models corresponding to text descriptions of models* Parameter estimates Fixed

Model number

Cross-group constraints

Random effects

␥(g)

␣(g)

␤(g)



(g ) a

⌿(bg )

⌿(abg)

␴2(g)

M+P

M+P

M+P

M+P

M+P

M+P

M+P

␹2 (df)

RMSEA

D+E

D+E

D+E

D+E

D+E

D+E

D+E

1.07 3.02 1.90

35.76 27.39 33.14 30.94 32.05

−1.10 1.27 −0.70 0.74 −0.06

36.11 27.51

−1.19 1.28

287.50 256.11 250.93 278.95 253.53 280.09 306.53 244.72

25.06 50.58 17.61 50.48 17.87 51.94 37.35

−45.83 −32.19 −36.56 −30.26 −36.85 −33.11 −58.84 −20.97

198.48 178.24 200.95 181.07 200.81 180.35 190.71 187.47

0 (reference model)



15.75 (10)

.042

1



21.41 (11)

.054

2

␣, ␤

23.64 (12)

.055

3

⌿b

16.69 (11)

.040

1.86 1.96 .96 2.98

Question 2/model 2. Among patients with the same pain scores, do the average level and rate of change in GHRQL differ across the two treatment arms? Do the mean GHRQL trajectories, controlled for pain intensity, differ by treatment arm? This question is tested by comparing model 0 with model 2, in which the mean slope and intercept are assumed equal across treatment arms. If model 2 is rejected in favor of model 0, then we conclude that treatment had an effect, on average, on the rate of change in GHRQL controlled for pain level. We would conclude that the components of broad GHRQL that do not depend on pain have different trend lines in the two treatment arms. Question 3/model 3. Do the treatments produce greater variability in how GHRQL changes in one arm compared with another (i.e., is one less consistent and, hence, less predictable relative to the other)? Does the variance in true rate of change in GHRQL, or slope, differ across treatment arms after controlling for pain? The question is examined by comparing model 0 with model 3, in which slope variances are assumed to be equal across treatment arms. If model 3 is rejected, then we would conclude that treatment affected the degree of clinical response heterogeneity. That is, broad GHRQL responses, conditional on pain, would be closer to the average across patients in one arm than another. Note that this is a very different question from the usual focus on whether the average treatment response differs by arm. Variability pertains to the diversity, rather than the average level, of treatment response. Descriptive question. How large is the variability in true rate of change in GHRQL after controlling for pain? Though descriptive, this represents the key clinical question of individual differences in treatment response. It is answered by evaluating the magnitude of the variance (or standard deviation) of the “slope” variable. Large variances imply diverse, heterogeneous responses, both across and within treatment arms. In this circumstance, it is risky to generalize the average treatment arm response to individual patients. The 34

standard deviation of the slope quantifies the expected deviation (of a randomly drawn patient) from the population mean response— the likely difference between a person’s slope and the population slope (14) for the treatment arm. For example, 32% of patients will have rates of change more than one standard deviation away from their population average slopes (the model assumes normality of the random effects) (14). This question need not involve a model comparison (though formal tests of zero variance for the slope, intercept, or both, are straightforward) but is usually answered by interpreting the magnitude of the slope variance in our final model. Estimation considerations. Latent trajectory modeling growth curve analyses were conducted with Mplus (29) using maximum-likelihood estimation under the assumptions of multivariate normality for the random effects and a missing-at-random (30) missing data mechanism. Two nested models permit a likelihood ratio test (LRT) based on the chi-square statistic. Although the number of such possible comparisons is very large, we restrict consideration to the smallest set of models required to test the above clinical hypotheses. A significance threshold of 0.05 was used for all tests. We make a number of assumptions to simplify the presentation of the LTM approach (see comments below in the Discussion section): stability of GHRQL regression coefficients over time; linear change in GHRQL (coding of 0, 1, and 2 for the paths from slope to GHRQL in Fig. 1 below); missing-at-random mechanism for missing data; and inferred differences in starting points and rates of change in GHRQL are “true” individual differences in the psychometric tradition (12,14). Latent trajectory modeling methods and associated software packages (e.g., Mplus) (29) can handle more complicated models, including those with multiple outcomes and covariates, that may depart from these assumptions.

Results Model results are presented in Table 1. Journal of the National Cancer Institute Monographs, No. 37, 2007

Downloaded from http://jncimono.oxfordjournals.org/ by guest on February 4, 2016

* RMSEA = root mean square error of approximation; (g) = treatment arm; ␥(g) = regression of GHRQL on pain; ␣(g) = mean intercept; ␤(g) = mean slope; ⌿(ag ) = variance of individual intercepts; ⌿(bg ) = variance of individual slopes; ⌿(abg) = covariance between slopes and intercepts. Estimates in bold reflect those constrained to be equal for the two treatment arms. Model 0, regression plus residual growth trajectories. Separate models for each treatment arm; Vs. 1, equal regressions of GHRQL on pain across arms (P = .02, rejected); Vs. 2, equal population mean trajectories (equal means slopes and intercepts) across arms (P = .02, rejected); Vs. 3, equal slope variances across arms (P = .33, not rejected), our final model.

Model 1 For the comparison of models 0 and 1, larger regression coefficients correspond to greater relevance of pain to GHRQL; smaller regression coefficients imply that pain has a less central role in determining GHRQL. The reference model ␹2 of 15.75 (df = 10) was subtracted from the model 1 ␹2 of 21.41 (df = 11). This comparison (df = 1) indicated a significant difference in the relationship by treatment arm over the 6-month study period (P = .02). Therefore, we reject model 1 in favor of model 0, the less constrained model. The data suggested that, on average, a unit difference in pain has approximately three times the negative association with GHRQL in the D + E arm compared with the M + P arm (see Table 1, reference model [0], fixed effects, ␥ = 3.02 for D + E regression of GHRQL on pain estimates versus 1.07 for M + P).

Model 3 Clinical question 3 addressed whether or not the groups differed in individual response variability. Model 3 constrained the slope variances to be equal across treatment arms. The hypothesis of equality could not be rejected (P = .33), and model 3 was therefore preferred to model 0 on the grounds of parsimony (i.e., there was no need to estimate separate slope parameters for each arm), estimating a common slope variance of 37.35 within treatment arms (see Table 1). The magnitudes of the other coefficients changed little between model 0 and model 3, which we concluded was our final, best-fitting parsimonious model. (This “final” model in our sequence need not, of course, represent the definitive analysis of these data.) Although there was no strong evidence of different heterogeneity across treatment arms, it is important to note that the common intrinsic variation is nonetheless substantial, as discussed below. Descriptive Question: Individual Response Variability. This question addressed the magnitude of the intrinsic variation in individual responses. Model 3, our final model, estimates the withinpopulation variance (variation in the rate of change over time among patients within an arm) as 37.35. This is intrinsic variability in treatment response, independent of measurement error, representing systematic but unexplained differences in patient trajectories. The standard deviation (6.11 for model 3) provides a more natural metric for interpretation, since by assumption the patient slopes (and intercepts) are distributed (multivariately) normally and therefore the usual normal curve relationships apply. For example, model 3 implies that, in the M + P arm, 68% of patients have conditional (pain-adjusted) GHRQL slopes between −7.30 Journal of the National Cancer Institute Monographs, No. 37, 2007

Discussion In this article, we explored the use of growth curve/LTM methods to examine treatment arm differences in HRQL outcomes measured over time. Although common sense suggests that GHRQL and pain overlap and pain necessarily affect one’s perceived quality of life, it is reasonable to hypothesize that GHRQL also reflects a spectrum of clinical impact not driven by pain. Therefore, our key question was, “is there systematic variation in these leftover GHRQL scores that is large enough to be of clinical interest?” If important systematic changes in GHRQL remain, above and beyond what can be attributed to pain, then this systematic information may indicate the value added by the GHRQL measures. We determined that the relationship between pain and GHRQL differed for the two treatment arms. On average, a unit difference in pain was associated with approximately three times the negative association with GHRQL for patients in the D + E arm compared with the M + P arm. The mean residual trend for GHRQL (i.e., pain-free GHRQL) was in the upward direction for the D + E arm whereas that for the M + P arm was in the downward direction. The average difference in population GHRQL slopes, controlled for pain, was 2.5 points per assessment and, although statistically significant, the difference is not likely to be clinically important on a 0–100 scale. 35

Downloaded from http://jncimono.oxfordjournals.org/ by guest on February 4, 2016

Model 2 Clinical question 2 examined whether the average population baseline GHRQL values and rates of change, controlled for pain, were equal for the two arms. Model 2 constrained the mean intercept and slope parameters to be equal across treatment arms. The hypothesis of equality was rejected (LRT, P = .02) compared with model 0, which estimated the mean slope for the M + P arm as −1.10 GHRQL points per assessment (showing improvement on average) and the mean slope for the D + E arm as 1.27 GHRQL points per assessment (showing deterioration, on average); see Table 1.

and +4.92 (−1.19 [mean slope] ± 6.11), while in the D + E arm, the 68% range is between −4.83 and +8.39 (1.28 [mean slope] ± 6.11) [e.g., see Singer and Willett (14)]. It is important to appreciate that, despite superficial similarity to confidence intervals, these ranges represent interperson population differences in responses that are independent of sample size (compared with uncertainty bounds for population averages, which depend on sample size and are very much smaller). Figure 2 attempts to portray these relationships in context. Based on our final model, we estimated the GHRQL population distribution conditional on pain; using this distribution, we generated 100 individual trajectories for each treatment arm. Note that we did not use the observed data because these scores contain measurement and other sources of error. The dark lines are population average intercepts and slopes superimposed on each plot. Even though the population mean slopes differ significantly, the individual differences in within-arm trajectories overwhelm the population average difference. Figure 2 thus provides a visual guide to the clinical significance of the mean population effect, which is relatively unimpressive in the context of the large intrinsic individual variation. A useful index for comparison can be based on the theoretical distribution of differences in slopes between randomly selected individuals from the same population (which has a standard deviation equal to the square root of two times the standard deviation for the slope, or 8.64). The expected difference between mean treatment trajectories of 2.47 for two populations is small compared with the range of differences between two people within a population. For example, 77% of pairs of randomly selected patients from the same population (treatment arm) would have slopes that differed by more than 2.47 points per visit. We would therefore conclude that the treatment arm comparison for GHRQL, although statistically significant, is probably not clinically important given a scale ranging from 0 to 100 (31).

Raghavan (32) noted the challenges in measuring the HRQL of patients with prostate cancer, particularly HRPC. There is a lack of association between measures of HRQL (including symptoms such as pain) and clinical outcomes (e.g., response and prostatespecific antigen) in recent studies of HRPC (28,33), including the SWOG trial described above (22). Berry et al. (21) speculated that the similarity of pain palliation for the two HRPC regimens in S9916 might have been due to the anti-inflammatory attributes of prednisone, one of the two agents in the M + P arm, even when the D + E arm was showing better therapeutic efficacy. That is, prednisone may have mitigated the negative effect of increases in pain report on GHRQL for patients in the M + P arm. Using LTM, we did find that the rates of change in GHRQL differed significantly by treatment arm, although the difference was small. Below we suggest other variables important in understanding the effect of treatment for HRPC on patient HRQL. Raghavan (32) suggested that in addition to the added toxicities of chemotherapy, the impact of castration tends to be ignored and carries with it behavioral, emotional, and physical costs. Shahinian et al. and others (23,34,35) have described an androgen deprivation syndrome, consisting of a complex of adverse effects such as sexual dysfunction, depressive symptoms, fatigue, and cognitive and constitutional disorders for men treated with androgen deprivation treatment before the development of hormone insensitive disease. 36

However, Shahinian et al. and others (23,34,35) suggest that factors other than treatment may be associated with these adverse effects in hormone-sensitive patients: age, disease status, and presence of comorbid medical conditions. Similarly, Raghavan (32) notes that age and frailty-related conditions (36–38) may also be accounting for the difficulty in understanding effects of chemotherapy for HRPC on patient HRQL. We plan to examine the moderating effect of the variables suggested by Shahinian et al. (35) and Raghavan (32) in future analyses of the impact of HRPC on patient HRQL. Latent trajectory modeling methods have been used to identify latent subgroups within an earlier stage prostate cancer population (39). Constructs such as androgen deprivation syndrome and frailty require additional research to document their usefulness in suggesting meaningful subgroups of patients that could be proposed a priori in the design of trials for HRPC. These definitions could enhance the clinical value of the LTM approach by predicting which individuals might have better treatment responses beforehand. We saw that individual response variability did not differ significantly for the two arms (comparison of models 3 and 0). If there had been a statistically significant difference in variability, this could signal the need to examine a potential role for subgroups of patients sharing similar characteristics (e.g., the age and frailty constructs discussed above). It is also possible that other predictors Journal of the National Cancer Institute Monographs, No. 37, 2007

Downloaded from http://jncimono.oxfordjournals.org/ by guest on February 4, 2016

Fig. 2. The figure translates the relationships between the slope and intercept into a population simulation based on the assumed multivariate normal model. The slopes and intercepts reflect model values (not observed data) for 100 hypothetical patients randomly sampled from each arm. The dark lines are population average trends for the intercept and slope. The individual trends reflect real change for individual patients without measurement error. GHRQL = health-related quality of life.

References (1) Patrick DL, Erickson P. Assessing health-related quality of life for clinical decision-making. In: Walker SR, Rosser R, editors. Quality of life assessment: key issues in the 1990s. Dordrecht (The Netherlands): Kluwer Academic; 1993. p. 11–64. (2) Cella DF, Tulsky DS. Measuring quality of life today: methodological aspects. Oncology 1990;4:29–38. (3) Nayfield SG, Ganz PA, Moinpour CM, Cella DF, Hailey BJ. Report from a National Cancer Institute (USA) workshop on quality of life assessment in cancer clinical trials. Qual Life Res 1992;1:203–10. (4) Holland PW. Statistics and causal inferences. J Am Stat Assoc 1986;81: 945–60. (5) Pearl J. Causality: models, reasoning, and inference. Cambridge (UK): Cambridge University Press; 2000. (6) Rosenbaum PR. From association to causation in observational studies. The role of tests of strongly ignorable treatment assignment. J Am Stat Assoc 1984;79:41–8. (7) Rubin DB. Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol 1974;66:668–701. Journal of the National Cancer Institute Monographs, No. 37, 2007

(8) Rubin DB. Bayesian inference for causal effects. The role of randomization. Ann Stat 1978;6:34–58. (9) Rubin DB. Statistics and causal inference: comment: which ifs have causal answers. J Am Stat Assoc 1986;81:961–2. (10) Sobel ME. Causal inference in the social and behavioral sciences. In: Arminger G, Clogg CC, Sobel ME, editors. Handbook of statistical modeling for the social and behavioral sciences. New York: Plenum; 1995. p. 1–38. (11) Deyo RA, Diehr P, Patrick DL. Reproducibility and responsiveness of health status measures. Statistics and strategies for evaluation. Control Clin Trials 1991;12:142S–58S. (12) Nunnally JC. Psychometric theory. New York: McGraw-Hill; 1978. (13) Sprangers MAG, Moinpour CM, Moynihan TJ, Patrick DL, Revicki DA, and the Clinical Significance Consensus Meeting Group. Assessing meaningful change in quality of life over time: a users’ guide for clinicians. Mayo Clin Proc 2002;77:561–71. (14) Singer J, Willett J. Applied longitudinal data analysis: modeling changes and event occurrence. New York: Oxford University Press; 2003. (15) Curran P, Hussong A. The use of latent trajectory models in psychopathology research. J Abnorm Psychol 2003;112:526–44. (16) Diggle PJ, Heagerty PJ, Liang K-Y, Zeger SL. Analysis of longitudinal data. 2nd ed. New York: Oxford University Press; 2002. (17) Kline R. Principles and practice of structural equation modeling. 2nd ed. New York: Guilford Press; 2005. (18) Laird NM, Ware JW. Random-effects models for longitudinal data. Biometrika 1982;38:963–74. (19) Donaldson GW, Moinpour CM. Individual differences in quality-of-life treatment response. Med Care 2002;40(Suppl):III39–53. (20) Berry DL, Moinpour CM, Jiang C, Vinson LV, Lara PN, Lanier S, et al. Quality of life (QOL) and pain in advanced stage prostate cancer: impact of missing data on evaluating palliation in SWOG 9916. Proc ASCO 2004;23:400. (21) Berry DL, Moinpour CM, Jiang CS, Ankerst DP, Petrylak DP, Vinson LV, et al. Quality of life and pain in advanced stage prostate cancer: results of a Southwest Oncology Group randomized trial comparing docetaxel and estramustine to mitoxantrone and prednisone. J Clin Oncol 2006;24: 2828–35. (22) Petrylak DP, Tangen CM, Hussein MH, Lara PN, Jones JA, Ellen TM, et al. Docetaxel and estramustine compared with mitoxantrone and prednisone for advanced refractory prostate cancer. N Engl J Med 2004; 351:1513–20. (23) Herr HW. Quality of life in prostate cancer patients. CA Cancer J Clin 1997;47:207–17. (24) Donaldson G. General linear contrasts on latent variable means: structural equation hypothesis tests for multivariate clinical trials. Stat Med 2003;22:2893–917. (25) Melzack R. The short-form McGill pain questionnaire. Pain 1987;30: 191–7. (26) Aaronson NK, Ahmedzai S, Bergman B, Bullinger M, Cull A, Duez NJ, et al. The European Organization for Research and Treatment of Cancer QLQ-C30: a quality of life instrument for use in international clinical trials in oncology. J Natl Cancer Inst 1993;85:365–73. (27) Fayers PM, Aaronson N, Bjordal K, Groenvold M, Curran D, Bottomley A, et al. EORTC QLQ-C30 scoring manual. Brussels (Belgium): EORTC; 2001. (28) Tannock IF, Osoba D, Stockler MR, Ernst DS, Neville AJ, Moore MJ, et al. Chemotherapy with mitoxantrone plus prednisone or prednisone alone for symptomatic hormone-resistant prostate cancer: a Canadian randomized trial with palliative end points. J Clin Oncol 1996;14:1756–64. (29) Muthén L, Muthén B. Mplus user’s guide. 3rd ed. Los Angeles (CA): Muthén & Muthén; 1998–2005. (30) Rubin DB. Inference and missing data. Biometrika 1976;63:581–92. (31) Sloan JA, Loprinzi CL, Kuross SA, Miser AW, O’Fallon JR, Mahoney MR, et al. Randomized comparison of four tools measuring overall quality of life in patients with advanced cancer. J Clin Oncol 1998; 16:3662–73. (32) Raghavan D. Chemotherapy for prostate cancer: small steps or leaps and bounds? No huzzahs just yet! Br J Cancer 2004;91:1003–4. 37

Downloaded from http://jncimono.oxfordjournals.org/ by guest on February 4, 2016

might not fully explain the differential heterogeneity, with the remainder attributable to essential variation, or true individual differences. This seems likely, since individual-level variability was substantial and clinically salient, highlighting the importance of not attributing population average change to an individual patient. Within-population differences between two randomly selected individuals were typically large; this 8.64 difference is close to the clinical significance criterion of Sloan et al. (31) of a 10% shift in scores with a 0–100 range. There could, of course, be clinical contexts where the assumptions described in the Methods section would not be appropriate. For example, assuming stability of the regression of GHRQL on pain over time might not be realistic in a palliative care setting. It is possible that deterioration of the patient’s condition over time would result in variable coefficients for the regression of GHRQL on pain over time. Given clinical data and information from the literature, investigators might want to test this assumption’s impact on model fit and, if indicated, free the equality of regressions over time and allow for a covariate by time interaction. Singer and Willett [(14), Chapter 5] allow for such a model. In general, the approach we have followed is just one of several alternative models for conceptualizing treatment response in a clinical trial setting (24). Our approach focused on growth, but an equally effective strategy would be to condition on baseline variables as covariates. In summary, we have learned the usefulness of thinking about a person-specific slope—the rate at which a person’s GHRQL changes or responds to treatment—as a clinically relevant attribute. Controlling for pain, differences in true, individual baseline level and slope are large and clinically salient. In our example, the effect of pain on GHRQL differed by treatment arm; knowledge of this effect alerts physicians and nurses to the possible value of upfront supportive care interventions for one treatment versus another. Further advances in the state of HRQL assessment will undoubtedly bring proportionate benefits. Inclusion of more comprehensive measures of HRQL and analytic methods that examine relationships among HRQL outcomes can provide useful clinical information regarding treatment impacts that complement data from targeted symptom outcomes.

(33) Tannock I, de Wit R, Berry W, Horti J, Pluzanska A, et al. Docetaxel plus prednisone or mitoxantrone plus prednisone for advanced prostate cancer. N Engl J Med 2004;351:1502–12. (34) McDermed J, Strum S, Scholz M. The Androgen Deprivation Syndrome (ADS): the incidence and severity in prostate cancer (PC) patients (PTS) receiving hormone blockade (HB). J Clin Oncol 1998;17:316a. (35) Shahinian VB, Kuo Y-F, Freeman JL, Goodwin JS. Risk of the “Androgen Deprivation Syndrome” in men receiving androgen deprivation for prostate cancer. Arch Intern Med 2006;166:465–71. (36) Aapro MS. The frail are not always elderly. J Clin Oncol 2005;23: 2121–2. (37) Anderson J, Van Poppel H, Blellmuntt J, Miller K, Droz J-P, Fitzpatrick JM. Chemotherapy for older patients with prostate cancer. BJU Int 2007; 99:269–73. (38) Fried LP, Tangen CM, Walston J, Newman AB, Hirsch C, Gottdiener J, et al. Frailty in older adults: evidence for a phenotype. J Gerontol A Biol Sci Med Sci 2001;556A:M146–56. (39) Legler J, Davis W, Potosky A, Hoffman R. Latent variable modelling of recovery trajectories: sexual function following radical prostatectomy. Stat Med 2004;23:2875–93.

Notes This investigation was supported in part by the following Public Health Service Cooperative Agreement grant numbers awarded by the National Cancer Institute, Department of Health and Human Services: CA38926, CA32102, CA37135, CA25224, CA46441, CA37981, CA45808, CA27057, CA12644, CA68183, CA22433, CA35261, CA58861, CA20319, CA46113, CA58882, CA76447, CA04919, CA16385, CA35090, CA03096, CA67663, CA45450, CA35431, CA45807, CA58416, CA14028, CA45377, CA63845, CA42777, CA46136, CA11083, CA35119, CA58658, CA46282, CA76129, CA46368, CA35176, CA86780, CA46462, CA35192, CA35178, CA67575, CA63844, CA12213, CA74647, CA35128, CA35996, CA58686, CA13612, CA45461, CA58723, CA63848, CA35281, CA63850, CA76132, CA74811 and supported in part by Aventis. The authors would like to thank the patients who contributed HRQL data to S9916 and the Clinical Research Associates at SWOG institutions who monitored the submission of the HRQL forms. We recognize the contributions of Dr Donna L. Berry, the HRQL Study Coordinator for S9916, and Dr Daniel P. Petrylak, the therapeutic trial study coordinator. We also thank Dr David Buchanan for helpful comments during the preparation of this article as well as Journal of the National Cancer Institute reviewers.

Downloaded from http://jncimono.oxfordjournals.org/ by guest on February 4, 2016

38

Journal of the National Cancer Institute Monographs, No. 37, 2007

Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.