A practical guide to papanicolaou smear rescreens

Share Embed


Descrição do Produto

CANCER

130

CYTOPATHOLOGY

A Practical Guide to Papanicolaou Smear Rescreens How Many Slides Must Be Reevaluated to Make a Statistically Valid Assessment of Screening Performance?

Paul A. Krieger, M.D., M.I.A.C.1 Theodora Cohen, Ph.D.2 Sonya Naryshkin, M.D., F.I.A.C.3 1 Corporate Medical Group, Quest Diagnostics Inc., Teterboro, New Jersey. 2

Department of Biostatistics, College of American Pathologists, Northfield, Illinois. 3 Department of Pathology, Mercy Hospital, Janesville, Wisconsin.

BACKGROUND. The question of the minimum number of Papanicolaou (Pap) smear slides that must be rescreened to draw statistically valid conclusions regarding the accuracy of screening often is raised. No method for generating answers in varying laboratory circumstances has achieved widespread application; standard statistical sample size calculations may represent such a resource. METHODS. A series of tables was constructed to display minimum required numbers of rescreens, with each table representing differing hypothetical laboratory circumstances. To use each table, assumptions must be specified in advance as to prevalence of abnormality, definition of error, baseline false-negative proportions (FNPs) of performance, and a degree of increase in FNPs that is considered a departure from baseline warranting concern, among others. RESULTS. The authors constructed four sample tables displaying minimum numbers of slides that must be rescreened in differing specified laboratory scenarios. Depending on assumed conditions and predetermined levels of satisfactory and unsatisfactory accuracy, the range of numbers is very broad (38 –10,000). One example representing likely conditions indicates that 1040 slides must be reexamined; in another scenario, a sample size of 300 is sufficient. CONCLUSIONS. The minimum number of rescreened slides needed to draw statistically valid conclusions regarding Pap smear screening accuracy can be calculated using standard statistical methods. However, a number of assumptions must be detailed in advance. The authors offer this as a practical guide and a continuation of a general inquiry regarding Pap smear error rate measurement and display. The use of these tables raises at least as many questions as it answers, but still may represent a significant advance. Future efforts at further numeric characterization of aspects of Pap smear screening performance are warranted to enable rational decision making when performance is examined in the course of quality assurance, and during quality control and regulatory activities. [See editorial on pages 127–9, this issue.]Cancer (Cancer Cytopathol) 1998;84:130 –7. © 1998 American Cancer Society.

KEYWORDS: false-negative proportion, Papanicolaou smear, quality control, rescreening, sample size, sensitivity, error rate.

O Address for reprints: Paul A. Krieger, M.D., Quest Diagnostics Inc., One Malcolm Avenue, Teterboro, NJ 07608. Received August 25, 1997; revision received December 4, 1997; accepted December 11, 1997. © 1998 American Cancer Society

f the quality measurement tools available in cervical cytology screening, random rescreening (RR) (‘‘spot checking’’) of Papanicolaou (Pap) smears initially screened as ‘‘negative’’ or ‘‘within normal limits’’ is an important method of measuring individual or laboratory accuracy with respect to sensitivity to epithelial abnormalities or other detection tasks. This tool enables us to answer the basic question, ‘‘what is the likelihood that a cellular abnormality among slides in a laboratory’s entire throughput is going to be unreported?’’ Several ways of collecting data to answer this question are in use

Pap Rescreens—How Many to Do?/Krieger et al.

today, including rescreening a set percentage of slides directed by a random number list and the rescreen of a group or groups of consecutive cases as part of internal quality control processes or an external inspection. The numeric expression used to answer this question is termed the actual or estimated ‘‘falsenegative proportion’’ (FNP),1 a simple ratio that is the number of actual or estimated false-negative results divided by the number of actual or estimated positive results. The question arises as to the minimum required number of Pap smear rescreen samples to be used during such exercises. This question has been studied previously.2– 4 However, it deserves renewed attention in light of changed Pap smear reporting classifications and newer concepts of quality control, and in light of pressures we now face to describe Pap smear screening accuracy in timely, more quantitative, statistically significant terms. For example, studies by Nagy,1,4 Melamed,2 and Melamed and Flehinger3 examined required sample size indirectly or incompletely, only providing a starting point for judging statistical significance with respect to this parameter. The last authors selected as a hypothetical scenario in their text a laboratory prevalence of 5 per 1000, suggesting that the threshold under study was high grade squamous intrepithelial lesions and above. They concluded, not surprisingly, that rescreening sample sizes of many thousands of cases are needed to conduct meaningful assessments of screening accuracy. In the screening environment of today, expectations to detect epithelial abnormalities are described as either ‘‘atypical squamous cells of undetermined significance/atypical glandular cells of undetermined significance (ASCUS/ AGUS) and above’’ or ‘‘low grade squamous intraepithelial lesions and above’’ and have led to the new vocabulary of ‘‘narrow’’ and ‘‘broad’’ definitions of error5 to assist in describing models of performance or accuracy analysis. (Indeed, a choice of 5 per 1000 as a prevalence and error detection threshold during rescreening would require a still newer definition, perhaps a ‘‘narrow-narrow’’ definition of error.) Expectations during screening in the current climate also include accurate judgements regarding Pap smear specimen adequacy. It is reasonable, at least for study purposes or internal laboratory quality control self-analysis, to set error thresholds that recognize these expectations. Furthermore, the entirely new body of literature comparing a variety of recently developed mechanical and/or computerized devices with conventional Pap smear screening relies on these newer thresholds, and our quality measurement efforts should yield measurements that can be used in comparative fashion if the need arises.

131

TABLE 1 Parameters to Be Predefined by User Before Consulting Tables 2–5 Prevalence of abnormality or condition studied in screening and rescreening population 0.1%, 0.5%, 1–10%, and 20–26% Baseline FNP (possibly ‘‘satisfactory’’ or ‘‘commonly observed’’) 5%, 10%, 15%, 20%, 25%, and 30% Increase from baseline FNP 10% or 20% FNP: false-negative proportion.

In addition, we are required by U.S. federal regulation to assess performance and to adjust accordingly the maximum permitted slide workload of each cytotechnologist at least twice each year. The data on which this is based should accurately and fairly represent performance of the detection task or tasks we wish to assess. In other words, are our numeric adjustments performed using data from samples large enough to draw statistically valid conclusions? In other situations, such as in quality assurance rescreens using blocks of consecutive cases, how many slides should be rescreened before drawing statistically valid conclusions on performance quality? The goal of this article is to present a statistical guide to selecting sample size, which attempts a direct answer to these and related questions. The authors’ goals do not include examination of nonscreening factors that may contribute to Pap smear failures, such as failure by the smear taker to sample a lesion accurately or failure to transcribe results by clerical staff accurately. These are important concerns that should be monitored as part of overall laboratory quality assurance. We examined a group of hypothetic scenarios and assumptions that could characterize different Pap smear screening situations. At the outset, it is important to define key assumptions and to assign values to needed parameters (Table 1). However, it is equally important as an initial step to define different types of error, and for this discussion several types of failures to judge accurately or detect different conditions were examined. These include failure to detect abnormal epithelial cells, failure to detect infectious organisms, and failure to identify less than optimal or unsatisfactory samples. An important additional category requiring study that is not part of this discussion is false-positive diagnoses.

Defining ‘‘Error’’ Pap smear error types and thresholds have received much attention recently. For the category of epithelial abnormalities, a CAP Q-Probe devoted to the retrospective rescreen of Pap smears gave both a ‘‘broad

132

CANCER (CANCER CYTOPATHOLOGY) June 25, 1998 / Volume 84 / Number 3

definition’’ and a ‘‘narrow definition’’ of error.5 In the tables that follow, prevalences are displayed that enable the use either of the broad definition or the narrow definition of error. Use of the broad definition in Pap smear screening and quality control means that the consistently applied threshold is ASCUS/AGUS and above, and use of the narrow definition means that low grade squamous intraepithelial lesion and above is the threshold. It follows that calculations involving the key ratio, the false-negative proportion (FNP), or an estimate of the FNP (eFNP) require that microscopic criteria used to judge failure to detect during a rescreen (‘‘error’’), which forms the numerator are the same as microscopic criteria used to detect during initial screening (‘‘prevalence’’), which forms the denominator (i.e., the ratio should be threshold- or definition-independent if errors occur at equal rates in each category).6 We acknowledge there is some controversy concerning the application of these definitions of error. Arguments in favor of using the narrow definition include: 1) it is difficult to assign consistently microscopic findings judged to be equivocal into a negative category versus a category of significant change but falling short of definitive abnormality, and 2) during quality control, only changes that are judged to be unequivocally positive should be recorded. Conversely, there are arguments in favor of using the broad definition of error. If microscopic changes are judged as significantly abnormal but falling short of definitive, detection of them during quality control should be recorded because: 1) patients with such findings are at significantly higher risk of harboring a significant lesion, 2) we frequently encounter difficulty distinguishing between abnormalities of uncertain significance and abnormalities of definitive types, and 3) these are the criteria we use for judging abnormality during the course of regular screening. If there is uncertainty as to which definition to apply, it also should be remembered that fewer slides need to be rescreened and that quality control data can be reported at shorter intervals if the broad definition is used. Furthermore, we believe, that based on many years of experience in our own laboratories, attained mainly in group settings at multiview microscopes, consensus in the application of microscopic criteria in the equivocal categories of ASCUS/AGUS can be reached in the majority of cases.7 Therefore, the use of the broad definition is justified at least within a given laboratory, or by a specially dedicated group of rescreening personnel who can achieve consistency in the use of criteria in this category through excellent training and experience. The use of other parameters in quality control

rescreening is optional and may be the source of additional useful information regarding performance quality. For example, in the category of specimen adequacy judgments, Bethesda system criteria are extremely useful in defining correct assessments and thresholds for error. Thresholds of error in detecting infectious organisms are relatively easy to define, especially if consensus is attained consistently at multiview microscopes.

METHODS AND DISCUSSION General Considerations We have subdivided the overall problem into components. The requirement to establish hypothetical baselines and deviations from baseline performance levels must be satisfied. Theoretically, this would set up the question: ‘‘What constitutes an acceptable and unacceptable standard of performance’’? However, such levels are not known in practice for Pap smear screening, so surrogate levels must be hypothesized. Hypothetic scenarios are required to apply the relevant statistical tests. Limitations and weaknesses of this overall approach always must be kept in mind because there are several potential flaws, and one must judge the applicability of statistically based conclusions reached with the limitations of the method in mind. Therefore, to determine the number of slides to be rescreened, one first must establish a baseline error rate and an increase above this baseline that is considered significant. (We do not recommend that these values be chosen based on data from only a single laboratory for the following reasons: the number of cytotechnologists in a laboratory is generally , 20, the number of errors is measured on a discrete scale and generally has a skewed nonnormal distribution, and conclusions based on the mean and standard deviation of this type of data can be misleading.) In addition, the sensitivity and specificity of the statistical test must be specified in advance. In other words, the question we ask must always be stated in terms such as: ‘‘How many slides must be rescreened so that an observed positive difference of X% between the performance level being assessed and the baseline performance level will be statistically significant when the appropriate statistical test is performed?’’ The same format of posing the question would apply when assessing laboratory or individual performance. Again, the calculation can be performed only if the bases of comparison are defined in advance. An additional requirement is to specify the prevalence of abnormality or condition under study within the population group. If this is not available, an estimate based on the overall population encompass-

Pap Rescreens—How Many to Do?/Krieger et al.

133

TABLE 2 Rescreening for a Single Parameter—Epithelial Abnormalitya ‘‘Acceptable’’ FNP (‘‘unacceptable’’ FNP is 10% higher than the ‘‘acceptable’’ FNP) Prevalence

5%

10%

15%

20%

25%

30%

0.1% 0.5% 1% 2% 3% 4% 5% 6% 7% 8% 9% 10%

34,000 6800 3400 1700 1133 850 680 567 486 425 378 340

52,000 10,400 5200 2600 1733 1300 1040 867 743 650 578 520

68,000 13,600 6800 3400 2267 1700 1360 1133 971 850 756 680

81,000 16,200 8100 4050 2700 2025 1620 1350 1157 1013 900 810

92,000 18,400 9200 4600 3067 2300 1840 1533 1314 1150 1022 920

100,000 20,000 10,000 5000 3333 2500 2000 1667 1429 1250 1111 1000

a

The number of slides to be rescreened when testing hypotheses regarding the true FNP under various prevalence assumptions (the difference to be detected between the ‘‘acceptable’’ and ‘‘unacceptable’’ FNPs is held constant at 10%). FNP: false-negative proportion.

TABLE 3 Rescreening for Multiple Parameters—Epithelial Abnormalities, Specimen Adequacy, and Infectious Agents ‘‘Acceptable’’ FNP (‘‘unacceptable’’ FNP is 10% higher than ‘‘acceptable’’ FNP) Prevalence

5%

10%

15%

20%

25%

30%

20% 21% 22% 23% 24% 25% 26%

170 162 155 148 142 136 131

260 248 236 226 217 208 200

340 324 309 296 283 272 262

405 386 368 352 338 324 312

460 438 418 400 383 368 354

500 476 455 435 417 400 385

FNP: false-negative proportion.

ing the study population can be used. In practice, the abnormal rate for epithelial changes (ASCUS/AGUS and above) maintained monthly or quarterly as a matter of course in most laboratories provides an appropriate estimate. Tables 2–5 provide answers to this type of question under different scenarios of prevalence and increases from baseline to be detected. The performance level is measured using the FNP expressed as a percentage, defined as the number of false-negative results divided by the total number of positive results.6 Throughout this article we assume that the rescreening process is free of error or that biases that may over- or understate failures to detect or judge cancel each other out. If this is not the case,8 the calculated FNP always is smaller or larger than the FNP derived from an error free rescreening process and the type 2 error of the test is smaller or larger than

assumed. However, estimating the rescreening FNP and adjusting the laboratory or individual FNP accordingly may increase the type 1 error, which is less acceptable. If a high rescreening error rate is suspected, an FNP based on triple screening may provide a better estimate of the FNP.

Evaluating a Single Parameter: Detection of Epithelial Abnormalities To use the tables for this purpose, an FNP that could be considered a baseline must be asserted. It is common to note in cytopathology literature estimated or actual FNPs in a broad range of 5% to . 20%. This range also has been reported anecdotally as part of different types of internal or external quality control studies, or reported as part of comparisons between conventional Pap smear screening and various technologic or nontechnologic en-

134

CANCER (CANCER CYTOPATHOLOGY) June 25, 1998 / Volume 84 / Number 3

TABLE 4 Rescreening for a Single Parameter—Epithelial Abnormality, Assessing a Larger Departure from Baseline Conditions than in Table 2 ‘‘Acceptable’’ FNP (‘‘unacceptable’’ FNP is 20% higher than the ‘‘acceptable’’ FNP) Prevalence

5%

10%

15%

20%

25%

30%

0.1% 0.5% 1% 2% 3% 4% 5% 6% 7% 8% 9% 10%

10,000 2000 1000 500 333 250 200 167 143 125 111 100

15,000 3000 1500 750 500 375 300 250 214 188 167 150

18,000 3600 1800 900 600 450 360 300 257 225 200 180

21,000 4200 2100 1050 700 525 420 350 300 263 233 210

24,000 4800 2400 1200 800 600 480 400 343 300 267 240

25,000 5000 2500 1250 833 625 500 417 357 313 278 250

FNP: false-negative proportion.

TABLE 5 Rescreening for Multiple Parameters—Epithelial Abnormalities, Specimen Adequacy, and Infectious Agents, Assessing a Larger Departure from Baseline Conditions than in Table 3 ‘‘Acceptable’’ FNP (‘‘unacceptable’’ FNP is 20% higher than the ‘‘acceptable’’ FNP) Prevalence

5%

10%

15%

20%

25%

30%

20% 21% 22% 23% 24% 25% 26%

50 48 45 43 42 40 38

75 71 68 65 63 60 58

90 86 82 78 75 72 69

105 100 95 91 88 84 81

120 114 109 104 100 96 92

125 119 114 109 104 100 96

FNP: false-negative proportion.

hancements.9 However, although the numeric range is a useful indicator, conclusions regarding relations to screening volume must be interpreted with caution and in any event, any general correlation observed cannot be the basis for a judgment regarding an individual laboratory. The issue of broad versus narrow definition should be reexamined in this context once again. The choice of prevalence in consulting the tables indicates the choice of threshold that has been or will be used during the course of the rescreening exercise. If the prevalence chosen represents ASCUS/AGUS and above, then the broad definition of error must be used during the course of the rescreen to arrive at the numerator of the ratio. In generating eFNPs based on RR in the authors’ laboratories, the broad definition was used.6

Hypothetical Laboratory Scenarios Using Different Departures from Baseline Sensitivity Here are specifics of a hypothetical situation and suggested use of the tables below. ABC Laboratory has a Pap smear volume of 40,000 slides per year and a prevalence rate of ASCUS/AGUS and above of 5%. You are asked to determine whether this laboratory has an excessive error rate. It has been suggested that FNPs of 5% or somewhat higher for epithelial changes currently are the best attainable in Pap smear laboratories6 and that 5–10% is excellent performance.10 For this example, we designate a statistical ‘‘baseline’’ FNP of 10% as acceptable performance. We also propose for this example that an FNP of . 20% (a departure of . 10% from baseline) in this category represents poor performance. We suppose that the FNP for this laboratory based

Pap Rescreens—How Many to Do?/Krieger et al.

on the rescreen of 350 negative results was 22.2%. A test of significance8 is performed to assess whether the observed difference of 12.2% (22.2%–10%) is statistically significant. The P value of this test is 0.097. Therefore, we conclude that the FNP is not statistically significantly higher than the ‘‘acceptable’’ standard of 10%. Continuing, if the question is asked ‘‘What is the minimum number of negative slides that must be reevaluated to very reliably detect a difference from a baseline equaling 10% that amounts to 10% or greater, i.e., 20% given a prevalence of positive slides of 5%?’’, Table 2 would show that the minimum number of negative results to be rescreened is 1040 and therefore a sample size of 350 cases was not large enough. This can be shown by performing the same statistical test when the FNP based on the rescreen of 1040 slides is 22.2%. The P value is now 0.002 and we conclude that the FNP is statistically significantly higher than the ‘‘acceptable’’ or baseline standard. Again, to conduct an assessment of the size of a slide sample, a level of performance departing from that baseline that represents a sought-for level of departure must be specified. Table 4 can be consulted if a larger departure from baseline is chosen for study. If one selects arbitrarily (but not unreasonably) a difference of 20% FNP percentage points above baseline as the level of departure that is of interest, and then specifies the prevalence that applies, then the appropriate cell in the table can be found.

Another Hypothetic Scenario In describing another hypothetic scenario using these assumptions, a cytotechnologist or a laboratory may have been found to have an FNP of 30% on random rescreening of 300 negative results. Reviewing the background arithmetic, the cytotechnologist or laboratory FNP was calculated based on a 5% RR of negative results during a 6-month period, (e.g., 6000 slides 3 5% 5 300). Was the sample of 300 rescreens conducted during 6 months of the 5% large enough? The answer, after considering that the prevalence of abnormality was 5% and locating the appropriate ‘‘cell’’ in Table 4 representing the difference sought and the prevalence of abnormality, is ‘‘yes.’’ The finding that a FNP of 30% is statistically significantly different from the baseline of 10% indeed was justified, having been based on rescreening a sufficiently large sample of 300 slides.

Evaluating Several Screening Parameters to Increase Usefulness of RR Usually, rechecking of Pap smear slides is conducted to search for false-negative results with respect to

135

epithelial abnormalities. However, RR also can be used to measure accuracy with respect to more than one detection parameter to yield data of greater utility from the same exercise, and the tables shown can be used to help guide this. Use of the tables in this fashion assumes that screening accuracy for each of the parameters is an independent phenomenon. To our knowledge, this never has been studied. In addition, the different detection tasks have different clinical significances. However, the authors believe that this uncertainty should not prevent undertaking of this potentially valuable inquiry. Therefore, we considered a hypothetic RR exercise in which three parameters were being measured, namely, epithelial findings, infectious agent findings, and specimen adequacy status. To conduct this exercise, assume that the prevalence of slides meeting these conditions as abnormal or ‘‘departures’’ ranges between 20 –26% and that misjudging any of these three characteristics is considered an error. Using the previous definition of the FNP as a guide, in this scenario a ‘‘positive’’ slide is defined as a slide with an epithelial abnormality and/or an infectious agent and/or ‘‘Satisfactory but limited . . . ’’ (SBL) or ‘‘Unsatisfactory’’ status, and an error is defined as misjudgment of any of these characteristics by the screener. The numerator of the hypothetic FNP (which actually should be called something like a ‘‘departure proportion’’) would be the number of slides found to have an error in any of the three categories. Consult Table 5. Assume again that the data is collected in the course of a 5% RR, and this time the prevalence of the three conditions (epithelial abnormalities and specimen adequacy of SBL plus ‘‘unsatisfactory’’ and infectious agents) added together is 20%. Table 5 shows that rescreening 300 slides is more than enough (in fact a total of 125 slides would be enough) to judge reliably that a finding of 20 FNP percentage points higher than baseline is statistically significant. Table 3 shows sample sizes for findings that are 10 FNP percentage points higher than ‘‘baseline.’’

RESULTS The cells in Tables 2 and 3 show the minimum number of negative slides to be rescreened when testing hypotheses regarding an assumed laboratory FNP or ‘‘departure proportion’’ under various prevalence assumptions (left column). For Table 2 the difference to be detected between the FNP specified under the null hypothesis (the FNP specified to represent baseline or ‘‘satisfactory’’ performance) and the alternative hypothesis (an ‘‘unacceptable’’ FNP level) is held constant at 10%. This can be restated as follows. The table shows the minimum number of slides that must be

136

CANCER (CANCER CYTOPATHOLOGY) June 25, 1998 / Volume 84 / Number 3

rescreened to attain statistical significance for an observed FNP excess of 10% under the prevalence circumstances of a given laboratory (left column) and the baseline or ‘‘satisfactory’’ level of performance FNP (percentage picked from row across the top). The null hypothesis is the error rate one considers baseline or ‘‘acceptable.’’ The alternative hypothesis is the error rate considered at a departure or ‘‘unacceptable’’ level. Again, each of the numbers of slides displayed in the table’s cells refers to detecting an increase of 10% FNP percentage points over a previously chosen FNP performance–namely, a figure picked from the row of percentages across the top (i.e., 5%, 10%, or as high as 30%). Table 3 has the same conditions as Table 2 with the exception that a false-negative outcome is defined as failure to detect cancerous and precancerous cells and/or failure to identify specimen adequacy and/or failure to identify infectious agents. The prevalence of the sum of any of these three conditions on the same slide is assumed to range between 20 –26%. For Table 3 the difference to be detected between the FNP specified under the null hypothesis (the FNP specified to represent baseline or ‘‘satisfactory’’ performance) and the alternative hypothesis (an ‘‘unacceptable’’ FNP level) is held constant at 10%. Table 4 shows the number of slides to be rescreened when testing hypotheses regarding the true laboratory FNP under various prevalence assumptions. The difference to be detected between the FNP specified under the null hypothesis and the alternative hypothesis is held constant at 20%. Table 5 shows the same conditions as Table 4 but a false-negative outcome is defined as failure to detect abnormal epithelial cells and/or failure to identify correctly specimen adequacy conditions and/or failure to identify infectious agents. The prevalence of these three conditions in the group of slides is assumed to range between 20 –26%. For Table 5, the difference to be detected between the FNP specified under the null hypothesis (the FNP specified to represent baseline or ‘‘satisfactory’’ performance) and the alternative hypothesis (an ‘‘unacceptable’’ FNP level) is held constant at 20%. Table 6 shows the formula used to calculate the sample sizes listed in Tables 2–5. A type 1 error is the probability of erroneously concluding that an observed performance level is significantly worse than the baseline, and a type 2 error is the probability of erroneously concluding that a performance level is not statistically significantly worse than the baseline. The performance level is measured using the FNP expressed as a percentage, defined as the number of

TABLE 6 Formula Used to Calculate the Number of Slides to Be Rescreened

n5

(Z b ÎP A (12P A ) 2 Z a Î2 P 0(12P 0)) / prevalence (P 02P A )2

n: the number of slides to be rescreened; a: the type 1 error, b: the type 2 error; Z: the appropriate percentile of the standard normal distribution; P0: ‘‘acceptable’’ performance; PA: ‘‘unacceptable’’ performance.

false-negatives divided by the total number of positive results.4 The formula for n was derived using standard statistical methods for making one-sided inferences regarding a single proportion (see reference 9, p. 13, section 3.2). The first term in the formula for n represents the number of positive slides (true-positives and false-negatives) in the sample of slides to be reviewed. This then is divided by the prevalence (or estimated prevalence) to obtain the total number of slides to be screened.

CONCLUSIONS We have presented a relatively ‘‘user-friendly’’ set of statistical principles and guidelines that may help in assessing sample sizes for measuring Pap smear screening accuracy. If applied with care, they represent a significant step forward in total quality improvement for the cytology laboratory. With the help of guidelines such as these, we can be assured that greater fairness and validity will be built into Pap smear laboratory quality control and inspection procedures. Consideration of these principles carries with it responsibilities in several major areas. First, we should refine numeric measures of Pap smear screening accuracy to the greatest extent possible, and new methods such as those discussed earlier should be regarded as first steps to be improved on in the course of additional inquiry. A second responsibility we face is to develop criteria to use to decide when remedial action is warranted in the face of data collected in a more statistically consistent and proactive fashion. As a corollary to this, to the extent that RR performed contemporaneously with initial screening yields statistically significant data regarding quality, we will depend less on inferior or surrogate measures of performance quality such as pickup rates and retrospective rescreening. These are discussion points better reserved for additional inquiry and explication in the future.

Pap Rescreens—How Many to Do?/Krieger et al.

REFERENCES 1. 2. 3.

4. 5.

6.

Nagy GK. False negative rate-a misnomer, misunderstood and misused. Acta Cytol 1997;41:778 – 80. Melamed MR. Presidential address. Acta Cytol 1973; 17:285– 8. Melamed MR, Flehinger BJ. Reevaluation of quality assurance the clinical laboratory [editorial]. Acta Cytol 1992;36: 461–5. Nagy GK. Sample size calculations for rescreening cytologic smears. Acta Cytol 1996;40:501–5. Jones BA. Rescreening in gynecologic cytology. Rescreening of 3762 previous cases for current high-grade squamous intraepithelial lesions and carcinoma - a College of American Pathologists Q-Probes study of 312 institutions. Arch Pathol Lab Med 1995;119:1097–103. Krieger PA, Naryshkin S. Random rescreening of cytologic smears: a practical and effective component of qual-

137

ity assurance programs in both large and small cytology laboratories [guest editorial]. Acta Cytol 1994;38: 291– 8. 7. Naryshkin S. Moving forward on ASCUS, but not there yet. CAP Today 1996;10:32– 6. 8. Fleiss J. Statistical methods for rates and proportions. 2nd edition. New York: John Wiley & Sons, 1981. 9. DeMay RM. Toward a practice standard. In: The art and science of cytopathology. Chicago: ASCP Press, 1996:142,146 –7. 10. Naryshkin S. The false negative fraction for Papanicolaou smears. How often are ‘‘abnormal’’ smears not detected by ‘‘standard’’ screening cytologist. Arch Pathol Lab Med 1997; 121:270 –2. 11. Renshaw A. Analysis of error in calculating the false-negative rate in the interpretation of cervicovaginal smears: the need to review abnormal cases. Cancer (Cancer Cytopathol) 1997;81:264 –71.

Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.