Telemedical Diagnosis of Retinopathy of Prematurity Intraphysician Agreement between Ophthalmoscopic Examination and Image-Based Interpretation Karen E. Scott, MD, MBA,1 David Y. Kim, MD,2 Lu Wang, MD, MS,2 Steven A. Kane, MD, PhD,2 Osode Coki, BSN,1 Justin Starren, MD, PhD,3 John T. Flynn, MD,2 Michael F. Chiang, MD2,3 Objective: To evaluate the intraphysician agreement between ophthalmoscopic examination and imagebased telemedical interpretation for retinopathy of prematurity (ROP) diagnosis, when performed by the same expert physician grader. Design: Prospective, nonrandomized, comparative study. Participants: Sixty-seven consecutive premature infants who underwent ROP examination at a major university medical center whose parents consented for participation. Methods: Infants underwent standard dilated ophthalmoscopy by one of two pediatric ophthalmologists, followed by retinal imaging with a commercially available wide-angle fundus camera by a trained neonatal nurse. Study examinations were performed at 31 to 33 weeks postmenstrual age (PMA) and/or 35 to 37 weeks PMA. Images were uploaded to a Web-based telemedicine system developed by the authors. After a 4- to 12-month period, telemedical interpretations were performed in which each physician graded images from infants upon whom he had initially performed ophthalmoscopic examinations. Diagnoses were classified using an ordinal scale: no ROP, mild ROP, type 2 prethreshold ROP, and treatment-requiring ROP. Main Outcome Measures: Absolute intraphysician agreement and statistic between ophthalmoscopic examination and telemedical interpretation were calculated by eye. All intraphysician discrepancies were reviewed, and underlying causes were classified by eye as no ROP identified by ophthalmoscopic examination, no ROP identified by telemedical interpretation, discrepancy about presence of zone 1 ROP, discrepancy about presence of plus disease, or other discrepancy in classification of ROP stage. Results: Absolute intraphysician agreement between ophthalmoscopic examination and telemedical interpretation was 86.3%. The statistic for intraphysician agreement between examinations ranged from 0.657 (substantial agreement) for diagnosis of treatment-requiring ROP to 0.854 (near-perfect agreement) for diagnosis of mild or worse ROP. Among 206 eye examinations (103 infant examinations), there were 28 (13.6%) intraphysician discrepancies in diagnosis, 8 of which resulted from uncertainty about presence of zone 1 disease and 4 from uncertainty about presence of plus disease. Conclusions: Intraphysician agreement between ophthalmoscopic examination and telemedical interpretation for ROP was very high. Neither examination modality appeared to have a systematic tendency to overdiagnose or underdiagnose ROP. Diagnosis of zone 1 disease and plus disease were major sources of clinically significant discrepancies. Ophthalmology 2008;115:1222–1228 © 2008 by the American Academy of Ophthalmology.
Originally received: May 8, 2007. Final revision: August 23, 2007. Accepted: September 11, 2007. Available online: May 3, 2008. Manuscript no. 2007-613. 1 Division of Neonatology, Columbia University College of Physicians and Surgeons, New York, New York. 2 Department of Ophthalmology, Columbia University College of Physicians and Surgeons, New York, New York. 3 Department of Biomedical Informatics, Columbia University College of Physicians and Surgeons, New York, New York.
1222
© 2008 by the American Academy of Ophthalmology Published by Elsevier Inc.
Presented at: American Academy of Ophthalmology Annual Meeting, November 2007, New Orleans, Louisiana. Supported by a Career Development Award from Research to Prevent Blindness, New York, New York (MFC), and the National Eye Institute, Bethesda, Maryland (grant no. EY13972 [MFC]). The authors have no commercial, proprietary, or financial interest in any of the products or companies described in the article. Correspondence to Michael F. Chiang, MD, Department of Ophthalmology, Columbia University College of Physicians and Surgeons, 635 West 165th Street, Box 92, New York, NY 10032. E-mail:
[email protected]. ISSN 0161-6420/08/$–see front matter doi:10.1016/j.ophtha.2007.09.006
Scott et al 䡠 Telemedical ROP Diagnosis: Ophthalmoscopic vs. Imaging Examination Store-and-forward telemedicine is an emerging technology that may be used for diagnosis of disease over distance and time. Examination findings are captured and transmitted for subsequent evaluation by a remote expert. Telemedicine may be particularly useful in fields where images are routinely used as a major component of clinical diagnosis, such as ophthalmology, radiology, and dermatology.1–3 Although telemedicine has the potential to improve the delivery and accessibility of health care, its widespread adoption has been limited by concerns regarding cost effectiveness and the absence of sufficient evaluation data regarding the efficacy and reliability of individual systems.4,5 Retinopathy of prematurity (ROP) is a disease of abnormal retinal neovascularization in low-birth-weight infants. For several reasons, ROP may be ideal for the application of telemedicine strategies. First, considerable progress in understanding the risk factors, diagnosis, and treatment of ROP has been achieved in recent years, largely as a result of the multicenter Cryotherapy for ROP and Early Treatment for ROP studies.6,7 Second, ROP continues to be a principal cause of childhood blindness around the world.8,9 Third, the number of infants requiring ROP surveillance is increasing because the incidence of premature births is rising,10 and because recently revised guidelines have modified the gestational age cutoff for examination to avoid missing clinically significant disease.11 Fourth, current diagnostic techniques, which consist of dilated examination by an experienced ophthalmologist, are physiologically stressful for infants, time consuming for ophthalmologists, and logistically difficult for neonatology personnel. A recent American Academy of Ophthalmology survey found that only half of retinal specialists and pediatric ophthalmologists are managing ROP, and that one fifth plan to stop in the near future because of concerns such as medicolegal liability and low reimbursements.12 Previous studies have demonstrated that remote interpretation of wide-angle digital retinal images may have adequate sensitivity and specificity to diagnose clinically significant ROP.13–18 These studies measured accuracy of telemedical ROP diagnosis by masked graders compared with a reference standard of dilated ophthalmoscopy. To our knowledge, all prior studies of this nature have utilized images captured by ophthalmologists or professional photographers. However, a practical telemedicine system requires that clinical data and retinal images be acquired by nonophthalmic personnel who are available at the point of care. Although ophthalmoscopic examination is considered the gold standard for ROP diagnosis and classification,11,19 it is not clear that it is inherently superior to an image-based telemedical interpretation by remote experts. Published research has shown that there may be significant practice variation among physicians, even when faced with identical clinical situations.20 –23 Image-based telemedical diagnosis may create opportunities for increasing accuracy and decreasing variability by standardizing the process of clinical data capture and interpretation. In fact, in other diseases such as diabetic retinopathy, remote interpretation of standard photographs by certified reading centers has been shown to be more accurate than ophthalmoscopy.24,25
Before implementation of large-scale telemedicine systems for ROP diagnosis, it will be essential to understand the diagnostic performance of image-based telemedical interpretation using photographs captured by nonophthalmic personnel, as well as the factors related to diagnostic discrepancy between ophthalmoscopy and telemedicine. This paper addresses these gaps in knowledge by analyzing agreement between ophthalmoscopic examination and masked telemedical interpretation performed by the same physician graders. Each study examination was performed by one of two experienced pediatric ophthalmologists, and all data were collected prospectively. This experimental design controls for variation among individual physicians by evaluating intraphysician discrepancy, which allows for the delineation of differences between ophthalmoscopic and telemedical approaches.
Materials and Methods Infrastructure for Telemedical Interpretation This study was approved by the Institutional Review Board at Columbia University Medical Center. A neonatal intensive care unit (NICU) nurse was trained to perform wide-angle digital retinal imaging using a commercially available device (RetCam-II; Clarity Medical Systems, Pleasanton, CA). Training included 2-day initial instructional session with the device manufacturer, followed by 6 weekly sessions with the principal investigator (MFC) during regularly scheduled ophthalmoscopic examinations. At each session, approximately 3 infants were photographed, and findings were correlated with clinical examination results. A store-and-forward ROP telemedicine application was developed by the authors (LW, MFC), and included a secure database system (SQL 2005; Microsoft, Redmond, WA), an interface allowing the photographer to upload clinical data and images, and a Web-based interface for subsequent interpretation by an expert. This system was designed to represent an image-based telemedicine examination, and to simulate a real-world clinical scenario. Images from both eyes were displayed side by side, along with birth weight, gestational age, and postmenstrual age (PMA) at time of examination (Fig 1 [available at http://aaojournal.org]).
Ophthalmoscopic Examination and Retinal Imaging Premature infants hospitalized in the Columbia University NICU from November 1, 2005 through October 31, 2006, were included in this prospective study if they met existing criteria for ROP examination11,26 and if their parents provided informed consent for retinal imaging and study participation. Infants were ineligible if they had structural ocular anomalies, had prior treatment for ROP with laser photocoagulation or other ocular surgery, or had been considered by their neonatologist to be unstable for examination because of poor general health. Each study infant underwent 2 procedures that were sequentially performed under topical anesthesia at the NICU bedside. (1) Dilated ophthalmoscopic examination was performed according to standard protocols by one of two experienced pediatric ophthalmologists (SAK, MFC), depending on weekly NICU rounding schedules.11,26 Presence or absence of ROP was documented on standard clinical templates, based on the international classification of ROP.19,27 (2) Digital fundus photography was performed by the trained study nurse (OC), according to a specific protocol by which posterior, temporal, and nasal images were captured from
1223
Ophthalmology Volume 115, Number 7, July 2008 each retina. Up to two additional images were captured from each eye, if the nurse judged that they would provide additional diagnostic information. Imaging was performed by the nurse alone, without advice or supervision from the examining ophthalmologist. No subjects were excluded because of poor image quality or inability to capture images. Study infants were imaged during a maximum of two sessions, one at 31 to 33 weeks PMA and another at 35 to 37 weeks PMA. After each session, the best images were selected by the nurse and uploaded to the telemedicine system, together with the ophthalmoscopic examination data.
Telemedical Interpretation Two physician graders (SAK, MFC) performed telemedical interpretations on the same eyes that they had previously examined using ophthalmoscopy. To minimize the possibility that graders would remember data about specific patients, no identifiers were displayed, images were presented by the system in random order, and images were assigned to the appropriate physician grader by independent review of two authors (LW, OC). In addition, all telemedical interpretations were performed 4 to 12 months after ophthalmoscopic examinations. Telemedical interpretations were graded on an ordinal scale based on established criteria from the Cryotherapy for ROP6 and Early Treatment for ROP7 studies: (1) No ROP; (2) Mild ROP, defined as ROP less than type 2 prethreshold disease; (3) Type 2 prethreshold ROP (zone I, stage 1 or 2, without plus disease, or zone II, stage 3, without plus disease); and (4) Treatment-requiring ROP, defined as type 1 prethreshold ROP (zone I, any stage, with plus disease; zone I, stage 3, without plus disease; or zone II, stage 2 or 3, with plus disease) or threshold ROP (ⱖ5 contiguous or 8 noncontiguous clock-hours of stage 3 ROP in zone I or II, with plus disease).
Data Analysis Results from both examinations were analyzed using spreadsheet (Excel; Microsoft) and statistical software (SPSS 15.0; SPSS Inc., Chicago, IL). Absolute intra-grader agreement between ophthalmoscopic examination and telemedical interpretation was calculated for each physician using the ordinal classification described. The unweighted statistic was used to measure chance-adjusted agreement for the presence of disease, based on an accepted scale: 0 to 0.20, slight agreement; 0.21 to 0.40, fair agreement; 0.41 to 0.60, moderate agreement; 0.61 to 0.80, substantial agreement; and 0.81 to 1.00, almost perfect agreement.28 Cases in which these 2 examinations resulted in intraphysician discrepancy were identified and reviewed by 2 independent examiners (DYK, KES). After consensus of 3 authors (DYK, KES, MFC), the reason for discrepancy was identified as one of the following: no ROP identified by ophthalmoscopic examination; no ROP identified by telemedical interpretation; or discrepancy about presence of zone-1 disease, discrepancy about presence of plus disease, or other discrepancy in classification of ROP stage.
by study personnel within the study period, 7 (6.5%) parents declined to provide informed consent, and 2 (1.9%) were considered unstable by their neonatologists for imaging. Among the 67 study infants, 21 (31.3%) received a single set of examinations at 31 to 33 weeks PMA, 10 (14.9%) received a single set at 35 to 37 weeks PMA, and 36 (53.7%) received examinations at both sessions. Both eyes of each subject were examined, for a total of 206 eye examinations (103 right eyes, 103 left eyes). Mean birth weight for all study infants was 912.4 g (range, 398 –1440), and mean gestational age at birth was 26.7 weeks (range, 23–33). Among 134 eyes from the 67 study infants, the overall incidence of mild or worse ROP diagnosed at either the 31to 33-week or 35- to 37-week sessions was 51.5% (69/134 eyes) by ophthalmoscopic examination, and 54.5% (73/134 eyes) by telemedical interpretation. Based on ophthalmoscopic examinations, the overall observed agreement between ordinal disease classifications of right and left eyes within the same patient was 91.3% (94/103 paired examinations).
Intraphysician Agreement between Ophthalmoscopy and Telemedicine Agreement between ophthalmoscopic examination and telemedical interpretation by the same physician grader is summarized in Table 1. Physician 1 showed 86.7% agreement between the two examinations, and physician 2 showed 85.4% agreement. Overall, the two graders showed 86.3% intraphysician agreement between the two examination modalities. Ophthalmoscopic examination resulted in a more severe diagnosis than telemedical interpretation in 5.8% of eyes, whereas telemedical interpretation resulted in a more severe diagnosis in 7.8% of eyes. There was no statistically significant tendency by either grader to overdiagnose or underdiagnose ROP using either examination modality (P ⫽ 0.85 for physician 1 and P ⫽ 0.35 for physician 2; Wilcoxon signed-rank test). As displayed in Table 2, the statistic for intraphysician agreement between examinations ranged from 0.657 (substantial agreement) for diagnosis of treatment-requiring ROP to 0.854 (near-perfect agreement) for diagnosis of mild or worse ROP.
Intraphysician Discrepancies between Ophthalmoscopy and Telemedicine Among the 206 eye examinations, there were a total of 28 (13.6%) discrepancies in diagnosis between ophthalmoscopic examination and telemedical interpretation by the same physician grader (Table 1). As summarized in Table 3 (available at http://aaojournal.org), 15 intraphysician discrepancies occurred between “no ROP” and “mild ROP” classifications; 9 occurred when telemedical interpretation found mild ROP but ophthalmoscopy found no ROP (Fig 2A), whereas 6 of these occurred when ophthalmoscopy found mild ROP but telemedicine found no ROP (Fig 2B). In the remaining 13 intraphysician discrepancies, either examination reported type 2 prethreshold or worse ROP. These occurred from discrepancies regarding presence of zone 1 disease (8 cases; Fig 3A–C [available at http://aaojournal.org]); presence of plus disease (4 cases; Fig 3D, E); and ROP staging (1 case; Fig 3F).
Results Study Population
Discussion
The study included 67 consecutive infants whose parents consented to participate. This represented 62.0% of the 108 eligible infants hospitalized in the Columbia NICU who met inclusion criteria during the study period. Of the 41 (38.0%) eligible infants who did not participate, 32 (29.6%) parents could not be contacted
This study found that, first, intraphysician agreement between ophthalmoscopic examination and telemedical interpretation was very high. Second, neither ophthalmoscopy nor telemedicine methods had a systematic tendency to
1224
Scott et al 䡠 Telemedical ROP Diagnosis: Ophthalmoscopic vs. Imaging Examination Table 1. Absolute Intraphysician Agreement between Ophthalmoscopic Examination and Telemedical Interpretation for Retinopathy of Prematurity (ROP) Diagnosis by Physician 1, Physician 2, and Physicians 1 and 2 Combined Telemedicine Ophthalmoscopy Physician 1 No ROP Mild ROP Type-2 prethreshold Treatment-requiring Total Physician 2 No ROP Mild ROP Type-2 prethreshold Treatment-requiring Total Physicians 1 and 2 No ROP Mild ROP Type-2 prethreshold Treatment-requiring Total
No ROP
Mild ROP
Type-2 prethreshold
Treatment-requiring
Total
63 (39.9%)* 6 (3.8%) 0 (0.0%) 0 (0.0%) 69 (43.7%)
7 (4.4%) 59 (37.3%)* 2 (1.3%) 0 (0.0%) 68 (43.0%)
0 (0.0%) 4 (2.5%) 11 (7.0%)* 2 (1.3%) 17 (10.8%)
0 (0.0%) 0 (0.0%) 0 (0.0%) 4 (2.5%)* 4 (2.5%)
70 (44.3%) 69 (43.7%) 13 (8.2%) 6 (3.8%) 158
38 (79.2%)† 0 (0.0%) 0 (0.0%) 0 (0.0%) 38 (79.2%)
2 (4.2%) 3 (6.2%)† 2 (4.2%) 0 (0.0%) 7 (14.6%)
0 (0.0%) 1 (2.1%) 0 (0.0%)† 0 (0.0%) 1 (2.1%)
0 (0.0%) 2 (4.2%) 0 (0.0%) 0 (0.0%)† 2 (4.2%)
40 (83.3%) 6 (12.5%) 2 (4.2%) 0 (0.0%) 48
101 (49.0%)‡ 6 (2.9%) 0 (0.0%) 0 (0.0%) 107 (51.9%)
9 (4.4%) 62 (30.1%)‡ 4 (1.9%) 0 (0.0%) 75 (36.4%)
0 (0.0%) 5 (2.4%) 11 (5.3%)‡ 2 (1.0%) 18 (8.7%)
0 (0.0%) 2 (1.0%) 0 (0.0%) 4 (1.9%)‡ 6 (2.9%)
110 (53.4%) 75 (36.4%) 15 (7.3%) 6 (2.9%) 206
Values are displayed as absolute number (percentage) of eyes. Diagnosis is classified as no ROP, mild ROP, type 2 prethreshold ROP, or treatment-requiring ROP. *Overall agreement for physician 1 was 86.7%. † Overall agreement for physician 2 was 85.4%. ‡ Overall agreement (boldfaced cells) for physicians 1 and 2 combined was 86.3%.
overdiagnose or underdiagnose ROP compared with the other method. Third, many clinically significant diagnostic discrepancies were based on uncertainty about presence of zone 1 or plus disease. By controlling for diagnostic variation among different physician graders, the study design was intended to identify key differences between ophthalmoscopic examination and telemedical interpretation for ROP diagnosis. The intraphysician agreement between ophthalmoscopic examination and telemedical interpretation in this study was very high, with a statistic ranging from 0.657 (substantial agreement) to 0.854 (near-perfect agreement). To put this into perspective, it is useful to compare with other published results. In our previous work involving telemedical ROP diagnosis, the was 0.38 to 0.81 for intergrader reliability between ophthalmologists, and 0.60 for intragrader reliability.17 Using the Early Treatment Diabetic Retinopathy Table 2. Unweighted Statistics for Intraphysician Agreement between Ophthalmoscopic Examination and Telemedical Interpretation for Retinopathy of Prematurity (ROP) Diagnosis by Two Physicians ROP Diagnostic Cutoff
(SE)
Mild or worse ROP Type 2 prethreshold or worse ROP Treatment-requiring ROP
0.854 (0.036) 0.726 (0.078) 0.657 (0.161)
SE ⫽ standard error. (SE) values are shown for diagnostic cutoffs of mild or worse ROP, type 2 prethreshold or worse ROP, and treatment-requiring ROP.
Study 7-field reference standard images, the weighted between graders reviewing the same images for presence of retinal lesions was 0.41 to 0.80, depending on the particular type of lesion.29 In a study involving agreement in fluorescein angiogram interpretation for photodynamic therapy eligibility in age-related macular degeneration, the statistic was 0.44 to 0.89 for intraobserver reliability and 0.37 to 0.40 for interobserver reliability.30 In other medical domains, the for reliability of physical examination techniques performed on pediatric patients with abdominal pain by emergency room attending physicians, residents, and surgeons was ⫺0.04 to 0.38.31 For interobserver agreement in radiographic diagnosis of upper lobe-predominant emphysema between expert radiologists and pulmonologists, the was shown to be 0.20 to 0.60.32 Taken together, these findings suggest that agreement between ROP ophthalmoscopic examination and telemedical interpretation modalities performed by the same physicians is comparable or higher than agreement between ophthalmic or medical tests performed by different expert physicians, even when the latter examinations are considered well-accepted diagnostic methodologies. This conclusion supports the notion that a telemedicine strategy for ROP diagnosis, where images are captured by a trained NICU nurse, is technically feasible without sacrificing overall diagnostic accuracy. Among the 206 eye examinations in this study, there were discrepancies in diagnosis by the same physician grader in 28 (13.6%) examinations (Table 3 [available at http://aaojournal.org]). Thirteen of these discrepancies were particularly significant, in which either examination diagnosed type 2 prethreshold or worse ROP. It is worthwhile to
1225
Ophthalmology Volume 115, Number 7, July 2008
Figure 2. Examples of eyes with intraphysician discrepancies between ophthalmoscopic examinations and telemedical interpretations, in which one modality detected disease but the other did not. A, Eye in which ophthalmoscopy diagnosed no retinopathy of prematurity (ROP), whereas telemedical interpretation diagnosed mild ROP. B, Eye in which ophthalmoscopy diagnosed mild ROP, whereas telemedical interpretation diagnosed no ROP.
review and analyze these discrepancies in detail. The majority of clinically significant discrepancies resulted from uncertainty about presence of zone 1 ROP. In this study, 2 eyes had zone 1 disease diagnosed by ophthalmoscopy but not telemedicine (Fig 3A [available at http://aaojournal.org]), whereas 6 eyes had zone 1 ROP diagnosed by telemedicine but not on ophthalmoscopy (Fig 3B, C). According to the international classification for ROP, zone 1 is clearly defined as a circular region with a diameter equal to twice the distance between the optic disc and the center of the macula.19,27 Despite this precise definition, the relevant anatomic landmarks may be difficult to identify and distinguish during examination. This is an especially critical distinction because zone 1 eyes were found to have the poorest structural and visual prognosis in the Cryotherapy for ROP study.33 Furthermore, the Early Treatment for ROP study established that presence of disease in zone 1 is a key factor in distinguishing types 1 and 2 ROP.7 For these reasons, the number of discrepancies involving zone 1 disease in this study raises concerns about the accuracy and reproducibility of clinical ROP diagnosis. Uncertainty about presence of plus disease was a second key source of clinically significant intraphysician discrepancy. In this study, 2 eyes had plus disease diagnosed by ophthalmoscopy but not telemedicine (Fig 3D), and 2 eyes had plus disease diagnosed by telemedicine but not ophthalmoscopy (Fig 3E). Plus disease is defined as the presence of a minimum level of arteriolar tortuosity and venular dilation as compared with a standard published photograph.6,19,27 This definition is based on a photographic standard with subjective qualifiers, rather than quantifiable measurements, and we previously showed that the mean statistic of an expert compared with 21 other expert graders reviewing the same images for presence of plus disease was 0.19 to 0.66.22 Findings from the present study demonstrate that identification of plus disease by ophthalmoscopic examination and telemedical interpretation is not consistent, even when performed by the same physician grader. This finding supports the notion that ophthalmologists may have differing inter-
1226
pretations regarding the level of “dilation” and “tortuosity” that is sufficient for plus disease, and raises further concerns that the diagnosis of clinically significant ROP may be suboptimal in this respect. To our knowledge, all previously published studies involving telemedicine for ROP diagnosis have compared the results of telemedical interpretation to a reference standard of ophthalmoscopic examination.13–18 However, the number and nature of clinically significant discrepancies identified in this study suggests that ophthalmoscopic examination may not necessarily be superior. In fact, it is conceivable that telemedicine could provide a more accurate and reliable diagnosis of plus disease because findings may be directly compared with the standard photographic definition.6 Improved diagnosis of zone 1 disease could also occur because the relevant anatomic landmarks may be precisely identified and measured on retinal images. Finally, we note that an additional 9 of the 15 discrepancies occurred because disease was identified by telemedical interpretation, but not by ophthalmoscopy (Table 3 [available at http:// aaojournal.org]). Subsequent review of these images appeared to disclose photographic evidence of mild ROP (Fig 2A), despite the fact that no ROP was detected by ophthalmoscopic examination. Although these situations would be considered “false-positive” errors by telemedicine in traditional study designs using ophthalmoscopy as a reference standard, it is likely that many of them may actually represent “false-negative” errors by the ophthalmoscopic examination. False-positive errors may be associated with unnecessary eye examinations or laser photocoagulation, whereas false-negative errors may result in missed opportunities for treating a potentially blinding condition. Overall, these results may have important implications for clinical ROP care, as well as for the general design of research studies that are intended to evaluate the efficacy of new diagnostic technologies compared with clinical examination by experts. Several limitations of this study should be noted. Both examinations on all study eyes were performed by the same
Scott et al 䡠 Telemedical ROP Diagnosis: Ophthalmoscopic vs. Imaging Examination physician grader, and imaging was always performed second. To minimize any systematic bias from this design, telemedical study interpretations were performed 4 to 12 months after ophthalmoscopic examinations, and no identifiable patient data were displayed with images. Only two graders were included in this study, because these were the two physicians who performed weekly ophthalmoscopic examinations on all premature infants at the study center.11,26 It is interesting to note that there was a significant difference in diagnostic classification of ophthalmoscopic examinations as well as telemedical interpretations between the two physician graders (P⬍0.001 by chi-square test). For example, physician 1 found that 44.3% of eyes had “no ROP” by ophthalmoscopy, whereas physician 2 found that 83.3% of eyes had “no ROP” by ophthalmoscopy (Table 1). Among infants examined by physician 1 compared with physician 2, there were no statistically significant differences in mean birth weight (868 vs. 964 g; P ⫽ 0.176), mean gestational age at birth (26.3 vs. 27.2 weeks; P ⫽ 0.145), or mean PMA at examination (33.3 vs. 33.8 weeks; P ⫽ 0.315). However, it is conceivable that these differences may have been clinically significant. Another possible explanation is that physician graders may have had a systematic tendency to either overdiagnose or underdiagnose ROP disease.23 We emphasize that this project was not intended to compare inter-physician agreement of telemedical diagnosis; we have explored that issue in detail in a parallel study.34 Data analysis was performed by eye, despite the fact that ROP diagnoses in right and left eyes of the same patient are not independent. Because standard ophthalmoscopy is performed on both eyes together, images of both eyes were presented to physician graders side by side to best simulate a real-world scenario. This minimizes bias favoring either examination, and allows for analysis of outcomes from both eyes of each infant. Standardization of image grading conditions was not performed. Parameters such as luminance, resolution, and contrast have been shown in the radiology literature to affect diagnosis,35 and the extent to which they may affect interpretation of images in telemedicine systems is not known. Telemedicine has potential to improve the quality, delivery, and accessibility of care for infants with ROP, and for patients with other ophthalmic and medical diseases. In this study, we show that agreement between ophthalmoscopy and telemedicine by the same physicians is substantial to almost perfect, and that it is higher than previously published agreements among multiple examiners who independently perform well-established diagnostic tests. In addition, we show that uncertainty about presence of zone 1 disease and plus disease are two key sources of clinically significant discrepancy between examination findings. These findings may have important implications for implementation of telemedicine systems, consistent classification of disease findings, and the design of future research studies involving telemedicine evaluation. Acknowledgments. The authors thank Dr Zhiliang Ying for his statistical assistance with this study.
References 1. Callahan CW, Malone F, Estroff D, Person DA. Effectiveness of an Internet-based store-and-forward telemedicine system for pediatric subspecialty consultation. Arch Pediatr Adolesc Med 2005;159:389 –93. 2. Sable CA, Cummings SD, Pearson GD, et al. Impact of telemedicine on the practice of pediatric cardiology in community hospitals. Pediatrics 2002;109:E3. Available at: http:// pediatrics.aappublications.org/cgi/content/full/109/1/e3. Accessed September 10, 2007. 3. Whited JD, Hall RP, Simel DL, et al. Reliability and accuracy of dermatologists’ clinic-based and digital image consultations. J Am Acad Dermatol 1999;41:693–702. 4. Field MJ, Grigsby J. Telemedicine and remote patient monitoring. JAMA 2002;288:423–5. 5. Whitten PS, Mair FS, Haycox A, et al. Systematic review of cost effectiveness studies of telemedicine interventions. BMJ 2002;324:1434 –7. 6. Cryotherapy for Retinopathy of Prematurity Cooperative Group. Multicenter trial of cryotherapy for retinopathy of prematurity: preliminary results. Arch Ophthalmol 1988;106: 471–9. 7. Early Treatment for Retinopathy of Prematurity Cooperative Group. Revised indications for the treatment of retinopathy of prematurity: results of the Early Treatment for Retinopathy of Prematurity Randomized Trial. Arch Ophthalmol 2003;121: 1684 –94. 8. Munoz B, West SK. Blindness and visual impairment in the Americas and the Caribbean. Br J Ophthalmol 2002;86:498 –504. 9. Gilbert C, Foster A. Childhood blindness in the context of VISION 2020 —the Right to Sight. Bull World Health Organ 2001;79:227–32. 10. National Center for Health Statistics. Hamilton BE, Martin JA, Ventura SJ. Births: preliminary data for 2005. Available at: www.cdc.gov/nchs/products/pubs/pubd/hestats/prelimbirths05/ prelimbirths05.htm. Accessed April 13, 2007. 11. Section on Ophthalmology American Academy of Pediatrics, American Academy of Ophthalmology, American Association for Pediatric Ophthalmology and Strabismus. Screening examination of premature infants for retinopathy of prematurity. Pediatrics 2006;117:572– 6. 12. Ophthalmologists warn of shortage in specialists who treat premature babies with blinding eye condition [press release]. San Francisco: American Academy of Ophthalmology; July 13, 2006. Available at: www.aao.org/newsroom/release/20060713. cfm. Accessed April 13, 2007. 13. Ells AL, Holmes JM, Astle WF, et al. Telemedicine approach to screening for severe retinopathy of prematurity. A pilot study. Ophthalmology 2003;110:2113–7. 14. Roth DB, Morales D, Feuer WJ, et al. Screening for retinopathy of prematurity employing the RetCam 120: sensitivity and specificity. Arch Ophthalmol 2001;119:268 –72. 15. Wu C, Petersen RA, VanderVeen DK. RetCam imaging for retinopathy of prematurity screening. J AAPOS 2006;10:107–11. 16. Yen KG, Hess D, Burke B, et al. The optimum time to employ telephotoscreening to detect retinopathy of prematurity. Trans Am Ophthalmol Soc 2000;98:145–50. 17. Chiang MF, Keenan JD, Starren J, et al. Accuracy and reliability of remote retinopathy of prematurity diagnosis. Arch Ophthalmol 2006;124:322–7. 18. Chiang MF, Starren J, Du YE, et al. Remote image based retinopathy of prematurity diagnosis: a receiver operating characteristic analysis of accuracy. Br J Ophthalmol 2006; 90:1292– 6.
1227
Ophthalmology Volume 115, Number 7, July 2008 19. International Committee for the Classification of Retinopathy of Prematurity. The international classification of retinopathy of prematurity revisited. Arch Ophthalmol 2005;123:991–9. 20. Peabody JW, Luck J, Glassman P, et al. Comparison of vignettes, standardized patients, and chart abstraction: a prospective study of 3 methods for measuring quality. JAMA 2000;283:1715–22. 21. Veloski J, Tai S, Evans AS, Nash DB. Clinical vignette-based surveys: a tool for assessing physician practice variation. Am J Med Qual 2005;20:151–7. 22. Chiang MF, Jiang L, Gelman R, et al. Inter-expert agreement of plus disease diagnosis in retinopathy of prematurity. Arch Ophthalmol 2007;125:875– 80. 23. Chiang MF, Gelman R, Jiang L, et al. Plus disease in retinopathy of prematurity: an analysis of diagnostic performance. Trans Am Ophthalmol Soc 2007;105:73– 84. 24. Moss SE, Klein R, Kessler SD, Richie KA. Comparison between ophthalmoscopy and fundus photography in determining severity of diabetic retinopathy. Ophthalmology 1985; 92:62–7. 25. Kinyoun JL, Martin DC, Fujimoto WY, Leonetti DL. Ophthalmoscopy versus fundus photographs for detecting and grading diabetic retinopathy. Invest Ophthalmol Vis Sci 1992; 33:1888 –93. 26. American Academy of Pediatrics, Section on Ophthalmology. Screening examination of premature infants for retinopathy of prematurity. Pediatrics 2001;108:809 –11. 27. Committee for the Classification of Retinopathy of Prematurity. International classification of retinopathy of prematurity. Arch Ophthalmol 1984;102:1130 – 4.
1228
28. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977;33:157–74. 29. Early Treatment Diabetic Retinopathy Study Research Group. Grading diabetic retinopathy from stereoscopic color fundus photographs—an extension of the modified Airlie House classification. ETDRS report number 10. Ophthalmology 1991; 98(suppl):786 – 806. 30. Holz FG, Jorzik J, Schutt F, et al. Agreement among ophthalmologists in evaluating fluorescein angiograms in patients with neovascular age-related macular degeneration for photodynamic therapy eligibility (FLAP-Study). Ophthalmology 2003;110:400 –5. 31. Yen K, Karpas A, Pinkerton HJ, Gorelick MH. Interexaminer reliability in physical examination of pediatric patients with abdominal pain. Arch Pediatr Adolesc Med 2005;159:373– 6. 32. Hersh CP, Washko GR, Jacobson FL, et al. Interobserver variability in the determination of upper lobe-predominant emphysema. Chest 2007;131:424 –31. 33. Cryotherapy for Retinopathy of Prematurity Cooperative Group. Multicenter trial of cryotherapy for retinopathy of prematurity: natural history ROP: ocular outcome at 5 1/2 years in premature infants with birth weights less than 1251 g. Arch Ophthalmol 2002;120:595–9. 34. Chiang MF, Wang L, Busuioc M, et al. Telemedical retinopathy of prematurity diagnosis: accuracy, reliability, image quality. Arch Ophthalmol 2007;125:1531– 8. 35. Herron JM, Bender TM, Campbell WL, et al. Effects of luminance and resolution on observer performance with chest radiographs. Radiology 2000;215:169 –74.
Scott et al 䡠 Telemedical ROP Diagnosis: Ophthalmoscopic vs. Imaging Examination Table 3. Reasons for Intraphysician Discrepancy between Ophthalmoscopic Examination and Telemedical Interpretation for Retinopathy of Prematurity (ROP) Diagnosis Based on Independent Review by Authors Reason for Discrepancy Discrepancy in classification of disease stage No ROP identified by ophthalmoscopic examination No ROP identified by telemedical interpretation Discrepancy in classification between stages 2 and 3 Discrepancy about presence of zone 1 disease Discrepancy about presence of plus disease Total
No. (%) of Eyes 9 (32.1) 6 (21.4) 1 (3.6) 8 (28.6) 4 (14.3) 28 (100)
1228.e1
Ophthalmology Volume 115, Number 7, July 2008
Figure 1. Example of Web-based telemedicine system developed for image-based diagnosis. Gestational age at birth, weight at birth, and postmenstrual age at time of examination are shown. Images from posterior pole, temporal retina, and nasal retina of both eyes are displayed, along with up to two additional images from each eye if judged by nurse photographer to be useful for diagnosis. Physician graders review findings and provide clinical diagnosis.
1228.e2
Scott et al 䡠 Telemedical ROP Diagnosis: Ophthalmoscopic vs. Imaging Examination
Figure 3. Examples of eyes with intraphysician discrepancies between ophthalmoscopic examination and telemedical interpretation, in which either modality detected type 2 prethreshold or worse retinopathy of prematurity (ROP). A, Eye in which ophthalmoscopy diagnosed type 2 prethreshold ROP because of presence of zone 1 disease, whereas telemedicine diagnosed mild ROP. B, C, Nasal and temporal views of eye in which ophthalmoscopy diagnosed mild ROP, whereas telemedicine diagnosed type 2 prethreshold ROP because of presence of zone 1 disease. D, Eye in which ophthalmoscopy diagnosed treatment-requiring ROP because of presence of plus disease, whereas telemedicine diagnosed type 2 prethreshold ROP. E, Eye in which ophthalmoscopy diagnosed mild ROP, whereas telemedicine diagnosed treatment-requiring ROP because of presence of plus disease. F, Eye in which ophthalmoscopy diagnosed mild ROP, whereas telemedicine diagnosed type 2 prethreshold ROP because of presence of stage 3 disease.
1228.e3