Data mining of administrative claims data for pathology services

Share Embed


Descrição do Produto

Proceedings of the 34th Hawaii International Conference on System Sciences - 2001

Data Mining of Administrative Claims Data for Pathology Services Simon Hawkins ([email protected]) Graham J. Williams ([email protected]) Rohan A. Baxter ([email protected]) Peter Christen ([email protected]) Michael J. Fett ([email protected]) Markus Hegland ([email protected]) Fuchun Huang ([email protected]) Ole Nielsen ([email protected]) Tatiana Semenova ([email protected]) Andrew Smith ([email protected]) Cooperative Research Centre for Advanced Computational Systems (ACSys) GPO Box 664, Canberra ACT 2601, Australia.

Abstract Australia has a universal health insurance scheme called Medicare. Medicare payments for pathology services generate voluminous transaction data on patients, doctors and pathology laboratories. The Health Insurance Commission (HIC) currently uses predictive models to monitor compliance with regulatory requirements. The HIC commissioned a project to investigate the generation of new features from the data. These features were summarised, visualised and used as inputs for clustering and outlier detection methods. Some initial interpretations and insights into the pathology service industry are discussed. Further work is required for feature selection, training of predictive models with the new features and the evaluation of performance against the currently deployed models.

1

Introduction

Australia has a universal health insurance scheme called Medicare. Medicare payments for pathology services generate voluminous transaction data on patients, doctors and pathology laboratories. These payments are administered by a government agency called the Health Insurance Commission (HIC). The HIC’s charter is to make accurate and timely payments while maintaining confidentiality and privacy. The administrative claims transaction data poten-

tially contain valuable information about the nature of the pathology services industry and its regulatory compliance. This data mining project was undertaken by the Advanced Computational Systems Cooperative Research Centre (ACSys), which is a third-party research consultancy center contracted by the HIC. This paper reports on the project results as well as some project management issues arising from an out-sourced data mining project in the privacy-conscious health industry. The business problem for the HIC is that of monitoring the compliance of pathology laboratories with the payment system’s regulatory framework. The HIC already has capabilities for training and deploying predictive models to aid in predicting levels of compliance. If a compliance level for a sample of pathology laboratories for a regulatory requirement is given, then the HIC can use predictive modeling techniques, such as neural networks and decision trees, to predict the compliance of a pathology laboratory. The resulting predictive models can then be deployed to alert the HIC whenever a pathology laboratory is predicted to be above a risk-threshold. These predictive models currently use a small number of features as inputs. These features typically include volumes of transactions, the types of pathology tests performed and the dollar value of tests performed each quarter for each pathology laboratory. They are statistics derived from the online transaction processing system. For the current project, the HIC was interested in the question of what other features may be generated from the transaction data for the purposes of characterising pathol-

0-7695-0981-9/01 $10.00 (c) 2001 IEEE 1

1

Proceedings of the 34th Hawaii International Conference on System Sciences - 2001

ogy laboratory utilization patterns. In the context of the knowledge discovery in databases (KDD) process [4], this project focusses on the earlier steps of data preprocessing and feature generation (also called data transformation). In commercial data mining application areas, such as marketing, financial modelling and telecommunications, the feature generation task is vital for competitive success. The choice of the right features in financial modelling offers competitive advantages for the analysts which may explain the dearth of literature. However, the feature generation task has not been prominent in the literature [8]. The latter steps of the KDD process were not the primary focus of this project in terms of: pattern searching using data mining techniques, model evaluation and model deployment. One key reason for structuring the project this way was the HIC’s legislated privacy requirements. It cannot legally release identified data about pathology laboratories, doctors or patients to a third-party such as ourselves. We now briefly describe the HIC’s concern with this study. The HIC aims to better understand the structure of the pathology services industry and the behaviour of the industry players, using the claims transactions that it processes. This understanding may have a bearing on: • Ways of identifying unnecessary, wasteful and excessive servicing and inappropriate practice (in extreme cases, fraudulent practice). • Policy recommendations for controlling pathology costs whilst maintaining pathology services [13]. • Policy recommendations for improving health care service delivery. The pathology services transactions provide data about health care consumers (patients), doctors providing services (general practitioners and specialists) and pathology laboratories. A transaction arises for each Medicare item (which may comprise of a standard combination of two or more tests) ordered when a patient visits a doctor. Except in rare cases, the doctor chooses the pathology laboratory that carries out the test. In most cases the pathology laboratory then makes a fee claim to the HIC for the pathology service provided. In other cases, the pathology laboratory invoices the patient directly, who then makes a claim to the HIC. For most tests, patients will go to a collection centre run by a pathology laboratory for specimen collection. For some tests, doctors collect the specimen themselves and make a fee claim to the HIC for collecting the specimen. Pathology laboratories are not permitted to offer inducements to doctors to order large number of pathology tests from their laboratories. In rural and regional areas, doctors have limited choice of pathology laboratory. This prior knowledge of the pathology services industry suggests that patterns in the relationships between doctors and pathology

Period Quarter 1, 1997 Quarter 2, 1997 Quarter 3, 1997 Quarter 4, 1997 Quarter 1, 1998 Quarter 2, 1998 Quarter 3, 1998 Quarter 4, 1998 Total

File Size 680MB 704MB 706MB 700MB 749MB 730MB 758MB 733MB 5.8GB

Transactions 4,448,547 4,529,848 4,496,426 4,423,777 4,777,493 4,640,190 4,819,261 4,623,801 36,759,343

Table 1. Number of transactions by quarter

services are of primary interest for assessing any inappropriate practices and for understanding the pathology industry’s structure and behaviour. Patterns in test ordering for particular patients by doctors are also of interest in assessing health care service delivery quality. The HIC has not yet completed its review of the results of this project. However we will motivate our results with some initial interpretations and hypothetical policy implications. The results should be of general interest to health administrators possessing health service transaction data. The remainder of the paper is organised as follows: Section 2 describes the data organisation and data transformations undertaken before features could be generated and visualised efficiently and flexibly. Section 3 describes the new feature sets that were generated, and some data mining methods that utilise these features and some visualisations of the feature sets. Section 4 describes additional features with time components that were investigated. Section 5 summarises the current and prospective insights gained from using the generated features. It also discusses whether the project has real benefits despite privacy requirements restricting the model evaluation and testing that could be done. Section 6 gives our conclusions.

2

Data Organisation

In this section, we describe the available data, the data transformations and the data organisation used to enable the fast access required for the feature generation methods.

2.1

Data Types

The project data were Medicare Benefits Schedule Category 6 (Pathology Services) transactions for the State of New South Wales for the eight quarters in 1997/1998. Table 1 summarizes the dataset. Additional data on referring doctor attributes were also provided. Each transaction has 44 fields relating to four distinct entities. They are the pathology laboratory, which performs

0-7695-0981-9/01 $10.00 (c) 2001 IEEE 1

2

Proceedings of the 34th Hawaii International Conference on System Sciences - 2001

Entity

Transaction

Pathology Laboratory Doctor

Patient

Entity fields (meaning described in text where they arise) Test item number, date of service, date of processing, date of referral, date of lodgement, schedule fee for test, benefit paid, hospital indicator Unique identifier, RRMA Unique identifier, RRMA, specialty (GP or specialist) Unique identifier, date of birth, gender, RRMA, home country, age

Table 2. Summary of transaction fields, grouped by the entity they describe

the pathology test, the doctor, who orders the test, the patient, for whom the test is ordered and the transaction itself. Table 2 gives a summary of the transaction fields. The 36.8 million transactions covered 79 pathology laboratories, 20, 314 doctors and 3, 853, 603 patients. The HIC is required by law to de-identify fields that could identify any individual entity. This was done by encrypting entity identifiers and postcodes. In lieu of unencrypted postcode location information, a RRMA field coded seven different types of geographic regions, including rural, metropolitan and city.

2.2

Data Transformation

The following pre-processing was performed: • The five date fields were converted to day offsets, starting from January 1, 1970 (the Unix epoch starting date). The offsets for dates before January 1, 1970 are negative. This simplified the calculation of time lags used in feature generation. • Empty field values were replaced with a marker value. • Since some pathology tests have different item numbers in different years, all test item numbers were mapped to those current at June 1999. The preprocessing was done using the Perl scripting language [12], because we noticed an order of magnitude difference in performance between Perl and Tcl [10]. This performance difference is important considering the quantity of

data involved. For example, a single pass of the data using Tcl took 72 hours, whereas it took 3 hours in Perl on our ten 167MHz-processor, 4.5 Gigabyte Sun 4000 Enterprise server. A number of passes over the data were required during the data transformation and data stratification process; a three day wait for each would soon take up a significant proportion of the project time.

2.3

Data Organisation and Access

The transactions were originally stored in a single large relational database table. One approach, and a current area of research in data mining, is to interface data mining methods with this relational database [6, 11]. SQL queries were used for ad hoc querying of the data throughout the project. However, our explorations of feature generation required fast access to one or more individual transaction columns, whereas a relational database provides fast access to individual transaction rows. Alternative approaches for fast column access include data-cubes [1], and sufficient statistic caching [9]. These approaches are efficient for specific data methods such as associative rules or clustering, but are not efficient enough for intensive exploration of interesting features. We developed a column-binary-flat file approach that was efficient, yet flexible enough, for feature generation. This organisation of the data allows our feature generation programs to selectively access one or more columns in an efficient, flexible way.

2.4

Data Stratification Test subset Specialist, in hospital

Code sh

Specialist, out of hospital

so

GP, in hospital

gh

GP, out of hospital

go

Description Test ordered by specialist doctor for patient in hospital. Test ordered by specialist doctor for patient out of hospital. Test ordered by General Practice(GP) doctor for patient in hospital. Test ordered by GP doctor for patient out of hospital.

Table 3. The four subsets of the stratified data

We stratified the data into the four subsets shown in table 3. The motivation for the stratification was two-fold:

0-7695-0981-9/01 $10.00 (c) 2001 IEEE 1

3

Proceedings of the 34th Hawaii International Conference on System Sciences - 2001

• The subsets were smaller and more manageable for data manipulations. • It was expected a priori that ordering patterns for each subset would be distinct. This expectation was only partilly borne out. We found that test ordering patterns for the go and so subsets did not significantly vary in section 3.1

3

Features for Pathology Laboratory utilization patterns

for doctors in the sh category, and laboratories in cluster two did more than 60% of their tests for doctors in the so category. Cluster three covers the laboratories which almost exclusively did tests from doctors out of hospitals. 60% of their tests were for GPs out of hospitals and about 30% were for specialists out of hospitals. This is the largest cluster with 25 laboratories. Cluster four contains the 22 laboratories which processed more than 85% of their tests with GPs out of hospitals (go). The clusters seem to identify market 1

0.8

The purpose of this project was to find new features that could provide a basis for predictive modeling of pathology laboratory utilization patterns for compliance monitoring with regulatory requirements. As mentioned in section 1, existing features include counts of columns in the transaction table, including volumes of transactions, volume of types of pathology tests performed and the total dollar value of the tests performed. In section 3.1 we examine the structure of the pathology laboratory market using clustering on relative proportions of types of pathology tests. The clusters found could be interpreted as ‘market niches’, summarising the relative test volumes from each market sector. Each pathology laboratory can be classified according to the market niche (or cluster) to which it belongs. Knowledge of market structure can have implications for health care financial policy and service delivery. In section 3.2, pathology laboratories, which are outliers with respect to various feature distributions, are identified.

0.6

0.4

0.2

0 gh

go sh Cluster 1 (contains 17 laboratories)

so

gh

go sh Cluster 2 (contains 15 laboratories)

so

gh

go sh Cluster 3 (contains 25 laboratories)

so

gh

go sh Cluster 4 (contains 22 laboratories)

so

1

0.8

0.6

0.4

0.2

0

1

0.8

0.6

0.4

0.2

3.1

Relative volume of tests in each subset

The features generated for input into the clustering algorithm were the proportion of tests provided by a particular pathology laboratory in each of the four subsets. The proportion of tests, rather than absolute count of tests, is used in order to avoid pathology laboratory test volume affecting analyses. A k-means clustering method [5] was applied to the 79 pathology laboratories using the ‘relative proportion in each of the four subsets’ feature. The clustering method requires the number of clusters (k) to be given as an input parameter. Analyses were performed using the range between k = 2 and k = 10 as input to the clustering method. Distinct groups arose with k = 4, k = 4, and k = 5 clustering solutions. We choose to describe the k = 4 result because it can be interpreted as follows. The first two clusters in figure 1 contained laboratories that mainly processed pathology tests for specialists in and out of hospitals (sh and so), respectively. They contained 17 and 15 laboratories. Laboratories in cluster one processed about 65% of their tests

0

1

0.8

0.6

0.4

0.2

0

Figure 1. Laboratories clustered according to relative volume of tests for each doctor subset. For each cluster, the relative volume of tests in the four groups gh, go, sh and so is given.

niches for the pathology laboratories. The market niches could be due to geographical factors, to marketing niche,

0-7695-0981-9/01 $10.00 (c) 2001 IEEE 1

4

Proceedings of the 34th Hawaii International Conference on System Sciences - 2001

38% female. The laboratories differing from this proportion and their outlier proportions are shown in table 5.

to doctor preferences or to other factors we have not considered. There are possible policy implications from this insight into market structure. For example, there have been changes in the pathology laboratory market in recent years due to bankruptcies and mergers. It should be interesting to see if this activity is focussed within any of the identified niches or across them.

3.2

Laboratory Identifier 33$ 335 %$,3@5,9$6,944

Outlier Pathology Laboratories

A simple but effective data mining method is to examine outliers. For univariate continuous features, an outlier can be defined as a value outside two (or other suitable value) standard deviations from the mean. For univariate categorical features, an outlier can be defined as a relative proportion which differs from the mean by more than 10% (or other suitable value). We used many of the 44 transaction fields as features and then looked for outliers with respect to those features. We now give examples of features that were discriminatory (i.e. identified a small proportion of pathology laboratories as outliers) and the outlier laboratories that were found: • The relative proportion of tests over the eleven pathology test categories. The test categories and overall avGroup Number 1 2 3 4 5 6 7 8 9 10 11

Group Name Haematology Chemical Microbiology Immunology Tissue Cytology Cytogenetics Infertility Basic Episode Initiation Specimen Referred

Percentage 16 28 13 1 1 4 1 0 0 36 0

Table 4. Percentage of tests in each test group for laboratories that have mainly out of hospital GPs erage percentages are shown in table 4. Laboratory 9@6 had its percentage of tests in Basic tests significantly higher than the other laboratories. Laboratories 999 and +9% had test percentages for Chemical tests significantly higher than the other laboratories. These outlier results provide further insight into the structure of the pathology services market. • Patient gender. The average male-female distribution of patients across all laboratories was 62% male and

Proportion of Males to Females (female,male) (0.54, 0.46) (0.70, 0.30) (0.67, 0.33)

Table 5. Laboratories with relative gender proportion outliers

• Laboratories with outliers in the patient age distribution are shown in table 6. Some explanations will be due to differences in patient catchment area. For example, a pathology laboratory whose primary market is in regional coastal areas, with a high proportion of retirees, will be expected to have a higher proportion of older patients. We observe that pathology laboratory 335 performs relatively more tests for female patients and for older patients. More female patients can be explained by more older patients, as females live longer on average. More older patients could be explained by geographic location or market niche. Pathology laboratory 9@6 does more Basic tests and also has more patients in the late 30s than other laboratories. It may be that patients of this age group tend to have more Basic tests than other groups. The cause for this pathology laboratory having patients aged in their late 30s is once again possibly its geographic location, or market niche.

4

Features with a time component

Next, we describe some relevant temporal features and visualisations for characterising pathology laboratory behaviour. The pathology services market is dynamic. New pathology laboratories are formed (from mergers or ab initio), while others are disbanded. Doctors change their Laboratory Identifier %9 335 566 9@6

More patients than average in: late 40s late 70s mid 60s late 30s

Table 6. Laboratories with age distribution outliers

0-7695-0981-9/01 $10.00 (c) 2001 IEEE 1

5

Proceedings of the 34th Hawaii International Conference on System Sciences - 2001

choice of pathology laboratories; some use only one laboratory, others use a combination and some switch, add new laboratories and trial new ones. These trends, and explanations behind these trends, are of interest to the HIC.

Doctor 447 80 70 60

4.1

Doctors changing pathology laboratories

50 40

The HIC is interested in the ordering patterns of individual doctors as a function of the pathology laboratories used. Figures 2 and 3 show visualisations of the relationship two doctors have with pathology laboratories over time. The first panel for each doctor shows the time pattern of ordering of different tests by that doctor from different laboratories. Pathology laboratories have been mapped to integers between 1 and 79 on the y-axis of the first panel in figures 2 and 3. Each × stands for any number of tests that the doctor has ordered with a specific laboratory in a given week. The second panel shows the total number of tests per week (TpW) this doctor ordered from all laboratories. The combination of these two plots shows when, where and how many pathology tests a doctor ordered. Doctor %5%4

30 20 10 0 Jan97

Apr97

Jul97

Oct97

Jan98

Apr98

Jul98

Apr98

Jul98

Oct98 Dec98

Doctor 447 80 70 60 50 40 30 20

Doctor %5%4 10 80 0 Jan97

70

Apr97

Jul97

Oct97

Jan98

Oct98

Dec98

60

Figure 3. Pattern of laboratory use and tests per week for doctor 447

50 40 30 20 10 0 Jan97

Apr97

Jul97

Oct97

Jan98

Apr98

Jul98

Oct98 Dec98

Doctor %5%4 500 450 400 350 300 250 200 150 100 50 0 Jan97

Apr97

Jul97

Oct97

Jan98

Apr98

Jul98

Oct98

Dec98

Figure 2. Pattern of laboratory use and tests per week for doctor %5%4

in figure 2 started test ordering from two new laboratories (at 18 and 55 on the y-axis) in July 1997, while simultaneously doubling the number of tests ordered from around 200 per week to 400 per week. It is interesting to consider what caused this change in behaviour. It could be that the doctor has changed from working half-time to full-time, since 400 tests per week is about average for a full-time GP. Doctor 447 in figure 3 in August 1998 ceased ordering from two laboratories (those at 31 and 61 on the y-axis), and started ordered from a new one. During this transition, the doctor had a complete break from ordering (perhaps a holiday or relocation of practice) and then resumed ordering at previous test volumes. Although there are over 20,000 doctors in the data, most of the ordering patterns between a doctor and pathology laboratories are relatively stable over time. Besides the two doctors shown, we have manually identified about fifty other interesting ordering patterns. Of course, it is desirable to automate this process, but data mining algorithms in the literature we have reviewed do not currently handle multivariate data with a time component (while we have only

0-7695-0981-9/01 $10.00 (c) 2001 IEEE 1

6

Proceedings of the 34th Hawaii International Conference on System Sciences - 2001

visualised two of the fields over time, there are other timevarying fields of interest). An important area of current data mining research is the development of algorithms and visualisation techniques for time series analysis [7] and event sequence analysis [2].

4.2

Service Lags

A service lag is defined as the time interval between date of referral (DOR) and date of service (DOS) for a pathology test. A chronically-ill patient may have tests ordered for future periodic visits to a doctor. In these circumstances, it is convenient for the patient to have a specimen taken before the next visit, so that the doctor has the results available for consultation. However the regulatory guidelines do not generally allow this to be done more than a year in advance. The summary of service lags in figure 4 reveals that most tests are ordered within 6 months of the referral date. This compares with just 0.18% of tests with a service lag of six months to one year and 0.04% of tests have a service lag of more than one year. Two break points between main time intervals fall approximately at 183 days (6 months) and 365 days (1 year). Although service lags of more than a year are infrequent, a closer examination of them indicated some data quality issues. It became apparent that date of service and date of referral fields in these transactions had been subject to a high proportion of data entry errors. For example, 1997 is relatively commonly entered incorrectly as 1979. The significance of the six month gap is explained by reference to multiple test ordering rules in the Medicare pathology regulations: multiple tests can only be ordered up to six months ahead for seriously or chronically ill patients [3]. Outlier detection using the service lag feature reveals four laboratories with significantly longer service moving average lags than usual.

4.3

Patient Episodes

An episode is defined as the group of pathology tests ordered for a patient by the same doctor on the same day of consultation. Episode size is defined as the number of tests in an episode. Episode duration is defined as the maximum time between the date of referral for the episode tests and the date of service for an episode test. Various test-based features of episodes were examined. Laboratories that have more repeated tests in an episode than is typical are identified. Approximately 6.5 million episodes were initiated in 1997. For 99% of these episodes, the tests in the episode were performed by a single laboratory. Using episodes as features introduces the complication for analysis of windowing effects. Episodes that are initiated before the beginning of 1997 continue into 1997 and episodes that start near the end of 1998 do not end until after the available data window. We excluded these incomplete episodes from the analysis. Figure 5 presents the distribution of the number of tests in each episode size. Typically episodes of size 2k are much more frequent than episodes of size 2k + 1. The most frequent episodes have size of 2 to 4 tests. Episodes of this size form the majority of episodes. There is a drop in tests at around size 60. The issue in interpreting this feature is how many tests per episode can be clinically justified? The issue is a complicated one, but can be broken down into two separate issues. The first issue concerns multiple tests of the same type. This is indicated for chronically-ill patients, where it may be convenient to order multiple tests for the next weeks or months. The second issue concerns the number of different tests that can be ordered in the same episode with clinical justification.

1e+06

total number of pathol tests for this size

1e+08

Number of occurences

100000 10000 1000 100 10 1 1

10

100 DAYS (DOS - DOR)

1000

10000

1e+07 1e+06 100000 10000 1000 100 10 1

Figure 4. Frequency of service lags (time interval between DOR and DOS)

10 100 size of episode

1000

Figure 5. Number of episodes by episode size

0-7695-0981-9/01 $10.00 (c) 2001 IEEE 1

7

Proceedings of the 34th Hawaii International Conference on System Sciences - 2001

5

Implications

We have used the features generated in this paper to identify pathology laboratories and doctors who are outliers with respect to distributions over these features. Additionally, as mentioned in the introduction, the features generated in this paper will be used in new predictive models and their performance compared with existing predictive models using existing features. As one would expect, some of the pathology laboratory utilization patterns discovered using the new features presented here are novel, wherease others are known from alternative sources of knowledge. The client intends to investigate the novel patterns for explanation and significance. Why are we not reporting on these patterns? Under the Health Insurance Commission Act, as a third-party contractor to the HIC, we can only legally receive de-identified information about patients, pathology laboratories and doctors. This privacy requirement excludes interpretations of results using information that can identify entities (such as their geographic market focus). In this paper, we have used examples of how our results may possibly affect HIC policy. Of longer term interest, outside the scope of the present study, is comparing pathology ordering patterns against a standard of best practice. This is difficult using Australian health data because of the absence of diagnostic information that would explain why the pathology test was ordered. In some cases, clinical diagnoses can be inferred from other administrative data. For example, there is a standard battery of tests for the second trimester of pregnancy, and so deviations from standard, practice through under- or overservicing may be observed. It is an open question whether this type of inference can be derived reliably from administrative claims data. For instance, over-servicing can be confounded with further co-morbidity investigations.

6

Conclusion

We have generated new features from pathology services claims data that were then used to identify outlying laboratories and doctors. The features were also used to visualize of doctors’ ordering practices. Algorithms for automating the process of finding outliers, and for clustering entities characterised by features involving multivariate time series and outliers, are needed in this domain and are not currently available. We have extended the range of features available to the HIC beyond those computed from counts of columns in the transaction table. We identified a number of new interesting features for use in predictive modeling. These features were summarised, visualised and used as inputs for clustering and outlier detection methods. Data organisation and data

tranformation methods were described for the efficient access and manipulation of these new features. Further work is required for feature selection and training of predictive models with the new features and evaluation of performance against the currently deployed models.

Acknowledgements We thank the Health Insurance Commission(HIC) for access to the data and financial support of the project. Peter Christen was funded by the Swiss National Science Foundation (SNF) and the Novartis Stiftung, Switzerland. We thank the referees for their suggestions which greatly improved the paper.

References [1] S. Agarwal et al. On the computation of multidimensional aggregates. In Proc. VLDB’96, pages 506–521, 1996. [2] R. Agrawal and R. Srikant. Mining sequential patterns. In Proc. of the 11th Int’l Conference on Data Engineering, pages 487–499, 1995. [3] Commonwealth Department of Health and Family Services. Medicare Benefits Schedule Book. Australian Government Publishing Service, 1997. [4] U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth. Advances in Knowledge Discovery and Data Mining, chapter From Data Mining to Knowledge Discovery:An Overview, pages 1–36. AAI Press, Menlo Park,CA., 1996. [5] J. Hartigan and M. Wong. A K-means clustering algorithm. Applied Statistics, 28:100–108, 1979. [6] G. John and B. Lent. Sipping from the data firehose. In Third Int. Conf. on Knowledge Discovery and Data Mining, pages 199–202. AAAI Press, Menlo Park,CA., 1997. [7] E. Keogh and P. Smyth. A probabilistic approach to fast pattern matching in time series databases. In Third Int. Conf. on Knowledge Discovery and Data Mining, pages 24–30. AAAI Press, Menlo Park,CA., 1997. [8] H. Liu and H. Motoda. Feature selection for knowledge discovery and data mining. Kluwer Academic Publishers, Boston, 1988. [9] A. Moore et al. Cached sufficient statistics for automated mining and discovery from massive data sources. Technical report, Robotics Institute and School of Computer Science, Carnegie Mellon University, 1999. [10] J. Ousterhout. Tcl and the Tk toolkit. Addison Wesley Longman, 1994. [11] S. Sarawagi, S. Thomas, and R. Agrawal. Integrating association rule mining with relational database systems: Alternatives and implications. Data Mining and Knowledge Discovery, 4(2/3):89–125, 2000. [12] L. Wall, T. Christiansen, and R. Schwartz. Programming Perl. O’Reilly and Associates, 1996. [13] K. Wheelwright. Controlling pathology expenditure under Medicare- a failure of regulation? Federal Law Review, 22(1), 1995.

0-7695-0981-9/01 $10.00 (c) 2001 IEEE 1

8

Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.