Knowledge Discovery from Data: Comarative Study

June 4, 2017 | Autor: Jabeen Sultana | Categoria: Machine Learning

Descrição do Produto

Transactions on Engineering and Sciences Vol. 2, Issue 6, June 2014

ISSN: 2347-1964 Online 2347-1875 Print

Knowledge Discovery from Data: Comarative Study M.A.H Farquad1 1 Associate

Jabeen Sultana2

G.Nagalaxmi3

Gudaas Savankumar4

Professor, S.R International Institute of Information Technology, Hyderabad. 2&3Dept of CSE, RGUKT-BASAR, Andhra Pradesh. 4 R&D Engineer, Works Applications, Singapore.

Abstract—Knowledge discovery of data is very much necessary in order to deliver a correct decision to the user, in decision making process. In this paper efficiency of knowledge learnt by SVM transparent approach is compared with opaque approach. We have selected DT, and NBTree as transparent approaches to evaluate the eclectic rule extraction approach of SVM. Where the training dataset available is modified according to the predictions of learned SVM. This modified data is expected to represent the knowledge learnt by SVM during training. Transparent approaches are then employed and the understanding of SVM is evaluated in the form of rules. The conclusion drawn after extensive experimentation is that improved comprehensibility is achieved. Index Terms— SVM, Rule induction techniques, DT, NBTree I.

INTRODUCTION

This Data Mining and Machine Learning Techniques extract knowledge from large databases. Despite having more accurate many of the machine learning algorithms tends to produce Black Box Models, which do not reveal the knowledge gained by them during training. SVM is not an exception and that is one of the major drawbacks of large acceptance of SVM. Rue extraction is the procedure to represent the knowledge learnt by Black Box techniques in the form of if-then rules. Real time applications such as medical diagnosis [20], Banking[6], Finance[7], Insurance[6,7], and ministry of Defense [17] need and prefer transparent approaches to understand the algorithms prediction. Transparent procedures Decision Tree [14,16] and NBTree are employed in this paper for evaluation. Decision Tree is considered to be the most simple and fast learning algorithm and is expect to give definite set of if-then-else rules. NBTree is considered to be hybrid decision Tree which employees Naive Bayes classification in accordance with DT. DT gives the class predictions at leaf nodes where as NBTree has the NB classifier residing at the leaf node which gives the possibility of the classes for the particular leaf. II.

OVERVIEW OF INTELLIGENT

There are three different rule extraction methods and are described below: A. SVM B. Decision Tree C. NBTree A. SVM Support vector machines are a set of related supervised learning methods used for classification and they belong to a family of generalized linear classifiers. In another terms, Support Vector Machine (SVM) is a classification and regression prediction tool that uses Machine Learning theory to maximize predictive accuracy while automatically avoiding over-fit to the data SVM performs multiclass classification. The way SVM works is to map vectors into an N-dimensional space and use an (N-1)-dimensional hyper plane as a decision plane to classify data. The task of SVM modeling is to find the optimal hyper plane that separates different class membership. SVM undergoes nonlinear mapping to map dataset into a high dimensional feature space and use linear descriptor to classify the data. B. Decisition Tree The template is used to format your paper and style the text. All margins, column widths, line spaces, and text fonts are prescribed; please do not alter them. You may note peculiarities. For example, the head margin in this template measures proportionately more than is customary. This measurement and others are deliberate, using specifications that anticipate your paper as one part of the entire proceedings, and not as an independent document. Please do not revise any of the current designations. C. NBTree Naïve Bayes Decision Tree (NBTREE) is almost similar to DT except at the leaves Naïve-Bayes classifiers exist instead of storing a single class label. It requires estimation of conditional probabilities for each attribute value at the given label. Classification at leaf nodes is done by NB classifiers. Compared to DT, in NBTREE

72

Techscripts

Transactions on Engineering and Sciences Vol. 2, Issue 6, June 2014

ISSN: 2347-1964 Online 2347-1875 Print

probabilities exist at each node and all the sum of the probabilities will be not more than unity. This is also available in WEKA, a popular data mining tool. III.

METHODS

Comparative analysis is carried out in this paper for two different eclectic rule extraction approaches are compared. First one, employs Decision Tree as Rule Extracting procedure where as in the second experiment NBTree is employed. The comparative approach is as follows; Step 1: Training data is used to train SVM (the best SVM is selected ) Step 2: Prediction of SV data is obtained and corresponding target values are modified according to the prediction of trained SVM. Step 3: Modified training data is obtained. Step 4: Modified training data is fed to DT/NBTree for learning. Step 5: Set of rules are extracted. Step 6: Test data is supplied to the rules for the predictions. Step 7: Prediction accuracy is obtained. The Proposed Opaque Approach to Knowledge Discovery has three important phases: 1. The first phase is to feed the original dataset to SVM and obtain Support Vectors and form a data set using support vectors. Training Data Prediction of SVM 1 3

2 Training Data

SVM

Modified Training Data (SVM) 4 DT/NBTree 5

6 Test Data

Rules 7 Prediction

Figure 1: Electic Rule extraction from SVM 2.

The second phase is to feed the dataset to rule extractors and suitable rules are obtained from decision systems such as DT, NBTREE, RSES and DENFIS. 3. The third phase is to evaluate rules and find the efficiency of rules using different rule quality criteria like accuracy, fidelity, comprehensibility, specificity, sensitivity, no of rules and conditions per rule. Thus rule evaluation follows the same procedure in the both the cases i.e., transparent case and in opaque case, we consider decompositional approach here, except that the parameter, Fidelity, which will be meaningful in the opaque approach. A more comprehensive list of rule evaluation parameters is given below. 1. Accuracy 2. Number of rules 3. Antecedents per rule 4. Comprehensibility 5. Fidelity 6. Sensitivity 7. Specificity The parameters for rule evaluation are explained in detail here: • Accuracy of a classifier on a given test set is the percentage of test instances that are correctly classified by a classifier. Numbers of rules are nothing but the total number of leaf nodes obtained in a tree or all the leaves representing rules. No of rules = no of leaf nodes • Antecedents (number of conditions) per rule represent the number of conditions obtained in each rule. It is denoted as a pair, indicating the number of conditions and the number of rules. Antecedents per rule = no of conditions / no of rules • Comprehensibility of a rule set is determined by measuring the size of the rule set (in terms of number of rules) and the number of antecedents per rule (the average number of antecedents per

73

Techscripts

Transactions on Engineering and Sciences Vol. 2, Issue 6, June 2014

• • •

ISSN: 2347-1964 Online 2347-1875 Print

rule). The sum of number of rules and the antecedents per rule is equal to Comprehensibility. The lower the number, the better is the comprehensibility. Comprehensibility = Total No of rules + No of Antecedents per rule A rule set is considered to display a high level of fidelity if it can mimic the behavior of the machine learning technique from which rules were extracted. Sensitivity is calculated from Confusion Matrix obtained in the model. TP = (TP/ TP +FN) * 100 Specificity is calculated from confusion matrix. TN = (TN/TN + FP) * 100 IV.

RESULTS & DISCUSSIONS

A Rule extraction is done by two approaches, i.e., with and without using SVM. Rule extraction is done on three datasets, namely, IRIS, Wine and WBC using two different algorithms DT, NBTREE, and evaluated on different parameters. Comparison is done between these two approaches of knowledge discovery and best rule extraction approach and technique are identified. The classification data sets are taken from UCI machine learning benchmark database. The experiments are conducted on the three datasets namely Iris, Wine and WBC. Datasets namely Iris, Wine and WBC are taken as text files and loaded into excel sheet and converted into (.CSV) file format. ARFF format files, prepared using notepad and saved in (.arff) format (for using in WEKA to work with DT and NBTREE modules). On each dataset 10-fold cross validation is applied and DT, NBTREE, RSES and DENFIS are constructed, then rules are extracted. Iris has a total of 150 instances with four attributes and 3 class labels. It is a 3-class classification problem. Wine has 178 instances with 13 attributes and 3 class labels. It is a 3-class classification problem. WBC has 699 instances with 10 attributes and 2 classes. It is a 2-class classification problem. Firstly, in transparent approach data sets are directly given to classifiers like DT, NBTREE (weka) and rules are obtained. Whereas in opaque approach original data sets are fed to SVM implemented in Rapid Miner tool and then support vectors are obtained. Taking these SV’s and class labels, dataset is formed and fed to the classifiers and rules are obtained in the same fashion of transparent approach. The process of making input files for WEKA tool is same as described earlier in methods for both the approaches. The only difference here in opaque case is that additionally, support vectors as well as the corresponding class labels are also extracted. Transparent approach yielded good performance in terms of classification accuracy, sensitivity and specificity, over all the datasets. The DT and NBTREE gave superior performance when used in the transparent mode than when combined with a black-box model as in the opaque approach. SVM, being a superior classifier as a standalone technique, fails to yield good accuracy in the opaque approach. Also, the rough sets and fuzzy logic based rule extraction techniques did not perform well either as standalone methods or when combined with SVM. Table 1: Results of DT on Iris, Wine and WBC Iris SV-Iris Wine SV- Wine WBC 96% 80% 83.77% 81.36% 93.99% 100% 88.33% 95.59% 93.01% 93.47%

DT 10-fold Accuracy Sensitivity

SV-WBC 92.98% 84.61%

Specificity

100%

85.48%

95.68%

75%

95.5

58.8

Number of Rules Conditions per Rule Comprehensibility Fidelity

5 14/5 8 ---

7 25/7 11 91%

8 25/8 11 ---

18 113/18 24 88%

20 118/20 26 ---

4 9/4 6 98%

Table 2: Results of DT on Iris, Wine and WBC

NBTREE 10-fold Accuracy

Iris 83.7%

SV-Iris 81.36%

Wine 93.26%

SV-Wine 74.5%

Sensitivity

95.5%

93.01%

99.99%

90.69%

Specificity

95.6%

75%

99.15%

Number of Rules Conditions per Rule Comprehensibility Fidelity

8 25/8 11 ---

18 113/18 24 91%

4 9/4 2.25 ---

74

WBC 95.85% 95.7 RESULTS OF DT ON IRIS, WINE AND WBC

SV-WBC 94.76%

82.85%

96.6

96.59

3 6/3 2 88%

10 37/10 5 ---

2 4/2 4 98%

88.46

Techscripts

Transactions on Engineering and Sciences Vol. 2, Issue 6, June 2014

ISSN: 2347-1964 Online 2347-1875 Print

Thus transparent approach to knowledge discovery from data is recommended over opaque approach. Both transparent and opaque approaches show their own advantages in dealing with various rule extraction techniques. However, they are quite different. V.

CONCLUSIONS AND FUTURE DIRECTIONS

Transparent approach often yields high accuracy, sensitivity, specificity, and fidelity. Most of the cases the comprehensibility is also better. Opaque approach yields low accuracy and low comprehensibility. A very important topic in this area of “Knowledge discovery from data”, is to further investigate, How to extend rule extraction methods to the case of Neural Networks like using rules from Rough Sets and integrating to Neural Network for Classification data sets. Time series problems, Bankruptcy Prediction, Security problems arising in Data Mining can also be addressed with respect to rule extraction from SVM. The scope of future work for this project is improving rule extraction part and accuracy in opaque approach. A good selection of rules can increase the performance even greater, Filtering of rules can also be improved by selecting rules from each decision class instead of taking rules with greater support. Algorithms like DT, NBTree can be used for obtaining best possible rules in WEKA. REFERENCES [1] Andrew Kusiak. Decomposition in Data Mining: An Industrial Case Study in IEEE Transactions On Electronics Packaging Manufacturing, Vol. 23, No. 4, 87-97, 2000. [2] Confusion Matrix http://www.compumine.com/web/public/newsletter/20071/precision-recall . [3] David Martens, Bart Baesens, and Tony Van Gestel. Decompositional Rule Extraction from Support Vector Machines by Active Learning. IEEE Transactions On Knowledge And Data Engineering, Vol. 21, No. 2, 352-358, 2009. [4] H. Nunez, C. Angulo, and A. Catala. Rule Extraction from Support Vector Machines. In Proceedings of European Symposium On Artificial Neural Networks (ESANN ’02), pp. 107-112, 2002. [5] Jiawei Han, Micheline Kamber, Data Mining Concepts and Techniques. Morgan Kaufman Publishers, 2001. [6] M.A.H. Farquad, V. Ravi and S. Bapi Raju, "Rule Extraction using Support Vector Machine Based Hybrid Classifier", TENCON-2008, IEEE Region 10 Conference, Hyderabad, India, November 19-21, 2008. [7] M. A. H. Farquad. Rule Extraction from Support Vector Machine. PhD Dissertation, Department of Computer and Information Sciences, University of Hyderabad, Hyderabad, India, 2010. [8] M. W. Craven. Extracting Comprehensible Models from Trained Neural Networks, PhD thesis, Department of Computer Science, University of Wisconsin-Madison, USA, 1996. [9] Nahla Barakat and Joachim Diederich. Learning-based rule-extraction from support vector machines: Performance on benchmark datasets. In Proceedings of the conference on Neuro-Computing and Evolving Intelligence, Knowledge Engineering and Discovery Research Institute (KEDRI), Auckland, New Zealand, 13-15, 2004. [10] NeuCom http://www.kedri.aut.ac.nz/areas-of-expertise/data-mining-and-decision-supportsystems/neucom [11] Nikola Kasabov and Qun Song. DENFIS: Dynamic Evolving Neural-Fuzzy Inference System and its application for time series prediction. IEEE Transactions on Fuzzy Systems, Vol.10, No. 2, 144-154, 2002. [12] Payam Refaeilzadeh, Lei Tang, Huan Liu. Cross Validation. In Encyclopedia of Database Systems, 532538, Springer, U.S, 2009. [13] Rapid Miner http://rapid-i.com/content/view/26/84/lang,en/ [14] Ron Kohavi. Scaling Up the Accuracy of Naïve-Bayes Classifiers: a Decision Tree Hybrid. In Proceedings of KDD-96, Portland, USA, 202-207, 1996. [15] Rough Sets: http://logic.mimuw.edu.pl/~rses [16] R. Quinlan. Induction of decision trees. Machine Learning, vol. 1, 81-106, 1986. [17] Tom M. Mitchell, Machine Learning. McGraw-Hill Publishers, Maidenhead, U.K., International Student Edition, 1997. [18] UCI MachineLearningRepository, http://archive.ics.uci.edu/ml/ [19] WEKA http://www.cs.waikato.ac.nz/_ml/weka [20] Zhenyu Chen, Jianping Li, Liwei Wei, “A multiple kernel Support Vector Machine scheme for feature selection and rule extraction from gene expression data of cancer tissues”, Artificial Intelligence in Medicine, vol. 41, 161–175, 2007.

75

Techscripts

Lihat lebih banyak...

Knowledge Discovery from Data: Comarative Study

Descrição do Produto

Comentários