A Linguistic Fuzzy-XCS classifier system

Share Embed


Descrição do Produto

A Linguistic Fuzzy-XCS classifier system Javier G. Marín-Blázquez and Gregorio Martínez Pérez and Manuel Gil Pérez

Abstract—Data-driven construction of fuzzy systems has followed two different approaches. One approach is termed precise (or approximative) fuzzy modelling, that aims at numerical approximation of functions by rules, but that pay little attention to the interpretability of the resulting rule base. On the other side is linguistic (or descriptive) fuzzy modelling, that aims at automatic rule extraction but that uses fixed human generated and linguistically labelled fuzzy sets. This work follows the linguistic fuzzy modelling approach. It uses an eXtended Classifier System (XCS) as mechanism to extract linguistic fuzzy rules. XCS is one of the most successful accuracy-based learning classifier systems. It provides several mechanisms for rule generalization and also allows for online training if necessary. It can be used in sequential and non-sequential tasks. Although originally applied in discrete domains it has been extended to continuous and fuzzy environments. The proposed Linguistic Fuzzy XCS has been applied to several well-known classification problems and the results compared with both, precise and linguistic fuzzy models.

I. I NTRODUCTION Computing with words is a fundamental contribution of fuzzy logic [21]. The potential to attach a linguistic label to a fuzzy set allows for obtaining a linguistic interpretation of fuzzy systems. However, in order to obtain such transparency, the fuzzy definitions must be human generated and must remain unmodified in any potential learning or optimization algorithm used to discover or improve such fuzzy systems. Algorithms that keep the original definitions of the sets are called linguistic (or descriptive) fuzzy models. This work is concerned with the latter linguistic approach. However, it must be noted that many algorithms follow another approach, termed precise (or approximative) modelling. These algorithms allow the fuzzy sets to be modified in order to best fit the data, but in doing so they may destroy the linguistic interpretation. Knowledge acquisition from human experts is a slow, expensive, and difficult task. However, as data storage costs have dropped, there is more and more historical data available about the behaviour of many systems. This fact has led to a great interest in developing algorithms to automatically generate models of these systems, based on such historical data. The fuzzy community has been very active in recent years in this data mining and there is a lot of research in to the automatic generation of fuzzy models. Classification problems are one of the typical tasks in this field. Many of the machine learning algorithms used in classification are offline methods. The learning procedure is performed in a batch style. Once the learning phase is over, Departamento de la Ingeniería de la Información y las Comunicaciones (DIIC). Facultad de Informática, Campus de Espinardo. Universidad de Murcia, 30.071 Murcia, Spain. [email protected],[email protected],[email protected]

the system is fixed, and no further changes are allowed. But there exists some other algorithms, in particular reinforcement learning (RL) based ones, that are designed to keep the learning procedure as long as desired. Online algorithms are very desirable when the behaviour of the system to be modelled is likely to gradually change over time. In this work, one of those online algorithms, an extended variant of classifier systems, called XCS, is used to learn a fuzzy linguistic classifier. The algorithm is modified to allow for fuzzy conditions and to use linguistic hedges. Linguistic hedges allow for more flexibility as the original fuzzy definitions are not allowed to change. The rest of the paper is organized as follows: Section II explains the linguistic and precise fuzzy modelling approaches. Section III, classifiers systems in general, and XCS in particular, are introduced. Section IV details the experiments performed. Then, the results are interpreted in section V. Finally, in section VI conclusions are drawn and future work is outlined. II. L INGUISTIC AND PRECISE FUZZY MODELLING In early fuzzy systems the fuzzy rule bases were expected to be generated by human experts. The variable domains were partitioned with a fuzzy partition. Experts provided these definitions of the fuzzy sets, later to be used in the rules. Originally, the fuzzy rules could use the fuzzy sets defined in such partitions only. These partitions usually had certain properties (such as distinguishability or completeness) that made these early systems genuinely transparent [18]. Such approach led to fuzzy grid partitions, widely used in controllers. Experts were a bottleneck in knowledge acquisition, but the data available about different systems was growing considerably. It was natural that machine learning algorithms were applied to automatically optimize (and also to create) these systems. Many of these algorithms were based, among others, on supervised learning techniques. Some early systems (mostly controllers) used fuzzy grids, which were fully defined. Unfortunately, in systems with high dimensionality, the total number of rules makes it impractical. Most systems are sparsely populated, with most examples located in certain areas, and the majority of the input space being empty. With fuzzy grid partitions the optimization work is mostly to obtain a (hopefully small) subset of rules among the Cartesian product of all fuzzy sets of different variables. The most basic algorithms are based on enumeration of all potential rules and selecting those that fulfil some (error) criteria. Other techniques were also applied, in particular evolutionary ones. But with these optimizations the problem

of uncovered examples arose, although the use of fuzzy sets with infinite tails (as sigmoids) allowed for complete coverage, even if the membership values may be negligible in certain areas. It was soon evident that fuzzy grids imposed a severe limitation on the granularity of the input space and some relaxation started to appear on some optimization algorithms. On one side the fuzzy set definitions started to be fine tuned, allowing changes to best fit the data (although doing this usually negated the linguistic interpretation of the original set). Another relaxation was to start using fuzzy sets individually defined for its use in a particular rule [7]. This lead to scattered fuzzy partitions. Soon the objective was to get more and more precise systems but neglecting the interpretability. All these changes led to what is termed precise (or approximative) fuzzy modelling. Standing by the original spirit is linguistic or descriptive fuzzy modelling. Such approach is mostly concerned with the development of fuzzy systems where the fuzzy set definitions are human generated and fixed as in early systems. The number of different concepts that a human seems to be able to handle is around 7±2 [12]. Therefore, the cardinality of each domain partition should be around that range. The problem of granularity, however, still haunts the descriptive approach. One way to keep the set definitions untouched for allowing more precision is the use of linguistic hedges. Hedges were introduced very early by Zadeh. A linguistic hedge changes the original linguistic interpretation of a given set and effectively transforms it, but in a predefined and interpretable way. Another alternative is to perform microtuning [15] on the original definitions. Microtuning is a controlled optimization of fuzzy set definitions that enforces a (very) high similarity between the original and the optimized fuzzy set definitions. Nevertheless microtuning is less advisable as it does change the original sets. However, this tiny change of meaning may be accepted by some users. For simplicity, and ease to use, linear piece-wise fuzzy sets have been used extensively. This work also uses such fuzzy sets, in particular trapezoidal and shouldered sets. The traditional hedges definition does not produce substantial changes in these types of fuzzy sets. A new definition of hedges was proposed in [11] which produced better results with linear piece-wise sets. These proposed definitions are used here. III. XCS L EARNING C LASSIFIER S YSTEM A. Introduction to Learning Classifiers Systems Michigan style classifier systems [6] are based on a single rule set in the form condition:action. The learning mechanism works by adjusting certain values associated with rules based on the feedback given by the system, as well as discovering new and better rules. Discovery of new rules is obtained by mating or by mutating old rules. Rule conditions usually include “don’t care” for many variables allowing for generality. Likewise, they usually accept partially defined

descriptors, an advantage where some values of certain input variables may be lost. Classifier systems are designed to be used in sequential tasks, also known as multi-step environments. These are problems that, in order to be solved, several intermediate steps must be executed (some of them with their own rewards), such as robot navigation or mazes. Usually there is a delay between the action and the reward. But classifier systems can also be used in non-sequential tasks (also known as single step in XCS terminology) with immediate reward. Supervised classification can be found among these single step tasks. In early learning classifier systems, rules occasionally made an action that earned external reward. This reward would contribute to the fitness of both, the current rules, as well as the rules that had fired in the past (the whole chain up to the current ones). Rewards were awarded using algorithms such as “Bucket Brigade” (effectively a trickle-down economy) or “profit-sharing plan” (essentially a communal reward-sharing). Besides, in such early systems, the fitness of a rule was not only a measure of the reward it might earn (used to select the rules to fire) but also a measure of the reward it had earned so far (and used to select the rules for breeding). That is, the same measure was used in the rule firing mechanism as well as to determine its genetic value. This caused several problems, in particular with rules which fired very rarely but were crucial when they did fire. Such rules would tend to be squeezed out of the population by the evolutionary competition long before they could demonstrate their true value. B. The eXtended Classifier System (XCS) XCS [19] was able to alleviate many of these problems. It uses reinforcement learning (RL) expressions to distribute the rewards. Likewise XCS classifiers have a fitness for the Genetic Algorithm (GA) based on the precision of the reward estimation, and not in which reward itself (that is still used to choose the rules to fire). In this way it solved the problem of accurate, but rarely fired, rules. XCS is designed to find complete maps, in the sense that they fully cover the input space, and do so for each possible action as well. It also features mechanisms to obtain the most general rules possible. In particular it includes a niche genetic algorithm specially designed to obtain general rules. With the same goal it uses a rule subsumption method that also promotes generality in the rules. The GA is not activated periodically, or in a random way, but following a special pattern. This pattern promotes the removal of rules in such a way that the rules are distributed equally among the different existing niches. It allows for more rules to cover the areas of the input space more difficult to discriminate, that is, where they are more necessary. The rationale behind XCS is to obtain a system that has precision in the prediction, that evolves through a niche GA, and that produces a precise and complete map X ×A −→ P . X is the input space, A the action (or classification if the task is such), and P the prediction space about the rewards.

Note that the map extends the normal dimensionality of the input by combining it with the possible actions to obtain predictions. This departs from the habitual map X −→ A and from the normal learning classifier systems. Although this may add some extra complexity to the problem, it is closer to the philosophy of the reinforcement learning, that tries to estimate the consequences of all and each possible action in any given situation. Many of the features that XCS provides are interesting for the future research of the authors of this work. Pressure towards generality may provide a kind of feature selection. By generality it is meant the preference for rules that contain many “don’t care” in the condition part of the classifiers. It is an extra advantage, likewise, that the learning mechanism allows both to apply it in non-sequential, as well as in sequential tasks. The future research of the authors aims to deal with sequential environments. A complete exhaustive explanation of XCS and its features can be found in [9]. An algorithmic description is available in [3]. A brief account of its working procedure would be: whenever a new example comes up it is matched with the condition patterns of the classifiers. The activated classifiers constitute the match set and are grouped by the action advocated for each classifier. Each of these subsets predicts a value for the reward by combining the prediction of each classifier weighted by each classifier precision (see equation 1). Such reward predictions, one for each action, constitute the prediction matrix. In exploitation mode, the system would choose the greatest prediction (when maximizing is pretended). In exploration mode an action is chosen randomly. The subset of classifiers that advocated for the chosen action forms the action set. After the action is executed (or the class decided) a reward is obtained, and such a reward is used to update the values of prediction, fitness and error of the classifiers of the action set. It is interesting to note that XCS has an optimization called numerosity that allows several copies of the same classifier to be in the population, with the same values for prediction and fitness. When these “macro-classifiers” activate their effects are weighted for such numerosity. As previously mentioned, XCS has two specially designed mechanisms to promote generality (and a third as mutation rate to create “don’t care”). One is the GA Subsumption and the other is the Action Set Subsumption. The first is activated when a new offspring is created by the GA. If the parent has a minimum experience, it has a high precision, and completely covers (is more general than) the child, then the offspring is discarded but the numerosity of the parent is increased. The second mechanism is activated each time the action set is evaluated. The most general classifier of the action set is located (as far is experienced and has a high precision). Then, each classifier in the action set that is covered by this most general classifier is deleted, increasing by one the numerosity of the most general for each deleted classifier. Note that this second mechanism is far more frequent (once each exploring cycle) and put a higher genetic pressure than the first one.

C. Extending XCS to a linguistic Fuzzy Environment Although early learning classifier systems were used in binary environments and the alphabet used was ternary (with 0, 1 and “don’t care”) there are several extensions to real [20], [16], or fuzzy [1] environments, etc. In [10] the XCS was completed to adapt to several types of entries, including qualitative and several types of quantitative (discrete and continuous). This work extends it to use linguistic (descriptive) fuzzy sets. The representation for linguistic fuzzy variables allows for a variable number of hedges. However, in order to keep interpretability, only two hedges are allowed in the experiments. In fact, more than two hedges per set severely reduce transparency. Fuzzy variables also include a “don’t care” value. XCS uses three operators: a genetic crossover, a genetic mutation and a cover operator to include examples that are not covered by any classifier. Such cover operator generates new classifiers (several, one for each possible action) that include such uncovered examples. Note that the three operators are designed in such a way that the offspring would always match the last state (example) of the problem. The crossover operator is a standard two-point one. If mutation is performed (the 4% standard XCS value, more about parameters in section IV-A), it generates “don’t cares” in 20% of the cases. Otherwise it changes either the hedge by adding (40%) removing (40%) or simply changing the base set (20%). The offspring, however, still has to match the last state. The cover operator generates rules with conditions with a 50% of “don’t care” (standard value for such parameter in XCS). The rest of conditions are randomly chosen with half of the conditions being sets with no hedges. Classification is a crisp decision problem. Evidence is accumulated for each potential class and a final decision is made based on such evidence. Normal XCS procedure weights the matched classifiers by its fitness and classifier prediction to obtain a final reward prediction for each class. With fuzzy classifiers a partial (fuzzy) match is now possible for some classifiers. Therefore the contribution of partially matched classifiers should be equally weighted by the degree of matching of the fuzzy classifier. The original expression to obtain the prediction in XCS is shown in equation 1, being P (ai ) the predicted reward if action ai is taken (or class ai is chosen), Fc is the fitness of classifier c that has to be in the match set for that class c²[M ]ai , and pc is the prediction of that classifier. The expression for the linguistic Fuzzy XCS is shown in equation 2. Here Sc stands for the firing strength of the condition. Such firing strength is calculated as the T-Norm (minimum in the experiments) of the membership values of each variable of the example with the hedged fuzzy sets of the classifier. This is shown in equation 3, where µcHi (xi ) is the membership value of i − th component of the example x, to the i − th hedged fuzzy set Hi of the classifier c.

P c²[M ]ai

P

P (ai ) =

c²[M ]ai

P P (ai ) =

Fc · pc

c²[M ]ai

P

(1)

Fc

Fc · pc · Sc

c²[M ]ai

Fc · Sc

Sc = T − N ormi (µcHi (xi )

(2) (3)

The updating expressions for fitness and prediction are similarly modified using again the firing strength of the classifier as an extra weighting factor. Unless stated otherwise, and except the mentioned changes, the XCS uses a procedure as described in [3]. Note that, during the training phase, a 50/50 scheme of exploration/exploitation is used. Of course, when the testing phase is reached the cover mechanism is disabled. Therefore the system may produce “unknown” classification. This must be highlighted as some work in the area does not disable covering and the potential unknowns are, effectively, classified in an almost random way. IV. E XPERIMENTAL E VALUATION To demonstrate the behaviour of the proposed linguistic fuzzy approach, some traditional benchmark classification problems, extensively used in literature, are used here [17], including the Wisconsin Breast Cancer, Pima Indian Diabetes, New Thyroid, Wine, and Iris databases. Table I shows some basic statistics about class distributions. TABLE I C LASSIFICATION PROBLEMS Name Breast Cancer Diabetes Iris New Thyroid Wine

# Inputs 9 8 4 5 13

# Classes 2 2 3 3 3

# Samples 683 768 150 215 178

To compare this linguistic approach with other literature results is difficult, as most results are based on precise fuzzy modelling. The results will be compared with a previous work on linguistic fuzzy models [11] (a Pittsburgh style algorithm) and with some well-known machine learning algorithms. Given the small size of the problems tested an exhaustive search linguistic algorithm will also be used. As in [11] the databases have been divided into 2 groups. One group, with 75% of the examples, will be used for training. In order to measure the generalization capabilities, another group, with 25% of the examples, will be used for testing. All of the used problems have quantitative variables and are extensively used in literature. An interesting aspect to examine is the effect that the use of general classifiers has on the results. XCS has two specific mechanisms to promote generality (see section III-B). The experiments will show the effect of deactivating them or use both.

The experiments will also show the effect of the size of population allowed. There are researches about such effect [2] suggesting that the values must be proportional to 1/pniche . Here pniche is the probability of an example of a niche that is intended to be preserved to be selected as training example. It is difficult to predict, a priori, how many niches there are, or the size they have (in order to estimate pniche ) so the values used (100, 200) are loosely based on the size of the training set. One of the future optimizations would be to develop an auto-adjusting method for selecting the size of the population. To obtain the results each described experiment is executed 40 times. Each run performs 5000 cycles. Half of them being exploring and half of them been exploiting. To be simple and fair, for each problem considered the fuzzification was carried out proportionally with respect to the size of the universe of discourse of the individual variables. The distance between its maximum and minimum value within the data set is divided such that all fuzzy sets approximately cover an equal range of the underlying real values, with soft boundaries of course. This follows the setup described in [11] for fair comparison although the sets obtained seem to differ slightly. Likewise, the partitions cardinality is 3, and maximum number of hedges is 2 as used in that work. Note that the small cardinality of the partitions makes the problem harder. There is an additional difficulty in comparing the approaches. XCS generates complete and accurate maps. This means that it is very frequent to find many repeated rules with same antecedent but different consequent. One of such rules usually has a high reward (the correct classification), the rest with a low reward (meaning that such consequent is an incorrect classification). All of such rules are accurate and therefore all of them would be included in the set of rules. Note that most rule systems only contain the rules that advocate for a particular classification, and not the one advocating for a not-this-class. Therefore the number of rules in XCS tends to be higher (usually by a factor around to the number of actions). Likewise, Michigan style systems tend to include in the population many low fitness classifiers that are young potential candidates still being evaluated. Michigan approach combines in the same population old, proven, accurate rules with the new potential good rules. This means that the number of rules is, again, much higher in the final system than Pittsburgh approaches1 . In order to compare in fairer terms a post processing of removal low reward and low fitness rules is also presented. This slightly degrades the system (that is designed to include, and need, such rules) but produces a comparable figure regarding the number of rules. This simple rule removal procedure would eliminate rules that suggest a reward smaller than a 10% of the maximum reward (in this case such threshold would be 100). This usually eliminates the rules advocating for not-this-class. Likewise, rules that are below 10% of the maximum fitness allowed (in this case such a threshold is 1 In Pittsburgh style there are several chromosomes (complete sets of rules) processed in parallel where all rules are evaluated at the same time.

0.1) are removed taking away the new unproven rules and the bad ones. These values seem to give a nice estimate of the corresponding number of rules of a Pittsburgh approach. Note as well that the procedure described in [11] has a direct mechanism to promote compact sets of rules and that XCS does not. A. XCS parameters XCS has a moderate number of parameters that can change the behaviour substantially. Most of them have been kept to suggested values in literature. As suggested in [13], the parameter θsub (experience of a classifier to be considered a subsumer) is set to 200 to avoid excessive generalization pressure. In [9] there is an extensive description of them. A few extra parameters have been added to deal with the linguistic fuzzy extension. In order to be self-contained the values used in this work are as follows: N = {100, 200} α = 0.1 β = 0.2 δ = 0.1 ν=5 θGA = 25 θdel = 20 θsub = 200 ²0 = 10 Pχ = 0.8 Pµ = 0.04 P# = 0.5 P#M ut = 0.2 Finally, the reward offered to the classifiers when they have chosen the correct class is of 1000. If it was an incorrect class the reward value would be 0. These values are typical of the XCS literature for classification problems. V. R ESULTS Tables II to VI show the results of the experiments for each of the classification problems described above. Each table shows the training (trn) and testing (tst) classification error. It also shows the number of macro-classifiers2 (R) and the average number of conditions that the macro-classifiers have (Rsz). The different methods are as follows: C 4.5 is the result of the well-known algorithm described in [14]. ANF stands for the popular ANFIS algorithm described in [8]. Exh stands for an exhaustive algorithm (with no hedges). FEGA stands for Functional Equivalence guided GA, an algorithm to transform precise models into linguistic ones as described in [11]. The same work includes a Classification Error guided GA (CLGA). LF-XCS is the proposed algorithm. P# stands for the two population sizes used, namely 100 and 200. An NS means that no subsumption mechanism was used in the LF-XCS and S indicates that both were used. So, for example, P100-NS stands for run of LF-XCS with a population size of 100, and with no subsumption used. Table II shows the results on Breast Cancer problem. Note that there is no significant difference in classification results if the subsumption mechanism is used or not. However the number of classifiers is substantially reduced when subsumption is applied. It also produces more general macroclassifiers. This happens in all problems with the standard results, and is kept in most after rule selection. Rule selection results for this problem are also very similar to FEGA although is not a general behaviour. FEGA is designed to provide compact rules while the rule selection in 2 The terms rule and macro-classifier are used as synonymous although a macro-classifier is the same rule repeated several times.

LF-XCS is only a crude approximation. Anyway it suggests that LF-XCS is performing nicely while keeping the potential of online learning. But it also shows that when compact sets of linguistic rules are a priority FEGA is probably the best option. CLGA tends to have quite worse results. Its random initialization made it quite unstable and some runs can be quite poor. LF-XCS also starts from random start but is much more stable. FEGA, on the other hand is both initialized and guided heuristically and is very stable, although it has the inconvenience of requiring a pre-trained precise model to do so. The exhaustive method is very poor, but it has to be taken into account that it uses no hedges. TABLE II R ESULTS FOR B REAST C ANCER P ROBLEM Method C4.5/ANF Exh/FEGA CLGA LF-XCS P100-NS P100-S P200-NS P200-S

Trn 1.2 10.3 10.4 3.7 3.1 2.4 2.2

Tst #R 7.6 29 15.2 74 14.2 9.1 Standard 5.3 52.6 5.0 43.0 4.7 83.0 4.6 67.5

Rsz 9 2.9

Trn 0.4 2.5

3.7 3.3 3.9 3.5

5.7 4.8 3.3 3.1

Tst 4.1 6.6

#R 18 9

Rsz 9 3.2

Rule selection 6.8 9.7 3.4 6.5 8.9 2.9 6.4 15.4 3.6 6.5 14.6 3.2

Table III shows the results on the Diabetes problem. For this problem the results are very similar to precise models, although possibly due to these models are overtrained. The rule selection mechanism effect on the testing is almost negligible. The number of rules of LF-XCS after selection is better than the precise approaches although still behind FEGA. This is the only problem in which FEGA rules are more general on average than the macro-classifiers provided by LF-XCS. TABLE III R ESULTS FOR D IABETES P ROBLEM Method C4.5/ANF Exh/FEGA CLGA LF-XCS P100-NS P100-S P200-NS P200-S

Trn 16.1 38.2 41.0 23.1 24.2 22.7 22.0

Tst R 29.2 35 39.6 65 41.5 11.5 Standard 26.1 51.0 26.5 48.1 25.0 96.0 24.4 99.4

Rsz 8 2.5

Trn 15.7 28.5

4.0 3.8 4.5 4.6

24.7 25.2 24.2 23.3

Tst 26.6 25.7

R 28 9.8

Rule selection 26.3 15.0 27.0 13.5 25.2 22.0 26.3 24.1

Rsz 8 1.9 3.6 3.5 4.2 4.4

The classic Iris dataset problem is presented in table IV. This problem is not very informative as it is almost equally solved by most methods and the variability is very small. Most individual runs are able to find the group of 4 and 5 rules that almost solve the problem. Table V shows the new Thyroid problem. It seems that the particular partitioning of the training and testing set left the effect of testing results being better than the training ones. Other training and testing schemes, such as crossvalidation [5], avoid these effects. But in order to compare with literature results the same classic 75/25 method was used. Finally, table VI shows results on the Wine problem. This problem has the worst results when rule selection is performed. A deeper look at this effect (eliminating either

TABLE IV R ESULTS FOR I RIS P ROBLEM Method C4.5/ANF Exh/FEGA CLGA LF-XCS P100-NS P100-S P200-NS P200-S

Trn 1.8 4.5 3.7 3.2 3.0 2.9 2.1

Tst R 2.7 5 10.5 11 6.2 5.1 Standard 2.4 25.5 2.7 19.8 3.5 56.8 2.7 35.2

Rsz 4 2.4

Trn 0.8 3.6

1.5 1.4 2.2 1.7

4.7 3.8 3.1 2.5

Tst 2.7 3.5

R 4 4.3

Rule selection 3.7 5.0 4.8 5.6 5.9 7.3 3.7 8.1

Rsz 4 2.5 2.1 1.8 2.2 2.0

TABLE V R ESULTS FOR N EW T HYROID P ROBLEM Method C4.5/ANF Exh/FEGA CLGA LF-XCS P100-NS P100-S P200-NS P200-S

Trn 1.9 82.6 45.9 7.2 7.0 5.4 5.0

Tst R 7.4 13 83.3 4 46.9 6.1 Standard 4.0 38.7 3.8 27.0 2.3 62.5 1.4 50.7

Rsz 5 3.9

Trn 3.1 6.2

2.9 2.4 3.0 2.8

9.5 8.3 5.4 4.6

Tst 1.8 9.3

R 4 6.3

Rsz 5 3.3

Rule selection 6.3 6.0 2.6 5.6 6.9 2.8 2.3 10.7 3.1 2.7 10.8 2.9

rules with low fitness or rules with low prediction only) was performed. It showed that it was the removal of rules of low prediction (the ones that advocate not choosing a class) have the deeper impact. It seems that there are more collisions among rules in this problem. Low prediction rules provide negative evidence for particular classes in its area by reducing the prediction. This allows pushing the soft boundaries away from a wrong classification. It allows for exceptions in general rules also (although only if the cases are a few, or otherwise the general rule will have low fitness). TABLE VI R ESULTS FOR W INE P ROBLEM Method C4.5/ANF Exh/FEGA CLGA LF-XCS P100-NS P100-S P200-NS P200-S

Trn 0.8 39.8 36.4 10.0 7.1 4.0 3.1

Tst #R 2.2 17 44.4 21 39.7 10.6 Standard 10.9 65.6 9.7 55.1 7.6 106.1 7.9 79.7

Rsz 13 2.6

Trn 0 6.6

5.6 5.4 5.9 5.7

18.7 15.4 9.4 9.8

Tst 2.2 9.1

#R 6 7.3

Rsz 13 2.7

Rule selection 19.5 9.4 16.5 8.5 15.1 8.9 12.5 8.7

4.9 5.0 5.4 5.6

VI. C ONCLUSIONS AND FUTURE WORK This work introduces a modified XCS to generate linguistic fuzzy classifiers. The proposed approach has been tested with some classical classification problems. The learning algorithm has been applied using the standard configuration. XCS still offers many potential improvements. Some suggestions have appeared lately in the literature to self-adapt some parameters while training to improve results. While comparison with other linguistic techniques shows similar results it must be taken into account that the mapping of classifiers is complete. This means that more rules would appear in the final model. The presence of these extra rules allows for a better online updating of the system. Nevertheless a simple rule reduction mechanism (as the one shown) can pinpoint the most important rules, if needed.

There are several lines of research ongoing from this work. One is to apply some of the aforementioned mechanisms for self-adaptation of the parameters. Some dynamical adjustments have already being suggested in [13]. The experiments in this work already suggest that other parameters, as the maximum number of classifiers, or the pressure towards generality, can have a substantial effect in the results. Taking control of them is an interesting challenge for future research. Another line of work is to modify the inference of XCS itself to deeper fuzzy principles in the lines suggested in [4]. Currently ongoing work is applying the proposed algorithm to huge databases. It is the aim of the researchers to use this linguistic XCS as the base of a detection module of an Intrusion Detection System, based on abuse detection. A non-fuzzy version was already presented in [10] with very promising results. R EFERENCES [1] Andrea Bonarini. An introduction to learning fuzzy classifier systems. In IWLCS, volume 1813. Springer, 1999. [2] Martin V. Butz, David E. Goldberg, Pier Luca Lanzi, and Kumara Sastry. Bounding the population size to ensure niche support in XCS. Technical Report 2004033, IlliGAl, July 2004. [3] Martin V. Butz and Stewart W. Wilson. An algorithmic description of XCS. Journal of Soft Computing, 6(3–4):144–153, 2002. [4] Jorge Casillas, Brian Carse, and Larry Bull. Fuzzy-xcs: An accuracybased fuzzy classifier system. In XII ESTYLF, pages 369–376, Jaen, Spain, September 2004. [5] P. R. Cohen. Empirical Methods for Artificial Intelligence. MII Press, Cambridge, 1995. [6] David E. Goldberg. Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, Reading, MA., 1989. [7] A. F. Gómez Skarmeta and F. Jimenez. Fuzzy modeling with hybrid systems. Fuzzy Sets and Systems, 104(2):199–208, 1999. [8] J.-S. R. Jang, C.-T. Sun, and E. Mizutani. Neuro-Fuzzy and Soft Computing. Matlab Curriculum. Prentice Hall, 1997. [9] Tim Kovacs. Strength or Accuracy: Credit Assignment in Learning Classifier Systems. Distinguished Dissertations. Springer, 2004. [10] Javier G. Marín-Blázquez, Gregorio Martínez Pérez, and Manuel Gil Pérez. Gestión de intrusiones mediante el sistema de clasificadores XCS. In (MAEB07), page In print, Tenerife, Spain, February 2007. [11] Javier G. Marín-Blázquez and Qiang Shen. From approximative to descriptive fuzzy classifiers. IEEE Transactions on Fuzzy Systems, 10:484–497, August 2002. [12] George A. Miller. The magical number seven, plus or minus two: Some limits on our capacity for processing information. The Psychological Review, 63:81–97, 1956. [13] Albert Orriols-Puig and Ester Bernadó-Mansilla. Bounding XCS’s parameters for unbalanced datasets. In GECCO 2006:, volume 2, pages 1561–1568, 2006. [14] J. R. Quinlan. C4.5: Programs for machine learning. Morgan Kaufmann, San Mateo, CA, 1993. [15] Qiang Shen and Javier G. Marín-Blázquez. Microtuning of membership functions: accuracy vs. interpretability. In FUZZIEEE 2002, volume 1, pages 168 – 173, Honolulu, December 2002. [16] Christopher Stone and Larry Bull. For real! XCS with continuousvalued inputs. Evolutionary Computation, 11(3):298–336, 2003. [17] Uci machine learning databases. http://www.ics.uci.edu/ mlearn/ /MLRepository.html. [18] J. Valente de Oliveira. Semantic constrains for membership function optimization. IEEE Transactions on Systems, Man and Cybernetics Part A: Systems and Humans, 29(1):128–138, Jan. 1999. [19] Stewart W. Wilson. Classifier Systems Based on Accuracy. Evolutionary Computation, 3(2):149–175, 1995. [20] Stewart W. Wilson. Get real! XCS with continuous-valued inputs. In IWLCS, volume 1813. Springer, 1999. [21] Lofti A. Zadeh. Fuzzy logic = computing with words. IEEE Transactions on Fuzzy Systems, 4(2):103–111, May 1996.

Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.