Somos uma comunidade de intercâmbio. Por favor, ajude-nos com a subida ** 1 ** um novo documento ou um que queremos baixar:

OU DOWNLOAD IMEDIATAMENTE

Graph Genetic Programming for Hybrid Neural Networks Design L. Ferariu and B. Burlacu “Gheorghe Asachi” Technical University of Iasi/ Department of Automatic Control and Applied Informatics, Iasi, Romania [email protected], [email protected]

Abstract— This paper presents a novel approach devoted to the design of feed forward hybrid neural models. Graph genetic programming techniques are used to provide a flexible construction of partially interconnected neural structures with heterogeneous layers built as combinations of local and global neurons. By exploiting the inner modularity and the parallelism of the neural architectures, the approach suggests the encryption of the potential mathematical models as directed acyclic graphs and defines a minimally sufficient set of functions which guarantees that any combination of primitives encodes a valid neural model. The exploration capabilities of the algorithm are heightened by means of customized crossovers and mutations, which act both at the structural and the parametric level of the encrypted individuals, for producing offspring compliant with the neural networks’ formalism. As the parameters of the models become the parameters of the primitive functions, the genetic operators are extended to manage the inner configuration of the functional nodes in the involved hierarchical individuals. The applicability of the proposed design algorithm is discussed on the identification of an industrial nonlinear plant.

I. INTRODUCTION Neural models are intensively used in nonlinear system identification, due to their capability of learning complex variables’ dependency embodied in finite representative data sets [1, 2]. Their increased robustness and computational efficiency resides from the inductive learning which is employed for topology and parameters’ adaptation, and also from valuable particularities of the neural architectures, namely high level of parallelism and strong interconnectivity provided between numerous, yet simple computational units [3]. Based on Kolmogorov decomposition theorem, various mapping neural topologies have been proved to behave universal approximation capabilities in terms of continuous, bounded, nonlinear functions (e.g. multilayer perceptron MLP, neural networks with radial basis functions RBF, etc.) [3]. Although these structural templates are computational equivalent and guarantee the existence of a neural model of any desired degree of accuracy, in practical situations they can lead to significant different performances, especially when high neural input dimension and/or large, noisy training data sets are involved. To ensure enhanced adaptability, this paper adopts the hybrid neural networks (HNN) formalism [4]. It combines various neural units with local and global responses. The aggregation permits a flexible

978-1-4244-7433-2/10/$26.00 ©2010 IEEE

design of compact neural models, leading, however, to increased difficulties in customizing the topology and the learning algorithm. The most common collaborative neuro-genetic systems employ genetic algorithms for neural topology selection and/or learning [4, 5]. Various direct and indirect representations based on fixed or variable-length chromosomes have been proposed. However, the hierarchical encodings implicitly accepted by genetic algorithms are limited to specific interpretations of several “control” genes which, in fact, encode the architecture. Shorter chromosomes can be achieved by working on tree-based or graph–based individuals. Although these encodings seem more “natural”, they require specific crossovers and mutations, able to preserve the validity of the generated offspring, when dealing with various types of hidden neurons and interconnections [5]. In neural identification, previous work based on graph-based evolutionary algorithms refers to a limited variety of structural templates, most approaches considering the homogeneous MLP. The proposed design of hybrid neural networks is based on genetic programming (GP) techniques. Some benefits result form the capacity of GP of concomitantly working on the structure and the parameters of the neural model, which allows a better control on the overall performances of the selected neural network [6, 7, 8]. Moreover, as GP is specialized in the evolvement of the hierarchical individuals, it should be able to ensure a more efficient exploration within the space of the candidate neural architectures, whilst preserving a natural interpretation of the encrypted individuals. At the other side, GP offers the inherent advantages of evolutionary approaches, so it can adjust a large number of neural parameters, while dealing with scarce aprioric information and/or nonlinearities, multimodalities, discontinuities of the objective functions. To ensure a better exploitation of neural architecture’s modularity and parallelism, this paper employs graph GP and introduces original enhancements which allow the modular generation of graph-encrypted individuals. The partially interconnected hybrid networks are generated by means of recursive combinations of primitives. In this context, the authors propose a special configuration of a minimally sufficient functions’ set, which is able to ensure the validity of the generated neural structures. Additionally, the graph GP is equipped with new compatible genetic operators, especially designed to preserve the consistency of the encoded neural architectures and, consequently, to enhance the algorithm exploration ability.

– 547 –

IEEE International Joint Conferences on Computational Cybernetics and Technical Informatics (ICCC-CONTI 2010) • May 27-29, 2010 • Timisora, Romania.

The paper is organized as follows. Section II presents the adopted hybrid neural architecture, whilst Section III discusses the suggested customizations aimed at making GP compliant with neural networks’ design. Section IV illustrates the performances of the proposed on the identification of an industrial system and final remarks are given in Section V. II. HYBRID NEURAL ARCHITECTURE The hybrid neural models accept a heterogeneous structure for each hidden layer [4]. This approach combines both simple and complex neurons with global and local responses, as basis for a flexible tuning of the neural architecture. Implementing the input operator by means of the dot product, the neurons with global response are able to extrapolate beyond the range of known training data, thus leading to increased robustness. On the contrary, the neurons with local response aggregate the inputs by using the Euclidian distance. Being activated in limited areas around specific input patterns, they can provide high computational efficiency for parameters’ calculation. Summing up, the combination of local and global neurons could offer higher approximation capabilities both in interpolation and extrapolation problems [3]. The suggested design procedure accepts any type of global and local neurons, interconnected by means of feed-forward links (Fig. 1). Several potential hidden neural units are described below. Let us denote the neuron’s input vector with z = [ zi ]i =1,.., no _ i and the neuron’s parameters with θ . The most common neural unit with global response is the standard perceptron (SP). When hyperbolic tangent activation function is used, it ensures the following nonlinear input-output mapping: no _ i

y SP = f SP (z, θ) = tanh( ∑ wi zi + b) ,

(1)

i =1

where θ = [ w1 ,.., wno _ i , b]'∈ ℜ no _ i +1 contains the weights associated to all input connections, denoted with [ wi ] i =1, no _ i , and the bias, b. A more complex global

−

y SG = f SG ( z, θ) = e

1

no _ i

2σ 2

i =1

2 ∑ ( ci − z i )

,

(3)

where θ = [c1 , ..., cno _ i , σ ]'∈ ℜ no _ i +1 includes all centers, [ci ]i =1,..no _ i , and the spread, σ . A possible extension is the local neuron with radial basis function and complex weights (RBC), which associates for each input connection a center, ci , yet also a complex weight,

we

−1 ⋅α i

:

y RBC = f RBC (z, θ) =

=e

2 2 no _ i no _ i − w 2 ∑ cos α i ⋅( z i − ci ) + ∑ sin α i ⋅( z i − ci ) i =1 i =1

(4).

leading to θ = [α1 ,..,α no _ i , c1 ,..., cno _ i , w]'∈ ℜ 2 no _ i +1 . Maximum activation is also obtained when all zi = ci , i = 1, no _ i , but the region where non-zero response is achieved can be more flexibly tuned, favoring better approximation capabilities than RBF [2]. This approach extends the previous work devoted to HNN design presented in [4], which was based on fixed length chromosomes and hierarchical genetic encoding with successive levels of control and parametric genes. Increased flexibility is provided by means of GP techniques, which is able to evolve individuals of various sizes and shapes. Moreover, the design procedure allows any number of hidden layers inside the HNNs. Each neuron accepts input stimulus provided by a subset of neurons of the previous layer, yet also by a subset of neural inputs. The input-output formalism, based on series – parallel identification schemes with external delay blocks is considered, meaning that the neural input vector contains lagged plant input and output measurements:

x(k ) = [u(k ),.., u(k − nu ), y (k − 1),.., y (k − n y )] ,

neuron is the one with functional links (FL), which increases the dimensionality of the input space by means of orthogonal functions, e. g. trigonometric ones:

Layer 1

(5)

Layer s

θ 1q

no _ i P

x1

y SFL = f SFL (z , θ) = tanh( ∑ ( ∑ wc (cos πjz i ) i =1

P

j =1

ij

(2)

+ ∑ w s (sin πjz i ) + wi zi + b)). j =1

ϕq . . .

θs

….

xi

ij

Here P indicates the predefined order of the functional expansion and θ ∈ ℜ1+ no _ i ⋅( 2 P +1) includes the weights for all original and expanded input links. Note that SFL-based networks can lead to a reduced number of neurons and, in some cases, to better overall performances than MLP [1]. The Gaussian neuron (SG) with real parameters is a local neuron which attains the maximum activation when the input values are, correspondingly, identical with the associated centers:

. . .

….

ϕ1

) yi

θ1r ϕr

xR

1 ≤ q < r ≤ n1

Figure 1. Hybrid neural architecture. For each neuron, ϕ denotes the input operator and Ө the parameters (real or complex weights, biases, centers and spreads, respectively). The input pattern x contains lagged plant variables.

– 548 –

L. Ferariu, B. Burlacu • Graph Genetic Programming for Hybrid Neural Networks Design

where u ∈ ℜ m and y ∈ ℜ n denote the inputs and the outputs of the system, k indicates the current sampling instant and nu, ny represent the maximum permitted input and output lags, respectively. For the sake of simplicity, all individuals encode single output models. Therefore, for MIMO systems identification problems, one has to design n different models, taking into account possible interconnections between the system outputs:

yˆ i (k ) = f i (x(k )) ,

(6)

where yˆ i denotes the approximation provided by ith model for the ith output of the system. The output neuron is forced to be a global one, in order to allow non-zero model output for a wide range of neural inputs, situation which is frequently met in system identification. III.

GENETIC PROGRAMMING LOOP

A. HNN Encoding GP builds the potential solutions by means of recursive combinations of predefined functions and terminals [9]. Despite classical GP which works with tree-encrypted individuals, this approach considers directed acyclic graphs (DAG). The main motivation lies in the restrictive connectivity allowed by the inner structure of the trees. Graph – based individuals can provide a direct encryption of any neural connections, by simply including corresponding links between the nodes, therefore they conduct to reduced size and simple interpretation. Note that each supplementary link means the reuse of all interconnected nodes from the lower layers, without additional memory consumption. Graph-GP assumes that the leaves of the graphs are terminals, namely elements of the T set, and the others nodes correspond to functions chosen from the O set. Therefore, the success of the algorithm is strongly related to a convenient initialization of O and T. To guarantee the validity of the hierarchical individuals, the primitives’ sets must satisfy the closure and the sufficiency properties [9]. The closure property ensures the data type and the data value consistency, as it states that any combination of functions and terminals is accepted within the individuals. At the other side, the sufficiency property assures that the optimal solution can be obtained by mixing some predefined primitives, so it demands to include in O and T all required elements. Obviously, the use of extraneous primitives leads to a larger searching space and to a harder exploration, therefore minimally sufficient sets are preferred [7]. However, the above mentioned requirements are limited to genotype validity. They guarantee the correctness of the individuals, yet not the validity of the encoded models, in terms of the employed structural template. As example, for HNN design, the set of functions O = {exp,+,−,*, cos, sin} is minimally sufficient, if the algorithm encodes 1 /(2σ 2 ) , instead of σ . Assuming a terminal set with all variable terminals specified in x, a supplementary real parameter-type terminal and some constant-type terminals (such as − 1 , π ), O and T satisfy also the closure property. Though, a

lot of correct individuals do not encode correct hybrid neural networks. In fact, the probability of obtaining a valid neural model by stochastic aggregation of primitives results very low and, consequently, the algorithm will behave unsatisfactory exploration capabilities. To ensure the phenotype validity, the authors recommend additional constraints. Namely, the set of functions is configured based on neural network’s granularity. As the neuron represents the atom at phenotypic level, it should be used as atom at genotypic level, too. This means, O is defined to include the inputoutput functions provided by all accepted neurons:

O = { f SP , f FL , f SG , f RBC ,...} .

(7)

In this case, a supplementary node in the DAG encodes a supplementary neuron in the HNN and a graph connection encrypts a neural link (Fig. 2). The proposed encoding guarantees the correct structure of each compound neural unit and exploits the inherent interconnectivity of DAG. The constant terminals are no more necessary and all the parameter terminals are already captured as generic parameters in the functional nodes of the individuals. The downside is the need of enhanced genetic operators able to work on the inner structure and parameters of the functional nodes. In compliance with the hybrid neural architecture (Fig. 1), x can be directly employed as terminal set. Note that the functions’ set (7) is minimally sufficient and data consistency is ensured for any combination of primitives. Moreover, x results sufficient if appropriate maximum input and output lags are used. nu and n y can be set by trial of error with reduced computational effort, knowing that the algorithm can cope with several alien lagged measurements. In fact, the GP-based selection of appropriate elements of x implements a concomitant variable reduction and external delays configuration. To avoid the production of overfitted models due to the undesired increasing of the complexity order of the graphs, two control parameters are used, namely the maximum depth of a graph ( nl ≥ 1 ) and the maximum number of input connections accepted for a functional node ( ni ≥ 1 ). Despite extended genetic algorithms based on graph encoding, GP offers enhanced capabilities in terms of reuse of valuable structural components of the fitted individuals by means of primitives’ set updating and encapsulation [10]. B. Initial Population Having no a priori information about the rough localization of the optimum points, the algorithm starts with a randomly generated initial population of DAGencrypted solutions distributed over the entire search space. Firstly, individuals of various shapes and depths are generated, by including, recursively, single input single output neurons of any type and terminal leaves. Afterwards, random extra-connections are drawn between the nodes of successive layers, for producing highly interconnected neural architectures, as illustrated in Fig. 2. Naturally, incoming connections are accepted for the

– 549 –

IEEE International Joint Conferences on Computational Cybernetics and Technical Informatics (ICCC-CONTI 2010) • May 27-29, 2010 • Timisora, Romania.

Figure 2. DAG-based encoding of HNN. The filled nodes encodes the functional nodes, the others denote the terminals. Initial connections between the nodes are shown with a solid line, while the created extra connections are indicated with dotted lines. Graph’s layers 1 - 3 contain neurons, yet also terminals which act as inputs for the neurons of the upper layer.

Figure 3. Structural crossover.

neuron-type nodes, only. Any initial individual can use a subset of terminals and any combination of global and local neurons. C. Genetic Operators As the suggested encoding treats the neurons as atomic structural blocks, the classic GP operators have to be extended in order to allow changes within the inner structure of the functional nodes [11]. Increased algorithm exploration capabilities are provided by means of enhanced crossover and mutations, which improve the preliminary proposal presented in [11]. The structural crossover interchanges two sub-graphs randomly selected from the parents. The offspring preserve all the output connections provided from the cutting node and all the internal links of the incoming sub-graph. If some of the swapped neurons are also connected to external nodes (that do not belong to the considered sub-graph), they have to be copied before leaving the individual, in order to keep unaltered the rest of the graph. An example is given in Fig. 3. The node 1 of parent A and the node 11 of parent B have been chosen as cutting points. After swapping, the offspring B will contain two supplementary layers and the output connections of the node 11 provided to the neurons 5 and 7 of the previous layer. In offspring A, the neuron 1 of parent A is replaced with a terminal node. The nodes belonging to the sub-graph selected for swapping can be deleted, except the nodes 4 and 9, which are also connected to nodes 2 and 3.

In the case of hierarchical encodings, the crossover causes two important problems. One is the bloat phenomenon manifested by the tendency of the producing larger and larger graphs, without significant fitness improvement [7]. Although the complexity order of the individuals is controlled by means of nl and ni , the bloat affects the performances of the algorithm, as it increases the chances of obtaining unvaluable overfitted offspring, with expected bad generalisation capabilities, which must be rejected from the population. The second drawback is related to competing conventions [5, 6]. The quality of the offspring could be much lower than the quality of the parents, as the crossover does not detect the structural blocks with different genotype, yet silimar functionality (e.g subgraphs resulted by permutations of their nodes). In this context, this approach considers four types of mutations, applied with high probabilities for modifying the neural topologies and the corresponding neural parameters. Being unary operators, mutations cannot produce competing conventions and permits a simpler control of the complexity order of the resulted offspring. Various mutations can act in sequel on the same individuals, to produce successive changes of their genetic material. The structural mutation replaces a randomly selected sub-graph with another one, stochastically generated by means of recursive combinations of available primitives. The node mutation changes the activation function of the randomly selected functional nodes or the significance of the terminal-type nodes, in accordance to the available alleles indicated in O and T, respectively. If a neuron – type node switches from a global unit to a local one, its neural parameters are randomly reinitialized, because the previous values have no more usability. Also, if the neuron remains global or local, but supplementary parameters are added (e.g. a SP turns to a FL or a GS becomes a RBC), their values are stochastically set. The link mutation introduces supplementary input connections for the functional nodes or cuts some of the existing ones. Only new feed forward connections are accepted, namely input connections for the neuron-type nodes coming from any node of the lower layers. Lastly, the parametric mutation produces small variations on the selected neural parameters, according to predefined parameters ranges. The mutations implement both large and small modifications within the encrypted models. The explorative behavior of the algorithm can be controlled by means of their assigned probabilities. Note that the structural mutation can produce the most important refreshment in terms of encoded neural topologies. D. Fitness Assignment One assumes that a representative training data set, S = {(u(k ), y (k ))} , k = 1,...d, is available. It has to describe the dynamic behavior of the system for a large range of inputs. To avoid ill conditioning and inconvenient aggregation of the neurons’ outputs, the measurements are scaled in [-0.9, 0.9]. The monoobjective design procedure ensures the minimization of the empirical risk, in terms of S:

SSE ( M ) =

– 550 –

1 d 2 ∑ ( yi − yˆ i ) , d i =1

(8)

L. Ferariu, B. Burlacu • Graph Genetic Programming for Hybrid Neural Networks Design

where yi and yˆ i denote the ith plant output and its approximation, respectively, and M denotes the evaluated model, encrypted as graph-based individual. An adequate balance between exploration and exploitation is preserved by means of linear ranking-based fitness assignment. Note that the complexity order of the encoded topologies is controlled by means of nl and ni . E. Selection for Reproduction and Survival At each generation, the universal stochastic sampling is called for filling the recombination pool. The resulted offspring compete with their parents to form the population of the next generation. During insertion, the best adapted solutions are deterministically chosen as survivors. The number of the offspring is smaller than the population size, so the evolutionary process results elitist in nature. The most accurate model encoded in the last generation is selected as the result of the run. IV.

APPLICATION

The proposed design methodology (named GPHNN) was applied for the identification of an evaporator subsystem from the Sugar factory of Lublin, Poland (Fig. 4). The evaporator is part of the evaporation station (ES), being responsible with the reduction of the content of water in the sucrose juice. It admits three inputs (the steam flow and steam temperature at the input of ES, as well as the juice temperature after heater) and one output (the juice temperature after the 1st section of ES) [12]. The training data set contains about 300 measurements acquired during a production shift. S is filtered with a low - pass 4th order Butterworth system. It illustrates the maximum possible excitation of the process achievable during normal plant exploitation. All missing and uncertain values have been replaced by means of polynomial interpolation. To provide a suitable verification of the model generalization capability, the validation data set includes measurements acquired in a different month of ES operation [12]. The suggested design method was implemented in C. For different configurations of the algorithm parameters, Table 1 lists the mean performances of accuracy achieved on training and validation, during 10 independent runs. All experiments correspond to a probability of structural crossover equal to 0.7 and a probability of mutation 0.3. If the terminal set contains few elements – being suspected as insufficient (#1→#5), the neural models can implement only short-term “memory”, leading to the risk of unsatisfactory approximations in the case of high order systems. Table 1 indicates that the recent input samples have higher influence on the evaporator output, yet, for nu = n y = 1 , only a limited level of accuracy can be achieved, even when numerous populations and long evolutionary loops are employed (#2, 4, 5). Therefore, the recommendation is to initially set higher nu and n y , as generally, the extraneous terminals could be less dangerous than the absence of useful ones (#18→#27), due to the inherent GP ability of selecting subsets of x. However, when the terminal set is not minimally sufficient, it is compulsory to ensure higher explorative behavior by means of larger populations and larger number of generations (#19, 22, 26). Note also that the usability of the selected model is strongly dependent to

the complexity constraints imposed by nl (for these experiments, nl = 3 ). The alien terms can act as undesired introns of the model, leading to bad approximations on the validation data sets (#21, #25). Being stochastic and nonlinear in nature, GPHNN does not guarantee the models’ accuracy improvement by solely increasing the population size (#14 vs. #15) and/or the number of generations (#10 vs. #13). Though, this is a common strategy for enriching the exploration within large search spaces, when combined with small selection pressures and highly productive genetic operators, aimed at preserving the population diversity. TABLE I.

EXPERIMENTAL RESULTS.

# nu ny gen n_ind SSE_L SSE_V 1 1 1 100 300 0.00349 0.0347 2 1 1 100 1000 0.00174 0.0198 3 1 1 200 300 0.00189 0.0428 4 1 1 200 1000 0.00136 0.0216 5 1 1 300 1000 0.00087 0.0120 6 2 2 100 300 0.00238 0.0164 7 2 2 100 1000 0.00166 0.0099 8 2 2 200 300 0.00187 0.0235 9 2 2 200 1000 0.00110 0.0100 10 3 3 100 300 0.00250 0.0146 11 3 3 100 700 0.00173 0.0080 12 3 3 100 1000 0.00228 0.0588 13 3 3 200 300 0.00279 0.0500 14 3 3 200 700 0.00129 0.0147 15 3 3 200 1000 0.00162 0.0089 16 3 3 300 1000 0.00155 0.0237 17 4 4 100 300 0.00266 0.0342 18 4 4 100 700 0.00293 0.0230 19 4 4 100 1000 0.00187 0.0090 20 4 4 200 300 0.00315 0.0349 21 4 4 200 1000 0.00146 0.0363 22 4 4 300 1000 0.00103 0.0101 23 5 5 100 300 0.00392 0.0719 24 5 5 100 1000 0.00180 0.0061 25 5 5 200 300 0.00174 0.0314 26 5 5 200 1000 0.00194 0.0076 27 5 5 300 1000 0.00130 0.0319 n_ind - population size; gen - the number of generations. SSE_L and SSE_V represent SSE computed on scaled training and validation data sets, respectively, according to (8). TABLE II.

PERFORMANCES OF HOMOGENEOUS AND HYBRID NNS

NN type

No. of Number of SSE_L SSE_V neurons parameters HNN 6 37 0.00058 0.0051 RBF* 16 256 0.2513 3.2229 MLP** 16 256 0.0102 2.4490 *) RBF was designed with a constructive algorithm, adding, in sequel, GS neurons; **) MLP was trained for 15000 epochs with Levenberg-Marquardt based procedure. For RBF and MLP, the parameters of the design procedure were set by trial and error.

The best neural model was obtained for nu = n y = 3 . It features a very simple partially interconnected architecture with 6 neurons (1 SP and 5SG) and 37 neural parameters (Fig. 4). For a sound comparison, its performances are compared with those provided by homogeneous full connected neural networks, namely MLP and RBF (Table 2). Note that MLP and RBF require much higher number of neurons for a quite satisfactory approximation on the training data set (Fig. 5). This leads to difficulties in managing the training algorithm and

– 551 –

IEEE International Joint Conferences on Computational Cybernetics and Technical Informatics (ICCC-CONTI 2010) • May 27-29, 2010 • Timisora, Romania.

high risk of overfitting. HNN offers significant better generalization (Fig. 5), due to the flexible manner employed by GP for variable reduction and structure/parameters optimization. GP efficiency results also from the fact that all the models encrypted within the population are valid and, consequently, no searching effort is wasted during the evolutionary loop by producing incorrect neural networks. Compared with [4], which uses enhanced supplementary multiobjective techniques for controlling the complexity of HNNs and dynamic internal blocks, the results are encouraging. For GPHNN, on non-scaled data, SSE results 4.8673 for training and 16.09 for validation, whilst [4] indicates 4.1197 and 11.54, respectively, corresponding to a HNN with 48 neural parameters (including here the coefficients of dynamic internal ARMA filters). The convenient performances of GPHNN which implements only a mono-objective optimization and external memory are mainly sustained by the increased flexibility provided by GP in evolving hierarchical individuals of various sizes. Finally, note that the automatic selection of neural architecture permits to ensure a convenient total design time. V.

CONCLUSIONS

The suggested GP-based methodology represents a suitable alternative for nonlinear system identification. At one side, it exploits the flexibility of the HNNs. This heterogeneous structural template combines various types of global and local neurons to provide enhanced approximation capabilities for a large class of applications. Moreover, the neural network formalism offers high adaptability, even in the post-design stage. At the other side, GP techniques permit the simultaneous configuration of the neural parameters and architectures, without requiring rich a priori information. To facilitate the generation of diverse partially interconnected topologies, DAG – based encoding is employed. The validity at both genotypic and phenotypic levels is preserved by setting appropriate primitives and genetic operators. The functions are defined in accordance to the modularity of HNN, by encapsulated the neural parameters inside the neuron-type nodes of the graph. Additionally, specific customized structural crossover and four types of mutations are designed to ensure an efficient exploration within the space of potential neural models. The approach is recommended for off-line nonlinear system identification, when the interdependency between the plant variables is poorly understood or/and high model accuracy is required, yet the system features complex nonlinearities. Further work will focus on the development of specific mechanisms meant to improve the modular configuration of DAG. They will permit the adaptation of the set of function and the temporary protection of valuable subgraphs during the evolutionary loop, in accordance to the accuracy of generated models and the diversity of the current population.

[2]

Igennik B., Tabib – Azar M., Le Clair S. R., “A net with complex weights”, IEEE Transactions on Neural Networks, 12, 236–249, 2001. [3] S. Haykin,. Neural Networks - A Comprehensive Foundation, McMillan College Publishing Company, New York, 2nd Edition, 1999. [4] L. Ferariu, M. Voicu, “Nonlinear System Identification Based on Evolutionary Dynamic Neural Networks wih Hybrid Structure” in Proc. of IFAC Congress, Prague, Czech Republic, 2005. [5] P. J. Flemming, R. C. Purshouse “Evolutionary Algorithms in Control Systems Engineering: A Survey”, Control Engineering Practice 10, 2002, pp. 1223-1241. [6] Affenzeler M., Winkler S., Wagner S., Beham A., Genetic Algorithms and Genetic Programming: Modern Concepts and Practical Applications (Numerical Insights), CRC Press, 2009. [7] Poli R., Langdon W. B., Mc Phee N. F., A Field Guide to Genetic Programming. Published via http://lulu.com (with contributions of J.R. Koza), [Online]. Available: http://www.gp-field-guide.org.uk, 2008. [8] N. Nedjah, A. Abraham, L. de Macedo Mourelle (Eds.) Genetic Systems Programming - Theory and Experiences, Springer, Netherlands, 2006. [9] J. R. Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selection. Cambridge, MA: MIT Press, 1992. [10] J. A Walker and Miller, J. F. (2008). The Automatic Acquisition, Evolution and Reuse of Modules in Cartesian Genetic Programming. IEEE Transactions on Evolutionary Computation, 12 (4), 397-417. [11] B. Burlacu, L. Ferariu (2010), The design of hybrid neural networks by means of genetic programming. The Bulletin of Politechnical Institute of Iasi, Romania, Section Automatic Control and Computer Engineering, LVI (LX), 1, 9-22. [12] T Marcu., Mirea L., Ferariu L., Frank P. M. “Miscellaneous Neural Networks Applied to Fault Detection and Isolation of an Evaporation Station”, Proc. of 4th IFAC Symposium on Fault Detection, Supervision and Safety for Technical Processes, SAFEPROCESS , Hungary, 2000.

Figure 4. Best HNN model - a partially interconnected architecture (one output SP neuron, 5 hidden SG neurons).

REFERENCES [1]

Patra J., Pal R., Chatterji, B., Panda G., “Identification of nonlinear dynamic systems using functional link artificial neural networks”, IEEE Transactions on System, Man and Cybernetics, Part B: Cybernetics, 29, 254–262, (1999).

– 552 –

Figure 5. The approximation provided by HNN, MLP and RBF on non-scaled validation data set.

Lihat lebih banyak...
Abstract— This paper presents a novel approach devoted to the design of feed forward hybrid neural models. Graph genetic programming techniques are used to provide a flexible construction of partially interconnected neural structures with heterogeneous layers built as combinations of local and global neurons. By exploiting the inner modularity and the parallelism of the neural architectures, the approach suggests the encryption of the potential mathematical models as directed acyclic graphs and defines a minimally sufficient set of functions which guarantees that any combination of primitives encodes a valid neural model. The exploration capabilities of the algorithm are heightened by means of customized crossovers and mutations, which act both at the structural and the parametric level of the encrypted individuals, for producing offspring compliant with the neural networks’ formalism. As the parameters of the models become the parameters of the primitive functions, the genetic operators are extended to manage the inner configuration of the functional nodes in the involved hierarchical individuals. The applicability of the proposed design algorithm is discussed on the identification of an industrial nonlinear plant.

I. INTRODUCTION Neural models are intensively used in nonlinear system identification, due to their capability of learning complex variables’ dependency embodied in finite representative data sets [1, 2]. Their increased robustness and computational efficiency resides from the inductive learning which is employed for topology and parameters’ adaptation, and also from valuable particularities of the neural architectures, namely high level of parallelism and strong interconnectivity provided between numerous, yet simple computational units [3]. Based on Kolmogorov decomposition theorem, various mapping neural topologies have been proved to behave universal approximation capabilities in terms of continuous, bounded, nonlinear functions (e.g. multilayer perceptron MLP, neural networks with radial basis functions RBF, etc.) [3]. Although these structural templates are computational equivalent and guarantee the existence of a neural model of any desired degree of accuracy, in practical situations they can lead to significant different performances, especially when high neural input dimension and/or large, noisy training data sets are involved. To ensure enhanced adaptability, this paper adopts the hybrid neural networks (HNN) formalism [4]. It combines various neural units with local and global responses. The aggregation permits a flexible

978-1-4244-7433-2/10/$26.00 ©2010 IEEE

design of compact neural models, leading, however, to increased difficulties in customizing the topology and the learning algorithm. The most common collaborative neuro-genetic systems employ genetic algorithms for neural topology selection and/or learning [4, 5]. Various direct and indirect representations based on fixed or variable-length chromosomes have been proposed. However, the hierarchical encodings implicitly accepted by genetic algorithms are limited to specific interpretations of several “control” genes which, in fact, encode the architecture. Shorter chromosomes can be achieved by working on tree-based or graph–based individuals. Although these encodings seem more “natural”, they require specific crossovers and mutations, able to preserve the validity of the generated offspring, when dealing with various types of hidden neurons and interconnections [5]. In neural identification, previous work based on graph-based evolutionary algorithms refers to a limited variety of structural templates, most approaches considering the homogeneous MLP. The proposed design of hybrid neural networks is based on genetic programming (GP) techniques. Some benefits result form the capacity of GP of concomitantly working on the structure and the parameters of the neural model, which allows a better control on the overall performances of the selected neural network [6, 7, 8]. Moreover, as GP is specialized in the evolvement of the hierarchical individuals, it should be able to ensure a more efficient exploration within the space of the candidate neural architectures, whilst preserving a natural interpretation of the encrypted individuals. At the other side, GP offers the inherent advantages of evolutionary approaches, so it can adjust a large number of neural parameters, while dealing with scarce aprioric information and/or nonlinearities, multimodalities, discontinuities of the objective functions. To ensure a better exploitation of neural architecture’s modularity and parallelism, this paper employs graph GP and introduces original enhancements which allow the modular generation of graph-encrypted individuals. The partially interconnected hybrid networks are generated by means of recursive combinations of primitives. In this context, the authors propose a special configuration of a minimally sufficient functions’ set, which is able to ensure the validity of the generated neural structures. Additionally, the graph GP is equipped with new compatible genetic operators, especially designed to preserve the consistency of the encoded neural architectures and, consequently, to enhance the algorithm exploration ability.

– 547 –

IEEE International Joint Conferences on Computational Cybernetics and Technical Informatics (ICCC-CONTI 2010) • May 27-29, 2010 • Timisora, Romania.

The paper is organized as follows. Section II presents the adopted hybrid neural architecture, whilst Section III discusses the suggested customizations aimed at making GP compliant with neural networks’ design. Section IV illustrates the performances of the proposed on the identification of an industrial system and final remarks are given in Section V. II. HYBRID NEURAL ARCHITECTURE The hybrid neural models accept a heterogeneous structure for each hidden layer [4]. This approach combines both simple and complex neurons with global and local responses, as basis for a flexible tuning of the neural architecture. Implementing the input operator by means of the dot product, the neurons with global response are able to extrapolate beyond the range of known training data, thus leading to increased robustness. On the contrary, the neurons with local response aggregate the inputs by using the Euclidian distance. Being activated in limited areas around specific input patterns, they can provide high computational efficiency for parameters’ calculation. Summing up, the combination of local and global neurons could offer higher approximation capabilities both in interpolation and extrapolation problems [3]. The suggested design procedure accepts any type of global and local neurons, interconnected by means of feed-forward links (Fig. 1). Several potential hidden neural units are described below. Let us denote the neuron’s input vector with z = [ zi ]i =1,.., no _ i and the neuron’s parameters with θ . The most common neural unit with global response is the standard perceptron (SP). When hyperbolic tangent activation function is used, it ensures the following nonlinear input-output mapping: no _ i

y SP = f SP (z, θ) = tanh( ∑ wi zi + b) ,

(1)

i =1

where θ = [ w1 ,.., wno _ i , b]'∈ ℜ no _ i +1 contains the weights associated to all input connections, denoted with [ wi ] i =1, no _ i , and the bias, b. A more complex global

−

y SG = f SG ( z, θ) = e

1

no _ i

2σ 2

i =1

2 ∑ ( ci − z i )

,

(3)

where θ = [c1 , ..., cno _ i , σ ]'∈ ℜ no _ i +1 includes all centers, [ci ]i =1,..no _ i , and the spread, σ . A possible extension is the local neuron with radial basis function and complex weights (RBC), which associates for each input connection a center, ci , yet also a complex weight,

we

−1 ⋅α i

:

y RBC = f RBC (z, θ) =

=e

2 2 no _ i no _ i − w 2 ∑ cos α i ⋅( z i − ci ) + ∑ sin α i ⋅( z i − ci ) i =1 i =1

(4).

leading to θ = [α1 ,..,α no _ i , c1 ,..., cno _ i , w]'∈ ℜ 2 no _ i +1 . Maximum activation is also obtained when all zi = ci , i = 1, no _ i , but the region where non-zero response is achieved can be more flexibly tuned, favoring better approximation capabilities than RBF [2]. This approach extends the previous work devoted to HNN design presented in [4], which was based on fixed length chromosomes and hierarchical genetic encoding with successive levels of control and parametric genes. Increased flexibility is provided by means of GP techniques, which is able to evolve individuals of various sizes and shapes. Moreover, the design procedure allows any number of hidden layers inside the HNNs. Each neuron accepts input stimulus provided by a subset of neurons of the previous layer, yet also by a subset of neural inputs. The input-output formalism, based on series – parallel identification schemes with external delay blocks is considered, meaning that the neural input vector contains lagged plant input and output measurements:

x(k ) = [u(k ),.., u(k − nu ), y (k − 1),.., y (k − n y )] ,

neuron is the one with functional links (FL), which increases the dimensionality of the input space by means of orthogonal functions, e. g. trigonometric ones:

Layer 1

(5)

Layer s

θ 1q

no _ i P

x1

y SFL = f SFL (z , θ) = tanh( ∑ ( ∑ wc (cos πjz i ) i =1

P

j =1

ij

(2)

+ ∑ w s (sin πjz i ) + wi zi + b)). j =1

ϕq . . .

θs

….

xi

ij

Here P indicates the predefined order of the functional expansion and θ ∈ ℜ1+ no _ i ⋅( 2 P +1) includes the weights for all original and expanded input links. Note that SFL-based networks can lead to a reduced number of neurons and, in some cases, to better overall performances than MLP [1]. The Gaussian neuron (SG) with real parameters is a local neuron which attains the maximum activation when the input values are, correspondingly, identical with the associated centers:

. . .

….

ϕ1

) yi

θ1r ϕr

xR

1 ≤ q < r ≤ n1

Figure 1. Hybrid neural architecture. For each neuron, ϕ denotes the input operator and Ө the parameters (real or complex weights, biases, centers and spreads, respectively). The input pattern x contains lagged plant variables.

– 548 –

L. Ferariu, B. Burlacu • Graph Genetic Programming for Hybrid Neural Networks Design

where u ∈ ℜ m and y ∈ ℜ n denote the inputs and the outputs of the system, k indicates the current sampling instant and nu, ny represent the maximum permitted input and output lags, respectively. For the sake of simplicity, all individuals encode single output models. Therefore, for MIMO systems identification problems, one has to design n different models, taking into account possible interconnections between the system outputs:

yˆ i (k ) = f i (x(k )) ,

(6)

where yˆ i denotes the approximation provided by ith model for the ith output of the system. The output neuron is forced to be a global one, in order to allow non-zero model output for a wide range of neural inputs, situation which is frequently met in system identification. III.

GENETIC PROGRAMMING LOOP

A. HNN Encoding GP builds the potential solutions by means of recursive combinations of predefined functions and terminals [9]. Despite classical GP which works with tree-encrypted individuals, this approach considers directed acyclic graphs (DAG). The main motivation lies in the restrictive connectivity allowed by the inner structure of the trees. Graph – based individuals can provide a direct encryption of any neural connections, by simply including corresponding links between the nodes, therefore they conduct to reduced size and simple interpretation. Note that each supplementary link means the reuse of all interconnected nodes from the lower layers, without additional memory consumption. Graph-GP assumes that the leaves of the graphs are terminals, namely elements of the T set, and the others nodes correspond to functions chosen from the O set. Therefore, the success of the algorithm is strongly related to a convenient initialization of O and T. To guarantee the validity of the hierarchical individuals, the primitives’ sets must satisfy the closure and the sufficiency properties [9]. The closure property ensures the data type and the data value consistency, as it states that any combination of functions and terminals is accepted within the individuals. At the other side, the sufficiency property assures that the optimal solution can be obtained by mixing some predefined primitives, so it demands to include in O and T all required elements. Obviously, the use of extraneous primitives leads to a larger searching space and to a harder exploration, therefore minimally sufficient sets are preferred [7]. However, the above mentioned requirements are limited to genotype validity. They guarantee the correctness of the individuals, yet not the validity of the encoded models, in terms of the employed structural template. As example, for HNN design, the set of functions O = {exp,+,−,*, cos, sin} is minimally sufficient, if the algorithm encodes 1 /(2σ 2 ) , instead of σ . Assuming a terminal set with all variable terminals specified in x, a supplementary real parameter-type terminal and some constant-type terminals (such as − 1 , π ), O and T satisfy also the closure property. Though, a

lot of correct individuals do not encode correct hybrid neural networks. In fact, the probability of obtaining a valid neural model by stochastic aggregation of primitives results very low and, consequently, the algorithm will behave unsatisfactory exploration capabilities. To ensure the phenotype validity, the authors recommend additional constraints. Namely, the set of functions is configured based on neural network’s granularity. As the neuron represents the atom at phenotypic level, it should be used as atom at genotypic level, too. This means, O is defined to include the inputoutput functions provided by all accepted neurons:

O = { f SP , f FL , f SG , f RBC ,...} .

(7)

In this case, a supplementary node in the DAG encodes a supplementary neuron in the HNN and a graph connection encrypts a neural link (Fig. 2). The proposed encoding guarantees the correct structure of each compound neural unit and exploits the inherent interconnectivity of DAG. The constant terminals are no more necessary and all the parameter terminals are already captured as generic parameters in the functional nodes of the individuals. The downside is the need of enhanced genetic operators able to work on the inner structure and parameters of the functional nodes. In compliance with the hybrid neural architecture (Fig. 1), x can be directly employed as terminal set. Note that the functions’ set (7) is minimally sufficient and data consistency is ensured for any combination of primitives. Moreover, x results sufficient if appropriate maximum input and output lags are used. nu and n y can be set by trial of error with reduced computational effort, knowing that the algorithm can cope with several alien lagged measurements. In fact, the GP-based selection of appropriate elements of x implements a concomitant variable reduction and external delays configuration. To avoid the production of overfitted models due to the undesired increasing of the complexity order of the graphs, two control parameters are used, namely the maximum depth of a graph ( nl ≥ 1 ) and the maximum number of input connections accepted for a functional node ( ni ≥ 1 ). Despite extended genetic algorithms based on graph encoding, GP offers enhanced capabilities in terms of reuse of valuable structural components of the fitted individuals by means of primitives’ set updating and encapsulation [10]. B. Initial Population Having no a priori information about the rough localization of the optimum points, the algorithm starts with a randomly generated initial population of DAGencrypted solutions distributed over the entire search space. Firstly, individuals of various shapes and depths are generated, by including, recursively, single input single output neurons of any type and terminal leaves. Afterwards, random extra-connections are drawn between the nodes of successive layers, for producing highly interconnected neural architectures, as illustrated in Fig. 2. Naturally, incoming connections are accepted for the

– 549 –

IEEE International Joint Conferences on Computational Cybernetics and Technical Informatics (ICCC-CONTI 2010) • May 27-29, 2010 • Timisora, Romania.

Figure 2. DAG-based encoding of HNN. The filled nodes encodes the functional nodes, the others denote the terminals. Initial connections between the nodes are shown with a solid line, while the created extra connections are indicated with dotted lines. Graph’s layers 1 - 3 contain neurons, yet also terminals which act as inputs for the neurons of the upper layer.

Figure 3. Structural crossover.

neuron-type nodes, only. Any initial individual can use a subset of terminals and any combination of global and local neurons. C. Genetic Operators As the suggested encoding treats the neurons as atomic structural blocks, the classic GP operators have to be extended in order to allow changes within the inner structure of the functional nodes [11]. Increased algorithm exploration capabilities are provided by means of enhanced crossover and mutations, which improve the preliminary proposal presented in [11]. The structural crossover interchanges two sub-graphs randomly selected from the parents. The offspring preserve all the output connections provided from the cutting node and all the internal links of the incoming sub-graph. If some of the swapped neurons are also connected to external nodes (that do not belong to the considered sub-graph), they have to be copied before leaving the individual, in order to keep unaltered the rest of the graph. An example is given in Fig. 3. The node 1 of parent A and the node 11 of parent B have been chosen as cutting points. After swapping, the offspring B will contain two supplementary layers and the output connections of the node 11 provided to the neurons 5 and 7 of the previous layer. In offspring A, the neuron 1 of parent A is replaced with a terminal node. The nodes belonging to the sub-graph selected for swapping can be deleted, except the nodes 4 and 9, which are also connected to nodes 2 and 3.

In the case of hierarchical encodings, the crossover causes two important problems. One is the bloat phenomenon manifested by the tendency of the producing larger and larger graphs, without significant fitness improvement [7]. Although the complexity order of the individuals is controlled by means of nl and ni , the bloat affects the performances of the algorithm, as it increases the chances of obtaining unvaluable overfitted offspring, with expected bad generalisation capabilities, which must be rejected from the population. The second drawback is related to competing conventions [5, 6]. The quality of the offspring could be much lower than the quality of the parents, as the crossover does not detect the structural blocks with different genotype, yet silimar functionality (e.g subgraphs resulted by permutations of their nodes). In this context, this approach considers four types of mutations, applied with high probabilities for modifying the neural topologies and the corresponding neural parameters. Being unary operators, mutations cannot produce competing conventions and permits a simpler control of the complexity order of the resulted offspring. Various mutations can act in sequel on the same individuals, to produce successive changes of their genetic material. The structural mutation replaces a randomly selected sub-graph with another one, stochastically generated by means of recursive combinations of available primitives. The node mutation changes the activation function of the randomly selected functional nodes or the significance of the terminal-type nodes, in accordance to the available alleles indicated in O and T, respectively. If a neuron – type node switches from a global unit to a local one, its neural parameters are randomly reinitialized, because the previous values have no more usability. Also, if the neuron remains global or local, but supplementary parameters are added (e.g. a SP turns to a FL or a GS becomes a RBC), their values are stochastically set. The link mutation introduces supplementary input connections for the functional nodes or cuts some of the existing ones. Only new feed forward connections are accepted, namely input connections for the neuron-type nodes coming from any node of the lower layers. Lastly, the parametric mutation produces small variations on the selected neural parameters, according to predefined parameters ranges. The mutations implement both large and small modifications within the encrypted models. The explorative behavior of the algorithm can be controlled by means of their assigned probabilities. Note that the structural mutation can produce the most important refreshment in terms of encoded neural topologies. D. Fitness Assignment One assumes that a representative training data set, S = {(u(k ), y (k ))} , k = 1,...d, is available. It has to describe the dynamic behavior of the system for a large range of inputs. To avoid ill conditioning and inconvenient aggregation of the neurons’ outputs, the measurements are scaled in [-0.9, 0.9]. The monoobjective design procedure ensures the minimization of the empirical risk, in terms of S:

SSE ( M ) =

– 550 –

1 d 2 ∑ ( yi − yˆ i ) , d i =1

(8)

L. Ferariu, B. Burlacu • Graph Genetic Programming for Hybrid Neural Networks Design

where yi and yˆ i denote the ith plant output and its approximation, respectively, and M denotes the evaluated model, encrypted as graph-based individual. An adequate balance between exploration and exploitation is preserved by means of linear ranking-based fitness assignment. Note that the complexity order of the encoded topologies is controlled by means of nl and ni . E. Selection for Reproduction and Survival At each generation, the universal stochastic sampling is called for filling the recombination pool. The resulted offspring compete with their parents to form the population of the next generation. During insertion, the best adapted solutions are deterministically chosen as survivors. The number of the offspring is smaller than the population size, so the evolutionary process results elitist in nature. The most accurate model encoded in the last generation is selected as the result of the run. IV.

APPLICATION

The proposed design methodology (named GPHNN) was applied for the identification of an evaporator subsystem from the Sugar factory of Lublin, Poland (Fig. 4). The evaporator is part of the evaporation station (ES), being responsible with the reduction of the content of water in the sucrose juice. It admits three inputs (the steam flow and steam temperature at the input of ES, as well as the juice temperature after heater) and one output (the juice temperature after the 1st section of ES) [12]. The training data set contains about 300 measurements acquired during a production shift. S is filtered with a low - pass 4th order Butterworth system. It illustrates the maximum possible excitation of the process achievable during normal plant exploitation. All missing and uncertain values have been replaced by means of polynomial interpolation. To provide a suitable verification of the model generalization capability, the validation data set includes measurements acquired in a different month of ES operation [12]. The suggested design method was implemented in C. For different configurations of the algorithm parameters, Table 1 lists the mean performances of accuracy achieved on training and validation, during 10 independent runs. All experiments correspond to a probability of structural crossover equal to 0.7 and a probability of mutation 0.3. If the terminal set contains few elements – being suspected as insufficient (#1→#5), the neural models can implement only short-term “memory”, leading to the risk of unsatisfactory approximations in the case of high order systems. Table 1 indicates that the recent input samples have higher influence on the evaporator output, yet, for nu = n y = 1 , only a limited level of accuracy can be achieved, even when numerous populations and long evolutionary loops are employed (#2, 4, 5). Therefore, the recommendation is to initially set higher nu and n y , as generally, the extraneous terminals could be less dangerous than the absence of useful ones (#18→#27), due to the inherent GP ability of selecting subsets of x. However, when the terminal set is not minimally sufficient, it is compulsory to ensure higher explorative behavior by means of larger populations and larger number of generations (#19, 22, 26). Note also that the usability of the selected model is strongly dependent to

the complexity constraints imposed by nl (for these experiments, nl = 3 ). The alien terms can act as undesired introns of the model, leading to bad approximations on the validation data sets (#21, #25). Being stochastic and nonlinear in nature, GPHNN does not guarantee the models’ accuracy improvement by solely increasing the population size (#14 vs. #15) and/or the number of generations (#10 vs. #13). Though, this is a common strategy for enriching the exploration within large search spaces, when combined with small selection pressures and highly productive genetic operators, aimed at preserving the population diversity. TABLE I.

EXPERIMENTAL RESULTS.

# nu ny gen n_ind SSE_L SSE_V 1 1 1 100 300 0.00349 0.0347 2 1 1 100 1000 0.00174 0.0198 3 1 1 200 300 0.00189 0.0428 4 1 1 200 1000 0.00136 0.0216 5 1 1 300 1000 0.00087 0.0120 6 2 2 100 300 0.00238 0.0164 7 2 2 100 1000 0.00166 0.0099 8 2 2 200 300 0.00187 0.0235 9 2 2 200 1000 0.00110 0.0100 10 3 3 100 300 0.00250 0.0146 11 3 3 100 700 0.00173 0.0080 12 3 3 100 1000 0.00228 0.0588 13 3 3 200 300 0.00279 0.0500 14 3 3 200 700 0.00129 0.0147 15 3 3 200 1000 0.00162 0.0089 16 3 3 300 1000 0.00155 0.0237 17 4 4 100 300 0.00266 0.0342 18 4 4 100 700 0.00293 0.0230 19 4 4 100 1000 0.00187 0.0090 20 4 4 200 300 0.00315 0.0349 21 4 4 200 1000 0.00146 0.0363 22 4 4 300 1000 0.00103 0.0101 23 5 5 100 300 0.00392 0.0719 24 5 5 100 1000 0.00180 0.0061 25 5 5 200 300 0.00174 0.0314 26 5 5 200 1000 0.00194 0.0076 27 5 5 300 1000 0.00130 0.0319 n_ind - population size; gen - the number of generations. SSE_L and SSE_V represent SSE computed on scaled training and validation data sets, respectively, according to (8). TABLE II.

PERFORMANCES OF HOMOGENEOUS AND HYBRID NNS

NN type

No. of Number of SSE_L SSE_V neurons parameters HNN 6 37 0.00058 0.0051 RBF* 16 256 0.2513 3.2229 MLP** 16 256 0.0102 2.4490 *) RBF was designed with a constructive algorithm, adding, in sequel, GS neurons; **) MLP was trained for 15000 epochs with Levenberg-Marquardt based procedure. For RBF and MLP, the parameters of the design procedure were set by trial and error.

The best neural model was obtained for nu = n y = 3 . It features a very simple partially interconnected architecture with 6 neurons (1 SP and 5SG) and 37 neural parameters (Fig. 4). For a sound comparison, its performances are compared with those provided by homogeneous full connected neural networks, namely MLP and RBF (Table 2). Note that MLP and RBF require much higher number of neurons for a quite satisfactory approximation on the training data set (Fig. 5). This leads to difficulties in managing the training algorithm and

– 551 –

IEEE International Joint Conferences on Computational Cybernetics and Technical Informatics (ICCC-CONTI 2010) • May 27-29, 2010 • Timisora, Romania.

high risk of overfitting. HNN offers significant better generalization (Fig. 5), due to the flexible manner employed by GP for variable reduction and structure/parameters optimization. GP efficiency results also from the fact that all the models encrypted within the population are valid and, consequently, no searching effort is wasted during the evolutionary loop by producing incorrect neural networks. Compared with [4], which uses enhanced supplementary multiobjective techniques for controlling the complexity of HNNs and dynamic internal blocks, the results are encouraging. For GPHNN, on non-scaled data, SSE results 4.8673 for training and 16.09 for validation, whilst [4] indicates 4.1197 and 11.54, respectively, corresponding to a HNN with 48 neural parameters (including here the coefficients of dynamic internal ARMA filters). The convenient performances of GPHNN which implements only a mono-objective optimization and external memory are mainly sustained by the increased flexibility provided by GP in evolving hierarchical individuals of various sizes. Finally, note that the automatic selection of neural architecture permits to ensure a convenient total design time. V.

CONCLUSIONS

The suggested GP-based methodology represents a suitable alternative for nonlinear system identification. At one side, it exploits the flexibility of the HNNs. This heterogeneous structural template combines various types of global and local neurons to provide enhanced approximation capabilities for a large class of applications. Moreover, the neural network formalism offers high adaptability, even in the post-design stage. At the other side, GP techniques permit the simultaneous configuration of the neural parameters and architectures, without requiring rich a priori information. To facilitate the generation of diverse partially interconnected topologies, DAG – based encoding is employed. The validity at both genotypic and phenotypic levels is preserved by setting appropriate primitives and genetic operators. The functions are defined in accordance to the modularity of HNN, by encapsulated the neural parameters inside the neuron-type nodes of the graph. Additionally, specific customized structural crossover and four types of mutations are designed to ensure an efficient exploration within the space of potential neural models. The approach is recommended for off-line nonlinear system identification, when the interdependency between the plant variables is poorly understood or/and high model accuracy is required, yet the system features complex nonlinearities. Further work will focus on the development of specific mechanisms meant to improve the modular configuration of DAG. They will permit the adaptation of the set of function and the temporary protection of valuable subgraphs during the evolutionary loop, in accordance to the accuracy of generated models and the diversity of the current population.

[2]

Igennik B., Tabib – Azar M., Le Clair S. R., “A net with complex weights”, IEEE Transactions on Neural Networks, 12, 236–249, 2001. [3] S. Haykin,. Neural Networks - A Comprehensive Foundation, McMillan College Publishing Company, New York, 2nd Edition, 1999. [4] L. Ferariu, M. Voicu, “Nonlinear System Identification Based on Evolutionary Dynamic Neural Networks wih Hybrid Structure” in Proc. of IFAC Congress, Prague, Czech Republic, 2005. [5] P. J. Flemming, R. C. Purshouse “Evolutionary Algorithms in Control Systems Engineering: A Survey”, Control Engineering Practice 10, 2002, pp. 1223-1241. [6] Affenzeler M., Winkler S., Wagner S., Beham A., Genetic Algorithms and Genetic Programming: Modern Concepts and Practical Applications (Numerical Insights), CRC Press, 2009. [7] Poli R., Langdon W. B., Mc Phee N. F., A Field Guide to Genetic Programming. Published via http://lulu.com (with contributions of J.R. Koza), [Online]. Available: http://www.gp-field-guide.org.uk, 2008. [8] N. Nedjah, A. Abraham, L. de Macedo Mourelle (Eds.) Genetic Systems Programming - Theory and Experiences, Springer, Netherlands, 2006. [9] J. R. Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selection. Cambridge, MA: MIT Press, 1992. [10] J. A Walker and Miller, J. F. (2008). The Automatic Acquisition, Evolution and Reuse of Modules in Cartesian Genetic Programming. IEEE Transactions on Evolutionary Computation, 12 (4), 397-417. [11] B. Burlacu, L. Ferariu (2010), The design of hybrid neural networks by means of genetic programming. The Bulletin of Politechnical Institute of Iasi, Romania, Section Automatic Control and Computer Engineering, LVI (LX), 1, 9-22. [12] T Marcu., Mirea L., Ferariu L., Frank P. M. “Miscellaneous Neural Networks Applied to Fault Detection and Isolation of an Evaporation Station”, Proc. of 4th IFAC Symposium on Fault Detection, Supervision and Safety for Technical Processes, SAFEPROCESS , Hungary, 2000.

Figure 4. Best HNN model - a partially interconnected architecture (one output SP neuron, 5 hidden SG neurons).

REFERENCES [1]

Patra J., Pal R., Chatterji, B., Panda G., “Identification of nonlinear dynamic systems using functional link artificial neural networks”, IEEE Transactions on System, Man and Cybernetics, Part B: Cybernetics, 29, 254–262, (1999).

– 552 –

Figure 5. The approximation provided by HNN, MLP and RBF on non-scaled validation data set.

Somos uma comunidade de intercâmbio. Por favor, ajude-nos com a subida ** 1 ** um novo documento ou um que queremos baixar:

OU DOWNLOAD IMEDIATAMENTE