A nonlinear system predictor from experimental data using neural networks

June 20, 2017 | Autor: Luisa Franchina | Categoria: Associative Memory

Descrição do Produto

A Nonlinear System Predictor from Experimental Data using Neural Networks Riccardo Carotenuto, Luisa Franchina, Moreno Coli Dipatfimento di lngegneria Elettronica, Universita di Roma "La Sapienza" via Eudossiana, 18 - 001 84 Roma, Italy - Ph. +39-6-44585345, Fax +39-6-4742647 E-mail: [email protected] .it - [email protected]

Abstract A novel iterative technique is proposed by the authors in order to build a discrete-time nonlinear dynamical system predictor from experimental input-output pairs. The iterative technique is capable of representing a class of dynamical systems as static multidimensional mappings. The practical solution of the prediction problem is strongly related with the availability of suitable representations of multidimensional mappings. The proposed technique, belonging to the memory-based techniques, highly reduces the memory amount required to store the representation of the mapping. The iterative technique is very well suited to work in conjunction with an associative memory structure as the monodimensional C W C and in presence of on-fly data. An application example to dynamical Jystem output prediction is presented. Moreover, a convergence discussion for the proposed algorithm is provided, Finally, computer simulations veri& the stated theory.

Keywords: neural networks, system predictor, iterative algorithm.

Introduction Modern control techniques are widely based on models acting as predictors of the process being controlled [l-31. A predictor is an algorithm capable of providing an estimated output of the system under control before the real system output is available. If the predictor is sufficiently fast, it allows us to try off-line many different control inputs and to choice the best one before the next time step. Unfortunately, it is often practically impossible to obtain a sufkiently accurate analytical description of the process. In such a case, controllers can incorporate black-box models built from experimental input-output data. In many cases a black-box model can be seen as a static mapping function from the input and the state space to the output space [4]. The method of representing dynamical systems by difference equations is currently well established in system theory and applies to a fairly large class of systems. In this paper we will be concerned with discrete-time nonlinear

SISO system which can be represented by the difference equation:

y& + 1) = Fly(k)..y(k - n + l), u(k)...u(k - m + l)]

Given the function F, the output at time k + 1 is completely determined by n past outputs and m past inputs. The function F is a static mapping, since it is a memoryless relation defined as F :'$In+" -+ '$I (see Fig. 1). In order to obtain a predictor for the system (l), we have to construct a suitable representation of the static mapping F. Obviously, we shall assume that a unique solution to the above problem exists for any set of values of u(k) in a compact region in U c 91m,and for any initial condition in a compact region in Y c W. Our aim is to construct such predictor from measurements of system input-output pairs only. To this end: we define an input and output range and a quantization level M for the measurement samples. After the quantization, F can be considered only in a (m + n)-dimensional box consisting of W'"' discrete points, so that F can be represented as a (m + n)-dimensional matrix. The entries of the above matrix are experimental values of F; addressed by the (m + n) past input and output samples. However, even if for little M, n and m, the exponential growth of the memory requirements makes this approach impracticable in a direct way. For example if M = 256 and n + m = 10, then '2 memory location must be allocated. Moreover, the input-output pair acquisition time has also an exponential growth. At present, many different methods are known in literature in order to compactly represent data sets, from the classical interpolation functions to the n-dimensional splines and to the different kinds of neural networks. Among the most suitable neural networks for this applications, the multilayered perceptron provides a good approximation in case of smooth F only, moreover it requires time consuming learning sessions and the convergence is not guaranteed [5, 61. The multidimensional CMAC (Cerebellar Model Arithmetic Computer) of Albus guarantees the convergence for a wide class of mappings, but the memory requirements grow in near exponential manner with the dimension of F.

148 0-8186-7352-4/96 $05.00 0 1996 IEEE

(1)

of such functions, which do not generate truly independent values in the d-dimensional space. After the quantization the fi's are represented each by a vector of M entries. From (2) such vectors are equivalent to the data matrix representing F, but while the full matrix representation requires Md cells, d.M cells sflice for the f,'s. In other words the intrinsic redundancy of the data set describing F allows us to dramatically reduce the memory requirements: from O(exp(d)) to O(d). In the Henon example now 256.2=5 12 instead of 2562 = 216memory cells are required. As a further problem constrain we assumc that the experimental data are on-fly: only one experimental pair at a time can be processed, according to a real on-line input-output measurement from a dynamical system: moreover, the memory resources needed for the complete data set storage are not available. In the past, however, suitable techniques to extract the f,'s from a temporal sequence of experimental data samples were missing. From an analytical point of view, if we have at a same time the complete data matrix representing F, the 6's can be obtained by solving the following linear system of Md equations in d.M unknowns:

y(k-n+l) Delay

Tme Delay

Fig. 1. A dynamical system described by m past inputs U(.), n past outputs y(.), and the static mapping F.

A hashing indexing technique was proposed as a partial remedy for this problem, but such a technique makes quite noisy the data retrieval [7-111.

The Iterative Decomposition and Extraction Algorithm (IDEA) We are searching for an algorithm which is capable to reconstruct the dynamical behaviour, i.e. the static mapping F, described by the experimental samples without a banal storage of all possible Md values. We can take advantage of the following fact: in dynamical systems, the values of F are always produced by some underlying law, i.e. they are not random-like values. So there is always some kind of redundancy in the data set, even if we are not able to explicitly describe it. For example, the best known case is represented by linear systems, completely described by a minimal set of parameters: the hyperplane coefficients. The paper deals with the case of F : !TId -+ !TI which is decomposable in a sum of monodimensional functions f, :

CX=D

where C, D, and X are [Md x d.lM], [Md x 11; and Id-M x 11 matrices, respectively. The entries of D are the same of the data matrix representing F, now ordered as a vector, X is a vector composed by the entries of the n vectors representing the n f,'s, actually unknowns, and C represents the composition rule (2) for which each element in D must be obtained by summing d suitable elements of X. For example, with d = 2, M = 3, X is composed by the vectors representing fl and fi, say V = [ vl v2 v3 ] and W = [ w1 wz w3 ] respectively. Jn such a case X = [V WIT and we have:

A wide class of multidimensional mapping functions can be represented by a sum of monodimensional functions, including all linear and many other functions: for example we can consider the following Henon discrete-time chaotic equation: Y(k + 1)

1 - 1.4 y(k)2+ 0.3 y(k - 1)

where d = n = 2, m = 0, x1 = y(k) and x2 = y(k - 1). If a given function y = F(xl, xz,...xd) can be written as y = fi(xl) + f2(x2)+...+ fd(xd), i.e. F can be decomposed or "projected" on the coordinate planes without information content loss, enormous benefits in memory requirements can be obtained whichever representation (and storage) technique we use. In practice, it is possible to take advantage from the intrinsic redundancy in the sample set

1 0

0 1

0 0

1 1

0 0

0 0

0

0

1

1

0

0

1 0 0 1 0

0 1 0 0 1

0 0 1 0 0

0 0 0 0 0

1 1 1 0 0

0 0 0 1 1

0

0

1

0

0

1

Under the assumption ( 2 ) only d M - (d - 1) equations are independent. The structure of the matrix C is given by the relation between the elements of X according to the assumed structure of F. Such a structure implies that the sum of the first M columns is equal to the sum of the subsequent groups of M columns. So the rank of the coefficient matrix C is d-M - (d - 1) anyway. It corresponds to the circumstance that the result is not different if a constant c, is summed to each vector representing a f,. with the constrain C,c, = 0. The above system is completely 149

determined only if the values of (d - 1) unknowns of X are arbitrarily fixed. Unfortunately this direct method is quite impracticable: we have previously assumed that it is impossible to have all the experimental data both simultaneously and ordered: we can have only one data pair at a time. Then, a practical extraction method must be iterative in order to deal with on-jlv data. The authors propose a technique to extract from the measurements of the input-output pairs the d fi's. The following iteration is proposed:

Proof By applying the iteration ( 3 ) , V P = {xl, x2,...%} , it results d

fl'T1(xl)= fl'(XI) + a [y(xl, x2, ... xd)

c ~'(XJ)]

J=

1

d

fdl'l

(xd) fdl(xd) + a [Y(xl, x2,... xd) -

c f,'(xj) 1 J=

1

and by summing member to member and subtracting each resulting member from y(xI, x2,. . xd) we obtain. d

v(xl, x2,... xd)

c $'+'(xj) = y(xl, J=

d

x2,... xd)

fJ'(xJ)

-

1- 1

1

d

- d a &(XI, X2,... xd)

- C f,'(xj) I J=

where i is the iteration index and j = 1, 2 ...d. Given a discrete-time dynamical system of the kind of (2) whose n and m (d = m + n) are known, the following steps are needed for the application of the proposed method:

d

e' = y(xi, xz,. ~ d -)

f,'(xJ> J=

I

we have le'+'/ = le' - d a e'I = le' 111 - dal. The iteration is contractive, i.e. le'+'( < le'l, V P = {xl, x2,...xd} only ifll- d a J< 1, and then only if 0 < a < 2/d, q e.d. 0

d cell arrays, each of M elements, are allocated to represent the d f,'s; n consecutive past samples of the outputs and m consecutive past samples of inputs, both quantized with M levels, are stored in a suitable memory buffer. Each from the above input and output samples, which is now an integer number ranging from 0 to M - 1, is used for indexing the corresponding memory cell array of M elements; the contents of the d cells now indexed are read, one for each cell array, and summed up. The result is the current prediction for the output, say yp(k + 1); the next time step the input and output of the dynamical system are sampled and the prediction error is defined as ep= y(k + 1) - yp(k + 1). A certain percent of this error accumulates through the iteration (3) in all the memory cells involved in the current prediction; repeat steps from 2) to 4), adding the new input-output samples and shifting out the oldest samples in the memory buffer.

This theorem gives the upper bound for the convergence parameter a, even if practical considerations on smoothness of convergence and robustness against the noise recommend the use of an a smaller than the theoretical bound. Under the assumption of convergence, the iterative algorithm (3) is capable to construct the d vectors representing the fl's from the measurements of input-output pairs of the system (2). Moreover the proposed method has the following noticeable features:

It is worth noting that such sequence corresponds to the series-parallel system-predictor configuration. An useful necessary condition for the convergence of the iteration (3) is given below.

Theorem - Given a function y(xl, x2,... xd) = F(xl, x2,... a) : iRd + X, bounded in the definition compact 2, c Xd, then, v P = {xl, x2,...xd} E v , v initial guess fio(x,),j = I , 2 ,...d, the d iterations

where i is the iteration index, are contractive only if 0 < a < 2ld.

1

By defining the approximation error in the ithiteration as

0 the algorithm constructs the vectors locally only for those regions of F whose experimental data are available; 0 the correct construction depends on the availability of many subsequent measurements of the same input-output pair even if not strictly consecutive; 0 the white noise with zero mean, always present in real application measurements, is time-integrated and thus cancelled by the iteration itself during the learning phase; 0 the quantization mechanism introduces a noise on the predicted system output whose power is strictly dependant

on the local shape of F.

A problem in the practical application of this method is represented by the high number of experimental samples needed to construct the vectors if M is large. Particularly, some vector elements could not completely converge despite a long algorithm execution time. The solution of the above problem is related to the well-known generalisation concept. The generalisation property allows an associative memory to handle associations never learned before, under

150

the assumption that these associations are neighbours of those previously learned. This feature represents a real advantage if a complete training is not allowed by the current problem constraints. The proposed simple memory structure has not generalisation properties, but the iteration (3) coincides with the monodimensional CMAC learning rule without both generalisation and hashing. Thus, if we make use of the monodimensional CMAC learning rule with a suitable generalization coeficient in place of (3), we can take full advantage of the generalisation properties.

Application Example A simple process prediction example will demonstrate the feasibility of the developed theory. In particular, the capability of the iteration (3) to construct the right f,’s will be highlighted. Simulation programs are written in PC-MATLAB and run on an IBM PC compatible machine. The iteration (3) i s applied to construct the predictor of the unknown plant described by the equations:

experimental data, here shown by the histogram in Fig. 5. The histogram shows the occurrence of experimental output values in the learning data set. The histogram of the input values is trivial since the input values covers uniformly the input range and it is not shown here. Out of the covered region the CMAC generalisation effect can be clearly seen: in the covered region the approximation of the fi’s is very good, but some approximation is also performed where the training data are missing. In order to verify the achieved prediction capabilities, the learning parameter is now set to zero and both the system and the predictor are fed with an input sequence not used in the training phase: the predictor output versus the system output is plotted in Fig. 6 for the same sinusoidal input.

Conclusions The authors are concerned with dynamical systems described by decomposable fuanctions as (2) in order to construct a fast and robust learning predictor. A novel

y(k + 1) = 1/4.cos[4/3n.y(k)]+ 1/4+in[4/3my(k - l)] +... ...+ l / 2 . ~ ( k = ) ~flb(k)] + f2ly(k - 1)] + f3[u(k>] In order to illustrate the system behaviour, the input versus output sequences can be seen in Fig. 2. The input and output range is [-1, 11. The quantization level parameter M is set to 256. Since n = 2 and m = 1, three vector suffice in representing fi, f2and f3, i.e. the static mapping. The upper bound for CL is given by the above theorem and in the present case a < 2/3. Nevertheless the possibility of a oscillatory even if convergent behaviour makes more expedient to choice a well above the theoretical bound. In the present case a = 0.1 was chosen. Here the algorithm was used in conjunction with three monodimensional CMAC’s with generalisation parameter = 64. A sequence of 5000 uniformly distributed random inputs in the range [-1, 11 is fed into the system during the learning phase. After a learning transient the error reaches a plateau which is due to a, to M and to the local slope of the estimated E’s: a high local slope in the f, amplifies as a gain the quantization noise. In Fig. 3 the rms prediction error curve is shown: each point represents the mean over 100 time steps. In Fig. 4 the final fl, f2 and f3 estimated after processing 5000 input-output pairs are shown. According to the theory exposed, the algorithm constructed the vectors representing fl, f2 and fi that are 1/4.cos[4/3my(k)] + c1, 1/4&[4/3my(k - l)] + c2 and l / 2 . ~ ( k )+~c3 respectively, all sampled in 256 equally spaced points from -1 to 1. In Fig. 4 the contents of the vectors approximating the fi’s versus the true 6’s are plotted: it results that CI = .025, c2 = -.025 and c3 = 0, satisfylng the constrain C,c, = 0, i = 1,2,3. In practice the value of such constants is

determined by the sequence of prediction errors in first iterations. Notice that the computed vectors are well representing the fi’s only in the interval covered by the

time steps Fig. 2. Example of system input-output behaviour: input (...) and output(-). I,,

1

O.)il; 0.8

04-

02

0

1 1000

2000

3000

4000

5000

Fig. 3. Rms prediction error curve over 5000 time steps

-04t.‘ -0 6 l

function approximators. The original function surface F can be reconstructed by summing the output of the neural networks instead of the original f, ‘s. So a dramatic memory reduction is achieved in respect to classic memory-based approximating methods. Since the CMAC learning algorithm is very similar to the proposed algorithm, few modlfications are then needed to perform the monodimensional function construction, so that in practice the above two steps are carried out at the same time. The experimental data samples can be supplied to the algorithm in a random or sequential way, depending on the constraints of the current problem, and neither there are needs of past samples complete storage nor it is necessary to construct the mapping globally. The substantial reduction of the data to be processed allows the algorithm to construct the representing vectors in a linear time in d, instead of the exponential time of classic memory-based mappings. Consequently, the proposed algorithm is very well suited for on-line applications, as required in modern control schemes. A convergence discussion and a computer simulation are provided. Furthermore, the algorithm shows a very good rejection of the zero mean noise on experimental data and a good behaviour in approximating functions not truly decomposable. The latter property is very desirable and is currently under theoretical investigations. First results are very promising.

I

061

1 1

-0 5

0

05

1

Fig. 4. Vectors representing fi, f, and f3 estimated (-) after 5000 time steps versus true 6’s (...).

1

-05

0

05

References

1

Fig. 5 . Histogram ofthe output range regions covered by experimental values in the training data set.

[ l ] C. G. Atkeson & D. J. Reinkensmeyer, “Using Associative ContentAddressable Memories to Control Robots”, in W. T. Miller 111, R. S. Sutton & P. J. Werbos, e&., Neural Networks for Control (London, England: MIT Press, 1990), 255-286. [2] Y.H. Pao: S.M. Phillips & D.J. Sobajic, “Neural-net computing and the intelligent control of system,” Inf.J. Control, 56, 1992, 263-289. [3] H. Tolle, P.C. Parks, E. Ersu, M. Home1 & J. Militzer, “Leaming Control with Interpolating Memories-General Ideas, Design Lay-Out, Theoretical Approaches an Practical Applications,” Int. J . Control, 56, 1992, 2913 17. [4] K. S. Narendra & K. ParthaSarathy, ”Identification and Control of Dynamical Systems Using Neural Networks,” IEEE Trans. on Neural Networks: 1: 1990, 4-27. [ 5 ] K. Hornik et al., “Layered Neural Networks can be Universal Function Approximators,” Neural Networks 2, 1989,359-366. [6] T. Chen & H. Chen, “Approximation of Continuous Functionals by Neural Networks with Application to Dynamic Systems,” IEEE Trans. on NeuralAVetworks,4, 1994, 910-918. [7] J. S . illbus, A new approach to manipulator control: The cerebellar model articulation controller (CMAC),” Trans. ASME .I. Dynamic Syst. Meas. Contr., 97. 1975a, 220-227. [XI J. S. Albus, “ Data Storage in the Cerebellar Model Articulation Controller (CMAC),” Trans. ASME J. Dynamic Syst. Meas. Contr., 97, 1975b, 228-233. [9] D. Ellison, “ On the Convergence of the Albus Perceptron,” IMA .J. Math. Controllnformation, 5 , 1988, 315-331. [lo] Y. Wong & A. Sideris, “Learning Convergence in the Cerebellar Model Articulation Controller,” IEEE Trans. on Neur. Network, 3 , 1992, 115-121. 1111 M. Brown & C. J. Hanis, “Comments on ‘Learning Convergence in the Cerebellar Model Articulation Controller”’. IEEE Trans. on Neur. Network, 6, 1995, 1016-1018. I‘

time steps Fig. 6. System ( 0 ) versus predictor (+) output for the same sinusoidal input.

iterative technique (IDEA) is proposed to extract the representation of the fl’s from experimental data on-fly. The proposed technique constructs d = n + m vectors whose entries are the M values assumed by the monodimensional f, functions according to the chosen quantization level. In order to take advantage of the generalisation properties, a second step is added: d monodimensional neural networks are trained on the obtained d sets of values and used as

152

Lihat lebih banyak...

A nonlinear system predictor from experimental data using neural networks

Descrição do Produto

Comentários