A posteriori tests to validate dimension estimates from time series

June 4, 2017 | Autor: Angelo Corana | Categoria: Engineering, Time Series, Mathematical Sciences, Electrocardiogram, Indexation, Noise Measurement, Correlation Dimension, Noise Measurement, Correlation Dimension

Share Embed

Denunciar este link

Descrição do Produto

Chaos, Solitons and Fractals 11 (2000) 2017±2030

www.elsevier.nl/locate/chaos

A posteriori tests to validate dimension estimates from time series Aldo Casaleggio *, Angelo Corana ICE-CNR, Via De Marini 6, 16149 Genova, Italy Accepted 13 May 1999

Abstract A posteriori tests are proposed to evaluate the degree of reliability of the estimate of dimension from a time series, using the method of correlation integrals. Although we consider in particular the correlation dimension (D2 ), our procedure can be applied to any generalized dimension as well as to the point-wise dimension. We propose the computation of two indices that can quantify to what degree the derivative of the log±log plot is constant with the correlation length and with the embedding dimension in the scaling region. An organized set of trials has been performed on time series from known model attractors, with dierent fractions of added measurement noise. On the basis of these trials, we found threshold values for the indices that discriminate between reliable and unreliable D2 estimates. In the last part of the work we apply our procedure to real signals (electrocardiograms), ®nding good accordance between index values and the amount of noise in the time series. Ó 2000 Elsevier Science Ltd. All rights reserved.

1. Introduction Nonlinear time series analysis is a powerful tool for obtaining information on the dynamical system generating the signal. Increasing research interest is being focused on the characterization of systems through the estimation of dimension, Lyapunov exponents and Kolmogorov entropy [1,2]. The estimation of such parameters presents some diculties [3] that essentially lie in the possible dependence of results on the particular choices of working variables and in the need for operator intervention in some steps of the procedure. These problems are pointed out by Glass and Hunter [4]: ``. . . the lack of good operational de®nition for identi®cation of fractals and chaos leads sensible people to reach diering conclusions about the same data.'' Therefore the need becomes evident to de®ne objective criteria able to evaluate the reliability of the estimates [5]. In the present work we deal in particular with the estimation of dimension, which, as is well known, is related to the minimum number of variables needed to model the dynamics. Our results refer to correlation dimension [6,7], but can be easily applied to the computation of other dimensions such as the generalized and point-wise dimensions. Dimension estimation depends on several factors [5]: the number of considered points (N) [8], sampling frequency (fs ), time-delay used in the reconstruction of the attractor (s), embedding dimension (m), and the possible presence of noise in the signal. Such estimation must therefore be carried out with care, choosing

*

Corresponding author.

0960-0779/00/$ - see front matter Ó 2000 Elsevier Science Ltd. All rights reserved. PII: S 0 9 6 0 - 0 7 7 9 ( 9 9 ) 0 0 0 9 3 - 4

2018

A. Casaleggio, A. Corana / Chaos, Solitons and Fractals 11 (2000) 2017±2030

suitable values of working variables; often it is convenient to repeat computation using dierent values for those variables. The aim of the present work is to propose two reliability indices for dimension estimation: the ®rst index (a) quanti®es the dependence of D2 on the embedding dimension, while the second (b) quanti®es the degree of linearity of the log±log plot in the scaling region. These indices are obtained during the procedure of dimension computation, without extra computational eort.

2. The method 2.1. Dimension estimations Given the mono-dimensional time series N

s fxi gi1

constituted by Ns values sampled with time interval Dt (i.e. sampling frequency fs 1=Dt), we reconstruct the trajectory xNi1 ;

xi 2 Rm ; N Ns ÿ m ÿ 1 s

in the state space of dimension m (embedding dimension) with the time-delay procedure [9]; s is the timedelay expressed in number of samples. Dimensions are usually computed by employing the method of correlation integrals. The point correlation integral is de®ned as Cj m; r

N X 1 Hr ÿ jxi ÿ xj j; N ÿ 1 i1;i6j

1

where r is the correlation length and H is the Heaviside function. When N ! 1, Cj m; r converges to the local density function, i.e. gives the probability of a randomly chosen point having a distance from xj lower than r. We computed distances using the euclidean norm. The generalized correlation integrals C q m; r can therefore be expressed as " q

C m; r

N 1 X Cj m; rqÿ1 N j1

#1=qÿ1 :

2

Using these correlation integrals it is possible to compute the generalized dimensions (of order q) Dq [7,10]. In most applications the D2 or correlation dimension (q 2) is used. On the other hand, considering directly the point correlation integral we obtain the point-wise dimension [7], which seems to oer great potential since it gives local information, not averaged over all points of the attractor. Generally speaking, the procedure for the computation of dimension with the method of correlation integrals requires the computation of the derivative d lnC q m; r=d lnr (or d lnCj m; r=d lnr) as a function of lnr. We denote this derivative with f q m; r. If an interval of r (scaling region) exists in which f q m; r k for m P minf , then the underlying system is characterized by a q-order dimension Dq k. In practice, a scaling region about half a decade wide is required. In the following we will consider only the D2 , because it is widely employed and most of our experiments and results concern D2 . We wish to point out, however, that our considerations can be applied to the other generalized dimensions and to the point-wise dimension as well. The procedure for estimating D2 therefore requires computation of the correlation integral (for simplicity sake we omit the apex 2)

A. Casaleggio, A. Corana / Chaos, Solitons and Fractals 11 (2000) 2017±2030

2019

Fig. 1. Typical plots of: (a) lnCm; r; (b) d lnCm; r vs. lnr for the Henon attractor. Fig. 2. D2 m vs. m for the Henon, Lorenz and Rossler time series with dierent fractions of measurement noise; continuous line noise free; dots 5%; small-dashes 10%; medium-dashes 15%; large-dashes 30%.

Cm; r

N ÿ1 X N X 2 Hr ÿ jxi ÿ xj j: N N ÿ 1 j1 ij1

Fig. 1 shows a typical plot of lnCm; r and its derivative against lnr. 2.2. The procedure As already stated, in the ideal case we have in the scaling region f m; r k for m P minf , and this value is assumed as the D2 . We can observe this behavior in the case of simple mathematical models (such as sinusoid, Henon, Lorenz and Rossler systems) in the absence of noise. Instead, in several cases we observe that the situation is less straightforward: the scaling region can be very small or dicult to identify, f m; r can vary both with r and with m, and the interval of r most closely resembling a scaling region can only be considered as a `test scaling region'. There may be several causes of such behavior, the main ones being: correlation integral computed using unsuitable working variables (fs ; N ; m; s, etc.), underlying dynamics intrinsically too complex (high-dimension) or stochastic, non-uniform distribution of points on the attractor, presence of noise. Manual intervention is needed to select the interval that most closely resembles the scaling region, and D2 estimation is then carried out on it. The indices of reliability that we propose are useful for checking a posteriori whether the test scaling region was suciently `good' and therefore if the obtained D2 estimate is reliable.

2020

A. Casaleggio, A. Corana / Chaos, Solitons and Fractals 11 (2000) 2017±2030

Moreover, the proposed method is suitable for inclusion in a fully automatic procedure of computation [11], in which various test scaling regions can be examined and the indices can be used to select the best one. In the following we brie¯y report the proposed procedure: compute Cm; r for m 2; . . . ; mmax compute f m; r for m 2; . . . ; mmax test scaling region determination, with extrema r1 and r2 do m minf ; msup compute D2 m hf m; ri, r 2 r1 ; r2 best ®t f m; r in r1 ; r2 , and ®nd the slope bm and the sum of squared errors errb m enddo compute D2 hD2 mi and rD2 , m minf ; . . . ; msup best ®t D2 m and ®nd the slope a, m minf ; . . . ; msup compute b hjbmji and errb herrb mi, m minf ; . . . ; msup h i denotes averages, r is the standard deviation and jxj indicates absolute values. The derivative f m; r is computed by best-®tting Cm; r over 5 points. mmax is the maximum embedding dimension we consider. We recollect that there exist optimized algorithms for D2 computation which, during the computation of the correlation integral for the highest embedding dimension, allow us to obtain the correlation integrals for all the lower embedding dimensions without extra computational work. In particular, we used the algorithm described in [12]. The embedding dimensions minf and msup are chosen such that minf is large enough not to constrain the attractor and msup 6 mmax . As shown by from the procedure, indices a and b quantify the lack of constancy of f m; r with m and r, respectively in the test scaling region. An important aim of our work is to carry out various trials using time series from various systems in order to ®nd out ranges and thresholds of a and b that can measure the validity of the test scaling region and therefore the D2 estimation. To ®nd out such intervals of a and b, in the case of attractors with known D2 , we tried to relate the values of a and b indices with the relative error in the D2 estimate, de®ned as ref

D2 ref

where D2

jD2 ÿ D2

j rD2

ref D2

;

3

denotes the reference value of D2 , as reported in Table 1.

Table 1 ref The model systems considered and their reference D2 valuea System

D2

fs

s

yi xi1 1 ÿ yi1 bxi x_ ry ÿ x y_ xR ÿ z ÿ y z_ xy ÿ bz x_ ÿy ÿ z y_ x ay z_ b zx ÿ c

a 1:4 b 0:3 r 10 R 28 b 8=3 a 0:15 b 0:2 c 10

1.21

1

1

2.06

10

1

2.01

1

1

Noise

x0 seed xk a xkÿ1 b mod c=c

±

1

1

H-torus

htk t k 1; 5

seed: users supplied a 67 081 293; b 14 181 771; c 226 p fk pk ; pk kth prime number

k

4 fk

2

Lorenz Rossler

a

ref

Parameters

Henon

Equations ax2i

Pk

i1

sin2pfk t;

The sampling frequency fs and the time-delay s used for the attractor reconstruction are also reported.

A. Casaleggio, A. Corana / Chaos, Solitons and Fractals 11 (2000) 2017±2030

2021

2.3. a: in¯uence of the embedding dimension on the D2 estimation The a index measures the D2 variation with the embedding dimension, and we suppose that the variation is linear with m D2 m a m k1 ;

m P minf ; r 2 r1; r2:

In the ideal case a 0; however, we often obtain a positive and sometimes quite high value of a, especially with real signals. We think this could be related on the presence of noise (additive or dynamic) in the system. Considering an embedding space of dimension m and an attractor of dimension D < m, the attractor remains con®ned in a D-dimensional subspace; on the contrary, noise occupies all the available space. Therefore, we expect to obtain a ' 1 when pure random noise is considered, and 0 < a < 1 for signals aected by noise. In general, increasing a values are thought to be due to increasing noise level, as it is con®rmed by the results shown in Fig. 2. We proposed this idea in [13], where we analyzed real electrocardiographic (ECG) signals for which information about the quantity of noise was available, and observed (Table A1 in [13]) that there was often a good correlation between high values of a (a > 0:2) and signals with high noise. Finally, we note that the error on the ®tting of D2 m is not considered because the number of points (values of m) is usually very small (typically between 5 and 10). 2.4. b: test of the scaling region linearity Ideally, in the scaling region, f m; r should be exactly constant; in practice, this does not often happen. Our aim is therefore to quantify to what extent the situation diers from the ideal one. This is obtained by approximating f m; r with a straight line using the best-®tting approach, i.e. f m; r ' bm r k2 ;

r 2 r1; r2; m minf ; . . . ; msup :

The b and errb values are computed by averaging the absolute values of bm and errb m (sum of squared errors) over all the considered embeddings. In the ideal case, b should be zero or very small, with null or very small error; the less constant f is in the scaling region, the higher b and errb will be, making estimation of dimension dicult. If b is high and errb is small, it means that the log±log plot is quadratic (there is no scaling region). In general, high errb values mean that f m; r cannot be ®tted to a straight line with a good approximation. This can also happen when b is small, in two situations: (i) if f m; r presents large, rapid displacements around a constant trend, and in this case the scaling region could be reliable; (ii) in the presence of a knee in the log±log plot, and in this latter case the scaling region is unreliable.

3. Systems and signals The proposed method has been applied to time series generated by the Henon, Lorenz and Rossler systems, both in the absence of noise and with the addition of uniform noise (additive noise, not dynamic noise), in fractions of 5%, 10%, 15% and 30% (with respect to the peak-to-peak of the signal). To verify the results that the method yields in the case of systems of medium-high dimension, we considered hyper-tori obtained by summing from 1 to 5 sinusoids (ht1 ; . . . ; ht5 ) with irrationally related frequencies. Various trials have shown that satisfactory results are obtained if s 2 and fs ' 2 fN , fN being the Nyquist frequency. Also in this case we analyze the signal both without noise and with the addition of 15% of noise.

2022

A. Casaleggio, A. Corana / Chaos, Solitons and Fractals 11 (2000) 2017±2030

Finally, we considered pure uniform noise. We chose uniform noise since it is more suitable for simulating the error due to the quantization carried out during the analog-to-digital conversion in the acquisition of real signals. Table 1 reports all the information about the considered model systems and the random number generator we used for noise. In all cases we reconstruct the dynamics from the ®rst variable. The procedure has been applied to a few real signals (electrocardiograms) to verify the usefulness of the method in situations more complicated than those that normally occur when time series generated by models are analyzed. We considered three ECG recordings (mit#105, mit#106 and mit#207) contained in a commercially available database (MIT-BIH arrhythmia database [14]). The ®rst two are tracings obtained from normal heart activity (NSR normal sinus rhythm), while the third is obtained from a subject with episodes of ventricular ®brillation (VF) and ¯utter. For each recording we analyze two segments lasting about 30 s (Ns 10 000, fs 360 Hz) to satisfy both the requirements of (i) stationary dynamics, and (ii) minimal number of orbits needed for a reliable D2 estimation. For recordings 105 and 106 the ®rst segment is noise free and the second is noisy (as reported in [14]); for subject 207 the ®rst segment refers to normal heart activity and the second is obtained during VF. In the context of the present work these real data have been analyzed with the purpose of showing how the noise present in real signals can aect the a and b indices. Another ECG recording (namely mit#117, which refers to normal heart activity) is considered in Section 5.3, in the context of the possible use of the reliability indices as a posteriori test of stationarity. To assess if the signal under investigation can really be considered the output of a low-dimensional system, we also analyze two kinds of surrogate signals: (a) the ®rst derivative of the signal (approximated by the incremental ratio), to verify that the number of degrees of freedom does not change, and (b) phase randomization, which yields the signal constituted by colored noise that most closely resembles the original one (the power spectrum remains unchanged, whereas phases become random with gaussian distribution) [15]. We wish to point out that among the various possible surrogate signals, we chose two with opposing characteristics: the derivative of the signal should not signi®cantly change the estimated dimension, whereas phase randomization completely destroys the underlying dynamics and the expected result is the absence of scaling region. Since the derivative of a signal is very similar to a high-pass ®ltering, we expect that, if the original signal is aected by noise, the derivative will increase the in¯uence of noise, thus making correct dimension estimation more dicult. Surrogate analysis has been applied to Henon, Lorenz and Rossler signals and to electrocardiograms; we did not apply phase randomization to hyper-tori since in this case it would cause only a phase-displacement without changing the function and thus the underlying dynamics. Concerning the number of points, all signals have been analyzed with Ns 10 000, with the exception of hyper-tori that have been analyzed with Ns 20 000. 4. Results Tables 2 and 3 cover the trials involving the reference attractors (without and with noise, respectively) and the pure random noise. We can state that the Henon system (discrete map) is very sensitive to noise because with 5% of noise the D2 estimates we get are already considerably inaccurate (D2 > 30%). However, a and b do not assume suciently high values to indicate clearly the low reliability of the estimate. The derivative of the signal shows a considerable increase in D2 ; a does not change signi®cantly, while b increases quite notably. As expected, phase randomization dramatically increases the estimated D2 value, while D2 , a and b assume very high values. The Lorenz attractor turns out to be less sensitive to noise; in fact, even with fractions of noise up to 15%, D2 is less than 10±20%. Appropriately, a and b slightly increase with noise. The derivative signal yields

A. Casaleggio, A. Corana / Chaos, Solitons and Fractals 11 (2000) 2017±2030

2023

Table 2 D2 estimates and related parameters for the reference attractors without noise and with varying fractions of uniform additive noise, and for pure noise a System

Noise fraction (%)

Test scaling region

D2

D2 (%)

a

b

errb

Henon Henon Henon Henon Henon

0 5 10 15 30

)2.2 )0.8 )0.2 0.0 0.3

)1.7 )0.3 0.3 0.5 0.8

1:23 0:02 1:47 0:13 1:91 0:29 2:08 0:41 2:71 0:75

3 32 82 106 186

0.00 0.04 0.09 0.13 0.24

0.09 0.20 0.46 0.15 0.44

0.00 0.01 0.02 0.01 0.23

Lorenz Lorenz Lorenz Lorenz Lorenz

0 5 10 15 30

0.5 2.3 2.3 2.5 3.1

1.0 2.8 2.8 3.0 3.6

2:02 0:07 1:84 0:17 2:00 0:27 2:13 0:39 1:98 0:77

5 19 16 22 41

0.02 0.05 0.08 0.12 0.24

0.10 0.13 0.39 0.76 1.49

0.00 0.01 0.01 0.05 0.15

Rossler Rossler Rossler Rossler Rossler

0 5 10 15 30

1.0 1.5 2.0 2.3 3.3

1.5 2.0 2.5 2.8 3.8

1:92 0:02 2:13 0:13 2:31 0:30 2:40 0:49 1:38 0:45

5 12 30 44 54

0.01 0.04 0.09 0.15 0.13

0.06 0.25 0.41 1.05 1.28

0.00 0.01 0.01 0.02 0.08

±

0.5

1.0

5:65 3:55

±

0.82

7.52

0.69

Noise a

Ns 10 000, minf 4 and msup 12.

Table 3 Errors on D2 estimates and index values for surrogate signals (derivative and phase randomization) of reference attractors Noise

Derivative

fraction (%)

D2 (%)

a

b

errb

D2 (%)

a

b

errb

Henon Henon Henon Henon Henon

0 5 10 15 30

6 43 104 151 264

0.00 0.02 0.06 0.12 0.37

0.05 0.92 0.76 0.72 0.96

0.00 0.06 0.01 0.03 0.03

664 662 673 667 693

0.89 0.88 0.90 0.89 0.92

2.87 2.86 2.96 2.80 2.80

0.46 0.62 0.62 0.38 0.47

Lorenz Lorenz Lorenz Lorenz Lorenz

0 5 10 15 30

2 29 53 73 164

0.00 0.05 0.14 0.24 0.52

0.08 0.20 0.75 0.91 3.14

0.00 0.01 0.02 0.06 0.01

126 138 142 173 258

0.45 0.48 0.49 0.57 0.76

2.41 2.55 2.75 3.26 3.99

0.00 0.01 0.00 0.01 0.03

Rossler Rossler Rossler Rossler Rossler

0 5 10 15 30

2 10 27 35 41

0.00 0.02 0.08 0.13 0.08

0.17 0.13 0.51 0.73 0.92

0.01 0.01 0.00 0.01 0.07

57 58 57 56 52

0.11 0.11 0.11 0.11 0.13

1.29 1.31 1.31 1.35 1.60

0.03 0.03 0.03 0.03 0.03

System

Phase randomization

a signi®cantly greater error in D2 value, and a corresponding increase in a and b values. The same remarks reported for the Henon system hold for phase randomization, although the increase in the error on D2 and in a is lower. Fig. 3(a)±(c) show the function f m; r for the Lorenz attractor and its surrogates. The Rossler attractor displays a more complex behavior. The log±log plot obtained from the original signal shows the existence of two scaling regions (Fig. 4(a)); the addition of a large fraction of noise causes the one on the left, the more signi®cant of the two, to disappear, and therefore we consider the second, smaller one, which gives an underestimate of the D2 value (see Table 2). The derivative signal yields results which are very similar to those obtained with the original signal (see Fig. 4(b)); with phase randomization the estimated values of D2 remain low (D2 ' 1:3±1:5) (see Fig. 4(c)), a remains low and b increases signi®cantly but much less than in the Henon and Lorenz systems. Such behavior could be due to the strong periodic component of the Rossler system.

2024

A. Casaleggio, A. Corana / Chaos, Solitons and Fractals 11 (2000) 2017±2030

Fig. 3. Plots of f m; r vs. lnr obtained from Lorenz time series with additive noise (10%) and related surrogates: (a) original; (b) derivative; (c) phase randomized. Fig. 4. Plots of f m; r vs. lnr obtained from noise free Rossler time series and related surrogates: (a) original; (b) derivative; (c) phase randomized.

The analysis of pure random noise gives very high values of a and b (see Fig. 5); in particular, a tends towards 1, as can be expected. Dealing with the hyper-tori (see Table 4) we can observe that: (i) with the considered number of points, we obtain not overly high errors in the D2 estimate; (ii) both a and b increase with k. Analysis of the hyper-tori with added noise shows that the error in the D2 estimate increases in the noisy time series; correspondingly, we always observe an increase in the a and b values. We also analyzed the derivative of hyper-tori with and without noise; the results obtained (see Table 5) do not dier signi®cantly from those we get from the original signals. 5. Discussion 5.1. Choice of minf and msup As already stated, a depends on the choice of minf and msup . On the other hand, we are interested in the study of attractors of unknown dimension. So, taking into account that in the reconstruction of an at-

A. Casaleggio, A. Corana / Chaos, Solitons and Fractals 11 (2000) 2017±2030

2025

Fig. 5. Plots of f m; r vs. lnr for pure uniform noise.

Table 4 D2 estimates and related parameters for hyper-tori without noise and with additive noise (15%)a

a

System

Noise fraction (%)

Test scaling region

D2

D2 (%)

a

b

errb

ht1 ht1

0 15

)1.0 0.0

)0.5 0.5

1:00 0:00 1:14 0:05

1 19

0.00 )0.01

0.01 0.22

0.00 0.01

ht2 ht2

0 15

)1.0 0.3

)0.5 0.8

2:06 0:04 2:46 0:20

5 33

)0.01 0.04

0.18 0.66

0.66 0.02

ht3 ht3

0 15

0.0 0.8

0.5 1.3

3:09 0:20 3:33 0:83

10 39

0.02 0.23

0.36 1.00

1.70 0.06

ht4 ht4

0 15

0.0 1.1

0.5 1.6

3:90 0:64 3:52 1:42

19 48

0.19 0.44

0.53 2.01

2.27 0.07

ht5 ht5

0 15

0.5 0.8

1.0 1.3

4:80 0:93 5:42 1:86

23 46

0.25 0.58

1.06 1.84

0.77 0.04

Ns 20 000, minf 4 and msup 12.

Table 5 Errors on D2 estimates and index values for derivative of hyper-toria

a

System

Noise fraction (%)

D2

ht1 ht1

0 15

0 13

ht2 ht2

0 15

ht3 ht3

a

b

errb

0.00 0.01

0.00 0.14

0.00 0.00

8 41

ÿ0.02 0.17

0.09 1.81

0.65 0.31

0 15

14 31

0.06 0.19

0.52 0.83

0.16 0.04

ht4 ht4

0 15

22 37

0.21 0.36

0.44 1.14

0.14 0.03

ht5 ht5

0 15

26 33

0.26 0.49

0.76 1.37

0.76 0.07

Ns 20 000, minf 4 and msup 12.

tractor of dimension D an embedding dimension m P D 1 is needed [6,9], we can adopt two dierent approaches. The ®rst approach requires the setting an minf value high enough to allow the analysis of systems with medium-high D2 ; this was adopted in [13]. The problem with this approach is that it does not

2026

A. Casaleggio, A. Corana / Chaos, Solitons and Fractals 11 (2000) 2017±2030

consider D2 m drift that might have already occurred for m < minf (see Fig. 2). This error may yield a signi®cant D2 overestimate which we cannot monitor by means of the a index. For this reason, an overly high minf can lead to large errors in D2 estimates, even though it helps to avoid constraint of the attractor. In the second approach, which we used in this work, we are interested in the identi®cation of low-dimension attractors. Thus we ®x minf 4 and msup 12. If the underlying dimension is higher than three, we have constraint of the attractor in the case of D2 minf , and this may cause error; however, as D2 is obtained as an average of D2 m between minf and msup , we expect the error not to be too large, as was veri®ed in practice. Indeed, even when we considered hyper-tori with a dimension up to 5, we obtained quite reasonable dimension estimates (see Tables 4 and 5). If we get high a values and expect a deterministic dynamic (although with medium-high dimension), it may be useful to repeat the analysis with higher minf and msup values. If this choice yields a signi®cantly reduced a value, this might indicate the presence of an underlying high-dimensional determinism. If the underlying dynamic is random and the number of points N is suciently high, a does not change much with higher minf and msup values. Thus, if a remains practically the same, we may expect a stochastic underlying dynamic or a deterministic dynamic hidden by noise that cannot be recognized by means of the correlation dimension. The dependence of b on minf and msup is weaker. Nevertheless, we have observed that an embedding space of lower dimension gives a smoother derivative of the log±log plot. We suppose this may be due to a very widespread distribution of points in an embedding of large dimension m. We observed an increasing b value with increasing minf ±msup range. 5.2. Thresholds for a and b From the set of trials we observed that a varies from 0 to 0.9 (negative a values with large jaj indicates that some problem has occurred, see Section 6); b has a larger dynamic and varies from 0 to about 7. b is very sensitive to the ¯atness of the scaling region: the less ¯at the scaling region, the higher the b value. We have obtained b > 0:6 even for visually acceptable scaling regions (see Fig. 3(b)). We propose that those test scaling region with b > 0:5±0:7 should be considered as unreliable. For the reasons explained at the end of Section 2.4, we prefer in this paper to consider the errb value only as a qualitative information. In order to determine an appropriate threshold for a and b that can measure D2 reliability, we proceed as follows: Firstly we decided to consider that reliable D2 estimates are those with an error D2 6 20%: This error might be considered too high, but this is justi®able because: (a) this work aims to the analysis of real time series that are almost always aected by noise, thus sharp estimates are very dicult to be obtained; (b) D2 de®nition (see Eq. (3)) takes into account the standard deviation of the D2 , which can be large especially when real systems are considered. The next step was to select (from the set of analyzed model attractors in Tables 2±5) all D2 estimates leading to an error D2 6 20% and to report the respective a and b values. We set the thresholds ath and bth as the maximum a and b values in the subset, obtaining ath 0:2

and

bth 0:6:

Finally we counted the number of trials with a 6 0:2 and b 6 0:6 for which D2 > 20%. As reported in Tables 2±5, this occurs in six cases. Summary of our results leads to the following report: · total number of trials 65, · number of trials with D2 6 20% 19, · number of trials with D2 > 20% 46, six of which are erroneously considered as reliable by the proposed a and b indices when ath 0:2 and bth 0:6: Since the threshold values for a and b are obtained from a quite large set of trials, we propose to consider as reliable those D2 estimates (in general computed without any a priori information) with index values:

A. Casaleggio, A. Corana / Chaos, Solitons and Fractals 11 (2000) 2017±2030

2027

a 6 0:2 and b 6 0:6: Although we are aware that this work may be considered as a rough early attempt at a systematic determination of the correlation dimension, we believe our results represent a solid step in this direction, especially when considering that the problem of de®ning reliability tests for D2 estimates is not often tackled in the literature. Improvements can be achieved by introducing these indices in a fully automatic correlation dimension determination algorithm, following the method already proposed in [11]. 5.3. Correlation dimension reliability and stationarity of the underlying dynamics Although this topic requires more analysis, we wish to speculate about a possible use of the reliability indices a and b to investigate the stationarity of the underlying dynamics. The idea is very simple and it is based on the well-known fact that a greater number of data points can lead to a clearer scaling region and therefore to a more reliable estimate of D2 , but only if the time series is approximately stationary. An example of such behavior can be seen by considering a signal from a model attractor and a real ECG signal. In Fig. 6 we compare the f m; r obtained from the Lorenz attractor (panels (a) and (b)) and from the recording mit#117 (panels (c) and (d)). Panels (a) and (c) are computed with N 2000, panels (b) and (d) are computed with N 16 000, starting from the same point. The Figure shows that as N increases, for both signals, f m; r becomes smoother for every m. However, as N increases, the overall scaling region becomes clearer for the Lorenz time series, but it gets worse for the ECG signal. This result can be quanti®ed by the a and b indices. In particular, for the Lorenz signal a and b remain almost constant: in the scaling region 0:1 6 lnr 6 1, we obtained a ' 0:03 and b ' 0:1.

Fig. 6. Comparison between f m; r obtained with dierent N values for the Lorenz attractor ((a) and (b)) and for the mit#117 electrocardiogram ((c) and (d)); (a) and (c): N 2000, (b) and (d): N 16 000.

2028

A. Casaleggio, A. Corana / Chaos, Solitons and Fractals 11 (2000) 2017±2030

The analysis of the ECG signal reveals a change, which is more pronounced for the b index. Precisely, in the scaling region 3:2 6 lnr 6 3:8 we obtain a 0:07 and b 0:34 for N 2000, and a 0:10 and b 0:62 for N 16 000. Given our thresholds we cannot consider the D2 estimate obtained with N 16 000 as reliable. Such behavior may be due to a lack of stationarity. 6. Real signal analysis Real signal analysis has been performed using the electrocardiograms described in Section 3. Tables 6 and 7 collect the results referring to the original and surrogate signals, respectively. Table 6 shows that the b value is signi®cantly higher when the signals are largely aected by noise or generated by an underlying high-dimensional dynamics (the ®brillation condition). In these cases we also observe an increment in the a index. In contrast with the results obtained from model attractors, the derivative of the original signals gives much lower than expected D2 estimates. This is due to the presence of a quite clear interval of r (selected as test scaling region), immediately before the saturation of the log±log plot, where the log±log plot is ¯at, and therefore its derivative is null (see for example Fig. 7(c)). Such behavior indicates the absence in the phase space of pairs of points with a distance in that range (a sort of prohibited range for the correlation integral). This may follow from the ECG morphology, which has sharp peaks, enhanced by the derivative operation. Such a scaling region can be considered as an artifact, but this cannot be monitored by a and b, which remain very small. Since we can consider as a trivial case the ®xed point attractor (which has a dimension D 0 by de®nition), a way of solving this problem might be to consider as artifacts all those cases where D2 < 1. Finally, analysis with phase randomized signals yields results in agreement with expectations: D2 values signi®cantly increase and we have a > 0:2 and b > 0:6. The particular case of mit#207 is shown in greater detail in Fig. 7. A similar pattern can be seen in the other analyzed cases. It is evident from Fig. 7(e) that no clear scaling region can be seen when ®brillation

Table 6 D2 estimates and index values from ECG signalsa ECG #

Start

Description

Test scaling region

D2

a

b

errb

105 105 106 106 207 207

15 min 16 sec 22 min 02 sec 1 min 37 sec 10 min 52 sec 10 min 00 sec 26 min 19 sec

NSR, NSR, NSR, NSR, NSR, VF

3.0 5.0 3.7 3.0 4.0 4.5

1:93 0:31 1:61 0:34 1:21 0:33 4:09 1:22 1:24 0:17 1:99 0:84

0.09 0.11 0.10 0.38 0.04 0.27

0.47 0.68 0.52 1.77 0.34 1.91

0.01 0.04 0.03 0.96 0.02 0.00

clean noisy clean noisy clean

3.5 5.5 4.2 3.5 4.5 5.0

a

For each subject the starting time from the beginning of the recording and a short description are reported. Ns 10 000, minf 4 and msup 12.

Table 7 D2 estimates and index values from surrogates (derivative and phase randomization) of the ECG signalsa ECG #

a

Start

Derivative

Phase randomization

D2

a

b

errb

D2

a

b

errb

105 105

15 min 16 sec 22 min 2 sec

0:20 0:02 1:13 0:34

0.01 0.11

0.28 0.58

0.00 0.05

3:61 1:48 3:48 1:31

0.47 0.41

2.09 1.81

0.16 0.13

106 106

1 min 37 sec 10 min 52 sec

0:35 0:13 3:01 1:10

0.04 0.35

0.26 2.20

0.00 0.09

3:90 1:80 4:63 2:42

0.57 0.77

2.95 5.19

0.09 0.18

207 207

10 min 00 sec 26 min 19 sec

0:27 0:02 5:73 2:30

0.01 0.73

0.19 2.66

0.01 0.26

2:89 1:12 3:68 1:38

0.36 0.43

1.83 2.01

0.00 0.06

Signals are in the same order as in Table 6, Ns 10 000, minf 4 and msup 12.

A. Casaleggio, A. Corana / Chaos, Solitons and Fractals 11 (2000) 2017±2030

2029

Fig. 7. Comparison of two segments from mit#207 tracings of NSR (a) and VF (d); panels (b) and (e) plot the corresponding f m; r vs. lnr; panels (c) and (f) show f m; r vs. lnr for the derivative of the original tracings.

tracing is considered. This is con®rmed by the high values of a and b as reported in the table. Following our procedure, we can therefore state that the test scaling region is unreliable. Fig. 7(e) also explains how we can sometimes get negative a values with large jaj: if we choose a test scaling region centered around ln 3 we obtain negative a values, since f m; r is higher for smaller embeddings. This typically happens when the log±log plot presents knees. 7. Conclusions In this paper we have proposed an a posteriori test to evaluate the degree of reliability of correlation dimension estimates from a time series. Our procedure can be applied to any generalized dimension, as well as to the point-wise dimension.

2030

A. Casaleggio, A. Corana / Chaos, Solitons and Fractals 11 (2000) 2017±2030

We propose the computation of two indices: the ®rst, a, for monitoring the dependence of the D2 estimation on the embedding dimension, and the second, b, which is sensitive to the ¯atness of the scaling region in which D2 is computed. An organized set of trials has been performed on time series from known model attractors, with dierent fractions of added measurement noise. In addition, surrogate tests have also been taken into account; in particular we considered derivative and phase randomization of the signal. On the basis of these trials, we found threshold values for the indices, allowing us to discriminate between `good' and `bad' scaling regions and correspondingly between reliable and unreliable D2 estimates. Analyzing a set of 65 time series, we found that only in six cases were the proposed indices unable to recognize D2 estimates with D2 > 20% as unreliable. In the ®nal part of the work we applied the procedure to electrocardiograms obtained from a commercially available database (MIT-BIH arrhythmia database), which contains tracing with varying signal quality. We analyzed the dierences that occur when segments of the signal are very clean as opposed to those that are very noisy and found a good accordance between index values and amount of noise in the time series. We consider these results an intermediate step toward the development of a fully computerized procedure for reliable dimension estimates from time series in the absence of any further a priori information. Acknowledgements The authors are very grateful to Prof Arun Holden for his advice and comments on this paper. References [1] Ott E, Sauer T, Yorke JA. Coping with chaos. New York: Wiley, 1994. [2] Abarbanel HDI, Brown R, Sidorowich JJ, Tsimring LS. The analysis of observed chaotic data in physical systems. Rev Mod Phys 1993;65:1331±92. [3] Rapp PE. Chaos in the neurosciences: cautionary tales from frontier. Biologist 1993;40:89±94. [4] Glass L, Hunter P. There is a theory of heart. Physica D 1990;43:1±16. [5] Isliker H. A scaling test for correlation dimensions. Phys Lett A 1992;169:313±22. [6] Ding M, Grebogi C, Ott E, Sauer T, Yorke JA. Estimating correlation dimension from a chaotic time series: when does plateau onset occur. Physica D 1993;69:404±24. [7] Gershenfeld NA. Dimension measurements on high-dimensional systems. Physica D 1992;55:135±54. [8] Eckmann JP, Ruelle D. Fundamental limitations for estimating dimension and Lyapunov exponents in dynamical systems. Physica D 1992;56:185±7. [9] Sauer T, Yorke JA, Casdagli M. Embedology. J Stat Phys 1991;65:579±616. [10] Kurths J, Herzel H. An attractor in a solar time series. Physica D 1987;25:165±72. [11] Bortolan G, Casaleggio A. Correlation dimension analysis of long term ECG: Study with a computerized determination of scaling regions. In: Murray A, Arzbaecher R, editors, Computers in Cardiology 1994. Los Alamitos: IEEE Computer Society, 1994. p. 713±16; Casaleggio A, Bortolan G. Automatic estimation of the correlation dimension for the analysis of electrocardiograms. Biol Cybern 1999;81:279±90. [12] Corana A, Casaleggio A, Rolando C, Ridella S. Ecient computation of the correlation dimension from a time series on a LIW computer. Parallel Comput 1991;17:809±20. [13] Casaleggio A, Corana A, Ridella S. Correlation dimension estimation from electrocardiograms. Chaos, Solitons & Fractals 1995;5:713±26. [14] The MIT-BIH arrhythmia database tape directory and format speci®cation. BMEC TR010. MIT Division of Health Sciences and Technology, 1982; CD-ROM edition (1992). [15] Provenzale A, Smith LA, Vio R, Murante G. Distinguishing between low-dimensional dynamics and randomness in measured time series. Physica D 1992;58:31±49.

Lihat lebih banyak...

A posteriori tests to validate dimension estimates from time series

Descrição do Produto

Comentários