A robust transform domain echo canceller employing a parallel filter structure

Share Embed


Descrição do Produto

ARTICLE IN PRESS

Signal Processing 86 (2006) 3752–3760 www.elsevier.com/locate/sigpro

A robust transform domain echo canceller employing a parallel filter structure Jiaquan Huoa,b, Ka Fai Cedric Yiub,, Sven Nordholma, Kok Lay Teoc a

Western Australian Telecommunications Research Institute (WATRI), A joint venture between The University of Western Australia and Curtin University of Technology, Perth, Australia b Department of Industrial and Manufacturing Systems Engineering, The University of Hong Kong, Pokfulam Road, Hong Kong, PR China c Department of Mathematics and Statistics, Curtin University of Technology, Perth, Australia Received 9 August 2005; received in revised form 5 March 2006; accepted 8 March 2006 Available online 6 April 2006

Abstract The proper control of acoustic echoes is of vital significance in modern communication systems. Acoustic echoes are commonly combatted by means of acoustic echo cancellation. A critical issue in acoustic echo cancellation is the control of adaptation of the echo cancellation filter with the possible presence of near-end speech activities. In this paper, a novel echo cancellation structure is proposed. The proposed structure makes use of two parallel adaptive filters. Echo of the two parallel adaptive filters is tailored for a specific operating situation. By so doing, the contradicting requirements of robustness against near-end disturbance and fast convergence and tracking speed are achieved simultaneously. r 2006 Elsevier B.V. All rights reserved. Keywords: Echo cancellation; Double-talk; Filter bank; Subband adaptive filter

1. Introduction The proper control of acoustic echoes is of vital importance in modern communication system. Acoustic echoes arise due to the acoustic coupling between the loud-speaker and the microphone at user terminals. With the long round-trip delay typical in today’s mixed signal networks, unsuppressed echoes would be very annoying to the endusers, and to the extreme, can make a conversation impossible [1,2].

Corresponding author. Tel.: +852 22415956; fax: +852 28592583. E-mail address: [email protected] (K. Fai Cedric Yiu).

Echo cancellers (EC) have been developed to suppress echoes in communication networks [3]. An EC is essentially an adaptive filter, as illustrated in Fig. 1, that generates an echo estimate from the farend signal. This echo estimate is subsequently subtracted from the microphone signal. The most attracting feature of an EC is its capability of providing a natural conversation pattern, i.e., allowing the users at both ends interrupting each other. Nevertheless, during time periods in which users at both ends are active (known as double talk, or DT), the adaptation of the EC filter would be subject to perturbation of strong near-end speech signal. Such a strong perturbation would set the EC filter off its already converged state, resulting in an annoyingly high level of returned echo to the far-end user.

0165-1684/$ - see front matter r 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.sigpro.2006.03.004

ARTICLE IN PRESS J. Huo et al. / Signal Processing 86 (2006) 3752–3760

3753

near end x(n) farend h(n) Adaptive filter ^ h(n) _ e(n) residual echo

+

noise v(n) double-talk

^ y(n) y(n) +

Fig. 1. An illustration of an echo canceller for acoustic channels.

A common way to alleviate the DT problem is to halt the adaptation of the EC filter whenever nearend speech activity is detected [4]. Due to statistical fluctuation and delay in parameter estimation, detection errors are inevitable. In [5], an adaptive filtering algorithm based on robust statistics is developed to cope with such detection errors. In this paper, a novel parallel adaptive filter structure is proposed which is robust against double-talk, fast in tracking echo path variations due to random or systematic changes, and does not rely on a double-talk detector (DTD). It comprises of two FIR filters, each tailored for a specific operating situation. A controlling algorithm is employed to govern the exchange of the filter coefficients between the two filters, and to decide the filter to be used for cancelling the echoes. One feature of the acoustic echo path is its extreme length [6]. The impulse response of a typical office requires an FIR filter of about 1000 taps at 8 kHz sample rate to model. The implementation cost of such a long adaptive filter is high. More importantly, when adapted in time domain with input signals of high spectral dynamics, the convergence speed of such a long adaptive filter is very slow. Adapting the EC filter in frequency domain or in subbands [7] can substantially speed up the convergence. Moreover, frequency domain and subband algorithms generally employ block processing and fast Fourier transform, which results in a considerable reduction in computational complexity. In this paper, we implement the echo canceller in subband with a delayless structure. Clear benefits of the proposed algorithm in terms of convergence rate, DT robustness and tracking are demonstrated with simulation results.

2. Robust statistics based adaptive filtering Fig. 1 illustrates the basic idea of acoustic echo cancellation. The echo path is considered as a linear time-invariant system. The microphone signal yðnÞ can therefore be written as yðnÞ ¼

1 X

hk xðn  kÞ þ vðnÞ,

(1)

k¼0

where hk is the kth tap of the impulse response of the echo path, xðnÞ is the loudspeaker signal, vðnÞ is the near-end signal. The echo canceller models the room impulse response with an N tap FIR echo cancellation filter (EC filter) and generates an estimate of the echo signal as ^ ¼ yðnÞ

N 1 X

h^k xðn  kÞ,

(2)

k¼0

where h^k is the kth tap of the impulse response ^ estimate. The echo estimate yðnÞ is then subtracted from the microphone signal and the error signal ^ eðnÞ ¼ yðnÞ  yðnÞ

(3)

is transmitted to the far-end. The EC filter h^k is made adaptive so that it can be adjusted to match vastly different room impulse responses and to track time variation of the echo paths. Typically, the EC filter is adjusted to minimize the power of the error signal. It is well-known that an EC filter so adjusted is sensitive to extraneous disturbances. A single large uncorrelated error is sufficient to send the EC filter far off its optimal state. This makes the operation of the echo canceller unreliable in the presence of near-end speech. In order to reduce this sensitivity, the notion of robust

ARTICLE IN PRESS J. Huo et al. / Signal Processing 86 (2006) 3752–3760

3754

statistics [8,9] has been explored to developed adaptive filtering algorithms. The basic idea of robust statistics based adaptive filtering algorithms is to apply a nonlinearity to the error signal before correlating it with the input signal, giving an update formula of the form [10,5,11] m ^ ^  1Þ þ hðnÞ ¼ hðn f ½eðnÞ; sðnÞxðnÞ, (4) xT ðnÞxðnÞ where ^ hðnÞ ¼ ½h^0 ðnÞ h^1 ðnÞ . . . h^N1 ðnÞT , xðnÞ ¼ ½xðnÞ xðn  1Þ . . . xðn  N þ 1ÞT are the EC filter coefficients and far-end input signal vectors respectively, m is a constant stepsize, f ½ denotes the nonlinear function applied to the error signal, and the nonlinearity applied to the error signal depends on a time varying scale parameter sðnÞ. The comparative study [12] of different combinations of nonlinear function f ½ and scale parameter update indicates that the following is a good combination for the application of acoustic echo cancellation (AEC): f ½eðnÞ; sðnÞ ¼

eðnÞ min½jeðnÞj; k0 sðnÞ, jeðnÞj

sðnÞ ¼ asðn  1Þ þ

(5)

ð1  aÞ min½jeðnÞj; k0 sðn  1Þ. b (6)

In (6), a controls the memory in the scale parameter update, k0 specifies the threshold with which the potentially contaminated residual echo signal samples are detected, and b, chosen such that sðnÞ converges to one with zero mean unit variance Gaussian distributed eðnÞ, controls the bias of the scale parameter. Moreover, the parameter b also serves to emphasize the current error signal sample and thus makes the algorithm reacts fast to large error samples. The robust statistics based adaptive filtering algorithms share a clear physical interpretation. The scale parameter represents an estimate of the residual echo level. Residual signal samples contaminated by near-end speech are detected based on such an estimate and de-emphasized by the nonlinear function f ½. Such an interpretation allows us to take the update of the filter coefficients and that of the scale parameter as two separate steps, instead of deriving them from a joint optimization procedure as in [5]. By using different values of the

parameters for the scale parameter update, it is possible to construct algorithms of very different properties. An example is shown in Fig. 4. In the example, the EC filter is updated as (4) with the nonlinear function f ½ chosen as (5) and sðnÞ is updated according to (6). For ‘‘strongly robust’’, a ¼ 0:997, b ¼ 1 and k0 ¼ 1:1 whilst for ‘‘weakly robust’’, a ¼ 0:997, b ¼ 0:60665 and k0 ¼ 1:1. For both cases, the stepsize m is set to 0.5. The choice of b ¼ 0:60665 for the ‘‘weakly robust’’ algorithm was selected in [5] so that the scale s is approximately an unbiased estimate of the standard deviation of the underlying signal when it follows a Gaussian distribution. In fact, different values for b between 0.5 and 0.8 have been tested also for the ‘‘weakly robust’’ algorithm and the results does not seem to be particularly sensitive to it. In the plots, misalignment is defined as zðnÞ ¼ 20log10

^ jh^ opt ðnÞ  hðnÞj . jh^ opt ðnÞj

(7)

From the example, we see that by varying the way the scale parameter is updated, the algorithm can be very robust against near-end speech disturbance but very slow in convergence and tracking, or fast in convergence and tracking but not robust in the presence of near-end speech activities. It is beneficial, both in terms of computational efficiency and convergence speed, to perform the update of the EC filter in frequency domain. We employed the closed-loop delayless subband structure [13] for the frequency domain adaptation. In such a structure, the far-end signal xðnÞ and eðnÞ are transformed into M non-overlapping frequency subbands by a set analysis filters fam ðnÞjm 2 ½0; M  1g, with m being the index of frequency subbands, and decimated by a factor of D, resulting in the subband signals X xm ðlÞ ¼ am ðkÞxðlD  kÞ, (8) k

em ðlÞ ¼

X

am ðkÞeðlD  kÞ.

(9)

k

A set of adaptive filters h^ m ðlÞ are updated with these subband signals for the corresponding frequency band as m f ½em ðlÞ; sm ðlÞxm ðlÞ, h^ m ðlÞ ¼ h^ m ðl  1Þ þ T xm ðlÞxm ðlÞ (10)

ARTICLE IN PRESS J. Huo et al. / Signal Processing 86 (2006) 3752–3760

where h^ m ðlÞ ¼ ½h^ m;0 ðlÞ h^ m;1 ðlÞ    h^ m;N sub 1 ðlÞT , xm ðlÞ ¼ ½xm;0 ðlÞ xm;1 ðlÞ    xm;N sub 1 ðlÞT represent the impulse response and the tap input of the EC filter in the mth subband, respectively, and sm is the scale parameter for the mth subband. Given this M subband EC filter weights, a time domain EC filter is obtained from the frequency domain filter weights through a set of zero phase synthesis filters gm ðnÞ [14] as XX gm ðkD þ qÞh^ m;pk ðlÞ, (11) h^ pDþq ðlÞ ¼ m

3755

meet any conflicting performance requirements, a sophisticated DTD is not necessary. Simple power comparison can be used to choose the adaptive filter that yields the better model of the echo path (Figs. 3 and 4). In the proposed AEC, both adaptive filters are updated for each block of D input signal samples with robust statistics based adaptive filtering algorithms in subbands. For the mth subband, two scale parameters are estimated as sðm;aÞ ðl þ 1Þ ¼ asðm;aÞ ðlÞ þ ð1  aÞ  min½jem;a ðlÞj; k0 sðm;aÞ ðlÞ,

ð12Þ

ð1  aÞ b  min½jem;b ðlÞj; k0 sðm;bÞ ðlÞ,

ð13Þ

k

where h^ p ðlÞ and h^ m;p ðlÞ are the pth sample of the time domain EC filter and the pth sample of the mth subband EC filter, respectively. Frequency domain adaptive filtering offers a substantially lower computational complexity due to the block processing with fast Fourier transform, and significantly faster convergence speed due to the independent normalization of far-end signal power in different frequency bands. 3. A parallel filter structure In order to enable the adaptation of the EC filter to meet the conflicting requirements of fast convergence and DT robustness, sophisticated DTDs are needed. However, designing a reliable DTD is a very challenging task. An AEC using parallel adaptive filters, as illustrated in Fig. 2, is proposed in this work. The proposed AEC uses two continuously updated adaptive filters, one designed to ensure robustness during DT, and the other designed to achieve fast convergence and tracking. Because neither of the adaptive filters is required to

sðm;bÞ ðl þ 1Þ ¼ asðm;bÞ ðlÞ þ

where l is the block index, em;a ðlÞ and em;b ðlÞ are the error signals in the mth subband generated by the strongly and weakly robust EC filter, respectively. The variables sm;a ðlÞ and sm;b ðlÞ are the corresponding scale estimates. The echo path model in the mth subband is updated at each iteration using (4) with f ½ chosen as (5) and the corresponding scale estimate. With two echo path models available, a criterion has been set up for determining which one is more reliable. Such a comparison of echo path model reliability can be done by comparing the levels of the signals ea ðnÞ, eb ðnÞ and yðnÞ. The signal levels are calculated at time instances n ¼ lD as lD X

pe;a ðlÞ ¼

(14)

jea ðnÞj,

n¼lDDþ1

farend

near end strongly robust adaptive filter

Controller

weakly robust adaptive filter

^ ya _

^y b _

ea

ea +

eb

eb + near end

e

Fig. 2. A parallel filter structure.

ARTICLE IN PRESS J. Huo et al. / Signal Processing 86 (2006) 3752–3760 d(n) +

x(n)

+

yˆ (n) -

ˆ h(n)

Open-loop

e(n) Closed-loop

3756

Weight Transform ˆh (n) 0

ˆh (n) 1

A(z) D

A(z) D ˆh (n) M−1

Fig. 3. Delayless subband adaptive filter structure.

pe;b ðlÞ ¼

lD X

jeb ðnÞj,

(15)

n¼lDDþ1

py ðlÞ ¼

lD X

jyðnÞj.

(16)

n¼lDDþ1

First of all, for an echo path model to be considered reliable, it should provide the capability of echo suppression. This is same as requiring the corresponding error signal level is below the microphone signal level. Moreover, the EC filter that presents a closer match to the echo path is supposed to produce a lower residual echo level, and thus a lower error signal level. Therefore, an echo path model, say h^ a ðlÞ, is chosen over the other, say h^ b , when the following conditions hold: pe;a ðlÞoRa pe;b ðlÞ, pe;a ðlÞoRy py ðlÞ, where Ra p1 and Ry p1 are predetermined constants. In order to reduce the chance of making a false decision, the above requirement needs to be satisfied over a consecutive K blocks. In short, the selection of the better echo path model can be presented as the following hypothesis testing: H0 :

fpe;a ðnÞoRa pe;b ðnÞg and fpe;a ðnÞoRy py ðnÞg

8n 2 ½l  K þ 1; l,

ð17Þ

H1 :

fpe;b ðnÞoRb pe;a ðnÞg and fpe;b ðnÞoRy py ðnÞg

8n 2 ½l  K þ 1; l.

ð18Þ

When a filter is selected, its coefficients are transferred to the other filter so as to accelerate its convergence toward the echo path. It should be noted that the two hypotheses do not cover all possible situations. When neither of the two hypotheses holds, no filter coefficient transfer should occur, and the error signal generated by the most recently selected filter is transmitted. Furthermore, one of the two error signals should be transmitted to the far-end. When pe;a ðlÞ falls below pe;b ðlÞ, it is likely that the adaptation of the weakly robust filter h^ b ðlÞ is disturbed by the nearend speech signal. In order to ensure robustness during DT, the error signal ea ðnÞ should be transmitted in such a situation. On the other hand, the event that pe;a ðlÞ exceeds pe;b ðlÞ may occur due to random fluctuations of the signal during near-end speaker active period. Therefore, the error signal eb ðnÞ would not be transmitted until h^ b ðlÞ is recognized as the more reliable echo path model. When one of the two error signals is chosen to be transmitted to the far-end, the AEC would keep on transmitting this error signal until otherwise determined. Note that a far-end signal energy detector shall be included to halt the update of the filters and scale parameters and the filter selection when there is insufficient far-end excitation.

ARTICLE IN PRESS J. Huo et al. / Signal Processing 86 (2006) 3752–3760

3757

2 weakly roubust strongly robust

0 -2

misalignment (×dB)

-4 -6 -8 -10 -12 -14 -16 0

2

4

6

(a)

8 10 time (×samples)

12

14

16

18 x 104

2 weakly roubust strongly robust

0

misalignment (×dB)

-2

-4

-6

-8

-10

-12

-14 0

(b)

2

4

6

8 10 time (×samples)

12

14

16

18 x 104

Fig. 4. Different properties of robust statistics based adaptive filtering algorithms with different scale parameter updates. (a) Tracking behavior, echo path changes at sample 100 000. (b) Double-talk behavior, near-end speech between sample 80 000 and sample 100 000.

4. Simulation results In the experiments, data were recorded in an office. The EC filter has 1024 taps and adapted in 128 subbands for each block of 64 samples. The analysis filter bank and the synthesis filter bank for weight transform are both uniform DFT modulated

FIR filter banks with 256 tap prototype filter. The parameters settings for the robust statistics based adaptive filtering algorithms are a ¼ 0:95, b ¼ 0:60665, k0 ¼ 1:1. The subband adaptive filters are updated with a uniform stepsize m ¼ 0:5. The parameters of filter coefficient transferring control are set as Ra ¼ Rb ¼ 1, Ry ¼ 0:125, K ¼ 4.

ARTICLE IN PRESS J. Huo et al. / Signal Processing 86 (2006) 3752–3760

3758

echo

nearend signal

0

2

4

6

8

10

12

14

16

(a)

18 x 104

5 NCC effective filter weakly robust filter strongly robust filter

misalignment (× dB)

0

-5

-10

-15

-20 0

2

4

6

(b)

8 10 t (× samples)

12

14

16

18 x 104

Fig. 5. Experiment 1. (a) Echo and near-end speech signals. Microphone displaced at sample 100 000. (b) Misalignment curves.

Misalignment is used as the performance indicator. For the proposed parallel structure, the filter that generates the error signal selected to transmit to the far-end is taken as the effective EC filter and the misalignment is calculated based on the coefficients of this effective EC filter. The optimal EC filter h^ opt ðnÞ is estimated offline.

In the experiments, a far-end signal energy detector is included. The far-end signal energy detector estimates the far-end background noise level using a fast dropping slow rising average scheme [15]. The exponential averaging parameter is 0.9 for declining signal and 0.9999 for rising signal. The adaptive filter in a frequency band and the

ARTICLE IN PRESS J. Huo et al. / Signal Processing 86 (2006) 3752–3760

corresponding scale estimate is only updated when the far-end signal power in this frequency band exceeds 1.5 times the background noise level. In the first experiment, a 2.5 s near-end speech signal segment is added to the echo at approximately 0 dB echo to near-end speech energy ratio, as is illustrated in Fig. 5(a). An abrupt change of the echo path is emulated by displacing the recording microphone by 4 cm at sample 100 000. The misalignment curves of the effective, the strongly robust and the weakly robust EC filters are plotted in Fig. 5(b). From the misalignment curves it can be seen that the strongly robust EC filter is robust against near-end disturbance whilst the weakly EC filter is capable of closely following the environmental changes. By combining the two filters, the effective EC filter retains their advantages yet avoids their disadvantages. For comparison, we also plot the misalignment curve for the system using normalized cross-correlation (NCC) for double talk detection [16]. The proposed parallel filter structure clearly outperforms the NCC method in terms of tracking speed. In the second experiment, the echo path changes occurs at sample 80 000 and near-end speech signal is introduced right after the displacing of the microphone. From the misalignment curves in Fig. 6, it is observed that the effective EC filter converges slowly to the new echo path during near-

3759

end active period, thanks to the strongly robust filter. After the near-end speaker ceases talking, the weakly robust filter starts a rapid tracking of the environmental change, which accelerates the tracking speed of the effective filter substantially. The proposed algorithm also performs better in tracking echo path change. Fig. 7 plots the misalignment curves with different near-end speech strength. It is seen that the proposed AEC works properly with a wide range of echo to near-end speech signal energy ratios. Also, the misalignment curve of the system using NCC at 6 dB echo to near-end speech signal energy ratio is plotted. It is demonstrated by this example that the proposed algorithm outperforms the NCC method not only in tracking but also in DT robustness. In summary, the simulation results illustrates that by combining two EC filters of different properties, the proposed AEC is robust against near-end disturbance and is capable of speedy tracking of environmental changes. 5. Conclusions A new subband DT robust AEC has been proposed. In the proposed AEC, a parallel FIR filter structure is employed with two adaptive filters with different properties. This structure has the

5 NCC effective filter weakly robust filter strongly robust filter

misalignment (× dB)

0

-5

-10

-15

-20 0

2

4

6

8 10 t (× samples)

12

14

Fig. 6. Experiment 2. Microphone displaced at sample 80 000.

16

18 x 104

ARTICLE IN PRESS J. Huo et al. / Signal Processing 86 (2006) 3752–3760

3760 0

0 dB -6 dB 6 dB 12 dB

-2 -4

misalignment (× dB)

-6 -8 -10 -12 -14 -16 -18 -20 0

2

4

6

8 10 t (× samples)

12

14

16

18 x 104

Fig. 7. Experiment 3. Performance with different echo to near-end speech signal energy ratio.

overall advantages that it is robust against DTs, a sophisticated DTD is not required, and very efficient in tracking echo path variations. The study has shown that the method can handle DTs perfectly without slowing down the tracking efficiency even when double-talks and echo path variations occur very closely in time, or when the near-end speech is at the same level as the residual echo. The study has also shown that it is beneficial to combine the algorithm with subband processing. This improves the performance of the algorithm significantly, and will allow us to do time frequency analysis of the involved signals. Further studies can be carried out to investigate the performance for real time-varying channels. References [1] International Engineering Consortium, Echo cancellation tutorial, hhttp://www.webproforum.com/echo_cancel/topic07. htmi, 2000. [2] M.M. Sondhi, D.A. Berkley, Silencing echoes on the telephone network, Proc. IEEE 68 (8) (1980) 948–963. [3] M.M. Sondhi, An adaptive echo canceller, Bell System Technical J. 46 (March 1967) 497–511. [4] T. Ga¨nsler, J. Benesty, S.L. Gay, Double-talk detection schemes for acoustic echo cancellation, in: S.L. Gay, J. Benesty (Eds.), Acoustic Signal Processing for Telecommunication, Kluwer Academic Publishers, Dordrecht, 2000, pp. 81–97 (Chapter 5).

[5] T. Ga¨nsler, S.L. Gay, M.M. Sondhi, J. Benesty, Double-talk robust fast converging algorithms for network echo cancellation, IEEE Trans. Speech and Audio Process. 8 (6) (2000) 656–663. [6] J.C. Baumhauer, et al., Audio technology used in AT&T’s terminal equipment, AT&T Tech. J. (March/April 1995) 57–70. [7] J.J. Shynk, Frequency-domain and multirate adaptive filtering, IEEE Signal Process. Mag. (January 1992) 14–37. [8] P.J. Huber, Robust Statistics, Wiley, New York, 1981. [9] F.R. Hampel, et al., Robust Statistics—The Approach Based on Influence Functions, Wiley, New York, 1986. [10] J.F. Weng, S.H. Leung, Adaptive nonlinear RLS algorithm for robust filtering in impulse noise, in: Proceedings of the IEEE International Symposium on Circuits and Systems, 1997, June 1997, pp. 2337–2340. [11] Y. Zou, Z. Chan, T. Ng, Least mean m-estimate algorithms for robust adaptive filtering in impulsive noise, IEEE Trans. Circuits Systems—II: Analog and Digital Signal Process. 47 (12) (2000) 1564–1569. [12] J. Huo, K.F.C. Yiu, S. Nordholm, K.L. Teo, On the robust filter design for echo cancellation with double talk, in: Proceedings of the ICOTA 2002, 2002. [13] D.R. Morgan, J.C. Thi, A delayless subband adaptive filter architecture, IEEE Trans. Signal Process. 43 (8) (1995) 1819–1830. [14] J. Huo, S. Nordholm, Z. Zang, New weight transform schemes for delayless subband adaptive filters, in: Proceedings of the Globecom 2001, 2001. [15] E. Ha¨nsler, G. Schmidt, Acoustic Echo and Noise Control— A Practical Approach, Wiley, New York, 2004. [16] J. Benesty, D.R. Morgan, J.H. Cho, A new class of doubletalk detectors based on cross-correlation, IEEE Trans. SAP 8 (2) (2000) 168–172.

Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.