Low power parallel spread-spectrum correlator

June 12, 2017 | Autor: Mircea Stan | Categoria: Power Consumption, Spread Spectrum, Low Power, Direct Sequence Spread Spectrum Cdma, Power Reduction, Electrical And Electronic Engineering, Register File, Electrical And Electronic Engineering, Register File

Share Embed

Denunciar este link

Descrição do Produto

Low power parallel spread-spectrum correlator D.Garrett a n d M.Stan Abstract: Direct sequence spread spectrum (DSSS) transmissions require a despreading stage w i t h the standard receiver block to recover the spread spectrum signal. For long spread spectrum codes, the correlation block can be a large portion of the receiver size, hence a considerable portion of the power consumption. The authors look at two power reduction alternatives for a parallel spread spectrum correlator, by analysing the algorithm and designmg a basehe correlator and by investigating how to streamline the arithmetic operations in one case, and optimising the sample storage in the other. The two correlator designs are compared with a mix of analytical techniques and simulation data to determine the optimal correlator alternative for the DSSS application. The final analysis shows that the register file based correlator can reduce the power by over 30% for bus widths greater then 6 by using a structure which maintains the multi-bit data samples in a static area and by rotating the single bit coefficients around the data with a circular shift register.

1

Introduction

A type of spread-spectrum encoding that is widely used is direct sequence spread spectrum (DSSS). DSSS takes a message stream and adds a wideband spreading code before the signal is modulated onto a carrier frequency. For binary coefficients, each symbol (bit) is represented by a pseudorandom pattern of chips. The despreading output is the convolution of the discrete-time sample signal with the original code coefficients as seen in eqn. 1:

enters the top, the oldest sample wdl shift out of the storage area as it is no longer required, thus providing additional flexibility for chaining several correlation blocks together in order to handle larger codes.

[

I rn

2 -1 registers

L

I

r=l

where L is the length of the DSSS code and k is the discrete-time index. For every new sample, the correlator must perform L multiplications and additions. Depending on the application, DSSS systems can use design alternatives with different code lengths, bits per sample, samples per symbol, and acquisition times [l-lo]. This paper deals with a parallel correlator, where the correlations are performed at the chip rate and the DSSS code can be acquired in every symbol period. There are structures that sacrifice the acquisition time for a reduction in the hardware size, but their correlations cannot be performed every cycle [lO-12]. 2

I

‘k

f

‘p-clk

-_-

elk

1

___

clk

‘k-+

multiply and add

_--

k%--+clk

---

I

I

Fig.I Shift regfster storage

1-

1

filter

Correlator structures

2. I Shift register sample storage A conceptually simple structure to handle the sample streams, and provide the L samples to an adder tree is a shift register (Fig. 1). The structure size is determined by the sample resolution n and the length of the DSSS codes, 2m - 1 for maximal length sequences. As a new sample OIEE, 1999 IEE Proceedings online no. 19990473 DO1 10.1049hpCds:19990473 Paper receivad 15th September 1998 The authors are with the Department of ~lectrical~ n & . h g ,univetsity of Virginia, Thornton Hall, Charlottesville,Virginia 22903-2442, USA IEE Proc.-Circuits Devices Syst., Vol. 146, No. 4, August 1999

Fig.2 Correlator with caefzient mult@licatiomfedinto summation black

In the arithmetic unit, the samples are first multiplied by the code coefficients and then are added together. Fig. 2 shows the block diagram of the arithmetic unit taking the samples from the registers (the control logic for the registers has been removed for simplicity). In the case of binary coefficients (+1 and -l), the multiplications can be reduced to selecting the original sample for a +1 coefficient, or the 2’s complement of a sample for a -1 coefficient. To provide 191

a fast summation, the adder tree can be organised in a binary tree structure, reducing the calculation time for eqn. 1 on the order of log2(2" - l), or simply m.

2.2 Reducing power in the adder tree Given the shift register storage, there are potential power savings that can be realised in the adder tree by observing the data statistics. In our case the DSSS code is based on maximal length pseudorandom sequences that can be generated with a linear-feedback shift register (LFSR) as in Fig. 3 [16]. These sequences have unique properties that define.how the coefficients are changing between successive calculations of the matched filter equation.

clk

clk

clk

clk

clk

clk

clk

the sum, and then the sample with the new polarity must be added. Thls can be handled in one step by adding the sample with the new polarity twice, which explains the factor of 2 before the summation symbol in eqn. 2. Also, in each cycle, the newest sample that enters the chain must be added and the outgoing sample must be subtracted from the overall correlation sum.

clk

Fi .3 84it LFSR maxbnal length (255) psdrandom code generator with

X 8 R UrpS

E$I -

h2

1

bypassbit,

Zb

H

7-

bypass bit3

w

b

r

Fig.5 M o d f ~ d&r

cell with byparsftazctwn

a Bypass adder

b Control signal generation

Fig.4 B y w s bit generation

The run property of a maximal length sequence defines the number of runs (streams of consecutive 1s or Os) to be dependent solely on the code length [16]. The run property of maximal-length codes allows us to reduce considerably the number of transitions in the adder tree as only half of the terms in the correlation sum will have a dflerent coeflicient from one calculation to the next (and thus change their contributions to the overall sum in each cycle). As the data is shlfted by one position, the previous coefficient and the new coefficient will remain the same for half the number of samples (in runs of length 2 or greater). To capture this behaviour we define 'bypass bits' (see Fig. 4), which tell the adder stages if a term is not c h a n p g and thus it has zero contribution to the difference between the present and next correlation sum. Sirmlar work has been reported for differential coding [17], but the emphasis was not in reducing power consumption. By storing the previous sum and identifying the factors that are changing, we can streamhe the arithmetic operation to reduce the number of terms. The overall number of adders cannot be reduced as different codes change the locations of inactive adders, but we can shutdown unused adders, and limit their power consumption. The equation for the correlator can be rewritten to express the new computation as follows:

3

zm-2

101

b

Fig.6 Haraivare cornpariron of the regular adder with the adder with bypan a Simple adder

b Adder with bypass

If the coefficient for a sample has not changed from the previous calculation, h*, is 0 in eqn. 2, otherwise h*, will reflect the new polarity (Cl or -1). When the coefficient changes, the o r i p a l sample value must be removed from

To take advantage of the new method of calculating the correlation sum, a specialised adder cell was developed as in Fig. 5. In the case where a coeflicient has not changed as ZEE Proc.-Circuits Devices Syst., Vol. 146, No. 4, August 1999

192

'.-

a sample is slufted, its particular contribution to the overall sum should be zero. When a term is bypassed, the adder can be configured to ignore its value (using cs), and only pass the other input as the result (using cu or cb). Fig. 5 shows a full-adder surrounded by passgates, and the truth table for the control bits Za and Zb, which are set when an input is zeroed in the calculation. Fig. 6 shows a slice of the first level of the adder tree, and how the regular adder tree differs from the adder with the bypass cells. In Fig. 6 4 regardless of the coefficients, the data is recomputed on every cycle in every adder. In Fig. 6b, on the other hand, the arithmetic only performs computations when the coeficients are different. In the case where both input to an adder are zero (as determined in Fig. 4), the bypass adder propagates a zero control line into the next level of the tree (in this case the adder has an invalid result, but with with zero signal into the next level of the tree, the sum is not used).

the energy a single transition defined as 1/2CLV2DD. Lowering the supply voltage will decrease power consumption quadratically [13], but at a cost of reducing the maximum chip clock rate. The total switchmg energy can then be determined by summing all switching events (which multiplied by the system frequency will provide power dissipation): (3) To simplify the analysis, we assume a constant logic volt-

age swing throughout the design, ignore the routing overhead, and consider that each gate provides a unit load to the corresponding driver. The power consumed by the sMt register ulll then have two components proportional to: (i) transitions on the register inputs and outputs; (ii) clock transitions.

+ +

PSR c( bitshifts + clocktrans funout n n c( -(2m - 1) + n ( 2 m - 1) 2 ( 2 m - 1) 2 K 2 n ( P - 1) (4)

2.3 Register file storage Another approach to reduce power dissipation is to reduce the activity in the storage area. One of the biggest causes of transition activity in the shift register implementation is that every sample is moved on every clock cycle. A possible approach to reduce the unnecessary activity on the datalines is to use a register file (with pointer) implementation instead of the n-bit wide shift register [18], as seen in Fig. 7. With this scheme, only one register out of the total of 2’” I will experience clock and output transitions for each new sample. However, a global bus must be connected to each register, increasing the load due to the inputs of all the registers. To create the behaviour of a FIFO structure, we use a one-hot address register of length (2” - l), which acts as the clock pulse to the single active destination. In addition, whereas the DSSS coefficients are static for the shift register correlator, the register file structure requires that the coefficients be shifted in a one-bit shift register ring. Different adders are inactive in each clock cycle as runs of coefficients pass the multipliers.

I

I

I

To provide an absolute power number, one section of an 6-bit wide shift register chain with the appropriate fanout load was simulated using a transistor level model for the HP 0 . 5 ~process available through MOSIS. For a lOMHz clock and a 3.3V supply, the average power dissipation per 6-bit register for uncorrelated data was 41pW (average power per bit was 6.83pW). For our system the majority of the time for the correlator will be spent looking for a correlation, thus the samples will look like noise. Using eqn. 4, the average power dissipation for a 2” - 1 length shift register, where Pbit is the measured parameter 6.83pW for the I-bit register, n is the sample size and the fdter is running at IOMHi, becomes:

PSR = nPbit(2” - 1) (5) To fmd the power dissipation in the adder tree, a 6-bit ripple-any adder was simulated in the HP 0 . 5 process ~ with uncorrelated data inputs. At a IOMHz data rate, the 6-bit adder consumed 19pW of power. For an estimate of the entire tree, the 6-bit adder power measurement was bitsliced, and applied to all the adders in the entire binary tree. Each successive level has half the adders of the last, with the number of bits increased by one (i.e. two 6-bit additions generate a 7-bit result). The overall power

I

Fig. 7 RegistwJile compt

Because the global bus feeds every register in the register file, minimising the transitions on this bus can reduce the overall power consumption considerably. A simple, but effective coding technique to reduce power is the bus invert method, and for a 6-bit bus, it reduces transition activity by approximately 20% [19]. A large portion of the decoding overhead for bus invert can be incorporated into the binary coefficient multiplications units. 3

’

1

lo

Correlator power analysis

I

10 filter size

3 10

Fig.8 Power &s@atrbnfor sh$ rqister correhtor

3.7 Shift register correlator

-4

16 bits

The dynamic power consumed by CMOS circuits is directly proportional to the number of transitions over time, with

0 X-X

414 bits 12 bits

IEE Proc.-Circuits Devices Syst., Vol. 146, No. 4, August 1999

2

‘1

10

+-+

10 bits

t--X 8 bits W 6 bits O d 4 bits V-V

2 bits

193

dissipation for the adder tree can be approximated with the following relationship: PAT

= 2m-1n~bit

+ 2m-2(n + 1)Pbit + .

*

+ (n + m - 1)Pb,t m-l

= Pbit

2"-"'(k

+ n)

(6)

k=O

The total power dissipation of the shift register correlator as a function of filter size is shown in Fig. 8.

3.2 Bypass adder correlator power analysis To estimate the correlator power for the bypass adder, we must first consider the performance of the bypass adder as compared to the simple adder block. When both inputs are active, the bypass adder suffers from the overhead of the passgates. On average, a 6-bit adder in the H p 0 . 5 proc~ ess has a 5.8% power dissipation increase as measured in SPICE simulations for uncorrelated data. The bypass adder has the overhead of the passgates in the case where it is adding two numbers, but it uses two orders of magmtude less power in cases of bypass or shutdown (the adder is disconnected, and the power is only consumed in charging the bypass hes). The run property statistics determine how many bypass bits are set and the particular spread spectrum code dictates where the bypass bits are located. Using these observations and the run property for maximal length sequences, we can generate the expected number of adders in each row that will be in each of the three modes: off, bypassed, or on. Table 1 summarises the expected number of adders in the three modes of the states bypass adders (on, off, and bypass) for maximal length codes. Since adders that are off or bypassed consume neghgible power, we can estimate the power dissipation in any row of the adder tree with the following equation, where POnis the power dissipation of the single adder:

11

10

.1

Fig.9 Power &quationfor bypass ndder correlator 4-4

0 X-x

+-+

%-%

16 bits

8 bits

46 bits

0

4 14 bits

0-0 V-V

12 bits 10 bits

4 bits 2 bits

3.3 Register file correlator power analysis As with the shift register, the transition activity for the register file can be modelled as a function of the code lengths and bus widths. The switching activity in the register file comes from six main components: (i) transitions due to the global bus (w/ bus invert); (ii) the fanout of the registers into the correlation block; (iii) the clocked registers; (iv) the clocks on the address bit registers; (v) the hot-bit shifting through the address register bits; (vi) the filter coefficients that must now be rotated. The total number of transitions per cycle becomes:

+

+

PRF 0: bus fanout regclk + bitclk + bitshift coefclk 0.8n n oc -(P - 1) - n 2 2

+ + +

PTO,= J73[m]POn + J ~ ~ [ ~ Y P +~ E[off]p,ff s s ] P ~ ~ ~ ~ ~ ~

= E[m]P,n (7) To provide a fair comparison with the first correlator structure, we must factor in the additional overhead of using the bypass adder trees: (i) additional register which holds the previous correlation results; (ii) additional adder which includes the previous result in the sum. A plot of the power dissipation for the bypass adder correlator is shown in Fig. 9.

,

3 10

2 10 filter size

10

1 + 2 + (2m - 1) + 5(2") 0.8n + 5 (3n + 5 ) (2" - 1 ) +

+ (2"

oc-

+ coefs

-

1)

(8) 2 2 When the ratio of the transition activity for the register file (eqn. 8) and the shift register (eqn. 4) are plotted (Fig. 10) for various filter sizes and bus widths, the register file has a clear advantage in reducing the transitions in the sample ~

Table 1: Expectednumbers of adders in each mode (on, off, bypass) Length

First level

Mode

7

15

31

63

127

off

0.5 1.5 2

1.5 3 3.5

3.5 6 6.5

7.5 12 12.5

15.5 24 24.5

0 0.5 1.5

0 1.5 2.5

0.25 3 4.75

0.75 1.75 3.75 7.75 12 6 48 24 9.25 18.25 36.25 72.25

0 0 1

0 0 2

0 0 4

0 0.5 7.5

bypass

on Second level

off bypass

on Third level

off bypass

on 194

0 1.5 14.5

255

511

1023

31.5 48 48.5

63.5 96 96.5

127.5 192 192.5

-

0 3 29

0.13 6 57.87

15.75 96 144.25 0.38 12 115.62

IEE Proc.-Circuits Devices Syst., Vol. 146, No. 4, August 1999

storage area over the shift register. As both the filter size and bus width increase, the register file approaches less then one-quarter of the transition in the shift regjster.

By calculating the power dissipation for a given size shift register (eqn. 5) and then applying the ratio of transition activity (eqn. 9), we can generate an estimate of the power dissipation for the register file storage. The register file also si&icantly changes the data transition statistics of the inputs on the first level of the adder tree. Instead of each adder seeing pseudorandom data, with on average only half the bits changing values, the register file forces every input bit to the adder to change when its corresponding sample changes polarity. A simulation of a single adder with stable sample data and the polarity changing according to the DSSS code coefficients had an average power dissipation of 23pW (compare with S9pW for the same adder with uncorrelated samples) in the first row. Similarly, simulations on the second row for the regular adder tree showed an average power dissipation of 31pW per adder cell for the register file (compare with 22vW as estimated for the shift register adder tree). A plot of the register fie correlator power dissipation in shown in Fig. 1S.

Overall comparison

4

In all,we have presented three variants to realise the correlation block for fast acquisition of an incoming DSSS code. Fig. 12 shows a graph with the normalised power dissipations of each of the variants: a shift register storage area with regular binary adder tree, a shift register storage with the bypass adder tree configuration, the register file implementation with the Bus Invert technique. 1.2r

m

$

0.7

E

p 0.6 0.5'

',

3

2

10

10

10

filter size

Fig. 12 Normalbed power &siption f i r the correlator rinplementutwn choices a--O 10-bit R F correlator 6 4 shift register correlator "0 x-x

+-+ e-+

o.2

t

1oL 10" filter size Fig. 10 Ratio of trm?wnal activity f i r registerjle agaimt sh$ register 4-4 16 bits %-% 8 bits 0-0 14 bits 0--(36 bits X-X 12 bits 0 4 4 bits V-V 2 bits +-+ IO bits

10

1

I

'I

2

10 filter size

10

Fig. 11 Power &spationfor register f i correlator 6 0 X-X

+-+

4 16 bits 414 bits 12 bits 10 bits

%-%

8 bits

o--O 6 bits 0

0-V

4 4 bits

2 bits

IEE Proc.-Circuits Devices Syst., Vol. 146, No. 4, August 1999

3 10

0

V-V

*-*

4 8-bit R F correlator

6-bit R F correlator Cbit R F corielator 2-bit R F correlator

The first correlator, the shlft register design, was chosen as the baseline case, and the curves in Fig. 12 were all normalised with respect this correlator. The bypass-adder correlator asymptotically approached a 12% reduction in power over the shift register correlator across for large filter sizes. The relationship between the power in the shift register and the bypass adder correlator remained constant across all bus widths. In contrast, the register fie correlator has a large power savings over both the shift register and bypass adder correlators. The register file correlator power dissipation shows a strong dependence on the bus width, and at 16 bits it has a W ?reduction over the normalised correlator. Even with a 6-bit sample width the register file has a 30% power reduction. 5

lo

bypass adder correlator 16-bit R F correlator 14-bit R F correlator 12-bit R F correlator

Conclusions

The starting point for this research was the shlft register implementation, its structure is easy to understand, and is most readily seen directly from the algorithm. The bypass adder techmque was explored to try and remove some of the computational expense per cycle. For the first row of the binary adder trees, the bypass techniques was effective in shutting down over half of the adders in the first row. While large gains were achieved in the first row of the binary adder tree, the lower levels of the adder are active in almost every clock cycle. While the bypass adder tree did achieve a si&icant power reduction in the arithmetic unit, the storage area was still consuming most of the overall power. The most substantial power reduction was achieved by the register file implementation, regardless of the filter size. Coupling the register file correlator with bus invert data encodings produced a correlator with over 30% reduction over the shift register case on bus sizes of 6 or greater. We have presented several power minimisation techniques for a direct sequence spread spectrum correlator 195

working at the chp rate. Depending on the FIFO implementation (shift register or register file), different adder tree solutions are optimal for low power design. When samples are shifted each cycle (as for the shift register FIFO), an adder tree with bypass reduces the overall power by 12%. When the samples are static and only the coefficients are shlfted (as for the register file FIFO), a regular adder tree gives the best results for more than 30% power reduction for most bus widths. The most effective technique found to reduce the dynamic power caused by switchmg activity was simply moving the data that would cause less transitions (the 1-bit coefficients) and keeping the multi-bit samples static. 6

Acknowledgments

The authors thank Dr Jim Harris for inspiring some of ths work and Dr Ron Wdliams, Dr Steve Jones, Max Salinas, Arnaud Forestier, Adam Von Ancken, and Peter Schaefer for many interesting discussions. The authors also thank Marcy Garrett for help in reviewing the paper. Ths work was partially supported by NSF Career Grant MIP9703440. 7

References

1 HAYKIN, S.: ‘Communication systems’ (John Wiley, 1994)

2 GARRETT, D., and STAN, M.: ‘Power reduction techtuques for a spread spectrum correlator’. Proceedings of the 1997 internal symposium on Low power electronics and design, 1997, pp. 225-230 3 ZIEMER, R., and PETERSON, R.: ‘Digital communications and spread spectrum systems’ (Macmillan Publishing Co., 1985), Chap. 7 4 NAMGOONG, W., and MENG, T.: ‘Power consumption of parallel spread spectrum correlator architectures’. Proceedings of the 1998 international symposium on Low power electronics and design, 1998, pp, 13x135

196

5 WU, J.: ‘A 2.6-V 44MHz all-digital QPSK direct-sequence spreadspectrum transceiver IC‘, IEEE J. Solid-State Circuits, 1997, 32, (IO), pp. 1499-1510 6 HAHM, M., FRIEDMAN, E,, and TITLEBAUM, E.: ‘A comparison of analog and digital circuit implementations of low power matched fdters for use in portable wireless communication terminals’, IEEE Trans. Circuits and Syst. II, Analog Digit. Signal Process., 1997, 44,(6), pp. 498-506 7 LEIBOWITZ, L.: ‘Multiplexingtechniques for digital correlator speed improvement’, IEEE Trans. Commun., 1985, COM-33, (6), pp. 579588 8 POVEY, G., and GRANT, R.: ‘Simplified matched filter receiver designs for spread spectrum communications applications’, Electr. Commun. Eng. J., 1993,5, pp. 5944 9 IBRAHIM, B., and AGHVAMI, A.: ‘Direct sequence spread spectrum matched filter acquisition in frequency-selective Rayleigh fading channels’, IEEE J. Sel. Areus Commun., 1994, 12, (5), pp. 885-890 10 CHUNG, B., CHIEN, C., and SAMUELI, H.: ‘Performance analysis of an all-digital BPSK direct-sequence spread-spectrum IF receiver architecture’, IEEE J. Sel. Areas Commun., 1993, 11, (7), pp. 1 0 9 6 1107 11 MILSTEIN, L., and GEVARGIZ, J.: ‘Rapid acquisition for direct sequence spread-spectrum communications using parallel SAW convolvers’, IEEE Trans. Commun., 1985, COM-33, (7), pp. 593600 12 VON ANCKEN, A.: ‘Techniques in rapid phase acquisition of PN sequences’. Master’s thesis, University of Virginia, 1997 13 CHANDRAKASAN, : ‘Design considerations and tools for low-voltage digital system design’. Proceedings of the Design automation conference, 1996, pp. 113118 14 JSANG, S.: ‘Accurate simulation of power dissipation in VLSI circuits’, IEEE J. Solid-state Circuits, 1986, SC-21, (5), pp. 899-901 15 STAN, M., and BURLESON, W.: ‘Low-power encodings for global communication in CMOS VLSI’, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., 1997, 5, pp. 444-455 16 ZIEMER, R., and PETERSON, R.: ‘Digital communications and spread spectrum systems’ (Macmillan Publishmg Co., 1985), Chap. 8 17 SANKARAYYA, N., ROY, K., and BHAlTACHARYA, D.: ‘Algorithms for low-power and high speed FIR filter realisation usmg differential coefficients’, IEEE Trans Cirnrits and Syst II, Analog Digit. Signal Process., 1997, 44,(6), pp. 488-497 18 TSERN, E., and MENG, T.: ‘A low power video-rate pyramid VQ decoder’, IEEE J. Solid-State Circuits, 1996, 31, pp, 1789-1794 19 STAN, M., and BURLESON, W.: ‘Bus-invert coding for low power V O , IEEE Trans. Very Large Scale Integr. (VLSI) Syst., 1995, 3, pp. 49-58 20 RABAEY, J.: ‘Digital integrated circuits’ (Prentice Hall Electronics and VLSI Series, 1996), p. 148

IEE Proc.-Circuits Devices Syst., Vol. 146, No. 4, August 1999

Lihat lebih banyak...

Low power parallel spread-spectrum correlator

Descrição do Produto

Comentários