A dual-DSP microprocessor system for real-time digital correlation

June 2, 2017 | Autor: Subra Ganesan | Categoria: Computer Hardware, DSP, Electrical And Electronic Engineering
Share Embed


Descrição do Produto

A duaI-DSP microprocessor system for real-time

digital correlation Subramaniam Ganesan outlines the design of a duaI-DSP system and its application for real-time correlation

The design of a dual TM5320C25 DSP microprocessor system with dual-port common memory and its application for real4ime correlation are explained. System design details, brief explanations for design decisions and timing analysis, theory of digital correlation, a software flowchart and a description of hardware-software debugging using the XDS 320 simulator and a software simulator a r e presented. digital correlation

DSP dual-porl memory

DSP micropro(essors have on-chip multipliers, compute a pipelined running sum of products and perform data scaling, memory shifts and pointer incrementing in parallel with other operations. They can accomplish each multiply-accumulate instruction in a single cycle of 60 ns (or 100 ns depending on the processor clock), while a typical general purpose 32-bit microprocessor requires nearly 1250 ns. These features are very useful for signal processing, especially image processing, where a large amount of data needs to be processed in real time. Multiple mi(roprocessor systems using advanced microprocessors can provide an appropriate solution to the demand for additional computing power to support complex applications 1' 2. A primary design objective of a multiple microprocessor system is the enhancement of system performance, throughput and real-time control. This, however, assumes that the computational task lends itself to partitioning into smaller tasks where one processor can be allocated to the execution of each task. The developments in technologies and multimicropro(essor architecture complement each other. The availability of a faster microprocessor provides a faster Det)artment ol ( (n]H)Liter Scien(e and Engineering, Oakland University, Ro( hester, MI 48 {09-4401, USA Papel re(eiw,d: 25 February, 1991 Revi~ed: 24 July 1991

processing element in a multiple-microprocessor system. In the last few years a number of high performance digital signal processing microprocessors (DSPs) have been announced { ~. Ideally, with n processors, the system should be n-times faster than a single processor system 6. Since most applications contain a lot of parallelism (parallel tasks), the speed-up obtained is almost linearly proportional to the number of processors.

CORRELATION

- THEORY

Correlation is a measure of the similarity between two waveforms. It is a method of time-domain analysis that is particularly useful for detecting periodic signals buried in noise, for establishing coherence between random signals and establishing sources of signals. Applications are found in many engineering fields such as radar, radio astronomy, medical, nuclear and acoustical research. The auto-correlation function of a signal is a graph of the similarity between the signal and a time shifted version of itself, as a function of time. The auto-correlation function of any signal, random or periodi(, depends not on the actual waveform, but on its frequency content.

The mathematics

of correlation

The auto-correlation function of a waveform a(t) is defined as: ~T

Raa(t-) =

Lim (l/T) ~ a(t)*a(t - r)dt

(1)

that is, the waveform a(t) is multiplied by a delayed version of itself and the product averaged over T seconds. In a digital system, the averaging is equivalent to sampling the signal every A t seconds and then summing a finite number, N, of the sample products.

0141-9331/91/070379-06 © 1991 Butterworth-Heinemann Ltd Vol 15 No 7 September 1991

379

k=N

Raa(T) = (l/N)

y a(kAt)a(kAt k=l

-

"c)

(2)

This would be computed for several values of T. The range of v over which Raa(V) iS of interest depends on the bandwidth of the signal act). For example, the autocorrelation function of a 1 M H z signal could be computed for values of v ranging from zero to 100~s with 100 ns resolution. The cross-correlation function of two non, identical waveforms a(t) and b(t) is given by: k=N

Rba(T) = ( l / N )

y

a ( k A t ) b ( k A t - T)

(3)

k=l On-line correlation has the advantage of providing results as the measurements are being made. The design of a dual DSP TMS320C25 processor based system for on-line digital correlation is described below. The hardware and software details are given in the following sections.

DUAL PROCESSOR ARCHITECTURE Figure 1 shows a block diagram of two TMS320C25 DSPs communicating through a dual-port common memory. This arrangement allows two separate processors running on independent clocks (asynchronously) to operate on the same data space.

Memory system A program memory of size 2K* 16 is created by using two CMOS EPROMS, TBP38L165-35 (or Cypress CY7C 29235). This EPROM has an access time of 35 ns, from valid addresses and 25 ns from chip select, and can be interfaced to a TMS320C25 40 M H z processor with no wait states. The time parameters for memory IC interfacing are discussed in Reference 8. Local data memory of size

4K*16 is created by using the Cypress CY7C 169-25. These are 4K × 4 SRAMs with 25 ns access time from valid addresses and a 15 ns access time from chip select. This access time is fast enough for a wait state generator not to be required. A dual-port SRAM CYC132-25 arm CY7(7142-25

were

used

to

create

a

2K'1~)

~lobai

memory.

Dual-port memory Tile new, fast dual-port SRAMs provide parallel (duplicate) address and data buses 7. They nearly double the communication bandwidth when used as a shared memory in a dual-processor system by storing variables in unique shared memory or local memory locations. Dual-port SRAMs readily support structured global data techniques and help software designers eliminate branching and multiple accesses to redundant local variables in taskswitching multiprocessing system.~. The dual-port memory device furnishes on-chip arbitration logic to handle write 0000 > O01F > 0020

[74,

nterrupts and reserved on-chip ROM On-chip ROM

DRR

>0000

DXR TIM PRD

> OFAF

IMR

Reserved

Figure 2.

WSC circuit diagram

Reserved

> I000

CLKOUT I

~

cLKOOT,

/---k f /---k / -

#h

Ad,*os ¥

Vo,do0,*oss

II

X__ /--

i

cs , ~

II

tl[

BUSYL

; \/

,ss

I

/-

>O05F

On-chip Block B2

> 0060

Reserved

> 0080

> O07F

On-chip Block BO - ~)n--chip . . . . . Block BI

> FFFF

> 200 > 2FF >300 >3FF >400

\

s

/

External SRAM (local memory)

/--

\

(( 1t JJ Remains low until BUSYL goes high

R/W DI5 DO

For READ

DIS- DO

For WRITE

Data out

> 13FF

"-G gn& . . . . . > 1400 dual - port SRAM (gobal memory) >IBFF

XEZ (~/-

FiRure }.

>0005

>OIFF

/-

K= Q_/ READY~

External 2K EPROM

External memory unused

Dual port SRAM is busy

XI - - ~ _ _ / " J

> 17FF

GREG

Data in

Unused external memory

XZ Figure 4.

>ICOO >FFFF

Memory map

WS(; timin R diagram

port memory only. The number of wait states introduced depends on the BUSY* signal coming from the dual-port memory. The timing diagram for the WSG is shown in Figure 3. When the TMS320C25 processor wishes to read data from dual-port memory, it places a valid address on the address bus. The address decoder decodes the address and sends a CS* signal to the dual-port memory after a time delay t~. If this memory location is being a(cessed by the second processor, the dual-port RAM sends a BUSY* signal after a time delay t 2. This triggers the WSG which in turn pulls low the READY signal to the first C25 processor. This READY signal remains low until the BUSY* signal fronl the dual-port memory goes high. The READY signal transition to high level takes place at the negative edge of the following CLKOUT2 signal. Valid data from the dual-port memory is now placed on the data bus (D0-D15) and is read by the first C25 processor during the positive edge of the CLKOUT2 signal. The valid address is removed by the processor at the negative edge of the CLKOUT1 signal. Similarly, a write operation in dual-port memory will be synchronized to CLKOUT2 and BUSY* signals. The external private memory (SRAM) and the program memory (EPROM) of each processor have fast access times and they need no wait states. When they are selected for access, the READY input of the TMS320C25 is pulled high, so that these memory accesses can be done with no wait states.

Vol 15 No 7 September 1991

Address decoding The memory map of the system is shown in Figure 4. The addressing scheme is designed so that a global memory cell has the same address for both the processors. The address decoder will send chip select signals to local EPROM, local SRAM and global dual-port SRAM. The overall propagation time delay from valid address availability to chip select signal generation is an important factor in the circuit design. The address decoding is implemented by using the 74AS138 decoder IC which has a propagation delay of 10 ns (max).

Analogue interface One analogue interface chip TLC32040 is connected to each of the TMS320C25 processors as shown in Figure 5. The TLC32040 chip integrates a bandpass switched capacitor anti-aliasing input filter, a 14-bit resolution A/D converter, four serial port modes, a 14 bit D/A converter and a low-pass switched capacitor output reconstruction filter 9, 10. The TLC320C40 ALC chip was chosen because it has both 14 bit A/D and 14 bit D/A, interfaces easily without any additional gates and has low cost, whereas a 14 bit parallel A/D interface will need select logic, tristate gates, a higher component count and has higher cost. Through serial lines the digital output of the TLC32040's

381

TMS 5 2 0 2 0 / C 2 5

+5V

TLC 32040 Vdd

2O

CLKOUT FSXL

6 14

FDX~

,2

I0 CL K R ~

Figure 5.

-5 V--

Vcc

nalogue

Word/Byte MSTR CIk FSX DX FSR DR ,SHIFT CIk VCC

Analogue inpu~

In+ In0~_~

output

N ~ rH

~

+5V

K~(mAt) = ( l / N )

> 1600 Ohm REF

F--~---

~

• A dedicated common memory area is allocated for communication between processors. • Since fast dual-port RAMs are available in the market, the design is simple. • The dual-port common memory is accessible to both processors and no external arbitration is required. • No special protocol is required for communication between processors. • The common memory size is fairly large and is enough for most applications. The disadvantages are: • The message shared at any time is limited by the dualport common memory size. • When processor 1 loads a message in common memory for processor 2, a protocol is required to inform processor 2 that a new message is available. An interrupt line or software polling can be used for this purpose.

AUTO- AND CROSS-CORRELATION COMPUTATION

382

(~)

The duaI-DSP processor system computes the values oi auto- and cross-correlation functions of analogue signals at 100 equally spaced points on the time delay axis. Input waveforms a(t) and b(t) are sampled as shown in Figure 6. Each processor samples at 2At intervals, stores them in the local memory and also in dual-port RAM. The first processor samples at odd A t intervals and the second at even At intervals. The computation of intermediate correlation functions is completed by each processor within a 2At interval. Whenever a new sample is taken. the processor stores it in the dual-port memory and its local memory and also transfers the last sample value stored in the dual-port memory by the other processor During the computation process, the most recent value ot a(t) is multiplied by the 100 delayed samples of b(t)~ summed with the previous corresponding value and the resulting 100 intermediate Ra[~ values are stored in local memory. In other words, the 100 locations in local memory of each processor contain intermediate values of R.#;(m At). When new values of b(t) and a(t) are read into local memory, the old values of b(t) are moved using the DMOV instruction, so that the previously stored samples will become delayed samples fc)r the present (alculation~. After N / 2 samples have been taken by each processor. the second processor transfers the Rah(m At) values into the dual-port memory. The first processor reads these values, adds them to the ( orresponding values in its Io(al memory, and displays the R,!~ results on the outpuL

Software The software flow chart for the computation ol autocorrelation functions is given in Figure 7. After processor~ 1 and 2 are initialized, processor 1 collects the first 100 samples of a(t) at At intervals and stores them in the dualport memory and local memory. Processor 1 then sends a synchronizing signal to processor 2 through the XF pin. Processor 2 starts operation when a synchronizing signal arrives at the BIO input. Processor 2 reads the 100 b(t) values from the global dual-port memory. Processor ! samples at even At intervals while processor 2 samples at odd ~ t intervals, and each processor samples at 2~5.t

br ~ ~ ~' J ,

In this section an application of this multiprocessor system for auto- and cross-correlation is described. If the signals a(t) and b(t) are sampled every A t seconds and m A t is the time delay between the signals, the crosscorrelation expression (3) can be written as11

tpr Process time f o_r _ FPI

~'~ . . . . . . ~----Z .... ~ ' '~ ~

-~'~ i

Process time t for/.LP2 P?

i 99

98

I

0

I

a(t)

I

a(kAt)*b(kAt k=~

a(i,~2~t)*a(kAt .,- t h a t )

,

A/D is connected directly to the TMS320C2S serial receive register inputs. The digital output from the TMS320C25 in bit serial format is sent to the D/A within the TLC32040. The transmit and receive sections are operated asynchronously and with the SHIFT CLK of the TL32040, Continuous mode operation without frame sync pulses is used. This allows transmission and reception of a continuous bit stream without requiring frame syn( pulses every 8 or 16 bits. Using the RFSM instruction the FSM bit is set to 0. Now on FSX and FSR,inputs to C25 will be ignored. Transmission occurs every CLKX cycle and reception occurs every CLKR cycle. Advantages of this architecture are:

Rab(mAt) = ( l / N )

7~

R~

AIC TLC32040 and TMS320C25 inteHace

N + m

where N is the number or ~amples taken and m -- 0.1 . . . . 99. For each value of m, the N product values are summed and divided by N. the auto-correlation expression (2) can be written as:

............

- mat) (S)

l-igure ft.

~

.......

L ....

L

"

...... J . . . . . . . . . . . . . . . . . . . .

t

Sampling ot a(t) and i)(t)

Microprocessors and Microsystems

J Start J i J Initialize J

J Start J i J Initialize ]

l

/

Input I00 samples /

[

I

I

Signal to processor2

i

I Repeat512 times

~----I

I

I Compute outo-corre,ationI on I0© most recent samples

Transfer 99 samples

taken from processor I to on-chip data memory I

/ InputA somp,e . / l I Transfer sample taken by I

Repeat 512 times

I /input I somp,e /

processor 2

I

Compute auto-correlation J on I00 most recent samples i

Transfer the last sample taken by processor I i ............ Store result in dual-port memory and send signal to processor I

No

Transfer results computed [ by processor 2 to on-chip data memory of processor I

J

I J

I

I

1

I

Compute R/ = RIi + R2i J

1

/ Output the result

/

1 [Stop J Flow chart for processor l

Figure 7.

Flow chart for processor 2

Software flowchart

intervals. The interval is obtained by using the internal timer interrupt. On power-up or reset, processors 1 and 2 execute initialization routines. The 100 internal memory locations $201 to $264 in block B0 are used for storing the 100 input samples. (The symbol '$' indicates that it is a hex value). Location $201 stores the A0 sample, and location $264 stores the A99, or most recent, sampJe. A0 is sampled 99 A t intervals before A99. Internal memory locations $300 to $363 are in Block B1. The dual-port memory locations $1400 to $1463 are used to store the initial 100 samples. Later in the program, location $1400 is used by processor 1 to store the most recent input sample, and location $1401 is used by processor 2 to store its most recent input sample. At the end of all computations, locations $1400 to $1463 are used by processor 2 to store the intermediate results. The operation of processor 1 is described below in detail. 1. Disable interrupts. 2. Set the XF pin and XE status bit ST register to 1. Processor 2 will be in wait state until XF from processor 1 is 0. ]. Block B0 is configured as data memory. 4. Store a value in memory to be used later by the timer period register to obtain an interrupt after a 2 A t delay. Disable timer interrupt. 5. Since the A / D is connected to the serial port, initialize the serial port registers. 6. Clear the 100 locations in block B1 (locations $300to $363) where the RO to R99, the auto-correlation results, will be stored.

Vol 15 No 7 September 1991

7. Enable timer interrupt. Input 100 samples at an interval of A t and store them in dual-port memory at address $1400 to $1463. The A t interval is obtained by using the timer interrupt. 8. Processor 1 resets the XF pin to 0 and signals processor 2 to start operation. 9. Processor 1 waits for an acknowledge signal on the BIO pin from processor 2, and on receiving acknowledgement sets the XF pin to 1. 10. Transfers the 100 inputs stored earlier at address $1400 to $1463 in dual-port memory to on-chip RAM locations $264 to $201 in Block B0, page 4. Note: $264 now contains A99 and $201 contains A0 va] ues. 11. The processor calculates the following 512 (or N) times for i = 0 to 511. R0 = R'0 + A0*a(i) R1 = R'I + Al*a(i) R99 = R'99 + A99*a(i) Here a(i) is the input sample read most recently by the processor. R' represents the previous value. Note that there is a delay of 2 A t between each input sample. After every 2 A t seconds the timer interrupts and the interrupt service routine (ISR) is executed. In the ISR, processor 1 reads the input sample and stores it in location $1400 in dual-port memory. Processor 2 had stored on A t seconds before a new input sample value at location $1401. These two most recent sample values are transferred to on-chip RAM block B0 locations $201 and $202. Before bringing the new values, the old 100 values at locations $201 to $264 are moved up twice using the move operation instructions LTD and DMOV. 12. After computing 512 times, the 100 results in locations $300 to $363 are divided by 512. 13. After the BIO signal arrives from processor 2, processor 1 reads the results R0 to R99 from dual-port memory and adds them to its values. The final 100 results are output at A t intervals, to the D/A. Processor 2 operates in a similar fashion, except that it waits for a start signal from processor 1 and stores the final result in dual-port memory. This dual processor system was built on a custom layout printed circuit board. An auto-correlation output waveform for a sine wave input is shown in Figure 8. A software simulator 12 with fixed a(t) and b(t) values in dual-port memory locations was used for testing. The XDS/22 Extended Software development system was used for testing the hardware. The assembled program from the floppy disk was transferred through an IBM PC clone which was acting as a terminal to the XDS/ 22. The XDS/22 executes commands and instructions and drives the target system (system wired here) as if it contained the TMS320C25. In tracing mode the XDS/22 stores selected program execution information for later display, with the option to halt processing after storing a selected number. With breakpoint the XDS/22 halts processing under selected conditions. Trace and breakpoint features were used to check the program behaviour.

Performance

For real-time correlation with this dual-processor system we could sample the signal at a minimum interval of 40 p s. This limitation is due to the fact that the minimum

383

+A

/ ~

~

o

(tn)

' al

2

J -A +A2/2

'

~

i

~

~

/

t?a°(t) Auto-correlation AUt°ut~ut

4 5

6

7 8

-A~/2 Figure 8.

Output graph

sampling and computing time needed for this algorithm by each processor is 80ps. The speed-up factor S is defined as the ratio of total computation time for one processor to that for P processors. The speed-up factor for this dual processor system for correlation computation was found to be 1.998 (i.e., approaching 2).

CONCLUSION In this paper, design details of a dual TMS320C25 processor system and its application for correlation are given. Dual-port memory is an ideal component for multiprocessor systems. The system has been used for finding the noise transmission path and vibration testing of automobile and aircraft structures by computing the correlation between a vibration signal and signals at different points. REFERENCES 1 Ganesan, S, Raja, P V, Kumari, V, Kun-Shan Lin and Ehlig, P 'Multiprocessor architecture using DSP

384

9 10 11

12

microprocessors' 4th Conference on Hypercube, Concurrent Computers and Applications, Monterey, California (May 6-8, 1989) Gass, W S, Tarrant, R T, Pawate, B I, Gammel, M, Rajasekaran, P K, Wiggins, R H and Covington, C D 'Multiple digital signal processor environment for intelligent signal processing' Proc. IEEE Vol 75 No 9 (September 1987) pp 1246-I 259 Special issue on DSP microprocessors, IEEE M i c r o (December 1986 and 1988) Lee, E A 'Programmable DSP architectures: Part I' I£E£ ASSP (October 1988) C o m p u t e r Design (March 13, 1989) Marsan, M A, Balbo, G and Conte, G 'Comparative performance analysis of single bus multiprocessor architecture' I E E E Trans. C o m p u L (Dec 1982) pp 1179-1191 Cormier, D 'Dual-port SRAMs bolster system design' ESD (November 1988) pp 83-86 Troullinos, G and Bradley, l 'Hardware interfacing to TMS 320C25' Product Application Notes SPRA014, Texas Instruments (1987) Pipenger, D E and Tobaben, E l Linear and Interface Circuits Applications, Volume 3, Texas Instruments, SLYAOO3, pp 11-195-I 1-213 'Interfacing the TLC 32040 family to the TMS320 family' User's guide, Texas I n s t r u m e n t s SI.AUO07 (1987) Ganesan, S, Raja, P V and Kumari, V 'A multi-DSP microprocessor system for real-time digital correlation' Proceedings of IEEE International Conference on System Engineering (September 1989) T M S 3 2 0 C l x / T M S 3 2 0 C 2 x A s s e m b l y Language Tools, rexas Instruments, SPRU018

Subramaniam Ganesan is an associateprofessor at Oakland University. He received BE, MTech and PhD degreesfrom PSGTech and the Indian Institute of Science, India. From 1971 to 1983 he wasat National Aeronautical Laboratory, India. From 1979to 1980 he was a research fellow with DAAD fellowship at Ruhr University, FRG. During 1983 he was a research associate at Concordia University, Montreal, Canada. During 1984-86, he was a faculty, member at Western Michigan University, Kalamazoo, USA~ His research interests are in multiprocessin~, parallel architectures and signal processin~ applications.

Microprocessors and Microsystems

Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.