A formal design methodology for parallel architectures

June 6, 2017 | Autor: Khaled Elleithy | Categoria: Design Methodology, Matrix Multiplication, Parallel Architecture

Descrição do Produto

A FORMAL DESIGN METHODOLOGY FOR PARALLEL ARCHITECTURES KHALED M. ELLEITHY & MAGDY A. BAYOUMI The Center For Advanced Computer Studies University of Southwestern Louisiana Lafayette, LA 70504, U.S.A.

ABSTRACT-- In this paper, we introduce a formal approach for synthesis of array architectures. Four different fovms are used t o express the input

algorithm: simultaneous recursion, recursion with respect t o dsferent variables, fixed nesting and variable nesting. Four different architectures for the same algorithm are obtained. As an example, a matrix-matrix multiplication algorithm is used t o obtain four different optimal architectures. The dtfferent architectures of this example are compared in terms of area, time, broadcasting and required hardware.

1. INTRODUCTION Reported techniques for high level synthesis surer from the following disadvantages: some systems are not suitable for large problems (Emerlad[l]), some systems require to know the target architecture in advance in order to have the structural description [2-4], a number of restrictions are imposed on the input description (Flamel[5], HART[6]), Certain phases of the design process are not automated due to the lack of algorithms to transform between different representations; such limitations require the designer’s responsibility for designing these phases (HARP[6]), the designer is responsible for specifying the operations sequencing and communications among different units [7,8], and some systems are limited to a special class of algorithms [9-113. A formal design methodology for high level synthesis has been introduced in [12-141 to overcome the previous disadvantages. The architectures produced by this methodology can be classified as uniprocessor architectures. T o exploit the parallelism in a given algorithm the methodology has been generalized so that it can be applied t o the simultaneous recursion form [15,16]. I n this paper the methodology is applied to the following forms: [l] Recursion with respect to several variables. [2] Fixed nested recursion. [3] Variable nested recursion. The methodology provides two main features: completeness and correctncrs. Completeness means the ability to use tlic approach for any general algorithm. Correctness is achieved by using a set of transformations that are proved t o be correct. A formal framework for the synthesis procedure has bcen developed which can be easily automated. A design example of matrix-matrix multiplication is used with each one of the forms t o obtain a parallel architecturc. These different architectures for this

CH2920-7/9010000/0603$01 .OO 0 1990 IEEE

603

Authorized licensed use limited to: University of Bridgeport. Downloaded on March 01,2010 at 15:04:30 EST from IEEE Xplore. Restrictions apply.

604

International Coqference on Application Specific Array Processors

example are compared in terms of area and speed.

2. RECURSION WITH RESPECT TO SEVERAL VARIABLES If xi (16i< n ) are n-1 place functions, z is n place functions, y is 2n place function and w!? are n place functions, then z is defined by the following Algorithm Specification Linguage (ASL)[12-141 code:

Transformation Algorithm to RSL To transform the system of recursion with respect to several variables to the Realization Specification Language RSL[12-14] representation we implement each equation using the same method described in[12-14]. Here is the RSL representation of the system: Initp(l,avgl ; 2,arg2 ; . . ' ;n,ayB,)

I

= p; +I m c ( I )

Ready

-

z ( 0 , a r g 2 ; . . ,a a , , ) p

d

Y(0,a

w,,

. . . ,a a , )

eg?(I , m )

-

=A

Comp(arg2,...., a w n # xl) n d ( a r g y ,a r g y ,...,arg,fi'"iY)

(1)

(3) (4)

(5)

Authorized licensed use limited to: University of Bridgeport. Downloaded on March 01,2010 at 15:04:30 EST from IEEE Xplore. Restrictions apply.

605

Potpourri 111 z (argl+l

..

0 arg,

.

.....awn)

-

Comp(arBl+l

..

0 arg3

.

. . . . . awn #

z f i d y ( a r g l + l 0 , a V 3 . . . . . a V I j )= A n d ( a r g l + l f i d y arg3fidy

......................................

2

)

.....arg, f i d Y

~

)

Equation 1 is used to show that we use n registers to be initialized with the arguments (argl, . . . ,ay&,,). Equation 2 means that the unit Suc which is a basic function has its inputs control, ( l < i < Y ) connected to the readyi output of the unit computing xi t o be sure that I is not incremented until xi is computed. Equation 3 is used t o represent the fact that I is incremented every clock cycle using the Suc unit, and I is initialized to the value 1 using the register number n + l . Equation 4 determines the end of operation when I reaches the value m . Equations 5,6,7 represent the composition operation in equations 1,2,3 of the ASL representation respectively. Proof of correctness for the algorithm can be found in[17]. Example: Let us use the recursion with several variable to define the greatest common divisor (GCD).T w o variables are required to dcfinc the GCD.The definition is as follows: G C D (0,n ) n G C D ( m +1,0) m +1 G C D ( m + l , n + l ) p(m,n,GCD(iv(m,Iz),n+l),GCD(m+l,w(n,m))) w(m,n)< m+1 w(m,n) < n + l P(m,n,a,b)= a n > m

--

Authorized licensed use limited to: University of Bridgeport. Downloaded on March 01,2010 at 15:04:30 EST from IEEE Xplore. Restrictions apply.

606

International Coderence on Application Specijic Array Processors

b

m a )= x ( a ) ,y(n , w ( n > f i ,y(n , a ) ) ) )

Although the solution of the function y gets more complicated as we go for higher values, we are still able to solve it in the same previous method. The solution we obtain for y(2,a) is expressed in the functions z , w $ but not the function y . This means if the functions z ,w+ are primitive recursive than y is primitive recursive since it is driven from z ,w,x by composition. Transformation Algorithm to RSL To transform the system of nested recursion to RSL we implement each equation using the same method described in[12-14]. Here is the RSL representation of the double nested recursion:

Initp (1,rz ;2,a )

(1)

Authorized licensed use limited to: University of Bridgeport. Downloaded on March 01,2010 at 15:04:30 EST from IEEE Xplore. Restrictions apply.

608

International Conference on Application Specific Array Processors = x (a

~*C,O,,,l

I

= p: IUC (I)

-

Ready

e q ? ( l ,n )

Temp, = C o m p ( n , a # y ) = And (prRIady , p Temp?

-

y)

Temp, Comp (n,a,Templ # w ) TempFdy - A n d (prRId' , p y , Temp?) Temp Temp:&'

y

-

,

Comp ( n ,Temp # y )

= And (p;&'dY

,Temp?)

Comp(n,a,Temp3 # z )

yaady - A n d (praady

, p y d y , Temp?)

Proof of correctness for the algorithm can be found in[17],

3.2 Variable Number of Nestings If x is 1-place function, and y is 2-place function then y is a double nested recursion function defined by the following ASL: Y(0 n ) - x ( n ) r ( m + 1 , 0 ) = r ( m 1) y(m+1, n+l) -r(m~(m+l,n)) 1

3

To see that the previous definition doesn't behave in the same manner as the definition in section 3.1. let us computc values ofy for different m , n .

From the computation of y for different m ,n we notice that the number of ncstings is not constant and depends on earlier values. This means that nested recursion can leads out of primitive recursion. Transformation Algorithm to RSL To transform the system of nested recursion to RSL we implement each equation using the same method described in [12-141. Here is the RSL representation of

Authorized licensed use limited to: University of Bridgeport. Downloaded on March 01,2010 at 15:04:30 EST from IEEE Xplore. Restrictions apply.

Porpourri 111

609

the double nested recursion: Inirp(1,n ; 2,m) J*C,,,,,,I

I

I

--

x (a

p;

SUC(Il)

p,'

SllC(I*)

Ready - A n d (eg?( I , , n ) , eg? ( I , ,n))

-

y ( O , n ) cOWlP(7Z # yfi.dy(O,n) p d v

-

X )

( n) y(m+l,O) Comp(m,l # y ) yfin'"(m+l,O) - y f i a d y ( m , l ) Temp, Comp(m+l,n # y ) Temp? - A n d (prfindy , p r f i d y ) ~

-

-

(7)

,

y Comp ( m ,a ,Temp # Y 1 y f i p d y- A n d ( p T f i a d v , T e m p y d y ) Temp? y

randy

- A n d (pYfindy , Temp?) Comp(n,a,Temp3 # z ) A n d ( p l ) f i d y , p? , Temp?)

-

(9)

I

Proof of correctness for the algorithm can be found in[17]. Table 1 shows a comparison between diff'erent forms of recursion in terms of architecture, broadcasting and complexity of the controller. The simultaneous recursion is the only form that gives a two dimensional array. All forms have broadcasting except the variable nesting. The controller of the variable nesting is complex compared with the other three forms. Table 1. Comparison Between Different Forms of Recursion.

Simultaneous

Several variables

Fixed nesting

Variable nesting

tWO

dimensional

one dimensional

one dimensional

one dimensional

Broadcasting

yes

yes

Yes

no

Complexity of controller

simple

simple

simple

complex

Architecture

Authorized licensed use limited to: University of Bridgeport. Downloaded on March 01,2010 at 15:04:30 EST from IEEE Xplore. Restrictions apply.

610

Internutiowl Corlference on Application Specific Array Processors

5. Matrix-Matrix Multiplication Example An example of matrix multiplication is introduced as an application of different forms of recursion. The architecture has two matrices A and B as inputs, and matrix Cas an output. The multiplication is done in a recursive way and can be described by the following high level subroutine:

mat*-multiplication (A,B) begin for i-1 t o n for j = l to n begrn CiJ 0 for &-I t o n C. C;j,k-l + Ai,k nexidl end next i next j end

-

* Bkj

The ASL and RSL representation using the simultaneous recursion form is as follows:

...................................... s14cconmi

r Remltl

Resultl

-

-

Ready

-

[()““‘Y

p;+l $Uc(r)

= eq?(l ,m

comp (aT,I,p$’

Result

) #

innerproduct)

#

innerproduct)

...................................... comp (fiT,I,p$+’ Result

Figure 1 shows the architecture obtained for matrix multiplication. The details of im$cmcnting the inner-product cell arc shown in [14]. The architecture consists of

N inner-product cells. The number of cycles required to perform the multiplication

Authorized licensed use limited to: University of Bridgeport. Downloaded on March 01,2010 at 15:04:30 EST from IEEE Xplore. Restrictions apply.

Potpourri III

611

is N . Figure 2 shows the architecture using recursion with respect to several variables. The architecture consists of N multiplication cqls and one adder. The number of cycles required to perform the multiplication is N . Figure 3 shows the architecture using fixed nesting recursion. The architecture consists of N in2er-product cells. The number of cycles required to perform the multiplication is N . Table 2 shows a comparison between different architectures of the matrix-matrix mdtiplication.

-

-3

)

,A12,A11

Aln,

I

1

LE?

, Ready

suc

ready Inner Product cell

A2n,..,i22,Azi

$E-$

Product

Product

I

H An. ..,AnZ.An 1

Product cell

Product cell

Product cell

Fig. 1. Matrix-Matrix Multiplication Architecture Using Simultaneous Recursion.

Authorized licensed use limited to: University of Bridgeport. Downloaded on March 01,2010 at 15:04:30 EST from IEEE Xplore. Restrictions apply.

612

International Coderenee on Application Spec@c Array Processors

Table 2. Comparison Between Different Matrix-Matrix Multiplication Architectures.

Simultaneous

Area

Several variables

Nesting

e(= )

*)

Time

e(@

2,

W3)

A+T

6(n3)

0(n3)

Broadcasting

Yes

no

no

Hardware

inner-product

inultiplier (n) adder (1)

inner-product

)

(n)

Ready Control

'

-

ready

Result

@

Cl 1 c2 1 C31

control t

c12 An1

A11

A31A21AIl

ready

Bln

Bln

E11 B I I 811

L U

An2

A12

A32A22A12

B2n

B2n

821 821 021

Aln

A3nA2nAln

Cnn

Ann Bnn

.. Bnn ......

B n l Bn 1 0 n l

1 Dl

1

Fig. 2. Matrix-Matrix Multiplication Architecture Using Recursion With Respect to Several Variables.

Authorized licensed use limited to: University of Bridgeport. Downloaded on March 01,2010 at 15:04:30 EST from IEEE Xplore. Restrictions apply.

Potpourri III

613

6.CONCLUSIONS I n this pa er an formal approach for transforming different forma o f recurtion parallel arc&tectures has been introduced. Four different forms are used to express a given algorithm. Four optimal architectures for a matrix-matrix multiplication are compared. The approach has the following advantages: __ It is suitable for large problems since the transformation algorithm is linear. It does not require to know the target architecture in advance. There are no restrictions imposed on the input description. The technique is fully automated. The designer is not responsible for specifying the operations sequencing and communications among different units. The approach is applicable to any general algorithm. Parallcl properties of algorithms are explored. to

n

I

SUC

Ann

.._.. A l n A l n 0000 .. 00

Bnn

....

’

Bn2 Bnl 0000 .. 00 +

1

Ann

.....

Bnn

.....

......

Result

pr*d’ct

>

(1)

Ready I

m

Ann

/nnpl;

A12 A 1 2 0

IT

Fin. 3. Matrix-Matrix Multiolication UsinPr Fixed Number of Nestinzs.

Authorized licensed use limited to: University of Bridgeport. Downloaded on March 01,2010 at 15:04:30 EST from IEEE Xplore. Restrictions apply.

614

International Codererace on Application Specific Array Processors

References C. J. Tseng, "Automated Synthesis of Data Paths in Digital Systems," Ph.D. Dissertation, CMU, Apr. 1984. T. J. Kowalski, D. J. Geiger, W. H. Wolf, and A. C. Parker, "The V U 1 Design automation assistant: From Algorithms to Silicon," IEEE Design and Test, pp. 33-43, Aug. 1985. T. J. Kowalski, "The V U 1 Design Automation Assistant: A Knowledge-based Expert System," P1i.D. thesis, CMU, 1984. T. J. Kowalski and D. E. Thomas, "The VLSI Design Automation Assistant: Prototype System," Twenty Design Automation Conf. Proc., Miami, pp. 479483, 1983. H . Trickey, "Flamel: A High Level Hardware Compiler," IEEE Trans. on Computer Aided Design, vol. CAD-6, no. 2, pp. 259-269, March 1987. T. Tanaka, T. Kobayashi and 0. Karatsu, "HARP: Fortran t o Silicon," IEEE Trans. on computer-aided design, vol. 8., no. 6., pp. 649-660, June 1989. C. Niessen, C.H. van Berkel, M. Rem, and R W. Saeiji, "VLSI Programming and Silicon Compilation; A novel Approach from Philips Research," Intl Conf. o n Computer Design: VLSI in Computer and processors, pp. 150-151, Oct. 1988. C.H. van Berkel, M. Rem and R W. Saeijs, "VLSI Programming," Intl Conf. on Computer Design: V U 1 in Computer and processors, pp. 152-156, Oct. 1988. S. D. Johnson, "Synthesis of Digital Designs from Recursion Equations," Ph.D. Dissertation, Comp. Sc. Dept., Indiana Uni., 1983. [lo] P. Quinton, "Automatic Synthesis of Systolic Arrays From Uniform Recurrent Equations," Proc. of the 11 th, Annual Intl Symp. on Computer Architecture, pp. 208-214, 1984. [ll] S. K. Rao, "Regular Iterative Algorithms and Their Implementationr o n Processor Arrays," Ph.D. Dissertation, Dept. of Electrical Eng., Stanford Uni., 1985. [12] K. M. Elleithy and M. A. Bayoumi, "Synthesizing DSP Architectures from Behavioral Specifications: A Formal Approach," Proceedings of the 1990 IEEE International Symposium for Circuits and Systems, May 1990. [13] K. M. Elleithy and M. A. Bayoumi, "A Formal High Level Synthesis Approach for DSP Architectures," Proceedings of the 1990 International Conf. on Acoustics, Speech and Signal Processing, April 1990. [14] K. M. Elleithy and M. A. Bayoumi, "A Frame-work for High Level Synthesis of Digital Architectures from precursive Algorithms," Proc. of the ACM Eighteenth Annual Computer Science Conference, pp. 305-311, Feb. 1990. [15] K. M. Elleithy and M. A. Bayoumi, "Formal Synthesis of Parallel Architectures from Recursive Equations," Accepted for inclusion in the 1990 International Conference on Parallel Processing, Aug. 1990. [16] K. M. Elleithy and M. A. Bayoumi, "A Formal Framework for Synthesis of Parallel Architectures," Proceedings of the Fourth Annual Symposium on Parallel Processing, April 1990. [17] k. M. Elleithy, "A Formal Framework for High Level Synthesis of Digital Designs," Ph.D. Dissertation, The Center for Advanced Computer Studies, University of SW Louisiana, 1990.

Authorized licensed use limited to: University of Bridgeport. Downloaded on March 01,2010 at 15:04:30 EST from IEEE Xplore. Restrictions apply.

Lihat lebih banyak...

A formal design methodology for parallel architectures

Descrição do Produto

Comentários