A PARALLEL HYBRID SPARSE LINEAR SYSTEM SOLVER

June 7, 2017 | Autor: Zahari Zlatev | Categoria: Computer Science, Numerical Analysis, Linear Algebra, Sparse coding

Descrição do Produto

Computing Systems in Engineering Vol. 1, Nos 2-4, pp. 183-195, 1990

0956-0521/90 $3.00+0.00 © 1990 Pergamon Press plc

Printed in Great Britain.

A PARALLEL

HYBRID

SPARSE LINEAR

SYSTEM SOLVER

K. GALL|VAN,t A. SAMEHt a n d Z. ZLATEV~ tCenter for Supercomputing Research and Development, University of Illinois at Urbana-Champaign, 305 Talbot Laboratory, 104 South Wright Street, Urbana, IL 61801, U.S.A. ~tAir Pollution Laboratory, Danish Agency of Environmental Protection, Risoe National Laboratory, DK-4000 Riskilde, Denmark (Received 27 April 1990)

Abstraet-42onsider the system A x = b, where A is a large sparse nonsymmetric matrix. It is assumed that A has no sparsity structure that may be exploited in the solution process, its spectrum may lie on both sides of the imaginary axis and its symmetric part may be indefinite. For such systems direct methods may be both time consuming and storage demanding, while iterative methods may not converge. In this paper, a hybrid method, which attempts to avoid these drawbacks, is proposed. An L U factorization of A that depends on a strategy that drops small non-zero elements during the Gaussian elimination process is used as a preconditioner for conjugate gradient-like schemes, ORTHOMIN, GMRES and CGS. Robustness is achieved by altering the drop tolerance and recomputing the preconditioner in the event that the factorization or the iterative method fails. If after a prescribed number of trials the iterative method is still not eonvergent, then a switch is made to a direct solver. Numerical examples, using matrices from the Harwell-Boeing test matrices, show that this hybrid scheme is often less time consuming and storage demanding; than direct solvers, and more robust than iterative methods that depend on preconditioners that depend .an classical positional dropping strategies.

o p e r a t i o n s d u r i n g stage k ( k = 1 , 2 . . . n - 1 ) of G a u s s i a n e l i m i n a t i o n are carried out by the f o r m u l a

I, T H E HYBRID A L G O R I T H M

C o n s i d e r the system of linear algebraic e q u a t i o n s A x = b, w h e r e A is a n o n s i n g u l a r , large, sparse a n d n o n s y m m e t r i c matrix. W e a s s u m e also t h a t matrix A

a!k+~ q

is generally sparse (i.e. it has n e i t h e r any special p r o p e r t y , such as s y m m e t r y a n d / o r positive definiteness, n o r any special p a t t e r n , such as b a n d e d n e s s , t h a t can b e exploited in t h e solution of the system). Solving such linear systems may be a r a t h e r difficult task. This is so b e c a u s e c o m m o n l y used direct m e t h o d s (sparse G a u s s i a n e l i m i n a t i o n ) are t o o time c o n s u m i n g , a n d iterative m e t h o d s w h o s e success d e p e n d s on t h e matrix h a v i n g a definite s y m m e t r i c part or d e p e n d s o n the s p e c t r u m lying o n o n e side of the i m a g i n a r y axis are not r o b u s t e n o u g h . Direct m e t h o d s h a v e t h e a d v a n t a g e t h a t they n o r m a l l y p r o d u c e a sufficiently accurate solution, a l t h o u g h a direct e s t i m a t i o n of the accuracy actually a c h i e v e d r e q u i r e s a d d i t i o n a l work. O n the o t h e r h a n d , w h e n iterative m e t h o d s c o n v e r g e sufficiently fast, they r e q u i r e c o m p u t i n g time t h a t is several o r d e r s of m a g n i t u d e smaller t h a n t h a t of any direct m e t h o d . This b r i e f c o m p a r i s o n of t h e m a i n p r o p e r t i e s of direct m e t h o d s a n d iterative m e t h o d s for the p r o b l e m at h a n d shows t h a t the m e t h o d s of b o t h g r o u p s have s o m e a d v a n t a g e s a n d s o m e disa d v a n t a g e s . T t l e r e f o r e it s e e m s w o r t h w h i l e to design m e t h o d s t h a t c o m b i n e t h e a d v a n t a g e s of b o t h groups, while m i n i m i z i n g t h e i r d i s a d v a n t a g e s . T h r o u g h o u t we a s s u m e t h a t sparse G a u s s i a n e l i m i n a t i o n is t h e direct m e t h o d c h o s e n for the solution of A x = b. It is well k n o w n t h a t this is the best choice in t h e case w h e r e A is large a n d generally sparse; see for e x a m p l e Ref. 1 or 2. T h e a r i t h m e t i c 183

~(k}__.(k)t.(Ap~--l.(k) = ¢til ~ i k ~,Ukk ] tSkj ,

l)

where i=k+l, k+2...n, j=k+l, k + 2 . . . n , while (k) aii = a,~ are the e l e m e n t s of matrix A. It is clear that if a}~) = 0 while n e i t h e r ai,{,~ n o r a},}~ vanish, t h e n a new n o n - z e r o e l e m e n t , fill-in, is c r e a t e d in position (i, j). U n f o r t u n a t e l y , fill-in does occur w h e n large sparse matrices are f a c t o r e d by G a u s s i a n e l i m i n a t i o n a n d this m e t h o d b e c o m e s r a t h e r expensive (in t e r m s of time a n d storage r e q u i r e m e n t s ) w h e n m a n y fill-ins are i n t r o d u c e d . T h e r e f o r e , r e d u c i n g the n u m b e r of fill-ins is o n e of the m a i n tasks d u r i n g the d e v e l o p m e n t of sparse m a t r i x cocles, at least on sequential a n d vector c o m p u t e r s . Such m i n i m i z a t i o n of fill-in is a c h i e v e d by a d o p t i n g a suitable pivoting strategy; see for e x a m p l e Refs 3-5. T h e a m o u n t of fill-in m a y be large, h o w e v e r , e v e n w h e n a good pivoting strategy is a d o p t e d . F o r such systems direct m e t h o d s may lose c o m p e t i t i v e n e s s with iterative m e t h o d s if a conv e r g e n t m e t h o d can be f o u n d for the system. T h e successful use of iterative m e t h o d s often d e p e n d s u p o n the effective use of p r e c o n d i t i o n i n g . Preconditioning a system A x = b involves using the iterative m e t h o d to solve the r e l a t e d system M -1 A x = M - ~ b , w h e r e the p r e c o n d i t i o n e d M is easily invertible and M -1 A ~ I. T h e choice of M is a n art in itself a n d dep e n d s on the iterative m e t h o d used, the application from which the system arises, and, for high-perf o r m a n c e m a c h i n e s , the a r c h i t e c t u r e o n which the a l g o r i t h m is to execute. 6 F o r the p u r p o s e of o b t a i n i n g an a p p r o x i m a t e faetorization of A for p r e c o n d i t i o n i n g , while main-

184

K. GALLIVANet al.

taining efficiency in the sparse Gaussian elimination process, we attempt to reduce further the number of fill-ins as follows (for more details see Ref. 7). Let 'r be a parameter that satisfies 0---'r < 1

(2)

a ~ )= max([a}.k~)+.[, ]a!.kk)+2l...la!k~),

(3)

and let

where the elements over which the maximum is taken form the active part of row i at stage k. The parameter "r will be called the drop-tolerance, because any element at stage k that satisfies

la~v~'l -- "~al~'

(4)

is d r o p p e d (removed from the arrays where the nonzero elements, together with their indices, are kept and neglected in the computations after stage k). It is clear that by choosing a sufficiently large droptolerance "r the number of fill-ins can be reduced considerably. This will lead to an approximate factorization stage in which both the computing time and storage requirements are substantially reduced compared to the classical direct methods. It is also clear, however, that there are two difficulties with this approach: (1) there is no guarantee that the factorization will be completed successfully when a large drop-tolerance is specified; and (2) the solution obtained with the factors calculated by using a large drop-tolerance (assuming that the factorization is successfully completed) will normally be inaccurate. While it is assured [see Eqs (2)-(4)] that not all the elements in the active part of a row will be removed, it is possible that all non-zero elements in the active part of a column will be dropped when the drop-tolerance is large. Therefore, it is useful to enhance the dropping criterion defined by Eqs (2)(4) by adding an extra requirement that the dropped non-zero element is not the last one in the active part of its column. In this way, structural singularity is assured at least not to appear at the stage under consideration. Although this enhancement does not guarantee that structural singularity will be avoided throughout the procedure, it works rather well in practice. Once the approximate factors of A are successfully obtained as a preconditioner, an iterative method must be used in an attempt to obtain a good approximation of the solution. Depending on the drop tolerance, however, the iterative method may not converge. Experiments show, however, that conjugate gradient-type methods perform rather satisfactorily in this situation. Three such methods, O R T H O M I N , GMRES and CGS, are used in the experiments. The improvement made in the dropping criterion and the use of an iterative method (preconditioned with the factors L and U obtained by Gaussian elimination) enhance the chances of solving the system A x = b with the desired accuracy with reason-

able efficiency. These two steps so far do not guarantee such success, however. A third step is necessary to yield a solution with the accuracy requested by the user when either the factorization is not completed (due to singularity resulting from dropping too many non-zero elements) or when the preconditioned iterative method is not convergent (or converges too slowly). This third step is rather obvious. If either the factorization process fails or the iterative scheme does not converge, then the drop tolerance must be reduced, new approximate factors L and U computed, and the iterative method restarted. This action can be repeated for a prescribed number of trials after which the droptolerance is set equal to zero, i.e. switching to a direct method. Let ,4(-r) denote the matrix such that, in the absence of rounding errors, A ( r ) = L U is formed by performing an approximate factorization of A using drop-tolerance -r (permutations required to form L and U have been ignored for simplicity of presentation). Note that A(-r)--->A as -r--+0. The hybrid solver can then be described as follows: DROP T O L E R A N C E -r IS G I V E N DESIRED A C C U R A C Y ~ IS G I V E N DO U N T I L (X IS ACCEPTED) IF (LU= A('r) EXISTS) T H E N

M~---LU x ~ (LU)-~b C A L L P C G _ T Y P E _ M E T H O D (M,A,x,b,~) IF (NOT C O N V E R G E D OR T O O SLOW) T H E N r (---pl(T) END IF

ELSE END IF END DO

The functions p~(-r) and p2('r) are functions that adjust the value of "r given an unsatisfactory performance by the iterative method and an unsuccessful factorization, respectively. The outer loop around the classical form of preconditioning which makes use of the two reduction functions yields a robust algorithm--in the worst case a direct method will eventually be used. By recomputing the preconditioner with smaller "r when the iterative method does not appear to be performing well we avoid the use of a poor preconditioner and the subsequent inefficiency. The adaptive behavior of the algorithm can therefore be used, starting with a relatively large initial -r, to allow the algorithm to find a drop tolerance that is natural for the problem. The early iterations with large -r require some extra time but the fact that many elements are dropped reduces the number of operations performed (significant for a single processor) and provides more opportunity for the creation of parallel pivot sets (important for parallel processors). The effort is usually repaid with rapid convergence of the iterative method and can be very worthwhile if a sequence of problems is to be solved with similar matrices, i.e. those with effective values of • that are about the same.

Parallel hybrid sparse linear system solver Assuming that the above steps are properly incorporated in a code, we must show the robustness and effectiveness ,of the hybrid. Below we illustrate by numerical examples that (see also Ref. 7): (1) this hybrid m e t h o d is more robust than other preconditioned iterative schemes available in the literature., and often there is no need for recalculating the approximate factors, and in this case the hybrid scheme is much faster than direct methods; (2) the global computing time of the hybrid scheme is less than that of direct methods even if one has to recalculate the approximate factors once or twice; (3) even in the worst case (when we have to switch to a direct method), the increase of the computing time is not that high. It is important to carry out the third step (the reduction of T and refactorization) properly. This step is rather time-consuming and, therefore, should be p e r f o r m e d only when either the factorization process with a given drop tolerance fails, or when the iterative m e t h o d does not converge. In the first case, the third step, is activated when the code detects a zero column or row. In the second case, failure of the iterative scheme, it is crucial to determine when to abandon the iterations and to resort to determining new approximate factors with a more stringent droptolerance. In the next section we shall show that it is possible to satisfy these two requirements and, thus, it is possible to implement all three steps efficiently.

2. THE STOPPING CRITERIA

Designing ,;topping criteria for iterative methods applied to systems A x = b , where A is a general matrix, is a critical task. Since Krylov sub-space methods (se, e above conjugate gradient-type methods) are not guaranteed to converge for general linear system,,;, the first task is to determine whether the iterates are converging. The residual vector r, defined by ri =

b - Axi

= m(x-xi),

(5)

where x, is the ith iterate, is often used in formulating stopping criteria. It is clear from Eq. (5) that some norm of the residual vector may provide a reliable estimate of tlhe norm of the error if the norm of matrix A (corresponding to the vector norm chosen) is, roughly speaking, of order 1. If IlZll is large, a stopping criterion based on the use of Ilr,II may not detect that the error in the solution, IIx-xL is small. This could be illustrated by taking only one equation with A = b = 10 "~. Assume that the accuracy required is A C C U R = 1 0 - 4 and that the current approximation x, is such that Ix-x,I = 10-"'. T h e n Ir, I = 1 ;and a stopping criterion based on the use of the residual will be misleading. If IIAII is sman, a stopping criterion based on the use of IIr,lr does not yield useful information about the

185

error of the approximation, IIx-x,ll. To illustrate this, consider again one equation only, this time with A = b = 10-"'. Let the accuracy required be again defined by A C C U R = I O -4. Assuming that the current approximation x~ is such that Ix-x,I = 10-', then tr~l = 10 -5 and a stopping criterion based on the use of the residual alone will be misleading. The examples given above are extreme (and in practice the situation will often be better). Nevertheless, these examples indicate that it is not a good idea to have the matrix A (or some norm of this matrix) involved, directly or indirectly, in the stopping criteria. A n interesting attempt to eliminate the influence of A is m a d e in the so-called simple version of G M R E S (the theoretical background of code G M R E S is discussed in R e f 8). In this code the iterations are stopped if

IIr,II< A CCUR,

IIr.II

(6)

where A C C U R is the accuracy required by the user. Assume that

I~A(x-x311-~ LIAIIIIx-xL

(7)

where i = 0,1 .... U n d e r this assumption

IIr,II IIx-x,ll IIr.ll-tk-x.II

(8)

and, if the starting approximation xo is equal to zero, then an attempt to evaluate the relative error is made by the stopping criterion in G M R E S (at the same time eliminating the influence of matrix A). While the assumption under which this criterion works, Eq. (7), is not unrealistic, there will be difficulties when the starting approximation is close to the exact solution. This could be the case, for example, when solving time-dependent problems. It follows then that it is worthwhile to determine whether or not the iterative process is convergent without using the matrix A in the criterion developed. In many iterative processes the new approximation is obtained from the old one by using a simple transformation: xi+t = xi+eL,p,

(9)

where i = 0 , 1 . . . , a, are constants and p, are certain correction vectors. Based on this observation, a convergence check with the desired properties can be derived. Let us assume that Eq. (9) holds for the iterative m e t h o d selected. This is true for O R T H O M I N (see Ref. 9) and for CGS (see Ref. 10), but not for the G M R E S (see Ref. 8). Let us assume also that the computations with the above formula are carried out without rounding errors. Then, if the iterative m e t h o d under consideration is convergent, the exact solution x of the system A x = b can be written as x=xi+

~ oL,pj, !

t+l

(10)

186

K. GALLIVANet al.

and an upper bound on the norm of x = xi can, under s o m e mild assumptions, be given by

ii~_x,ll_ R A T E . Such a criterion is rather restrictive in connection with conjugate gradient-type methods and very often stops the iterative process very early and, what is even more important, unnecessarily (because the process will often converge, although slowly, even when R A T E i > R A T E is occasionally true). An attempt to evaluate the rate of convergence, especially for iterative refinement, is made by using the p a r a m e t e r R A T E . In connection with conjugate gradient-type methods, however, it is often useful to set R A T E close to 1 ( R A T E = 0 . 9 9 9 is used in our experiments). This is so because conjugate gradienttype methods often converge very slowly at the beginning with remarkable improvements later. In fact, these slow and fast regimes of convergence could occur several times o v e r the course of solving a system. On the other hand, when solving timed e p e n d e n t problems, for example, it is advisable to keep the value of R A TE small. The code in such a case will, after several trials and reductions of'r, find some sufficiently accurate preconditioner for which

ACCUR

Lihat lebih banyak...

A PARALLEL HYBRID SPARSE LINEAR SYSTEM SOLVER

Descrição do Produto

Comentários