Context-free parsing on O(n) processors

June 2, 2017 | Autor: David Barnard | Categoria: Cognitive Science, Computer Software, Computer Languages

Descrição do Produto

Comput. Lang.Vol. 17, No. 1, pp. 61~56, 1992 Printed in Great Britain. All rights reserved

CONTEXT-FREE

0096-0551/92 $3.00+0.00 Copyright© 1991 PergamonPressplc

PARSING

ON

O(n)

PROCESSORS

D. T. BARNARD and D. B. SKILLICORN Department of Computing and Information Science, Queen's University, Kingston, Ontario, Canada K7L 3N6 (Received 3 April 1990; revision received 28 September 1990)

Abstract--The growing availability of multiprocessors makes it interesting to treat compilation itself as an algorithm to be parallelized. One approach is to allocate a processor to each symbol or token of the source program and find massively parallel algorithms for the various phases of compilation. The well-known Cocke-Younger-Kasami (CYK) algorithm for parsing context free languages runs in O(n 3) time on a uniprocessor. The CYK algorithm can be adapted to run on O(n) processors in O(n 2) time. The number of processors in the present generation of multiprocessors makes this a reasonable practical approach. Parallel parsing Cocke-Younger-Kasami algorithm pattern recognition Context free languages

1. P A R A L L E L

Natural language recognition

Syntactic

COMPILATION

M a n y multiprocessors of different types are currently available, ranging from those with four or eight processors up to massively parallel machines with thousands of processors. All of the programs that execute on these machines must be compiled. Yet invariably the compilation task is done using the well-understood sequential techniques, either on a supporting host machine or on one of the multiprocessor's own processors. The reason for the lack of attention to improving the performance seems to be that the primary use of high performance multiprocessors is for compute-intensive numerical computations. Because these typically have long execution times, the overhead of sequential compilation is negligible. However, we must expect that proliferation of such machines will make them the standard workhorses of computing, much as workstations are now. In this new environment compilation will be a much more significant part of the workload. Sequential compilation can have time complexity varying from linear to almost cubic in the length of the source program, depending on the complexity of the grammar. In contrast, parallel compilation holds out the possibility of time complexity that is poly-logarithmic in the length of the source program. Apart from a great deal of theoretical work on the complexity of compilation in parallel, there are several different approaches to practical parallel compilation. The most obvious approach is to pipeline the various phases of the compiler: lexing, parsing, semantic analysis, optimization etc. [1]. This approach is limited by the number of phases into which the compiler can reasonably be broken and by the amount of global information required by each phase. For example, if the language is such that it is not possible to decide what a program is until the last token has been seen then the overlap between the parsing and semantic phases will be negligible. Thus this approach can be reasonably successful on machines with limited parallelism but it seems unlikely to scale. A second approach is to make an initial sequential pass over the source program to determine important tokens such as module and procedure boundaries. This information is used to divide the program up so that different program units are compiled on separate processors. This approach has the advantage that it uses existing compiler technology for the work in each processor, with new phases added at the beginning and possibly the end of compilation. It will scale better than the pipelined approach since the number of procedures will typically be large. Research using this approach can be found in [2--4]. A more radical approach is to assume massive parallelism and re-think the compilation task on the assumption that there are about as m a n y processors as there are symbols in the program being 61

62

D.T. BARNAROand D. B. SKILLICORN

compiled. In most programming languages modules can be separately compiled. Thus we need multiprocessors with about as many processors as the number of tokens in a module--a generous amount of parallelism but one that is available even today. This approach opens up many new theoretical problems as well as practical ones. It is not yet clear to what extent this approach can be made to work. The early phases of compilation have been investigated. Hillis and Steele [5] describe a parallel lexing algorithm, based on an idea of Schell's [6], that runs in time logarithmic in the length of the source program. Although designed for the Connection Machine, it can be easily generalized for other architectures. An implementation can be found in [7]. In an earlier paper we described an algorithm with a logarithmic number of phases that parses LL grammars [8]. For a large sublcass, the operations of each phase are constant time so that the overall parsing algorithm is of logarithmic time complexity. Other work of the same general kind includes evaluating parallel attribute grammars [9], and approaches built on parallel tree contraction such as [10] and [l 1]. These last construct fast parsing algorithms but at the expense of severe restrictions on the grammars used. A summary of recent work in parallel compilation can be found in [12]. In this paper we describe a modification to the Cocke-Younger-Kasami (CYK) algorithm that requires only a linear number of processors and runs in time quadratic in the length of the source program. This algorithm allows any context free grammar to be parsed. It thus provides an upper bound on the time complexity of parsing using a linear number of processors. There are some applications in which the full power of context free grammars is required, such as natural language recognition and syntactic pattern recognition. For such applications our algorithm is the best known with linear hardware requirements. Other fast parsing algorithms are known--unfortunately they require a super-linear number of processors and are thus not of great practical interest. Systolic recognition algorithms using O(n 2) processors and running in O(n) time are well known [13-15], and several parsing algorithms with the same complexity have been recently discovered [16]. A PRAM parsing algorithm running in O(log 2 n) time on O(n 6) processors is known [17]. If the context free language is unambiguous, this can be improved to O(log n) time and O(n 3) processors [18], and if it is deterministic this can be further improved to O(log 2 n) time and O(n 2) processors [19]. 2. P A R S I N G The standard CYK algorithm [20] parses context free languages by constructing a table in which the entries represent possible interpretations of portions of the input string. It assumes that the grammar is in Chomsky normal form, so that productions are either of the form A -, a for some terminal a, or of the form A --, B C . The restriction to Chomsky normal form is useful for expository purposes--any context free grammar can be transformed into Chomsky normal form. The algorithm runs in O(n a) time on a uniprocessor and requires O(n 2) storage for the table. A variant suitable for vector processors of size O(n) was developed by Partsch [21]. Kosaraju [15] also described a systolic version requiring O(n) processors and O(n 2) time, but our version is much simpler. In this paper, we show how to adapt the CYK algorithm to run on a linear number of processors. Algorithms that require hardware linear in the size of programs to be compiled are attractive in a way that algorithms requiring quadratic hardware or worse are not. We hope that a class of grammars can be discovered that is analogous, in the parallel world, to LR--almost as useful as general context-free grammars, but substantially faster to parse. In the meantime, the result of this paper shows that no worse than quadratic time is required to parse any context free language. We are continuing to investigate this question. Our revised algorithm has the same time-processor-memory product as the standard uniprocessor version of the CYK algorithm. 3. THE NEW A L G O R I T H M The parallel algorithm builds the same table as the CYK algorithm, but builds it a row at a time. We regard the table as being in left upper triangular form. Each entry in the table is a set of

Context-free parsing on O(n) processors

63

productions and hence is bounded above by the size of the grammar. Each entry M[i,j] is the set of productions A ~ ct such that ~t * aia~+l .. • aj, i.e. • derives agag+l .. • aj. Each entry in the first row of the table is generated by listing all productions that might produce the terminal in that position in the string. Subsequent entries are produced by composing table entries---the set of table entries M[i,j] can arise in j - i distinct ways, based on the entries M[i, i]. M[i + 1,j] M[i,i + 1]. M[i + 2,j] M [ i , j - 1]. M [ j , j ] The composition of two matrix entries requires examining all productions in the table entries being composed and checking if any of them are right-hand sides of productions in the grammar. Hence the composition operation is O(1) in the length of the input. For practical parsing, it is of interest to ask how large the constant must be. The composition operation involves string matching the composed table entries against the right-hand sides of productions. This is at worst a linear function of the size of the grammar, and standard techniques can reduce it substantially. There may be multiple table entries which need to be composed, and these can increase the time for composition. However, in most practical settings they are rare since they correspond to multiple possible interpretations of the substring they represent. For example, in a natural language application, a word may be a noun or an adjective, or a clause can be dependent or an entire sentence, but such uncertainties are usually quickly resolved by further content. Thus we expect uncertainty to occur locally but not to persist for many steps. All of this suggests that even for rather small input strings the composition operation can be treated as constant time. We now show how to generate the entries of this table on a linear array of n processors, one for each symbol in the input string. Each processor will compute, in turn, the table entries M[i, i], M[i, i + 1], M[i, i + 2]. . . . . M[i, n]. The leftmost processor will therefore compute entry M[1, n]. The string represents a valid context-free program if the start symbol is present on the left hand side of one of the productions in this entry. The information required for processor i on step j is: the entries M[i, i]. . . . . M[i, i + j ] which it has computed on its previous steps; and the entries M[i + 1, i + j + 1], M[i + 2, i + j + 1]. . . . . M[i + j + 1, i + j + 1] which were computed on the previous step by the processor to its right. Processor i will compute the entry M[i, i + j + 1] by composing the entries M[i, i].M[i + 1, i + j + 1] M[i,i + I].M[i + 2, i + j + 1] M[i, i + j ] . M [ i + j + 1, i + j + 1] For each entry a pointer, indicating at which position in the input string it was composed, must be kept for the subsequent parsing step. We assume that it takes one time unit to perform a composition and to send an entry from one processor to its neighbour. A processor may receive a table entry, transmit a table entry, and perform a composition simultaneously. Now processor i performing stepj receives the table entries from its right-hand neighbour in the order given above, beginning at time instant x. By time x + 1 it has performed the first of the compositions listed above and received the second table entry from its right neighbour. By time x + 2 it has performed the second composition and received the third entry from the right. Afterj time instants it has computed all of the compositions listed above and can send the entry M[i, i + j + 1] to its left neighbour, followed by all of the subsequent entries M[i + 1, i + j + 1] that is received. Figure 1 shows the activity of some processors in the middle of a string, with time shown downwards. Figure 2 shows the activity of a general processor. Because each processor cannot send its computed entry after step j until it has received j entries from its right neighbour it is clear that completion time for step j = completion time for step (j - 1) + j

64

D.T. BARNARDand D. B. SKILL1CORN 4

t'~r~e~sor3

/ ~3

M[4,4]

M[3,31

/ M[3,3].M[4,5]

6

M[5,5]

M[6,6]

M[5,5].M[6,6]

M[4,4].[5,5]

M[3,3].M[4,41

5

M[5,5] /

"7 "7 M[4,4].M[5,61 M[4,5].M[6,6]

M[3,4].M[5,5]

/

Time

M[6,6] /

Eachprocessorkeepsthe entries it has computed t~8

M[3,3].M[4.6]

It takesone timeunitfor a table

t-~ t=10

M[3,4].M[5,61 M[3,5].M[6,6]

entryto be Iransmittedto an

/1

op.oor

Fig. 1. Information flow and entries computed. There are n steps and hence the total time for the computation is O(n2). Since each step takes time only a small constant, we can expect the parallel algorithm to outperform the sequential algorithm as soon as the input string is larger than a small multiple of the number of productions in the grammar. Each processor must keep all of the previous table entries that it has computed. Hence the storage requirement for each processor is linear in the number of symbols in the input, for a total of O(n 2) storage (actually, an extra logarithmic factor should be included to account for pointers that are used during the parsing step). The algorithm so far has only been concerned with recognition. However, parsing is straightforward by reversing the flow of information. The first processor selects a production with the start symbol on its left-hand side, takes the right-hand side and keeps the leftmost symbol for itself, sending the rightmost to its neighbouring processor to the right. Because a marker has been kept indicating how the production was composed, both processors know which symbols of the input

65

Context-free parsing on O(n) processors PROCESSOR i

+1, i+j+l] Time

Mti,i+jq M[i,i+l]

M[i+2, i+j+l]

M[i, i]

M[i+l, i+j+l]

M[i, i]. M[i+l, i+j+ll M[i, i+ll. M[i+2, i+j+l]

MIi, i+j]. Mti+j+l, i+j+l] Fig. 2. Detail of computation at a general processor. string result from each symbol of the production. Hence they can determine which processors should repeat the parsing step. We assume that the parsing algorithm is a prelude to further compilation steps, so that we are not concerned with extracting the parse from the processors. However, no processor can be responsible for more than a logarithmic number of productions in the parse, so that an extraction step adds no more than a logarithmic time to the overall algorithm. 4. S U M M A R Y We have presented a variant of the C Y K algorithm for parsing context-free languages which is of interest in the development of parallel compilers. The combined time-processor-storage product for this algorithm is the same as that of the C Y K algorithm on a uniprocessor, except for some added storage for holding the pointers to where compositions took place. Note that the C Y K algorithm must essentially remember this information as well. We have traded a linear number of processors for a linear reduction in the time complexity. Multiprocessors of the present generation have enough processors for it to be realistic to allocate one processor to each token of an input p r o g r a m - - a t least it will become so when I/O has developed to the point where processors can access the file store in a non-sequential way. Hence algorithms that assume a number of processors linear in the input size are realistic in a way that algorithms that assume quadratic numbers of processors are not. Acknowledgements--We would like to thank Laurent Langlois for his careful reading of this paper. This work was

supported by the Natural Sciences and Engineering Research Council of Canada and by the Information Technology Research Centre of Ontario. REFERENCES 1. Vandevoorde, M. T. Parallel compilation on a tightly coupled multiprocessor. Techincal report, DEC report SRC TR-546; 1987. 2. Seshadri, V., Wortman, D. B., Junkin, M. D., Weber, S. Yu, C. P. and Small, I. Semantic analysis in a concurrent compiler. In Proceedings ACM SIGPLAN '88 Conference on Programming Language Design and Implementation, pp. 233-240; 1988. 3. Seshadri, V., Small, I. S. and Wortman, D. B. Concurrent compilation. In Proceedings of the IFIP Conference on Distributed Processing, Amsterdam; 1987.

66

D.T. BA~NARDand D. B. SKILLICORN

4. Gross, T., Zobel, A. and Zolg, M. Parallel compilation for a parallel machine. In ACM SIGPLAN 89 Conference on Programming Language Design and Implementation, pp. 91-100; 1989. 5. Hillis, W. D. and Steele, G. L. Data parallel algorithms. Commun. ACM 29: 1170-1183; 1986. 6. Schell, R. M. Jr. Methods for constructing parallel compilers for use in a multiprocessor environment. Ph.D. thesis, University of Illinois at Urbana-Champaign; 1979. 7. Barnard, D. T. and Skillicorn, D. B. Parallel parsing; a status report. Technical Report 90-267, Department of Computing and Information Science, Queen's University; 1990. 8. Barnard, D. T. and Skillicorn, D. B. Parallel parsing on the Connection Machine. Inform. Process. Lett. 31:111-117; 1989. 9. Boehm, H.-J. and Zwaenepoel, W. Parallel attribute grammar evaluation. Technical Report, Rice University Computer Science Technical Report TR87-55; 1987. 10. Srikant, Y. N. and Shankar, P. Parallel parsing of programming languages. Inform. Sci. 43: 55--83; 1987. 11. Khanna, S., Ghafoor, A. and Goel, A. A parallel compilation technique based on grammar partitioning. In Computer Science Conference; 1990. 12. Barnard, D. T. and Skillicorn, D. B. (Eds). Proceedings of a Workshop on Parallel Compilation, Kingston, Ontario, Canada; 1990. 13. Guibas, L. J., Kung, H. T. and Thompson, D. C. Direct VLSI implementation of combinatorial algorithms. In Caltech Conference on VLS1, pp. 509-525; 1979. 14. Kosaraju, S. R. Computations on iterative automata. Ph.D. thesis, University of Pennsylvania; 1969. 15. Kosaraju, S. R. Speed of recognition of context-free languages by array automata. SIAM J. Comput. 4: 331-340; 1975. 16. Langlois L. Systolic parsing of context-free languages. Technical Report 693, Universit6 de Montr6al; 1989. 17. Ruzzo, W. L. Tree-size bounded alternation. J. Comput. Syst. Sci. 21: 218-235; 1980. 18. Rytter, W. Parallel time O(logn) recognition of unambiguous CFLs. In Fundamentals of Computation Theory, pp. 380-389. Springer Lecture Notes in Computer Science 199; 1985. 19. Chytil, M. P. and Monien, B. Caterpillars and context-free languages. In STACSgO 7th Annual Symposium on Theoretical Aspects of Computer Science (Edited by Choffrut, C. and Lengauer, T.), pp. 70-81. Springer Lecture Notes in Computer Science 415; 1990. 20. Aho, A. and Ullman, J. D. The Theory of Parsing, Translation and Compiling: Parsing, Vol. I. Englewood Cliffs, N.J.: Prentice-Hall; 1972. 21. Partsch, H. Transformational derivation of parsing algorithms executable on parallel architectures. In Programiersprachen und Programmentwicklung (Edited by Amman, U.), pp. 41-57. Berlin: Springer; 1984. About the Author--DAviD T. BARNARDstudied computer science at the University of Toronto before joining the faculty of Queen's University at Kingston in 1977, where he is a Professor in the Department of Computing and Information Science. His research interests are in applications of formal language theory to compilation and document processing. Dr Barnard is a member of the Association for Computing Machinery, the IEEE Computer Society and the European Association for Theoretical Computer Science. About the Author--DAviD SK1LLICORNreceived a B.Sc degree from the University of Sydney and a Ph.D.

from the University of Manitoba in 1981. He joined the faculty of Queen's University at Kingston in 1982 and is now an Associate Professor. His reaearch interests are in the practical exploitation of parallel architectures and the development of programming languages and calculi for them. Professor Skillicorn is a member of the Association for Computing Machinery and of the IEEE Computer Society.

Lihat lebih banyak...

Context-free parsing on O(n) processors

Descrição do Produto

Comentários