Data structures for order-sensitive predicates in parallel nondeterministic systems

May 26, 2017 | Autor: Gopal Gupta | Categoria: Data Structure, Computer Software, Parallel Systems, Data Format

Descrição do Produto

Acta Informatica 37, 21–43 (2000)

c Springer-Verlag 2000

Data structures for order-sensitive predicates in parallel nondeterministic systems Desh Ranjan, Enrico Pontelli, Gopal Gupta Laboratory for Logic, Databases, and Advanced Programming, Department of Computer Science, New Mexico State University, Las Cruces, NM 88003, USA (e-mail: {dranjan,epontell,gupta}@cs.nmsu.edu) Received: 20 April 1998 / 3 April 2000

Abstract. We study the problem of guaranteeing correct execution semantics in parallel implementations of logic programming languages in presence of built-in constructs that are sensitive to order of execution. The declarative semantics of logic programming languages permit execution of various goals in any arbitrary order (including in parallel). However, goals corresponding to extra-logical built-in constructs should respect the sequential order of execution to ensure correct semantics. Ensuring this correctness in presence of such built-in constructs, while efﬁciently exploiting maximum parallelism, is a difﬁcult problem. In this paper, we propose a formalization of this problem in terms of operations on dynamic trees. This abstraction enables us to: (i) show that existing schemes to handle order-sensitive computations used in current parallel systems are sub-optimal; (ii) develop a novel, optimal scheme to handle order-sensitive goals that requires only a constant time overhead per operation. While we present our results in the context of logic programming, they will apply equally well to most parallel non-deterministic systems. 1 Introduction Logic programming is a popular programming paradigm that has been used in a wide variety of applications, ranging from Artiﬁcial Intelligence, Genetic Sequencing, Database programming, Expert Systems, Natural Language Processing, Constraint-based Optimization, to general purpose programming and problem solving. The most popular logic programming language is Prolog [45]. A nice property of logic programming languages is that parallelism can be automatically extracted from logic programs by the

22

D. Ranjan et al.

compiler or the runtime system without requiring any effort from the programmer. However, the implementation of a parallel logic programming system poses many interesting and challenging problems. Several solutions have been proposed for these problems and parallel implementations have been realized in the past. To date, however, little attention has been paid to formal analysis of the complexity of the various operations involved. The problem of executing programs in a logic programming language can be abstracted as the process of maintaining a dynamic tree. The operational semantics of the logic programming language determines how this tree is built and what operations on the tree are of interest. As execution proceeds according to the operational semantics, this tree grows and shrinks. In a parallel implementation the tree can grow and shrink in parallel. Various operations are needed during parallel execution to guarantee the correct behavior. For example, at runtime we may need to determine at a particular moment if a given node is in the leftmost branch of the tree that exists at that moment; or, given two nodes, if one node is an ancestor of another; etc. Although dynamic data structures have been studied extensively [4, 11, 48,17,44,46, 10], to the best of our knowledge the speciﬁc data structures and operations needed to support parallel logic programming have not been formally studied. In this paper we focus on the problem of handling order-sensitive computations during parallel execution of Prolog programs. This need to support order-sensitivity originates from the use of the extra-logical predicates offered by Prolog (e.g., input/output, cut, assert). We propose to analyze the problem in a formal context and derive optimal data structures with respect to such formalization. In order to study the problem it is necessary to formalize parallel execution of Prolog. The development of an abstract framework where all issues of parallel execution of Prolog are represented in terms of operations over data structures allows one to have a clear understanding of the requirements for correctly implementing order-sensitive predicates. Furthermore, the use of an abstract framework allows us to compare and evaluate the existing schemes proposed in the literature, as well as to deﬁne a notion of optimality. We believe that abstracting out the problems in parallel execution of logic programs as that of building data structures to support certain operations of interest, and using this formalization for comparison and optimality studies, are the major contributions of the paper. All the results presented are novel, as these problems have never been considered before. The only work that comes close is that of Gupta and Jayaraman [15] that establishes some results on complexity of supporting an or-parallel execution. In some related research we have already demonstrated how this sort of formalization of parallel execution of logic programs

Data structures for order-sensitive predicates in parallel nondeterministic systems

23

can be used to provide comparisons between existing models [35, 33]. We have also demonstrated how the results of this type of formal analysis can be concretized in implementations which are provably superior to those previously presented in the literature [30]. We hope this paper will initiate further research in the area of data-structures for dynamic graphs that arise in parallel implementation of declarative languages1 and parallel AI systems. It must be noted that our work on formal analysis of parallel logic programming systems has already provided sources of cross-fertilization, leading to novel results in other research areas (e.g., object-oriented programming [34]). We discuss various results on the complexity of operations involved in parallel execution of Prolog programs. The main result we describe is related to the side-effect problem in parallel logic programming, which is described in Sect. 2.3. We show that this problem can be solved with a constant time overhead per operation. This result is important, as all the existing schemes to handle order-sensitive predicates in parallel Prolog systems are non-optimal and require ω(1) (i.e., non-constant) overhead per operation. We investigate the problem in the context of or-parallel execution of logic programs; the results are easily generalizable (as argued in this work) to the case of and-parallelism as well. In the context of this work we limit our attention to the models which deal with a single form of parallelism at the time (i.e., either and-parallel or or-parallel systems)—we do not consider models which attempt to concurrently exploit multiple forms of parallelism. This result has not only a theoretical value, but also a great practical relevance. In fact, the scheme is sufﬁciently simple to warrant its efﬁcient implementation, which should provide for an arguably superior parallel Prolog system. Nevertheless, the issues related to actual implementation of this scheme are beyond the scope of this work. The result is also meant to complement the other results on complexity of implementation of parallel logic programming, brieﬂy described in this paper and fully developed in [33, 35, 36]. The rest of the paper is organized as follows. In the rest of this section we give a brief overview of logic and parallel logic programming. Section 2 describes the major issues in supporting the different forms of parallelism. Section 4 formalizes these issues in terms of operations on trees. The remaining section present the lower and upper bound for the side-effect problem. We conclude with some open problems and conjectures.

1 Dynamic data structures also arise in functional languages; for example, in graph reduction based execution models.

24

D. Ranjan et al.

1.1 Logic programming Logic Programming (LP) [26] is a programming paradigm based on a subset of ﬁrst order logic (Horn Logic). A program P consists of a set of (Horn) clauses of the form (A:-B1 , . . . , Bn ), where A and Bi are atomic formulae, and n ≥ 0. Execution starts with an initial goal of the form (B1 , . . . , Bk ) and aims to verify whether there exists a substitution, σ, for the variables in the goal such that (B1 , . . . , Bk )σ is a logical consequence of the program P , i.e., P |= (B1 ∧ . . . ∧ Bk )σ. The Bi ’s in a goal are called subgoals. Execution of a logic program is typically based on SLD-resolution [37]. We assume the reader to be familiar with the execution model of Prolog [45]. In pure logic programming—i.e., logic programming without any extralogical component—given a goal, we have considerable latitude in choosing which subgoal in the goal should be solved next. We will term this class of non-determinism don’t-care non-determinism, to place emphasis on the fact that the alternative choices will not affect the ﬁnal outcome [28, 16].2 Once a subgoal is selected, we also have considerable freedom in choosing which clause to use for reducing it, as heads of many clauses may be uniﬁable with a given subgoal. This will be termed don’t know non-determinism, since different choices will potentially lead to distinct solutions. The operational semantics of an LP language is determined by the selection strategy chosen for picking a subgoal in the resolvent and the clause to reduce it. The most widely used logic programming language is Prolog [7]. Prolog uses the following selection strategy: at each resolution step the leftmost subgoal in the resolvent is picked, and among the multiple matching clauses, the one that occurs textually ﬁrst (in the ﬁle containing the program) is chosen. Prolog also includes several extra-logical features [13] such as built-in side-effect predicates (for input/output, etc.) and meta-programming predicates (cuts, assert, retract, etc.). One of the most important aspects of these features is their strong dependence on Prolog’s selection strategy—i.e., the adoption of different strategies in the selection of clauses and subgoals during resolution will possibly lead to different and incorrect results—e.g., different strategies will lead to the execution of output operations (such as write) in different orders. In the rest of this work we will often refer to these predicates as order-sensitive predicates. The semantics of the ordersensitive predicates is strongly related to the selection strategy adopted.

2 Some authors have used the two terms don’t know and don’t care non-determinism with a slightly different meaning (e.g., [25]).

Data structures for order-sensitive predicates in parallel nondeterministic systems

25

1.2 Parallel logic programming The presence of non-determinism is a peculiar feature of logic programming. This non-determinism in logic programs can be used as a source of implicit parallelism. Two major forms of parallelism can be identiﬁed [8, 42]: – And-Parallelism (AP) [21, 31, 40], which arises from don’t care nondeterminism; given a goal, several subgoals can be selected for simultaneous execution. – Or-Parallelism (OP) [2, 27]: which arises from don’t know nondeterminism. Given a subgoal that has been selected from the goal, the multiple clauses whose heads unify with it can be solved in parallel. A major research direction in parallel logic programming has been to design parallel implementations of Prolog [12, 16, 21, 2, 27]. This implies that the parallel execution mechanisms have to be designed in such a way that a user sees the same externally observable behavior during parallel execution as is observed during sequential Prolog execution. We say that such a parallel system preserves Prolog semantics. Thus, in a parallel system that preserves Prolog semantics: (i) the same solutions are produced, in the same order, as in sequential execution; (ii) the same side-effects are observed, in the same order, as in sequential execution. In this paper, we only consider those parallel implementations that aim to preserve Prolog semantics. While the restrictions may appear too stringent, considerable success has been achieved in parallelizing Prolog programs and obtaining good speedups from them, even in presence of side-effects, cuts, etc. [31, 32,40,1,27,49]. The most attractive feature of this approach is that Prolog programs can be written for sequential machines and then automatically run in parallel on parallel machines. Parallel implementations of other logic programming languages, and of extensions and modiﬁcations of Prolog [18, 39], have also been considered. However, these options are less interesting, since they require programmer intervention. Also, in many of these languages parallelism has to be explicitly programmed, making their semantics quite complex. 2 Problems in parallel logic programming Various parallel implementations of Prolog have been proposed and realized [16,23]. Most of the current systems suffer from severe efﬁciency problems. Many of these inefﬁciencies, especially in and-parallel systems, arise from the need of guaranteeing Prolog semantics during parallel execution. Other inefﬁciencies, such as those arising in or-parallelism, are present due to the complexity entailed in implementing don’t know non-determinism. In both

26

D. Ranjan et al.

cases, preservation of Prolog semantics requires various operations (e.g., execution of side-effects) to be ordered correctly; this creates dependences between the concurrent executions. Only executions that respect all such dependences can be accepted. The major problems that arise in and-parallelism and or-parallelism are discussed next. We treat problems that arise in and-parallelism and orparallelism separately. It is possible to combine both and-parallelism and or-parallelism together [12, 13], however, for the sake of simplicity, the two forms of parallelism are considered independently in the context of this work. 2.1 And-parallelism Given a goal B1 , . . . , Bn —where the various Bi are subgoals—the subgoals in the goal can be concurrently executed during an and-parallel execution. And-parallel execution can be represented using an and-tree. The root of the and-tree is labeled with the initial goal. Each node in the tree contains a conjunction of subgoals. If a node contains a conjunction B1 , . . . , Bn , then it will have n children: the ith child of the node is labeled with the body of the clause used to solve the subgoal Bi . The main problem in the implementation of and-parallelism is the efﬁcient management of the uniﬁers produced by the concurrent reduction of different subgoals. Two subgoals Bi and Bj (1 ≤ i < j ≤ n) in the resolvent B1 , . . . , Bn should agree in the bindings of all the variables that are common to them (such variables are termed dependent variables or shared variables in parallel logic programming terminology [40]). During and-parallel execution, however, Bi and Bj may produce a conﬂicting binding for a dependent variable. In sequential Prolog execution, usually, Bi , the goal to the left, binds the common variable and Bj works with the binding produced by Bi . In and-parallel execution, when Bj is started, the common variable will be unbound. Bj may attempt to instantiate this variable ﬁrst, violating Prolog semantics—which requires Bj to wait for the bindings produced by Bi . The key problem is to ensure that bindings to common variables are made in the same order as in a sequential Prolog execution. This requirement is much stronger compared to just requiring that Prolog semantics be preserved (i.e., externally observable behavior during parallel execution be the same as in the sequential execution). If a parallel system satisﬁes this stronger requirement, we say that it preserves strong Prolog semantics. Preserving strong Prolog semantics is important for an and-parallel system, as otherwise considerable amount of redundant computation may be performed [40, 41, 30]. Note that if strong Prolog semantics is preserved, then Prolog semantics is guaranteed, but not vice versa.

Data structures for order-sensitive predicates in parallel nondeterministic systems

27

2.2 Or-parallelism In case of OP the parallel computation can be described as a tree, called the or-tree. Each node contains a goal obtained from the computation. The root node is labeled with the initial goal. To expand a node further, the leftmost subgoal in the goal of that node is selected and the matching clauses are found. For each matching clause, a child node is created. If B1 , . . . , Bn is the goal at the node, for each clause Hj : −Dj1 , . . . , Djm such that θj = mgu(Hj , B1 ), a child node labeled with the goal (Dj1 , . . . , Djm , B2 , . . . , Bn ) θj (also known as resolvent) is created. The operation mgu is used to compute the most general uniﬁer of two terms—i.e., given two terms s, t, the operation mgu(s, t) returns the most general substitution µ such that µ(s) is identical to µ(t). Intuitively, the most general uniﬁer is the weakest instantiations of the variables in s, t which makes the two terms identical. Note also that the child node ni corresponding to a matching clause Ci that textually precedes another matching clause Cj in the program is placed to the left of nj , where nj is the child node corresponding to Cj . Sequential execution corresponds to building the or-tree one node at the time, in depth-ﬁrst order. In the or-tree, each branch is forced to maintain a local view of the substitution computed. This requirement emerges from the fact that during a reduction like the one mentioned above the substitutions θj produced by uniﬁcation may potentially conﬂict and must be kept separate in different branches of the or-tree. This is because if a node containing a resolvent p(X) is expanded into two branches due to the two matching clauses, p(1) :- ....

p(2) :- ....

then the variable X will get a conﬂicting binding from each of the two clauses. These conﬂicting bindings need to be maintained separately, as they will each lead to a different solution (X=1 and X=2) for the initial resolvent. The main issue, thus, in designing an implementation scheme capable of efﬁciently supporting OP is the development of an efﬁcient way of associating the correct set of bindings to each branch of the or-tree. The naive approach of keeping θj for each branch is clearly highly inefﬁcient, since it requires the creation of complete copies of the substitution (which can be arbitrarily large) every time a branching takes place [8, 42, 15]. 2.3 Side-effects Preserving Prolog semantics during and-parallel or or-parallel execution requires that order-sensitive predicates are processed in the correct order. Since the goal of a parallel Prolog system is to reproduce the same external behavior as in a sequential execution, we need to guarantee that all the

28

D. Ranjan et al.

order-sensitive predicates are performed in the same order as in a sequential execution—e.g., all the input and output operations are required to appear in the same order as during sequential execution. The problem of performing parallel execution of logic programs in presence of side-effects has been thoroughly investigated [6, 29, 20, 13], and a number of different schemes have been proposed. The various schemes differ in the amount of parallelism that can be exploited even in the presence of side-effects. Nevertheless, all the schemes have to address a common basic problem: the need to verify that a given node is leftmost with respect to the whole and-tree in case of and-parallelism and to the whole or-tree in case of or-parallelism. For the sake of simplicity, in the rest of this discussion we will focus on side-effects in the context of or-parallelism. The results can be easily generalized to the case of and-parallelism. Verifying whether a node is in the leftmost branch of the tree is needed due to the following reason: in order to execute side-effects in the correct order, we need to delay the execution of a side-effect in a parallel thread of computation until all the “preceding” side-effects have been completed (“preceding” in the sense of Prolog’s sequential execution order). Unfortunately, the problem of deciding whether all the side-effects which precedes a given one have been performed is akin to the problem of deciding termination—thus it is undecidable. Since all the preceding side-effects are guaranteed to lie in the branches of the computation tree to the left of the current one, then the most natural solution to the problem is to delay the side-effect until the corresponding node in the computation tree is in the leftmost branch of the tree—i.e., until all the preceding computations have been completed. This may seem a very conservative solution, but it is a correct solution and no better solution is known. This approach has been adopted by all existing parallel Prolog systems [1, 27], with acceptable performance. In the context of and-parallelism, the situation is analogous [29, 6]; e.g., in the scheme proposed in [6] the synchronization is achieved by “...check all the done bits of left sibling subgoals in the parallel conjunction which contains the side effect predicate and its ancestor parallel conjunctions”, which simply translates to verifying the leftmostness of the side-effect predicate in the and-parallel tree. In this paper we will analyze the problem arising from ordering of sideeffects only for or-parallel execution. The scheme can be applied virtually unchanged to the case of and-parallelism. 3 Pointer machines We will establish our results in the context of a formal model for describing algorithms, speciﬁcally the Pointer Machine model [3]. Since we are proving

Data structures for order-sensitive predicates in parallel nondeterministic systems

29

upper bounds, our results clearly hold also for more powerful models, like the RAM model. Pointer machines allow us to capture the complexity of the problem to a ﬁner degree than RAM, e.g., by avoiding the (unreasonable) common assumption made in RAM which allows constant-time execution of operations on numbers of size up to lg n, where n is the size of the input. This makes Pointer machines particularly suitable to lower-bound case analysis (See Tarjan’s paper [46] for further justiﬁcation of Pointer machines). A Pure Pointer machine consists of a ﬁnite but expandable collection R of records, called global memory, and a ﬁnite collection of registers. Each record is uniquely identiﬁed through an address (let N be the set of all the addresses). A special address nil is used to denote an invalid address. Each record is a ﬁnite collection of named ﬁelds. All the records in the global memory have the same structure, i.e., they all contain the same number of ﬁelds, with the same names. Each ﬁeld may contain either a data (from a generic data domain Data) or an address (i.e., and element from N ). If F is the ﬁnite collection of ﬁeld names, then the following function ﬁeld access : F × N → Data ∪ N allows to access the value stored in a ﬁeld of a record. In the rest of this paper, for the sake of simplicity, we will use the notation ﬁeld name(record) to denote the value returned by

ﬁeld access(ﬁeld name, address(record))

The pointer machine is also supplied with two ﬁnite collections of registers, d1 , d2 , . . . (data registers) and r1 , r2 , . . . (pointer registers). Each register di can contain an element from Data, while each register ri can contain one element of N . The machine can execute programs. A program P is a ﬁnite, numbered sequence of instructions (where each instruction in P is uniquely identiﬁed by one number). The instructions allow to move addresses and data between registers and between registers and records’ ﬁelds. The only “constant” which can be explicitly assigned to a register is nil. Special instructions are used to create a new record (returning its address) and to perform conditional jumps. The only conditions allowed in the jumps are true and equality comparison between two pointer registers. Observe that the content of the data ﬁelds will never affect the behavior of the computation. In terms of analysis of complexity, it is assumed that each instruction has a unit cost. Observe that the basic pointer machine model does not allow any data operation—i.e., data registers and ﬁelds can be used only as “passive” data containers, and their use cannot affect the complexity of the algorithms. For further details on the structure and behavior of the pure pointer machine the reader is referred to [24, 46, 38, 3].

30

D. Ranjan et al.

4 Abstraction of the problems A tree T = N, E is a connected acyclic graph. A rooted tree contains a distinguished node, root which has no incoming edges. Each node in the tree has bounded degree. Each node of the tree is labeled. Without loss of generality we will focus exclusively on binary trees. The frontier of a tree is the list containing all the leaves of the tree (collected from left to right) [9]. Trees are manipulated through three instructions: (i) create tree() which creates a tree containing only the root; (ii) expand(n, b1 , b2 ) which, given one leaf n and two labels b1 and b2 , creates two new nodes (one for each label) and adds them as children of n (b1 as left child and b2 as right child); (iii) remove(n) which, given a leaf n of the tree, removes it from the tree. These three operations are assumed to be the only ones available for modifying the “physical structure” of the tree. The tree induces a partial ordering between the nodes. Given two nodes n and n , we write n n if n is an ancestor of n ; n ≺ n additionally says that n = n . We will be often referring to the notion of leftmost branch. Given a node n, the leftmost branch of the subtree rooted at n can be deﬁned inductively: – if n does not have any children, then the branch containing only n is the leftmost branch; – if n has children, and l is the leftmost child, then the leftmost branch is n followed by the leftmost branch of l. We can deﬁne in particular the following partial order ✂: ∀ nodes n, m (n ✂ m ⇔ n is a node in the leftmost branch of the subtree rooted at m) Given a node n, let µ(n) indicate the node min {m ∈ N |n ✂ m}. Intuitively, µ(n) indicates the highest node m in the tree (i.e., closest to the root) such that n is in the leftmost branch of the subtree rooted at m. µ(n) is also known in the parallel logic programming community as the subroot node of n [19,20,27]. 4.1 Abstracting side-effects Let us consider the problem of correct ordering of side-effects in the case of OP (the case of AP is perfectly symmetrical so is omitted). During OP execution nodes are added and deleted from the the or-tree. The execution steps can be directly expressed using the tree operations described in the previous section. In particular: – each or-parallel computation is described by a branch in the tree. The tip of each branch represents the current execution “point” of that computation.

Data structures for order-sensitive predicates in parallel nondeterministic systems

31

– whenever a computation (which is currently located at a certain leaf n of the tree) generates a new node, then an expand operation is executed to expand n with the new branches created3 . – whenever a computation either terminates or fails, it immediately enters into backtracking; each step of backtracking will delete the current leaf of the branch (remove operation). Whenever a branch encounters a side-effect, it must check if it can execute it. This check boils down to verifying that the branch containing the side-effect is currently the leftmost active computation in the tree. If n is the current leaf of the branch where the side-effect is encountered, its computation is allowed to continue only if µ(n) = root. Thus, in OP, checking if a side-effect can be executed requires the ability of performing the operation ﬁnd subroot(n) which, given a leaf n, computes the node µ(n). We let SE denote the problem of supporting the operations create, expand, remove, and ﬁnd subroot for a dynamic tree. 4.2 Abstracting and-parallelism and or-parallelism In this section we brieﬂy overview the abstraction of the mechanisms required to handle and- and or-parallelism. The details of these abstractions and the proofs of the mentioned results can be found elsewhere [33]. The development of the OP computation takes place as described in the previous section on side-effects. Variables that arise during execution, whose multiple bindings have to be correctly maintained, can be modeled as attributes of nodes. Determining the correct binding for a variable in a certain branch of the or-tree can be abstracted as the problem of determining whether a given label (representing the variable) has been associated to any node in the current branch. Let us refer to the problem of managing the or-tree and the related attributes as the OP problem. It is possible to prove the following result [33,35]. Theorem 4.1 If n is the number of nodes in the or-tree, then the OP problem on pointer machines has a lower bound time complexity Ω(lg n) per operation. The OP problem can be solved on a pointer machine with a √ single operation worst-case time complexity of O( 3 n(lg n)k ) for a small k. And-parallelism can be abstracted in a similar way. The nodes in the andtree represent parallel conjunctions and the children of each node represent the concurrent resolution of each goal in the conjunction. The management 3 Following the assumption made before, we assume that each subgoal has at most two matching clauses.

32

D. Ranjan et al.

of variables shared between concurrent subgoals can be reduced to a problem very similar to the SE problem. Strong Prolog semantics translates to the following condition for binding a shared variable: if node n represents the current subgoal, this is allowed to bind a shared variable (and continue the expansion of the branch) only if the node n is leftmost in the subtree rooted in the node where the shared variable was introduced ﬁrst [40, 47, 42, 33]4 . Let us refer to this problem as the AP problem. It is possible to prove the following result [33, 35]: Theorem 4.2 The problem AP can be solved on a pointer machine with amortized complexity of O((n + mα(n, m))(lg lg n)k ) for some small k, where n is the number of nodes in the tree, m is the number of operations performed, and α is the inverse of the Ackermann function (thus a very slowly growing function). The problem can also be solved with single operation worst–case time complexity O( lglglgnn ). More recently [36, 33] we have also shown that: Theorem 4.3 The problem AP has an amortized lower-bound time complexity of Ω((n + mα(n, m)) on pointer machines, where n is the number of nodes in the tree, m is the number of operations performed, and α denotes the functional inverse of the Ackermann’s function. 5 Complexity of SE on pointer machines The issue of managing side-effects in the context of parallel logic programming has been tackled in a large number of proposals [19, 6, 13, 2]. Several schemes to manage the side effect issue have been presented, but none of them is optimal. Here we present an optimal scheme to solve the side effect problem. We prove the following result for the SE problem: Theorem 5.1 The SE problem can be solved with the worst case time complexity of O(1) per operation. An SE problem consists of a correct sequence of operations that manipulate a tree structure (using the create tree, expand, and remove operations), and perform ﬁnd subroot tests (which, given a reference to a leaf n of the tree, computes the node m such that m = µ(n)). A sequence of tree operations is said to be correct if the ﬁrst operation is the only occurrence of a create tree, and expand, remove, and ﬁnd subroot are applied only to leaves of the tree. We show how to maintain a data structure that supports all the above operations with a time complexity of O(1) per operation. The construction of a solution to the above problem can be developed in two steps, identiﬁcation of the frontier and identiﬁcation of the subroot nodes. 4 The situation is actually slightly more complex due to the possibility of aliasing between variables.

Data structures for order-sensitive predicates in parallel nondeterministic systems

33

5.1 Identiﬁcation of the frontier We ﬁrst show that it is possible to keep an explicit representation (e.g., doubly-linked list) of the frontier of a tree built with the above described operations. Let us assume that the tree is implemented in the straightforward way on a pure pointer machine. Each record contains (at least) the following (pointer) ﬁelds: parent which points to the parent node in the tree (nil if the node is the root of the tree); left which points to the record representing the left child of the current node; and right which points to the record representing the right child of the current node. The tree is constantly pointed by the r10 register. These ﬁelds are managed in the obvious way during creation and deletion of nodes. Code for these operations is presented in Fig. 1.. Each operation to create, expand nodes, and remove nodes has complexity Θ(1). The code for the different operations assumes that the arguments of each procedure are placed in the ﬁrst r registers—e.g., the remove operation expects one argument, which is assumed to be placed in the register r1 before entering remove. create tree: r10 :=create

expand: r2 :=create r3 :=create parent(r2 ) := r1 parent(r3 ) := r1 left(r1 ) := r2 right(r1 ) := r3

remove: r4 :=parent(r1 ) if r4 = nil goto 1 : r2 :=left(r4 ) r3 :=right(r4 ) if r1 = r2 goto 2 : right(r4 ) := nil if true goto 3 : 2 : lef t(r4 ) := nil if true goto 3 : 1 : r10 := nil 3 : end

Fig. 1. Basic tree operations

The tree representation described above can be extended to maintain an explicit representation of the frontier of the tree at each point during the execution. The frontier is represented as a linked list connecting in the proper order (from left to right) the nodes belonging to the frontier. To maintain this structure we simply need to add two additional ﬁelds in the record representing a node of the tree: pred and succ, used to link the node to its predecessor and successor in the list representing the frontier—see Fig. 2.. These are pointer ﬁelds and are deﬁned only for nodes that are current leaves in the tree.

34

D. Ranjan et al.

parent pred succ left right

Fig. 2. Structure of each record

The basic ideas (see also Fig. 3.(i) and 3.(ii)) are the following: – During an expansion, a leaf n is expanded adding two new nodes n1 and n2 below it. n was part of the frontier before the expansion, thus its pred ﬁeld points to the leaf immediately on the left, and its succ ﬁeld points to the leaf immediately on its right. The new frontier can be computed by simply removing n from the list and replacing it with the sequence n1 , n2 . This can be clearly achieved by modifying a constant number of pointers (only 4 records need to be touched). This process is illustrated in Fig. 3.(i), where the dashed arrows represent the previously existing pointers and the solid arrows represent the new pointers after the expansion is completed. – During a deletion, a leaf n is removed from the frontier. If n is the only child of its parent (m), then it is sufﬁcient to replace in the list that represents the frontier n with m. If, on the other hand, n is the left (right) child of its parent (m), and m has also a right (left) child, then the deletion involves simply eliminating one element from the frontier. Thus, the predecessor of n in the list must be linked directly to the successor of n. Both predecessor and successor of n in the frontier can be directly accessed from n. Thus, in any case, at most 3 records need to be modiﬁed, and all of them can be immediately obtained from the node to be removed. The case where the removed node is one of the two children of its parent is illustrated in Fig. 3.(ii). We can state the following result: Lemma 5.2 The frontier of a tree created via a valid sequence of operations create tree, expand, and remove, can be maintained as a doubly-linked list with Θ(1) time per operation. Observe that the idea of maintaining an explicit representation of the frontier of the computation tree has been explored in the past for different purposes, e.g., to support advanced scheduling techniques for or-parallelism [5,43]. Our scheme has the advantage of providing a constant time computation of the frontier (a missing feature in the previous proposals)—thus,

Data structures for order-sensitive predicates in parallel nondeterministic systems

pred succ left

right

n

pred succ left

pred succ

right

left

pred succ left

35

(i)

right

pred succ left

right

right

Current Pointers Old Pointers pred succ left

right

m

pred succ left

pred succ

right

left

pred succ left

right

(ii)

right

pred succ left

right

n

Fig. 3. Maintaining the frontier

it could provide advantages also in supporting these classes of scheduling algorithms. A similar frontier-type structure is also used in the Andorra-I system to maintain the order of the subgoals in a query [49].

5.2 Leftmost checks The missing operation in the above construction is ﬁnd subroot, which computes the node µ(n) for each leaf n of the tree. In what follows we show that the previous data structure can be augmented to support the ﬁnd subroot operation in constant time. The result is based on the following lemma: Lemma 5.3 Let n and m be two different leaves of a binary tree. The value of µ(m) changes when n is removed from the tree if and only if n is the left child of a node k and µ(m) is the right child of k.

36

D. Ranjan et al.

Proof. Recall that µ(m) is the highest node k on the path from m to the root such that m is the leftmost node in the subtree rooted in k. Since k is the subroot node for m, it follows that before the removal of n, k must either be the root of the tree or the right child of a node parent(k) which has also a left child— otherwise we would be able to ﬁnd a node (parent(k)) which is closer to the root and for which m is still the leftmost node. Let us consider the case in which the removal of n affects µ(m). Clearly k cannot be the root of the tree, as otherwise the removal of n would not affect µ(m). Thus, the only way to change µ(m) is to remove the left child of parent(k). Since n is the only node removed, then n must be the left child of parent(k). The other direction is obvious. ✷ Lemma 5.4 If n and m are two distinct leaves in a binary tree, then µ(n) = µ(m). Proof. Since two distinct leaves cannot both be leftmost in any subtree, the lemma follows. ✷ The previous results have the following consequence. Corollary 5.5 The removal of a leaf n from a binary tree will affect the µ of at most one leaf. The record structure is modiﬁed by adding an additional pointer ﬁeld, named sub. The ﬁeld sub is deﬁned (i.e., it contains a meaningful value) only for the records representing leaves of the tree. If n is a node in the tree and r is its record representation, then sub(r) contains the address of the record representing the node µ(n). The ﬁeld sub can be managed as follows during the manipulation operations. During the tree creation, if r10 contains the address of the root of the tree (newly created), then sub(r10 ) = r10 . This is justiﬁed from the fact that, if the tree contains only one node, then such node is leftmost only in the subtree rooted in itself. When a node n is expanded by appending two children to it (see Fig. 4.(i)), the subroot nodes for the two new leaves can be determined as follows (as usual we assume that r1 contains the address of the record for n): – if r2 contains the address of the newly created node which represents the new left child of n, then sub(r2 ) = sub(r1 ). In fact, r2 represents a node which is on the same left branch as n, and as such it must have the same µ node. – if r3 contains the address of the newly created node which represents the new right child of n, then sub(r3 ) = r3 . In fact the parent of the node for r3 is n, which has a left child. Thus such right child is not leftmost for any subtree rooted at a higher node.

Data structures for order-sensitive predicates in parallel nondeterministic systems

37

During the removal of a leaf n (whose address is in register r1 ), three cases may occur: 1. if n is the only child of its parent, then the removal of the node will not affect the µ of any other leaf in the tree. In fact the only leaves whose µ can be affected are those lying on the right of n. But the only way to affect their µ is to make one of them leftmost in the tree rooted in the parent of n. This cannot happen since we are assuming that n is the only child of its parent. Thus the only operation required in this case is to copy the value of sub of the leaf to the sub ﬁeld of the parent. This is illustrated in the top part of Fig. 4.(ii). The dashed arrow represent the new pointer. 2. if n is the right child of a node m, which has also a left child, then the removal of n will not modify any µ. In fact, from the lemma 5.3 we gain that only the removal of a left child of a node can affect some µ. This is illustrated in the left part of Fig. 4.(ii). 3. if n is the left child of k, and k has a right branch, then the removal of n can affect the µ of at most one leaf (from corollary 5.5). It is quite obvious that this is the case where one leaf will be affected. Let m be the leaf which follows n in the frontier of the tree. From the way the tree is built, m must lie on the leftmost branch of the tree rooted in the right child of k. Thus, when n is removed, it is clear that µ(m) should be given the value µ(n). This is illustrated in the right part of Fig. 4.(ii). Once again it is relatively straightforward to see that in every possible case, the cost of maintaining the sub ﬁeld is constant. The details of the additional steps required to maintain the sub pointer are illustrated in Fig. 5.. The availability of the sub ﬁeld allows, ﬁnally, to implement the ﬁnd subroot operation. If n is the leaf we are considering, let us assume that r1 contains the address of the record representing n. ﬁnd subroot: r1 = sub(r1 ) The operation returns in r1 itself the address of the record representing µ(n). The operation has a cost Θ(1). It should be noted that this algorithm for ﬁnding subroot has been hitherto unknown to the parallel logic programming community. The algorithms that have been used in existing or-parallel systems [1,27] have a worst–case complexity Θ(n) per operation. 5.3 Discussion As mentioned before, all the schemes to handle side-effects and other ordersensitive predicates proposed in the literature can be shown to be non-optimal

38

D. Ranjan et al.

sub

sub

sub

sub sub sub

sub

sub sub

(i)

(ii) Fig. 4. Management of the subroot nodes

expand: ... r5 = pred(r1 ) r6 = succ(r1 ) if r5 = nil goto 1 : succ(r5 ) = r2 1 : if r6 = nil goto 2 : pred(r6 ) = r3 2 : pred(r2 ) = r5 succ(r3 ) = r6 pred(r3 ) = r2 succ(r2 ) = r3 end

remove: if (parent(r1 ) = nil) then if (lef t(parent(r1 )) = r1 ) then if (right(parent(r1 )) = nil) then pred(succ(r1 )) = pred(r1 ) if (pred(r1 ) = nil) then succ(pred(r1 )) = succ(r1 ) else pred(parent(r1 )) = pred(r1 ) succ(parent(r1 )) = succ(r1 ) endif else if (lef t(parent(r1 )) = nil) then succ(pred(r1 )) = succ(r1 ) if (succ(r1 ) = nil) then pred(succ(r1 )) = pred(r1 ) else succ(parent(r1 )) = succ(r1 ) pred(parent(r1 )) = pred(r1 ) endif endif

Fig. 5. Management of the frontier

Data structures for order-sensitive predicates in parallel nondeterministic systems

39

with respect to the abstraction we propose. In the context of or-parallelism, the most relevant work on this problem is that of Hausman et al. [20] and Ali and Karlson [2]. The scheme proposed by Hausman et al. is based on verifying leftmostness by climbing up the current branch and checking whether each node encountered has another active branch on the left. To avoid repeated checking of nodes, agents detecting that its branch is leftmost will mark all the visited nodes; thus a leftmost check will stop as soon as a node marked as leftmost is found. The algorithm has a single operation worstcase time complexity of Θ(n); it is also easy to show very realistic situations where the optimization produced by marking bears no improvement on this complexity. This scheme also has an amortized complexity of Θ(n). The scheme used in Muse [2] forces each agent to maintain an explicit summary of the organization of the current branch: a stack contains the sequence of active choice-points in the current branch. A leftmost check scans the stack looking for any choice-point which has other active alternatives on the left. To avoid repeated traversal of certain nodes, each choice-point contains an additional pointer; if node n contains a pointer to node m, then m is an ancestor of n and all nodes between n and m do not have any active alternative on the left. This scheme also has a single operation worst-case time complexity of Θ(n). Other schemes have been described [22, 43] but with similar complexity. In the context of and-parallelism, the two major schemes proposed are those of Hermenegildo [29] and Chang and Chiang [6]. The latter scheme uses a variations of the naive approach: each node will decide its leftmostness by climbing up the branch and verifying the existence of other branches on the left. This will produce a single operation worst-case time complexity of Θ(n). The scheme presented in [29] is closer to the our proposal: additional arguments are introduced in the various predicates, and these arguments are used as semaphores to synchronize the execution of side-effects. The overall effect of these semaphores is the creation of a “chain” which connects the leaves of the and-tree which correspond to the side-effects currently presents in the and-tree. At any point in time, only the leftmost element of such chain is allowed to proceed. The scheme is similar to what we describe, but relies on the a-priori knowledge of whether a side-effect will appear or not in each branch of the computation—in order to decide whether additional arguments are needed to expand the chain of side-effects. The scheme we propose in this paper does not require such kind of advanced knowledge. It is clear from the above discussion that, in addition to the non constanttime overhead, all these schemes pay also additional overheads throughout the computation (i.e., not only when side-effects occur, but also in the other basis operations) to support or optimize the leftmost checks and the other activities related to side-effects. For example, this is the case of maintaining

40

D. Ranjan et al.

the additional stack in the Muse scheme, maintaining cut counters in each node in the scheme described in [20], and maintaining additional counters for each and-parallel subgoal in the scheme described by Chang and Chiang [6]. These simple observations provide a strong suggestion that a practical implementation of our scheme will perform better then these schemes— as all the schemes incur a small constant time overhead during standard execution, while only our scheme will avoid any non-constant time cost during execution of side-effects. Additionally, the most advanced schedulers designed for dealing with or-parallelism [43, 5] rely on maintaining an explicit representation of the frontier of the computation tree. Our scheme provides an efﬁcient solution to maintain the frontier and allows to reuse the data structures used for scheduling to efﬁciently support side-effects. 6 Conclusions and future work In this paper we studied the complexity of supporting order-sensitive predicates during parallel execution of Prolog programs. The following results were achieved: – we presented an abstraction of the issues involved in parallel execution of Prolog in terms of operations on labelled dynamic trees; – we showed that side-effects in a parallel Prolog system can be executed in constant time on a pure pointer machine. The results are of great theoretical importance considering that none of the existing proposals has been able to achieve such goals. The results provide a strong theoretical framework that researchers in parallel logic programming can use to compare and analyze alternative solutions. The existence of an optimal model and the sub-optimality of the existing schemes suggest that better practical schemes are feasible and should be researched. We have also argued that there are strong reasons to believe that the proposed optimal scheme can be highly competitive in practical implementations. Various open questions still remain to be answered. From the theoretical point of view, there are some aspects of the complexity of or-parallelism and and-parallelism that are still under investigation. We have been able to detect tight lower and upper bounds for the AP problem [33, 36], but the upper and lower bounds for OP are still quite distant [35]. We are also investigating the application of the scheme designed for order-sensitive predicates to other application domains (e.g., object-oriented programming). From the practical point of view, we are currently integrating our novel scheme for supporting order-sensitive predicates in an actual parallel Prolog system, the ACE [14] system developed by the authors.

Data structures for order-sensitive predicates in parallel nondeterministic systems

41

Acknowledgements. The authors wish to thank K. Melhorn and L. Longpre for the discussions and suggestions. The work has been partially supported by NSF grants CCR-9900320, CCR-9875279, EIA-9810732, HRD-9906130, and CDA-9729848, and CCR-9820852.

References 1. K.A.M. Ali, R. Karlsson: The Muse Or-parallel Prolog Model and its Performance. In: 1990 N American Conf. on Logic Prog., pp. 757–776. MIT Press, 1990 2. K.A.M. Ali, R. Karlsson: Full Prolog and Scheduling Or-parallelism in Muse. Int Parallel Program. 19(6):445–475 (1991) 3. A.M. Ben-Amram: What is a Pointer Machine? Technical Report, DIKU, University of Copenhagen, 1995 4. N. Blum: On the single-operation worst-case time complexity of the disjoint set union problem. SIAM J Comput, 15(4):1021–1024 (1986) 5. P. Brand: Wavefront Scheduling. Technical report, SICS, Gigalips Project, 1988 6. S.-E. Chang, Y. P. Chiang: Restricted AND-Parallelism Execution Model with SideEffects. In: Ewing L. Lusk, Ross A. Overbeek, (eds) Proceedings of the North American Conference on Logic Programming, pp. 350–368, 1989 7. W.F. Clocksin, C.S. Mellish: Programming in Prolog. Berlin Heidelberg New York: Springer 1981 8. J. S. Conery: The And/Or Process Model for Parallel Interpretation of Logic Programs. PhD thesis, The University of California At Irvine, 1983 Technical Report 204 9. T.H. Cormen, C.E. Leiserson, R.L. Rivest: Introduction to Algorithms. MIT Press, 1992 10. M.L. Fredman, M.E. Saks: The Cell Probe Complexity of Dynamic Data Structures. In: Procs. of 21st ACM Symposium on Theory of Computing, pp. 345–354, ACM 1989 11. H.N. Gabow: Data structures for weighted matching and nearest common ancestor. In: ACM Symp. on Discrete Algorithms, pp. 434–443, 1990 12. G. Gupta: Multiprocessor Execution of Logic Programs. Dodrecht: Kluwer, 1994 13. G. Gupta, V. Santos Costa: Cuts and Side-effects in And/Or Parallel Prolog. Journal of Logic Programming 27(1):45–71 (1996) 14. G. Gupta, M. Hermenegildo, E. Pontelli, V. Santos Costa: ACE: And/Or-parallel Copying-based Execution of Logic Programs. In: Proc. ICLP’94, pp. 93–109. MIT Press 1994 15. G. Gupta, B. Jayaraman: Analysis of or-parallel execution models. ACM TOPLAS 15(4):659–680 (1993) 16. G. Gupta, E. Pontelli, K.M. Ali, M. Carlsson, M. Hermenegildo: Parallel Execution of Prolog Programs: a Survey. Technical report, New Mexico State University, 1999 17. D. Harel, R.E. Tarjan: Fast Algorithms for Finding Nearest Common Ancestor. SIAM J Comput 13(2):338–355 (1984) 18. S. Haridi, S. Janson: Kernel Andorra Prolog and its Computation Model. In: Proc. 7th International Conf. on Logic Programming, pp. 31–46. MIT Press 1990 19. B. Hausman: Handling speculative work in or-parallel prolog: Evaluation results. In: S. Debray, M. Hermenegildo (eds) North American Conference on Logic Programming, pp. 721–736, Austin, TX, October 1990 20. B. Hausman, A. Ciepielewski, A. Calderwood: Cut and Side-Effects in Or-Parallel Prolog. In: International Conference on Fifth Generation Computer Systems, pp. 831– 840. Tokyo, November 1988 21. M. Hermenegildo, K. Greene: &-Prolog and its Performance: Exploiting Independent And-Parallelism. In: 1990 Int’l Conf on Logic Prog., pp. 253–268. MIT Press, June 1990

42

D. Ranjan et al.

22. L. V. Kal´e, D. A. Padua, D. C. Sehr: OR-Parallel execution of Prolog with Side Effects. J Supercomput 2(2):209–223, (1988) 23. J.C. Kergommeaux, P. Codognet: Parallel logic programming systems. ACM Comput Surv 26(3):295–336 (1994) 24. D.E. Knuth: The Art of Computer Programming, Vol 1. Reading, MA: Addison-Wesley 1968 25. R.A. Kowalski: Logic for Problem Solving. Amsterdam: Elsevier North-Holland 1979 26. J. W. Lloyd: Foundations of Logic Programming. Berlin Heidelberg New York: Springer 1987 27. E. Lusk, R. Butler, T. Disz, R. Overbeek, D.H.D. Warren, A. Calderwood, P. Szeredi, S. Haridi, P. Brand, M. Carlsson, A. Ciepielewski, B. Hausman: The Aurora Or-parallel Prolog System. New Generation Computing 7(2/3): 243–271 (1990) 28. K. Marriot, P. Stuckey: Programming with Constraints. MIT Press 1997 29. K. Muthukumar, M. Hermenegildo: Efﬁcient Methods for Supporting Side Effects in Independent And-parallelism and Their Backtracking Semantics. In: 1989 International Conference on Logic Programming. Cambridge, MA: MIT Press, June 1989 30. E. Pontelli, G. Gupta: Implementation Mechanisms for Dependent And-parallelism. In: International Conference on Logic Programming. Cambridge, MA: MIT Press, 1997 31. E. Pontelli, G. Gupta, M. Hermenegildo: &ACE: A High-performance Parallel Prolog System. In: International Parallel Processing Symposium, pp. 564–571. IEEE Computer Society, Santa Barbara, CA, April 1995 32. E. Pontelli, G. Gupta, D. Tang, M. Carro, M. Hermenegildo: Efﬁcient Implementation of Independent And Parallelism. Comput Lang 22(2/3), 1996 33. E. Pontelli, D. Ranjan, G. Gupta: On the Complexity of Parallel Implementation of Logic Programs. In: Proceedings of the International Conference on Foundations of Software Technology and Theoretical Computer Science, pp. 123–137, Berlin Heidelberg New York: Springer, 1997 34. E. Pontelli, D. Ranjan, G. Gupta: The Complexity of Late Binding in Dynamic ObjectOriented Programming Languages. J Funct Logic Program, Special Issue #2, November 1999 35. D. Ranjan, E. Pontelli, G. Gupta: On the Complexity of Or-Parallelism. New Generation Comput 17(3):285–308 (1999) 36. D. Ranjan, E. Pontelli, L. Longpre, G. Gupta: The Temporal Precedence Problem. Algorithmica, 1999 37. J. A. Robinson: A Machine Oriented Logic Based on the Resolution Principle. Journal of the ACM 12(23):23–41 (1965) 38. A. Sch¨onhage: Storage Modiﬁcation Machines. SIAM J Comput 9(3):490–508 (1980) 39. E.Y. Shapiro, (ed): Concurrent Prolog: Collected Papers. Cambridge MA: MIT Press, 1987 40. K. Shen: Exploiting Dependent And-parallelism in Prolog: The Dynamic Dependent And-parallel Scheme. In: Proc. Joint Int’l Conf. and Symp. on Logic Prog., pp. 717– 731. MIT Press 1992 41. K. Shen: Overview of DASWAM: Exploitation of Dependent And-Parallelism. J Logic Program 29(1/3):245–293 (1996) 42. K. Shen, M. Hermenegildo: A Simulation Study of Or- and Independent Andparallelism. In: Proc. 1991 International Logic Programming Symposium. MIT Press 1991 43. R. Sindaha: The Dharma Scheduler — Deﬁnitive Scheduling in Aurora on Multiprocessor Architecture. In: Proc. of ILPS’93, 1993 44. D.D. Sleator, R.E. Tarjan: A data structure for dynamic trees. J Comput Sys Sci 26:362– 391 (1983)

Data structures for order-sensitive predicates in parallel nondeterministic systems

43

45. L. Sterling, E.Y. Shapiro: The Art of Prolog. Cambridge MA, MIT Press 1994 46. R.E. Tarjan: A Class of Algorithms which Require Nonlinear Time to Maintain Disjoint Sets. J Comput Syst Sci 2(18):110–127 (1979) 47. H. Tebra: Optimistic And-Parallelism in Prolog. In: PARLE 87. Berlin Heidelberg New York: Springer 1987 48. A.K. Tsakalidis: The Nearest Common Ancestor in a Dynamic Tree. Acta Inf 25(1): 37–54 (1987) 49. D.H.D. Warren, V. Santos Costa, R. Yang: The Andorra-I Engine: a Parallel Implementation of the Basic Andorra Model. In: Procs. of the International Conference on Logic Programming. Cambridge, MA: MIT Press 1991

Lihat lebih banyak...

Data structures for order-sensitive predicates in parallel nondeterministic systems

Descrição do Produto

Comentários