SPS-Parallelism + SETHEO = SPTHEO

June 2, 2017 | Autor: Christian Suttner | Categoria: Cognitive Science, Automated reasoning, Automated Theorem Proving, First-Order Logic

Share Embed

Denunciar este link

Descrição do Produto

Journal of Automated Reasoning 22: 397–431, 1999. © 1999 Kluwer Academic Publishers. Printed in the Netherlands.

397

SPS-Parallelism + SETHEO = SPTHEO CHRISTIAN B. SUTTNER? Institut für Informatik, Technische Universität München, Arcisstr. 21, D-80290 München. e-mail: [email protected] (Received: 12 October 1997; in final form: 9 January 1998) Abstract. This paper describes the parallel automated theorem prover SPTHEO, a parallelization of the sequential first-order theorem prover SETHEO. The parallelization is based on the SPS-model (Static Partitioning with Slackness) for parallel search, an approach that minimizes the processor-toprocessor communication. This model allows efficient computations on hardware with weak communication performance, such as workstation networks. SPTHEO offers the utilization of both OR- and independent-AND parallelism. In this paper, a detailed description and evaluation of the OR-parallel part are given. Key words: parallel automated theorem proving, SPTHEO, OR-parallelism.

1. Introduction Automated theorem proving (ATP) research is concerned with the development of systems that, given a set of axioms and a conjecture, determine whether the conjecture is a logical consequence of the axioms. Albeit most ATP systems are complete (i.e., in principle every theorem can be proven), many interesting theorems cannot be proven within reasonable runtime and memory constraints. A major goal of ATP research is to increase the number of ATP problems solvable in practice. This goal can be approached in various ways, such as the investigation of new and refined proof calculi, improvement of search control, and better search pruning techniques. A particularly promising approach for performance improvement of ATP systems is their parallelization. Parallelization offers enormous potential, for several reasons: − The parallel exploration of alternative search paths can lead to an exponential reduction of the required search, due to the avoidance of an early commitment to particular search steps.?? ? This work was supported by the Deutsche Forschungsgemeinschaft within the Sonderforschungsbereich 342, Teilprojekt A5 (Parallelization of Inference Systems). ?? This effect can also be achieved to a limited degree on a single processor, either by running multiple processes under a time-sharing regime or by simulating OR-parallel search via a breadthfirst strategy. Both approaches, however, are limited to very small degrees of parallelism in practice.

VTEX(VR) PIPS No: 189600 (jarskap:mathfam) v.1.15 JARSEL29.tex; 3/03/1999; 10:53; p.1

398

CHRISTIAN B. SUTTNER

− The declarative nature of ATP search problems offers the potential for parallelization with low overhead, since there are no side-effects requiring special treatment or issues limiting the available degree of parallelism as in logic programming. − The increase in processing capacity can allow significantly larger search spaces to be explored in the same amount of time. − The increase in memory capacity can significantly increase processing effectiveness, by avoiding swapping due to memory limitations. More than a dozen operational parallel ATP systems have been developed over the years (see [4, 22] for overviews), with varying success. Analyzing existing systems, one finds that cases of weak performance can be attributed to either synchronization bottlenecks or excessive communication. Ideally, processors involved in the computation would neither synchronize nor communicate between the beginning and the end of the computation. In that case the parallelization overhead would be at a minimum, and all processing power could be spent productively. The SPS-model (Static Partitioning with Slackness, [26, 27]) for parallelizing search-based systems is a parallelization approach developed with the objective of minimizing parallelization overhead without losing performance due to ineffective processor utilization. It is particularly well suited for distributed memory architectures, including high-latency configurations such as computer networks. This paper describes the application of the SPS-model to parallelizing the SETHEO ATP system, version 3.1 [13, 16]. SETHEO is a state-of-the-art theorem prover, regarding both the proof procedure and its implementation: the underlying model elimination calculus has been extended by various types of constraints to prune off unnecessary search paths [13], and the implementation in C is based on an extension of the Warren Abstract Machine [28], providing high inference rates. The goal has been to start with a very efficient sequential system and to develop a parallelization that allows the effective utilization of hundreds of workstations. The resulting SPTHEO system is a portable parallel theorem prover that outperforms the underlying sequential SETHEO system on hard problems, not only in terms of speed but also in terms of problem solving ability: SPTHEO solved a large number of problems that SETHEO had not solved. The remaining parts of this paper will describe the approach taken and its results. Section 2 describes the SETHEO system, Section 3 describes the abstract parallelization model and the SPTHEO system resulting from its application to SETHEO. Section 4 gives an evaluation of SPTHEO-specific properties, and a detailed performance evaluation and comparison between SETHEO, SPTHEO, and RCTHEO (a previous parallelization of SETHEO). Finally, conclusions are presented in Section 5.

JARSEL29.tex; 3/03/1999; 10:53; p.2

SPS-PARALLELISM + SETHEO = SPTHEO

399

2. The Model Elimination Prover SETHEO The SETHEO system is a sequential ATP system for first-order logic. It is based on weak model elimination [15], a modified version of the model elimination proof procedure [14] that excludes factoring and lemmas. Letz [12] showed that model elimination can be viewed as a particular refinement of connection tableaux, in which open tableau nodes are selected for expansion by a depth-first left-to-right search strategy. The SETHEO system consists of two units: a preprocessing unit and a proof search unit. The preprocessing unit performs certain optimizations on the input clauses (such as removing clauses containing pure literals) and generates the input for the proof search unit. The proof search unit (called SAM, for SETHEO Abstract Machine) is an extension of the Warren Abstract Machine [28]. The abstract machine concept has been developed for the efficient execution of logic programming languages: by tailoring operations (e.g., unification) specifically to the given input, the runtime complexity is reduced. In the following, the prover is identified with the SETHEO abstract machine, and problems are assumed to be already represented in an appropriate format. The preprocessing time is typically very small and will not be considered further.

2.1.

INPUT LANGUAGE

The input to the prover consists of a formula in clause normal form. A formula in clause normal form consists of a list of clauses where each clause consists of a list of literals. A literal L is an atom or the negation (¬) of an atom, called positive or negative literal, respectively. An atom is a k-ary predicate symbol followed by k terms (e.g., p(t1 , . . . , tk )), and finally a term is either a variable (e.g., X, Y, . . .), or a l-ary function symbol (e.g., f, g, . . .) followed by l terms (e.g., f (t1 , . . . , tl )). A substitution σ is a function that maps variables to terms, and σ (S) denotes the result of substituting variables in a set of literals S by terms according to σ . A substitution σ is called a unifier for a set of literals S iff σ (S) is a singleton. A unifier σ is called a most general unifier (mgu) for a set of literals S iff a substitution τ exists for every unifier ρ of S such that τ σ (S) = ρ(S). Two literals L and K are said to be contradictory iff they have opposite sign (i.e., one is positive and one is negative) and contain the same atom. Since SETHEO is a refutation proof system, an input formula is assumed to contain the negation of the conjecture to be proved. If the conjecture to be proved is a theorem in the theory described by the remaining clauses, its negation contradicts the theory. In the preprocessing the input to the prover is transformed to a proof problem, which consists of a set of problem clauses, each of which is of the form H ; B1 , . . . , Bk ,

JARSEL29.tex; 3/03/1999; 10:53; p.3

400

CHRISTIAN B. SUTTNER

where H is called the head (literal) and B1 , . . . , Bk are called the body literals (subgoals) of the problem clause. A problem clause that contains only negative literals is called a start clause. A set of problem clauses is obtained from an original clausal formula F consisting of clauses Ci of the form L1i , . . . , Lki , where L1i , . . . , Lki are literals in the following way: An artificial start clause is generated of the form ¬start and for each clause Ci ∈ F that contains only negative literals? a clause with head ‘start’ is generated: start; L1i , . . . , Lki . Then, for each clause Ci ∈ F a set of ki problem clauses P Ci,j , j ∈ [1..ki ] is generated of the form Lji ; L1i , . . . , L(j −1)i , L(j +1)i , . . . , Lki . Thus, for each clause Ci ∈ F a set of problem clauses is generated, where each literal of Ci occurs as head literal once and all remaining literals belong to the body. These problem clauses are called the contrapositives of Ci . 2.2.

PROOF PROCEDURE

The prover operates on a tree of literals called a tableau. Initially, the tableau consists of the root node labeled with the symbol ‘¬start’. The aim is to generate a closed (or solved) tableau, that is, a tableau in which every path from the root to a leaf contains a pair of nodes that is labeled with contradictory literals. Leaf nodes of paths that do not contain contradictory literals are called open nodes and are labeled with an open subgoal to be solved. Nodes that are on the path from the root to a node are the predecessors of the node. A given tableau can be expanded by the application of one of the model elimination inference rules tableau extension and tableau reduction. The application of such a rule is called an inference step. Furthermore, since model elimination is not proof confluent (i.e., not every constructable tableau for an unsatisfiable formula can be closed), backtracking is necessary. These steps are explained in more detail in pseudo-code notation in procedure expand, shown in Figure 1. The recursive procedure expand captures the basic mechanism for searching for a closed tableau. The intuitive interpretation is that, in order to solve a particular problem clause, each of its body literals needs to be solved. Reduction steps are ? A refutation requires the use of at least one clause consisting of negative literals only. If no such clause exists, the formula is known to be satisfiable.

JARSEL29.tex; 3/03/1999; 10:53; p.4

SPS-PARALLELISM + SETHEO = SPTHEO

401

expand( tableau ){ if( no open nodes in tableau ) Terminate with SUCCESS [Comment: Tableau is closed => Conjecture is a theorem] else{ Choose leftmost open node N in the tableau labeled with a subgoal L; if possible, perform a reduction step: - Choose a predecessor N’ of N not tried for solving N before labeled with a literal K, where K and L can be made contradictory by a mgu K-L; - Apply the mgu K-L to all literals in tableau; else if possible, perform an extension step: - Choose a problem clause C not tried for solving N before with a head H where H and L can be made contradictory by a mgu H-L; - Rename all variables in C to be different from variable names occurring in the tableau and in the formula; - Append to node N a node N’ labeled with H; - Append to node N a set of nodes N1..Nk labeled with the subgoals s1, ..., sk of C; [Comment: N1, ..., Nk are open] - Apply the mgu H-L to all literals of tableau; else if possible, backtrack: Undo changes in tableau due to the last tableau expansion for which an alternative exists; else Terminate with FAILURE [Comment: Conjecture is not a theorem] expand( tableau ); } } Figure 1. Pseudo-code notation of the SETHEO proof procedure.

necessary (and possible) only for non-Horn formulas, that is, formulas that contain clauses with more than one positive literal (this is due to the restriction of start clauses to negative clauses). The selection of the leftmost open subgoal in the tableau realizes a depth-first search strategy. However, the uncontrolled application of depth-first search can lead to a path with infinite length. Then a closed tableau may not be found, because alternative tableaux are never considered. In order to ensure completeness of search, iterative-deepening search is utilized.

JARSEL29.tex; 3/03/1999; 10:53; p.5

402

CHRISTIAN B. SUTTNER

Iterative-deepening search (also called consecutively bounded search) [11, 25] is a common control mechanism for achieving search completeness for depth-first search. Briefly, it works as follows. Search tree nodes are expanded depth-first until a quantitative limit (called a search bound) of the constructed data structure is exceeded. Whenever this occurs, backtracking is enforced; that is, the problem represented by the state in which the bound failure occurred is treated as being unsolvable. The search then proceeds until either a solution is found or the complete search space induced by the search bound is exhausted. If no solution is found within the search bound, and backtracking caused by a bound failure has occurred, the search is started from scratch by using an increased search bound value. If no such backtracking did occur, the problem is unsolvable, and the computation terminates. In order to achieve search completeness by this scheme, it is necessary that a search bound guarantees the finiteness of the search space. Several search bounds can be used simultaneously, in which case a bound failure occurs as soon as one of the individual search bounds is exceeded. SETHEO provides several search bounds restricting the size of a tableau. The most important bounds are the depth and inference bounds. The depth bound limits the maximal allowed length of a path from the root to a leaf of the tableau. The inference bound limits the number of rule applications that are used to construct a tableau (this does not limit the number of rule applications required during the search, since rule applications that are backtracked do not count). An example of a formula, its corresponding proof problem, and tableaux that can be generated from it are given in Figure 2. From the top, each row of tableaux represents the tableaux allowed for the inference bound limits 0, 1, 2, and 3, respectively. 3. The SPTHEO System This section describes the SPTHEO system. First, the static partitioning with slackness (SPS) parallelization model is explained. Then, the realization of SPTHEO based on this model is described in detail. 3.1.

THE SPS - MODEL

The operation of the SPS-model can be separated into three phases. In the first phase (task generation), an initial, finite segment of the search space is explored. During this phase, tasks for parallel execution are generated according to the partitioning rules (defined later) employed by the system. Since task generation occurs during only this phase, and independently of the processing of the tasks later, this is called static partitioning (in contrast to dynamic partitioning, where new tasks are generated during parallel execution in order to continually adapt to the search space structure). The number of generated tasks (denoted by the letter m) is equal to or larger than the number of processors (denoted by the

JARSEL29.tex; 3/03/1999; 10:53; p.6

SPS-PARALLELISM + SETHEO = SPTHEO

403

Figure 2. Example for an initial part of the search tree for a formula.

letter n). The term slackness is used to express the relation between the number of tasks and the number of processors. In order to specify a degree of slackness, the slackness parameter spp = m/n denotes the average number of tasks per processor (slackness per processor). In the second phase (task distribution) the tasks generated in the first phase are distributed among the available processors. The availability of all tasks prior to their distribution can be utilized beneficially in several ways. First, redundancies among the tasks can be investigated and removed. Second, relationships among the tasks can be used to guide the mapping onto processors, for example, to reduce the potential for load imbalance. Third, the complete mapping information can be given to each processor. This allows direct processor-to-processor communication for distributed task management (such as termination control), or in case information exchange between particular tasks is desired (e.g., lemma exchange). In the third phase (task execution) the tasks are executed independently on their assigned processors. Since possibly more than one task has to be processed per processor, a service strategy is required. For completeness, quasi-parallel processing is necessary. For each task, a copy of the original sequential search-based

JARSEL29.tex; 3/03/1999; 10:53; p.7

404

CHRISTIAN B. SUTTNER

system, extended by the ability to process a task, is started under a time-sharing regime. Altogether, the following process structure occurs. An initial process will first generate and distribute the tasks. Then, for each task a process is started at a processor. The task management can be centralized or distributed. For centralized task management, each task that terminates sends its result to a master process, which takes care of potential task dependencies and global termination. For distributed task management, the information spread during task distribution needs to be sufficient for each process to take appropriate actions depending on its result. The abstract SPS-model is very general, leaving many issues important for system realization and performance unspecified. Among those details are the questions of how tasks are actually generated, how many tasks (given a particular number of processors) should be generated (i.e., the degree of slackness), and how the tasks should be distributed (i.e., which tasks should be given to which processor). A general, application-independent analysis of these issues can be found in [26]. In the following section, the particular choices made for the SPTHEO system are described. 3.2.

THE DESIGN OF THE SPTHEO SYSTEM

3.2.1. Generation Phase 3.2.1.1. Partitioning Rules. The proof procedure expand allows OR- and ANDpartitioning of the search space: − OR-partitioning Instead of selecting one of the possible extension or reduction steps for a subgoal, all of these steps are performed in parallel. This strategy results in the construction of alternative tableaux in parallel. For a proof, it is sufficient to close any one of these tableaux. − AND-partitioning Instead of working on the leftmost open subgoal in the tableau, all open subgoals are worked on in parallel. Since each of the subgoals needs to be solved for obtaining a closed tableau, this represents the construction of a single tableau in parallel. Since variables may be shared among two subgoals, the mgu’s obtained in parallel need to be compatible. In SPTHEO, both OR-partitioning and independent AND-partitioning are realized. Independent-AND-partitioning is a restricted form of AND-partitioning where dependent subgoals are grouped together into variable disjoint subgoal groups. This form of AND-partitioning is frequently possible in practice [9] and can be processed very efficiently because the subgoal groups can be treated independently (compatibility of the mgu’s is implicitly guaranteed). However, independent AND-partitioning leads to a number of complications with respect to com-

JARSEL29.tex; 3/03/1999; 10:53; p.8

SPS-PARALLELISM + SETHEO = SPTHEO

405

pleteness and correctness of the SETHEO calculus. A combined presentation of both OR- and independent AND-partitioning is therefore beyond the scope of this paper. A detailed description and analysis of independent AND-partitioning can be found in [26]. In the following the discussion will be limited to the OR-partitioning part of the system. 3.2.1.2. Generation Strategy. The standard search strategy of SETHEO is depthfirst search using iterative deepening on the depth of tableaux (empirically, the most effective single bound). For experimentation, depth-bounded and inferencebounded generation as well as their combination are possible in SPTHEO. Experiments [9] showed that iterative deepening search on the inference bound is particularly well suited for task generation, because it allows good control over the resulting number of tasks and because it is more likely to produce independent subgoals than iterative deepening on the depth bound. 3.2.1.3. Generation Control. For users interested in dealing with applications, but not ATP research itself, the specification of search bounds for the generation mode is undesirable, because it does not explicitly constrain the number of tasks to be generated. Therefore SPTHEO allows the desired number of tasks to be specified. Since the partitioning of the search space depends on the problem structure, it is not always possible to generate exactly the desired number of tasks. A heuristic control is used that combines tasks from the last two iterative deepening levels of the generation phase, such that at least the desired number of tasks is generated, attempting to minimize the number of surplus tasks (see [26] for details). 3.2.1.4. Task Encoding. In order to achieve small message sizes for task distribution, instead of the tableau itself the expansion steps necessary to reconstruct the tableau are stored. These are the steps that lead from the root of the search tree to the state representing the tableau, and can be efficiently extracted from the data structures of the abstract machine. A step can be encoded as an integer that represents the code address for a jump to the instruction of the abstract machine program that performs the desired step. Thus, a task consists of a sequence of k integers, where k is the number of steps required to build the tableau. For reduction steps, an integer specifying the path literal to use is also required. When processing a task later, the tableau is recreated by performing the specified sequence of steps. This method for storing and recomputing the sequence of steps to reconstruct a desired state was first proposed in [5]. It has been used successfully in a number of systems [1, 2, 21] and has proved superior to alternative approaches (e.g., as in [3]). In particular, this task representation scheme has been employed and shown to be well suited to SETHEO in the PARTHEO system [18, 20], the first parallelization of the SETHEO system.

JARSEL29.tex; 3/03/1999; 10:53; p.9

406

CHRISTIAN B. SUTTNER

3.2.2. Distribution Phase The mapping of tasks to processors influences the load balance and iterativedeepening balance (difference between the iterative deepening levels active at the same time at different processors). Experiments [10] have shown that good load balance without processing overhead can be achieved with a modulo distribution: Taski → Processori mod n . The reason is that this scheme leads to a low number of common ancestors of the tasks that are assigned to the same processor, which heuristically means that the diversity of search spaces at the same processor is large. While the SPS-model offers further techniques for improving load and iterative deepening balance (see [26]), the low average imbalance encountered for SPTHEO made their realization unnecessary. In SPTHEO a modification of the modulo distribution, which adapts to the load situation of the individual processors of the network, is used. The master process that generates the tasks distributes them in their order of creation. 3.2.3. Execution Phase For the execution phase, three issues need to be discussed: first, how each individual task is processed; second, how the complete set of tasks is processed; and third, how the SETHEO search pruning techniques are affected by the parallelization. 3.2.3.1. Individual Task Processing. A task consists of the encoding of the expansion steps required to reconstruct a particular tableau. For simplicity, the term ‘task’ will be used equivalently to refer to the encoded information, the tableau, and the associated processing of such a tableau, depending on the context. A task is processed by starting a UNIX-process with the task encoding as a command line argument. This process then reconstructs the tableau by deterministically performing the encoded expansion steps. After the tableau has been reconstructed, the iterative-deepening depth-bound is set to the current tableau depth plus one, and a standard SETHEO search begins. Each task is started with a runtime limit. Upon termination, a message indicating the cause (runtime limit expired, success, or failure) is sent to the master process. 3.2.3.2. Task Set Processing. If more than one task is mapped to a processor, each task is processed by a separate SPTHEO UNIX-process. The time-sharing mechanism of the UNIX operating system is utilized to share the capacity of a processor among its tasks. As soon as a task terminates successfully (i.e., it finds a refutation), the master process is notified, and all tasks still running are terminated. If a task terminates unsuccessfully (i.e., no refutation for the task exists), the master process is notified; if all tasks terminated unsuccessfully, the conjecture is not a theorem, and the master process terminates.

JARSEL29.tex; 3/03/1999; 10:53; p.10

SPS-PARALLELISM + SETHEO = SPTHEO

407

3.2.3.3. Partitioning and Search Pruning. SETHEO employs several search pruning techniques that reduce the search space of standard model elimination. These are implemented by the use of inequality constraints on variable instantiations [17]. SPTHEO is based on SETHEO version V3.1, which incorporates tautology elimination, a limited form of tableau subsumption, and regularity [12]. Partitioning parallelization (regardless whether static or dynamic) can influence the effectiveness of the pruning if the persistence of some constraints is reduced, because of the partitioning of the search space. Since, in the SPS model, partitioning occurs only in the initial generation phase, only anti-lemma constraints produced during task generation are affected. Suttner [26] gives a detailed discussion for the individual pruning techniques, and both reasons and empirical evidence why little loss in the pruning effectiveness occurs in SPTHEO. 4. Evaluation of the SPTHEO System This section describes a comprehensive evaluation of SPTHEO. Performance issues specific to the SPS-model are assessed, and performance comparisons with the SETHEO and RCTHEO [6] systems are made. For these comparisons, the first statistically significant evaluations of the SETHEO and RCTHEO systems were performed [26]. The performance comparison of SPTHEO with SETHEO provides information on the absolute improvement over a comparable sequential system, instead of the typically less interesting relative metrics, which compare a parallel system on one processor with itself on n processors. The comparison of SPTHEO with RCTHEO shows the performance relative to a different parallelization approach. The RCTHEO system is a parallelization of SETHEO based on random orderings of the alternative extension steps for solving a subgoal. Thus, like SPTHEO, RCTHEO explores different OR-branches in parallel. A major difference is that, since the reordering extends to all OR-branches encountered during a search, RCTHEO exploits a dynamic partitioning scheme. Another difference is that in RCTHEO the same sequence of search steps can occur on different processors, which is not possible in SPTHEO. RCTHEO achieves good asymptotic performance and has been suggested as a performance reference for OR-parallel approaches [6, 7]. The first part of this section treats issues relevant to all of the following parts. The general evaluation setting is described, and the runtime performance assessment is discussed. Also, information about the communication overhead for SPTHEO and RCTHEO is given. The second part assesses performance data specific to the SPS-model. The third part contains the main evaluation for SPTHEO and contains an extensive comparison between SPTHEO, SETHEO, and RCTHEO.

JARSEL29.tex; 3/03/1999; 10:53; p.11

408 4.1.

CHRISTIAN B. SUTTNER

GENERAL EVALUATION ISSUES

4.1.1. Evaluation Setting All of the experiments were based on the TPTP v1.1.3 problem library for automated theorem proving [23, 24]. The evaluation complies with the guidelines for the use of the TPTP; in particular, the problems were not modified in any way, and no additional information was given to the provers. From the 2652 problems contained in TPTP v1.1.3, 81 problems were excluded from the experiments.? Thus 2571 problems were used. All the systems (SPTHEO, SETHEO, and RCTHEO) originate from the same source code of SETHEO v3.1. All versions are integrated into one code package. Because of the WAM architecture, the implementation influence of one version on the others is negligible; that is, there is no significant slowdown of one version due to code necessary for another version. Both parallel systems utilize a mainly unchanged version of the SETHEO abstract machine during parallel proof search. The control options are the same for all systems and for all experiments, and enable depth-bounded iterative-deepening search with pruning by tautology, clause subsumption, anti-lemma, and regularity constraints. Modifications were necessary to SETHEO in order to facilitate automatic testing of a large number of test problems (e.g., modified output format, adaption to allow background processing, implementation of a runtime limit). Because all the systems share most of the code, the measurements for the different systems are comparable. The performance differences are attributable to the different search space explorations and the approach-specific overheads of the individual systems. All the systems utilize the same preprocessing component for generating abstract machine code. The runtime for the preprocessing is equivalent for all the systems. The preprocessing takes a comparatively short amount of time (usually up to a few seconds) and depends on the number of literals rather than the problem difficulty. Since the focus here is on the parallelization of the search, the preprocessing time has been excluded from the evaluation. All experiments were performed on a network of 121 HP workstations connected by Ethernet. The network consists of 110 HP-720 workstations (58 Drystone MIPS, 38.5 SpecInt’92) with 32 MByte main memory (and 200 MByte disk for swap space) each, and 11 HP-715/50 workstations (62 Drystone MIPS, 36 SpecInt’92) with 80 MByte main memory (and 2 GByte storage disk, and two 500 MByte disks for system and NFS data) each. Each HP-715/50 workstation operates as file server for 10 HP-720 clients. The network is partitioned with bridges into five segments with two servers each, and one segment with one server. In order to obtain statistically significant data, more than 1.5 million individual proof searches were run, requiring more than 6000 hours CPU time and resulting in approximately 700 MByte of output data. To make a large number of experiments ? Different problems were excluded for different reasons, such as errors in their formulation or problems with the preprocessing unit of SETHEO. Please see [26] for a detailed description.

JARSEL29.tex; 3/03/1999; 10:53; p.12

SPS-PARALLELISM + SETHEO = SPTHEO

409

possible, the runtime for each individual task processing for SPTHEO and RCTHEO was limited to 20 seconds. Runtime in this text always refers to CPU time, unless noted otherwise. All system components requiring interaction, including reaction on system errors, such as inability to allocate sufficient memory, had to be changed to facilitate automatic control. The data had to be filtered automatically to detect erroneous data (e.g., caused by machines reset by other users), and affected experiments had to be repeated. 4.1.2. Runtime Assessment A major difficulty in the evaluation of a network-based parallel system is the influence of other users on the available processing capacity, both in terms of processor load and available communication bandwidth. Ideally, exclusive usage of the network would be possible, allowing the true runtimes to be obtained. Measurements showed that the network is highly utilized, even during the night and on weekends. Therefore, extensive experiments based on exclusive usage were not possible. Fortunately, both the SPS-model and the random competition model (RCTHEO) allow analyses which reliably approximate the results that would be obtained under exclusive usage. For SPTHEO, this is achieved by the following. Instead of the standard SPTHEO operation where termination occurs as soon as a proof has been found, each task is independently processed until it fails, succeeds, or is terminated due to the runtime limit. The performance statistics for all tasks are then collected in a file, together with the performance statistics for the task generation phase. Given this data, the wall-clock runtime on some hardware platform can be estimated: Let σ be the first task that leads to a proof. The wall-clock runtime of the parallel system then consists of the runtime Tgen for task generation, the time Tdist (σ ) until σ is distributed, the runtime Tσ for processing task σ , the time Tspp− delay for the processing delay of σ due to time sharing, and the time Tt erminat ion until the success message from σ is received and processed by the master (i.e., wall-clock runtime = Tgen + Tdist (σ ) + Tσ + Tspp− delay + Tt erminat ion). Tgen and Tσ are already contained explicitly in the log file. Tdist can be estimated by measuring the average time it takes to distribute a task, and multiplying this by σ . Tspp− delay depends on the number of tasks, the times when the processing of the tasks start, and runtimes of the tasks that are processed on the same processor as σ . Given a particular distribution of tasks, the number and runtimes of these tasks are contained in the log file. An upper bound on Tspp− delay is obtained by assuming that all these tasks start at the same time as σ . In case the task switching overhead due to quasi-parallelism is not negligible, measurements need to be performed. It is then straightforward to compute Tspp− delay from the log data and measurements. Finally, Tt erminat ion can be obtained by appropriate communication time measurements. In case no proof is found within the runtime limit the runtime can be computed as above, based on the task which terminates last. Note that an estimate of the wall-clock runtime with a bounded error as done above is not possible for dynamic partitioning schemes. For RCTHEO, the wall-clock runtime can

JARSEL29.tex; 3/03/1999; 10:53; p.13

410

CHRISTIAN B. SUTTNER

be approximated similarly, with the simplification that no time for task generation occurs and no time-sharing delay for tasks needs to be considered (assuming one task per processor). The wall-clock time required for distributing a single task (including process startup) using the PVM message-passing library on the HP network under nonexclusive usage has been measured. The measurements were obtained for SPTHEO, and similar values can be assumed for RCTHEO. The average time per task for up to 20 tasks is below 0.1 seconds, with maximal values for individual problems from 0.2 to 0.7 seconds. The average value increases slightly (up to 0.13 seconds) as the number of tasks increases, with a significant increase of the maximal value (up to 2.1 seconds). The reason for this is that as the number of tasks increases, the probability increases that a processor with other load is used. Such a processor responds slowly, increasing the average distribution time. The experiments show that for exclusive network usage an average distribution time of approximately 0.1 wall-clock seconds can be expected for up to 60 tasks. The approach of processing tasks independently and then computing the actual performance of the parallel system afterwards has an important advantage: One set of experiments can be used to obtain results for different model parameters and realization choices. For example, different slackness values can be simulated: A set of m tasks can be virtually mapped to any number of processors from 1 (spp = m) to m (spp = 1). 4.1.3. Calculating Averages For system evaluation based on a large number of test problems, average values of metrics are of particular P interest. The usual arithmetic average of k values v1 , . . . , vk , defined by Ak = ( ki=1 vi )/k, has the drawback that exceptionally large values significantly influence the average. Intuitively, it is often more appropriate to consider average values that are less distorted by exceptionally large Q values. Two commonly used alternatives are the geometric Gk = ( ki=1 vi )1/k P and harmonic Hk = k/ ki=1 v1i averages (both are defined only for vi > 0). The averages Gk and Hk reduce the effect of large values. It holds that Hk ≤ Gk ≤ Ak for all value series v1 , . . . , vk with vi > 0, i ∈ [1..k], k ∈ N. Similar to the tendency of Ak to be too large, Hk leads to an unintuitively small average value if some exceptionally small values occur. Gk is commonly considered to be the best single value approximation to an intuitive average, and it will be used in the following (in [26], additionally the arithmetic and harmonic averages are given). 4.1.4. Search Steps and Runtime In the following evaluations, frequently the number of search steps will be given as a machine- and load-independent indication of the work performed. In order to provide some intuition about these numbers, measurements were performed

JARSEL29.tex; 3/03/1999; 10:53; p.14

SPS-PARALLELISM + SETHEO = SPTHEO

411

with SETHEO to determine the (machine dependent) runtime for some number of search steps. For these inference rate measurements, the 572 problems that SETHEO could solve and required more than 0.01 seconds search time were used.? The search time is the runtime of the abstract machine excluding the time for processing overhead before the search starts. Since this overhead consists mainly of the time required to read the problem into the memory, it will be called the input time. Excluding input time, it was found that SETHEO performs approximately G572 = 8558 search steps (H572 = 5200, A572 = 10800) per second on one of the HP hosts. The minimal and maximal values observed were 100 search steps and 36,400 search steps per second, respectively. 4.2.

SPS - SPECIFIC PERFORMANCE ISSUES

This section treats issues that arise specifically in the context of the SPS-model. In the first part the task generation phase is analyzed, and in the second part load balance is investigated. 4.2.1. Task Generation In the following three topics regarding the task generation phase are addressed. First, simple problems for which no parallel execution is required are discussed. Second, the accuracy of the task generation for producing a desired number of tasks mdesired is examined. Third, the effort required to generate tasks is assessed. In order to measure the actual runtime spent on the task generation, the runtimes shown in this section exclude the input time. 4.2.1.1. Avoided Parallelization. A particular advantage of the SPS-model is the avoidance of parallel execution for very simple problems. A parallel execution phase is initiated only if at least the desired number of tasks can be generated. For problems that exhibit a low degree of inherent parallelism, this may not arise before the problem is solved during the generation phase. Table I gives an overview of the problems that are processed purely sequentially by SPTHEO, for two different values for the desired number of tasks to generate. When demanding 16 tasks, 9% of all problems are solved during the generation phase, with a very low geometric average runtime of only 0.07 seconds. It is interesting to observe how these numbers change as the number of desired tasks is increased to 256. Then nearly 17% of all problems are solved during task generation, with a low geometric average runtime of about 0.09 seconds. A multiprocessor search for one of these problems causes overhead that is likely to increase both the runtime and the accumulated processing time, compared with a SETHEO or SPTHEO processing. The fact that the number of problems solved sequentially is nearly twice as large ? Problems that are solved in less time were not used because then the limited clock resolution would bias the result.

JARSEL29.tex; 3/03/1999; 10:53; p.15

412

CHRISTIAN B. SUTTNER

for the larger mdesired -value shows how the problem difficulty needs to increase to facilitate parallel processing. Table I. Problems that are processed purely sequentially by SPTHEO. Work and runtimes are given in number of search steps performed and seconds, respectively. mdesired = Number of problems that are processed purely sequentially . . . of which are solved during task generation geometric av. for work and runtime maximal work and runtime . . . of which do not exhibit enough parallelism

16 232 230 37.8 35998 2

(9.02%) (99.14%) 0.07 s 6.32 s (0.86%)

256 430 428 94.3 53366 2

(16.72%) (99.53%) 0.09 s 4.31 s (0.47%)

A performance comparison between SETHEO and SPTHEO for the problems solved sequentially by SPTHEO is inappropriate, because different search strategies are used. While the standard search mode of SETHEO is iterative-deepening on the tableau depth, the SPTHEO task generation is controlled by iterativedeepening on the number of tableau expansion steps. As a result, the relative performance depends on the appropriateness of either strategy for solving the problem. The difference for the problems that can be solved during the generation phase is insignificant, because the average time required for solving one of them is only fractions of a second. For those problems that are not solved during the SPTHEO task generation phase, a SETHEO-SPTHEO comparison is not problematic, because for those problems both systems utilize the same search strategy (depth-bounded iterative deepening search). All of the following evaluations will be concerned with problems for which parallel processing occurs. For clarity, these problems will be referred to as the parallelized problems. For mdesired = 16 and 256 there are 2339 and 2141 parallelized problems, respectively. 4.2.1.2. Control of the Number of Generated Tasks. Since the partitioning depends on the search space structure of an individual problem, it is not always possible to generate exactly the desired number of tasks. The SPTHEO task generation attempts to approximate the desired number, with a bias toward producing rather more than less tasks than specified. For mdesired = 16, 22.1 tasks are generated in (geometric) average, and 267.0 tasks for mdesired = 256. The number of problems for which exactly the desired number of tasks were generated were 405 (17.3%) for mdesired = 16 and 110 (5.2%) for mdesired = 256.

JARSEL29.tex; 3/03/1999; 10:53; p.16

SPS-PARALLELISM + SETHEO = SPTHEO

413

4.2.1.3. Effort for Generating Tasks. The generation phase in SPTHEO operates sequentially and increases the serial fraction of the computation. Therefore it is important to consider the number of search steps and the runtime required for generating tasks. For the parallelized problems, the geometric averages for the runtime and the number of search steps are 0.06 seconds/86 for mdesired = 16, and 0.22 seconds/1346 for mdesired = 256. Albeit the task generation phase is optimized toward an approximation of the desired number of tasks rather than toward achieving a short runtime, the average duration is quite short. A reduction of the generation time by parallelization would not lead to a significant performance improvement. 4.2.2. Load Balance Because of the static partitioning concept, load imbalance may arise whenever tasks terminate with no proof found (failing tasks) while the overall computation still continues. Idle processors can be avoided by attempting to avoid the generation of failing tasks, and by use of slackness. In this section, the effectiveness of a simple lookahead for reducing the number of failing tasks is measured, and the remaining occurrence of failing tasks is assessed. Finally, the effect of slackness on reducing load imbalance is investigated. 4.2.2.1. Effectiveness of Lookahead. In SPTHEO, a simple lookahead is used to avoid the generation of tasks that would fail immediately. Besides for reducing load imbalance, such a task set optimization is also important in order to ensure a high payoff for the communication overhead. The lookahead checks that at least one successor tableau exists. Since a tableau usually has several successor tableaux, there is a good chance that further successor tableaux exist, and so on. The lookahead requires little overhead (one unification operation for each task and some constraint tests), but empirically decreases the occurrence of failing tasks considerably. Figure 3 shows the effectiveness of the lookahead in avoiding the generation of failing tasks. It reveals that for some problems the generation of about 70–80% of potential tasks is avoided by the lookahead, and on average 24–42% are avoided for the most common iterative deepening levels for task generation (inference bounds from 2 to 9). The questions are now how many failing tasks still occur, and whether further lookahead could be used to prevent their generation. Table II shows that failing tasks occur within 20 seconds in only about 11% of the problems, for both values of mdesired . Moreover, for a problem in which failing tasks occur, the ratio of failing tasks to all tasks for the problem decreases for increasing mdesired . The reason for both is that failing tasks are more likely to occur in easy problems than in difficult ones. Since more easy problems are solved purely sequentially for larger mdesired values, the number of problems where failures are observed (and also problems where many tasks fail) decreases for increasing mdesired .

JARSEL29.tex; 3/03/1999; 10:53; p.17

414

CHRISTIAN B. SUTTNER

Figure 3. Arithmetic average (♦) · and maximal (+) percentages of tasks deleted for a problem by one step lookahead during task generation for mdesired = 256. The dotted line shows the percentage of problems from all 2141 parallelized problems for which task generation is started at that inference bound value. Table II. Occurrences of tasks that fail before the time limit Tlimit = 20 seconds is reached. Note that not all of these failures actually occur in practice, since a proof may be found before a failure occurs. Tasks failing within Tlimit = 20 seconds occur in how many problems

mdesired = 16

256

269 (11.5% of 2339)

245 (11.4% of 2141)

G269 , G245 of occurrences per problem

26.6%

15.0%

G269 , G245 of search steps tasks with ≤ 10 search steps tasks with ≤ 50 search steps

19.9 43.8% 83.1%

33.4 11.4% 78.4%

Table II also provides information about the sizes of failing tasks. Since most of these tasks are quite small, further lookahead could be used to reduce the number of failing tasks. For example, a lookahead of 50 inference steps per task would remove about 80% of failing task occurrences. Note that the true effectiveness of lookahead is larger than indicated by the numbers, because the search step counts given include the search steps required for recomputing the tasks. In the

JARSEL29.tex; 3/03/1999; 10:53; p.18

SPS-PARALLELISM + SETHEO = SPTHEO

415

task generation phase, those steps have already been performed, and a failure is therefore detected with fewer search steps per task. This is also the reason why the percentages for mdesired = 256 are lower than for mdesired = 16: tasks generated for mdesired = 256 are usually obtained at a deeper inference level, and therefore require more search steps to rebuild during their recomputation. There are only few tasks that fail within 10 search steps, since (as seen in Figure 3) between 2 and 9 inferences are usually spent for their recomputation alone. In summary, it is found that the one step lookahead is sufficient to achieve a small average number of failing tasks for the used runtime limit of 20 seconds. 4.2.2.2. Load Balance and Slackness. Load imbalance can occur as a result of different runtimes for tasks and a nonuniform number of tasks per processor. For measuring the degree of load imbalance quantitatively, we define LI (n) (for Load I mbalance) as Pn (Tpe(n) − Ti ) , LI (n) = i=1 (n − 1) × Tpe(n) where Ti denotes the total runtime spent at processor i and Tpe(n) denotes the total system runtime (i.e., Tpe(n) = maxi=1...n Ti and Ti ≤ Tpe(n) ), and n > 1. The term n − 1 in the denominator represents the largest number of terms that can be different from zero, because there is at least one processor i with Ti = Tpe(n) . LI (n) is an absolute measure, not taking into account the best or worst possible balance that can be obtained for a particular set of tasks. It ranges from perfect balance (LI (n) = 0), which means that all processors finish working at the same time, to maximal imbalance (LI (n) = 1), where exactly one processor is busy during the execution. Figure 4 shows the arithmetic average (the geometric average is not defined, because of problems with no load imbalance) and the maximal load imbalance

Figure 4. Maximal values observed and arithmetic average for the load imbalance as a function of the slackness.

JARSEL29.tex; 3/03/1999; 10:53; p.19

416

CHRISTIAN B. SUTTNER

observed over all parallelized problems, as a function of the number of processors. It can be seen that a significant reduction of load imbalance is already obtained with sppdesired = 4 for mdesired = 16 and sppdesired = 8 for mdesired = 256, where sppdesired = mdesired /n. Further measurements showed that if the number of processors used is adjusted to make the number of tasks a multiple of it, the LI values are noticeably smaller. 4.3.

ATP - SPECIFIC EVALUATION OF SPTHEO

The OR-partitioning performance of SPTHEO is evaluated by comparisons with the SETHEO and RCTHEO systems. The evaluation is split into two main parts. In the first, the problem-solving performance is assessed. For this, the number of problems that are solved under two different types of time limitation are investigated. In the second, the speedup and productivity are investigated. After that, the relationship with the PARTHEO system [20] is discussed. SPTHEO and RCTHEO utilize alternative choices for search steps as a source of parallelism: SPTHEO partitions the search space, while RCTHEO relies on competition between different orderings of the choices. In both systems communication occurs only for the distribution of tasks and termination handling. For both systems a message for task distribution includes the desired control parameters for the abstract machine and problem name. For RCTHEO, a seed for the random number generator is also required. Thus, an RCTHEO task consists of the problem name, the control parameters, and a seed. For SPTHEO, instead of a seed, a task encoding (consisting usually of 10 to 50 integers) is required. 4.3.1. Problem-Solving Performance Commonly, the most important performance characteristic of a theorem prover is the number of problems that can be solved within some time limit. For parallel systems, it is furthermore interesting to evaluate the problem solving scalability, that is, the ability to solve more problems as the number of processors is increased. In the following, first the system performance under runtime limitation is investigated, and then the effect of limiting the accumulated processing time is analyzed. 4.3.1.1. Problem-Solving Performance under Limited Runtime. Figure 5 displays the number of problems solved by SETHEO depending on the runtime limit, with runtime limits ranging from 0 to 1000 seconds (≈ 17 minutes). The left plot uses a linear scale for the runtime limit, while the right plot employs a logarithmic scale. The left plot shows that almost all problems are solved within a comparatively short time, and the chances of obtaining a solution decrease as the runtime increases. This can be explained by the nature of combinatorial search. If no solution is found after a short time, the combinatorial explosion results in such a vast search space that the chances of finding a solution decrease rapidly. The logarithmic plot on the right allows a more detailed analysis. It incorporates

JARSEL29.tex; 3/03/1999; 10:53; p.20

SPS-PARALLELISM + SETHEO = SPTHEO

417

Figure 5. The number of problems solved by SETHEO as a function of the runtime limit.

the difference arising from considering the input time or not (this difference is not noticeable in the left plot). Recall that input time refers to the time required to read a proof problem file into the system and to initialize data structures. Runtime limits usually refer to the complete system runtime, including the input time. However, for judging the quality of a search process, the input time is irrelevant. It is therefore interesting to consider runtime limits that apply to only the search runtime. The right plot contains the curve for this case, which shows a smooth decreasing increase of the number of solved problems as the runtime limit is increased (the steps in the beginning are due to the time resolution employed). Including the input time, however, leads to a significant step in the problem-solving performance at a runtime limit of approximately three seconds. The reason for this is that there are several hundred problems that cannot be solved within shorter runtime limits, simply because reading the problem into the prover uses up too much of the available runtime. For runtime limits above five seconds, the effect of the input time becomes negligible. Therefore, on the given hardware, SETHEO runtime limits should be at least five seconds. Similarly, it is possible to determine a runtime at which most of the theorems provable within some larger runtime limit have already been solved. For example, given a runtime limit of 10 seconds (including the input time), SETHEO solves 747 problems. Extending this runtime limit to 100 and 1000 seconds leads to the solution of 809 (+ 8.3%) and 858 (+ 6.1%) problems, respectively. It can be expected that increasing the runtime limit to several hours per problem will lead to a comparatively small increase in the number of problems solved. This analysis justifies also the 20 seconds runtime limit per task that was chosen for the parallel systems. Figure 6 provides evaluation results for the case that the CPU runtime per processor of each of the three ATP systems is limited to 20 seconds. The number of problems solved by SETHEO is indicated by the horizontal dashed line. There is basically no difference if input time is excluded or included in the limitation (the value shown is for excluding the input overhead, and SETHEO solves one problem less otherwise). For SPTHEO, results are shown for mdesired = 16 and

JARSEL29.tex; 3/03/1999; 10:53; p.21

418

CHRISTIAN B. SUTTNER

Figure 6. The numbers of problems solved by SETHEO, RCTHEO, and SPTHEO, allowing 20 seconds CPU time per processor. The value for SETHEO is denoted by the horizontal dashed line.

mdesired = 256. As mentioned in Section 4.2.1, on average this results in about 22 and 267 tasks, respectively. The results for SPTHEO for different numbers of processors therefore relate to different slackness values. For the case mdesired = 16, n = 1, each task receives a runtime of 0.91 seconds, due to time-sharing. Figure 5 showed that for proof processes limited to less than three seconds the input time can lead to a significant limitation of the proof performance. This effect can be observed here as well. The rectangles show the number of problems solved if the input time is considered. In that case, large slackness values (small n) limit the performance drastically because many tasks do not obtain a reasonable amount of runtime for actual search. This is particularly limiting for SPTHEO because each problem needs to be read into the prover at least twice: it first needs to be read in for the task generation phase, and then each individual task again needs to read in the formula for task processing. The significant influence of the input time becomes obvious when considering the number of problems solvable when excluding the input overhead. A lower bound? for this is shown by the line bars on top of the ? The reason that only a lower bound can be given is the following. For computing the SPTHEO runtime for some number of processors, the runtime used excludes the input time. However, the runtime limit used for each task is still 20 seconds, including the input time. For obtaining the true values, a new set of experiments would need to be performed in which the runtime limit excludes the input time, which is merely approximated here. The same applies for RCTHEO.

JARSEL29.tex; 3/03/1999; 10:53; p.22

SPS-PARALLELISM + SETHEO = SPTHEO

419

rectangles for SPTHEO. Neglecting the input time, SPTHEO achieves a significant improvement over SETHEO even for the single-processor case. Altogether, Figure 6 allows the following conclusions. For small numbers of processors, good performance for SPTHEO requires a balance between the runtime limit and the slackness. If the time available per task becomes too small, many problems will remain unsolved because of the input overhead. The previous analysis shows that a runtime limit for SPTHEO should be at least 3 + 3 × spp seconds (time for task generation plus time for timesharing of spp tasks). For slackness values less than or equal to about three, SPTHEO outperforms both SETHEO and RCTHEO significantly. For example, using 256 processors allowing 20 seconds CPU time per processor, and including the input time, SPTHEO solves 252 (32.9%) problems more than SETHEO, and 145 (16.6%) problems more than RCTHEO. The example still neglects the communication times required for SPTHEO and RCTHEO. Since the overheads for SPTHEO and RCTHEO are approximately the same, the relation between them is not affected by this omission. Assuming a distribution time of 0.1 seconds per task, the distribution of 256 tasks would take 25.6 seconds. In the worst case (assuming that the task distributed last is the best), all SPTHEO and RCTHEO results may be interpreted as obtained for a runtime limit of 20 + 25.6 = 45.6 seconds. For such a runtime limit, SETHEO would solve 786 problems, and SPTHEO would still solve 30.4% more. SPTHEO also outperforms RCTHEO consistently as long as the ratio between the runtime limit and the slackness is acceptable. Regarding the problem-solving scalability of the parallel systems, Figure 6 shows that both systems solve increasingly more problems as the number of processors is increased. For both, the increase decreases for increasing numbers of processors. This is expected, for similar reasons as the reduced effectiveness of increasing the runtime limit for the sequential system (Figure 5). A more detailed examination of the problem-solving performance of SPTHEO for different runtime limits (including the input time) is given in Figure 7. It shows the number of problems solved by SPTHEO for different numbers of processors. The left plot is obtained for mdesired = 16 and 1 to 16 processors. The right plot is for mdesired = 256 and 1 to 256 processors. The bottom curve always shows the number of problems solved with one processor. Each curve immediately above another curve shows the performance for twice as many processors as for the lower one. In particular for the case mdesired = 256, the existence of a minimal threshold value for the runtime limit (depending on the slackness) can be observed, as for SETHEO in Figure 5. Note that the asymptotic upper bound on the number of problems proven by SPTHEO is due to the runtime limit of 20 seconds per task. This limit becomes the constraining factor if the ratio Tlimit /spp is sufficiently large, instead of the limit shown on the horizontal axis. 4.3.1.2. Problem-Solving Performance under Limited Accumulated Processing Time. The previous system comparisons are based on a common runtime limit for

JARSEL29.tex; 3/03/1999; 10:53; p.23

420

CHRISTIAN B. SUTTNER

Figure 7. The problem-solving performance of SPTHEO for an allowed CPU time per processor between 10 and 120 seconds (maximal 20 seconds per task). The horizontal lines show the SETHEO performance for runtime limits of 10 and 120 seconds, respectively.

all systems. It is not surprising that parallel systems with many processors outperform a sequential system, simply because they have much more processing capacity available. In order to assess and compare the computational efficiency, it is appropriate to limit the accumulated processing time instead of the runtime. Formally, the individual processing time spent by processor i on if Ti , i ∈ {1, . . . , n} denotes P a given problem, Tap(n) = ni=1 Ti denotes the accumulated processing time of the processors. If Tpe(n) (pe for parallel execution) denotes the parallel runtime utilizing n processors, then 0 ≤ Ti ≤ Tpe(n) , and Tpe(n) ≤ Tap(n) ≤ n × Tpe(n) (in particular, Tpe(1) = Tap(1)). Figure 8 shows the number of problems solved by the three systems, given a limit on the accumulated processing time of 1000 seconds. For the parallel systems, the additional runtime limit of 20 seconds per task persists. Thus, for small numbers of processors, the product 20 × n may be less than 1000, and therefore more problems may be solvable if the limit per task were increased such that the accumulated processing time limit is matched. The values of n for which the full limit of 1000 seconds is not reached are shown in parenthesis in the figure. As before, the SETHEO performance is denoted by a horizontal line. The SPTHEO performance is nearly independent of the number of processors, but decreases for more than about 100 processors. Again, this is due to input overhead. For example, for 256 processors the 20 seconds limit results in about 4 seconds of runtime per task, which becomes too short for some difficult problems (compare Figure 5). Nevertheless, for all n, significantly more problems can be solved by SPTHEO than by SETHEO, although both systems utilize the same total amount of CPU processing. RCTHEO does not perform so well. There is no number of processors for which more problems are solved than by SETHEO. The reason for this is that

JARSEL29.tex; 3/03/1999; 10:53; p.24

SPS-PARALLELISM + SETHEO = SPTHEO

421

Figure 8. Problem-solving performance with an accumulated processing time limit of 1000 seconds and a runtime limit of 20 seconds per task (both including input times). For values of n shown in parentheses, less than 1000 seconds of accumulated processing time resulted from the runtime limit of 20 seconds for each task.

the large amount of redundant computation in the competition approach leads to an inefficient use of the computational resources.? The previous evaluations are the most relevant from an automated theoremproving viewpoint. For parallelization research, it is further interesting to investigate the effect of the parallelization on the search process. This can be done by investigating the change in the number of search steps due to parallelism. It provides a measurement of the work spent in terms of elementary search operations, and therefore gives unambiguous feedback on the parallelization effect. However, it ignores the communication and input overhead, and therefore should not be taken as a direct indication of the overall performance, which needs to consider overheads as well. 4.3.2. Speedup and Productivity Performance This section describes the SPTHEO speedup and productivity performance. 4.3.2.1. The Speedup and Productivity Metrics. The most common parallel perial t ime ; that is, it formance metric is speedup. It is generally defined as sequent parallel t ime measures the change in runtime obtained by using n instead of one processor. Rela? It should be noticed that the runtime limit of 20 seconds per task for the numbers of processors

shown in parentheses is more restrictive for RCTHEO than for SPTHEO: for RCTHEO, Tap(n) is limited to n × 20 seconds, while for SPTHEO Tap(n) is limited to m × 20 seconds.

JARSEL29.tex; 3/03/1999; 10:53; p.25

422

CHRISTIAN B. SUTTNER

tive speedup is obtained if the parallel system itself, running on a single processor, is used to determine the sequential time. Absolute speedup is obtained if the best comparable sequential system is used to determine the sequential time. Absolute speedups are usually lower than relative speedups, because parallel systems are rarely optimized for the single-processor case. While relative speedup is easy to obtain and frequently used, it is known to favor slow processors and poor programming [8, 19, 26]. Absolute speedup has high pragmatic significance, but in general raises the question which sequential system to use for comparison. Here, SETHEO provides a well-suited reference point, since it is the sequential basis for the parallelization. In the following, both types of speedup are presented. For the performance evaluation of SPTHEO, the runtime will be approximated by the number of search steps performed by the task which requires the least number of search steps to find a proof. This increases the generality of the results, as it makes the results mainly machine-independent. It is possible to do this for SPTHEO because the communication overhead of the parallelization is static (the partitioning of the search space is done sequentially, before the parallel phase, and does not change during parallel processing). Reliable, direct timing measurements on a workstation network would require exclusive usage, which is not possible for the given network. Tseq , the number of search steps perThus, in the speedup definition S(n) = Tpe(n) formed by SPTHEO using n processors is used for Tpe(n) . For the absolute speedup Sabs , Tseq is measured by the number of search steps performed by SETHEO. For the relative speedup Srel , Tseq is measured by the number of search steps performed by SPTHEO on one processor. A second performance metric commonly used is efficiency, defined as E(n) = Srel (n) . The efficiency metric is directly proportional to speedup and should be n more appropriately viewed as scaled speedup. Since it does not incorporate new information, two algorithms that achieve equal relative speedup always have equal efficiency. A more interesting efficiency measure of a parallel computation needs to be based on the amount of processing capacity actually consumed. This is achieved Tseq , where Tap(n) is the accumuby the productivity metric, defined as P (n) = Tap(n) P lated processing time of all involved processors (Tap(n) = ni=1 Ti ). In contrast to efficiency, productivity introduces new measurement information for performance evaluation that is not already inherent in relative speedup. Speedup and productivity represent two different optimization criteria for a parallel computation. Adding processors for a computation may increase speedup, but may also reduce productivity. It is therefore interesting to investigate the trade-off between the two criteria. For this, we combine the speedup and productivity metrics into a new metric, called gain: G(n) = S(n) × P (n). Since both a high speedup and a high productivity are desirable, the product of both aims at a maximization of speedup for minimal computational costs. The gain is inversely proportional

JARSEL29.tex; 3/03/1999; 10:53; p.26

SPS-PARALLELISM + SETHEO = SPTHEO

423

to the product of runtime and accumulated processing time of a computation. G(n) = 1 denotes a balance between speedup and accumulated processing time, using the performance on a single processor as reference. The classical notion of ideal speedup (S(n) = n) with the assumption that Tap(n) = n × Tpe(n) results in G(n) = 1. A computational gain above 1 denotes a positive gain from parallelization in the sense that proportionally more speedup is produced than additional resources (i.e., accumulated processing time) are consumed. Computational gain values below 1 denote a performance loss due to parallelization, either because a slowdown occurred or because the speedup required an overproportional increase of resources. 4.3.2.2. Absolute Speedup, Productivity, and Gain. A comprehensive performance comparison between SPTHEO and SETHEO is given in Figure 9. It consists of three rows of three plots each. The first row shows the speedup, the second the productivity, and the third the gain, all of SPTHEO compared to SETHEO. The first column of each row displays results for SPTHEO with 16 processors, using mdesired = 16 for task generation. The second and third columns of each row are for SPTHEO with 16 and 256 processors, using mdesired = 256 in both cases. Therefore, the first and third columns represent an average slackness between one and two (since slightly more tasks are generated in average than specified by mdesired ), while the middle column represents an average slackness slightly above 16. Each data point represents the result for a particular problem. On the horizontal axis, the number of search steps required by SETHEO for that problem are shown. On the vertical axis, the SPTHEO performance is given. For the speedup and productivity plots, a solid diagonal line indicates all points where both systems require the same number of search steps. Points above this line therefore represent cases where SPTHEO requires more steps than SETHEO, while for points below SETHEO requires more. In the speedup plots, additionally a dashed diagonal line is shown that denotes points where SETHEO requires n times as many search steps as SPTHEO (linear speedup by SPTHEO). Points below this line denote problems for which SPTHEO requires less than 1/n of the search steps of SETHEO (‘superlinear’ speedup). Similarly, in the productivity plots a dashed diagonal line shows points where the sum of search steps accumulated over all processors of SPTHEO is n times as large as the number of search steps spent by SETHEO. Points below this line denote problems for which SPTHEO requires less than n times as many accumulated search steps than SETHEO. Furthermore, in all plots, a vertical line is drawn where the number of search steps spent by SETHEO is 10000. According to the measurements presented earlier, this number is roughly equal to one second of search time (on one of the HP hosts). In all plots, problems that are solved by both SETHEO and SPTHEO are depicted by a ‘ ♦’, · while problems solved by SPTHEO but not by SETHEO are depicted by a ‘+’. For all ‘+’ data points, the number of search steps required by SETHEO to solve the problem is unknown. In those cases, the number of search

JARSEL29.tex; 3/03/1999; 10:53; p.27

424 CHRISTIAN B. SUTTNER

JARSEL29.tex; 3/03/1999; 10:53; p.28

Figure 9. Absolute speedup, productivity, and gain of SPTHEO compared with SETHEO.

SPS-PARALLELISM + SETHEO = SPTHEO

425

steps performed by SETHEO until its termination by the runtime limit is used as a lower bound. Therefore, all ‘+’ data points denote lower bounds for the SPTHEO performance on those problems. For the speedup and productivity plots, the true location of those data points is some unknown distance to their right. A number of interesting observations can be made in Figure 9. Let us begin with some general statements that apply to all plots. The point sets are bounded to the right by the runtime limit of 1000 seconds (i.e., the number of search steps that could be made within that time) used for the SETHEO experiments. Therefore, the rightmost data points refer to the problems which were solved close to the runtime limit by SETHEO, or not solved by SETHEO at all (in which case the number of search steps spent within the runtime limit is used). Also, a lower bound on speedup and productivity can be noticed. This is due to the search steps required for task generation, which are independent of the difficulty of the problem. The necessarily higher value of this bound for mdesired = 256 than for mdesired = 16 is clearly visible. Let us now turn to the individual metrics. First, consider the speedup plots. Moving from the left to the middle plot (both are for 16 processors, but for different slackness values), two observations can be made. First, points are shifted upwards due to the additional search steps required for task generation, which reduces the achievable speedup. Second, a number of points on the left of the vertical line (problems requiring very few search steps by SETHEO) vanish. This is due to the fact that 198 problems are additionally solved during the task generation phase when increasing mdesired from 16 to 256 (compare also Table I). Moving from the middle plot to the plot on the right, it is noticeable that little changes occur for the points left of the vertical line, while many points on the right to the line are shifted downwards. This shift is due to the additional processing resources employed (256 processors), and shows that they become increasingly effective as the difficulty of a problem for SETHEO increases. In all three plots, there are a large number of problems for which SPTHEO requires less than 1/n of the search steps required by SETHEO. This shows that a significant advantage is obtained by parallel processing due to the modification in the search space exploration, rather than the use of additional processing power alone. Let us now turn to the productivity plots. When moving from the left to the middle plot, similar changes as for speedup can be observed. When moving from the middle to the right plot, in contrast to the speedup plots the distribution of points is mainly unaltered. This means that utilizing 256 instead of 16 processors has little effect on the total number of search steps performed. This is quite important, as it indicates that the cost for solving problems does not increase if the slackness is reduced (i.e., more processors are employed). In all plots, it is interesting to observe the large number of points below the solid diagonal line. For these problems, the accumulated number of search steps performed by the parallel system is less than the number of steps performed by the sequential system.

JARSEL29.tex; 3/03/1999; 10:53; p.29

426

CHRISTIAN B. SUTTNER

Finally, let us consider the gain plots. These plots provide a combination of the speedup and productivity plots. Moving from the left plot to the middle plot, most points become shifted downwards. This is due to the effort required for the extended task generation. Moving from the middle to the right plot, the gain for many problems to the right of the vertical line increases. This shows that large numbers of processors are particularly useful for problems difficult for the sequential system. Similarly, it can be observed that the points are more or less distributed along a diagonal line with slope 1 (especially noticeable for the right plot). This means that the gain obtained by SPTHEO increases approximately linearly as the number of search steps required by SETHEO increases. Most problems to the left of the vertical line have a gain below one (albeit many of such problems showed speedup in the speedup plots), due to the increase in the number of accumulated search steps required. Most problems on the right of the vertical line have gain larger than one, and for many of them the gain is larger than n. As before, this denotes the benefit caused by the altered search space exploration. In summary, Figure 9 allows the following conclusions. The benefit from parallelization increases with the difficulty of the problem for the sequential system. The benefit is beyond the advantage obtained by having additional resources. Some improvement is due to the parallel search space exploration, which reduces the total amount of work necessary. With respect to the SPS-model, it has been found that it is not useful to employ large slackness values for this application. Large slackness increases the task generation overhead, and thereby reduces the possible speedup. The time-sharing delay cancels out the advantage that is obtained due to the increased degree of parallelism. Considering the input and task distribution overheads, it is preferable to keep the slackness low. This is acceptable because the largest benefit from slackness for reducing load imbalance occurs already with small values (compare Figure 4). It is often argued that the advantage of any change to the search space exploration can be implemented in a sequential algorithm, and that parallel execution is therefore unnecessary. However, this argument ignores two important issues. First, the incorporation of the search scheme of the parallel system in a sequential algorithm requires the maintenance of several search threads in parallel. In effect, the resulting sequential algorithm is a simulation of a parallel algorithm. Thus, the resulting computation is still parallel in an abstract sense, but merely executed sequentially. Second, in practice this approach is quite limited. In the same way that slackness values are limited by the constraint that all processes need to fit into main memory in order to avoid performance degradation due to swapping, the degree of parallelism that can be achieved by a sequential algorithm is limited by the main memory. Therefore, only a few search threads will typically be possible in such a sequential system. The evaluations presented so far are the most important, as they describe the problem-solving performance and the effect of parallelism on the search. Tradition-

JARSEL29.tex; 3/03/1999; 10:53; p.30

SPS-PARALLELISM + SETHEO = SPTHEO

427

Figure 10. Geometric average of relative speedup of SPTHEO based on the number of search steps performed.

ally, however, evaluations focus on relative speedup figures and absolute runtime speedups. These are investigated below. 4.3.2.3. Relative Speedup of SPTHEO. Figure 10 shows the relative speedup for SPTHEO, based on the number of search steps, for mdesired = 16 and mdesired = 256. It shows nearly linear speedup up to 16 processors, and an increasing deviation from linear speedup for larger n. It should be noted that the plots display the geometric averages; the arithmetic averages are much closer to being linear for large numbers of processors. There are several reasons why the relative speedup does not remain linear for large n. First, and most important, the influence of the serial fraction given by the task generation overhead increases with decreasing spp. For large n, the number of search steps required to solve a task is frequently smaller than the number of search steps required for task generation, which limits the achievable speedup considerably. The relative speedup metric does not provide a useful judgment in these cases, because it ignores the absolute values. Reducing the task generation overhead (e.g., by further parallelization) would improve the relative speedup for large n significantly. However, since the task generation overhead is known to be quite small, the actual performance of the system in terms of theorems provable or absolute runtime would not improve much. This is an example of the misleading role that relative speedup can play in the evaluation of parallel systems. Second, since usually slightly more tasks are generated than specified by mdesired , there are cases where the number of tasks at the processor which first terminates successfully for some n is not decreased by using twice as many processors, due to an unfortunate task distribution. Inspection of some individual problems with low relative speedup showed that by using slightly more processors, guaranteeing spp = 1, better speedup would be obtained. Third, for n = mdesired , the slackness is close to one, which means that a failing task usually causes load imbalance because there isn’t another task running on the affected processor. However, since failures occur rarely on average (compare Table II), this has comparatively little influence on the average speedup.

JARSEL29.tex; 3/03/1999; 10:53; p.31

428

CHRISTIAN B. SUTTNER

Figure 11. Lower bounds on the geometric average of the runtime speedup of SPTHEO compared with SETHEO including input and task distribution times.

4.3.2.4. Estimated Absolute Runtime Speedup. More useful than relative speedup measures, in particular for potential users of the system, is a comparison of the actual runtime improvement that is achieved by SPTHEO. Figure 11 shows lower bounds for the geometric average of the absolute speedup of SPTHEO compared with SETHEO. The runtime measurements for this include the input overhead. Furthermore, the task distribution overhead is included, based on the assumption that the task which terminates the computation is the m/2-th task distributed (where m is the number of generated tasks), and assuming a distribution time of 0.1 seconds per task. The m/2 estimation is pessimistic; measurements show that for mdesired = 16, the successful task with the shortest runtime typically appears within the first 1/3 of the tasks distributed, and appears within the first 1/4 for mdesired = 256 (geometric averages). Figure 11 gives lower bounds on the average speedup because there are a large number of problems for which SETHEO did not find a solution (28%). The true speedup for these problems is therefore unknown, and the runtime limit of 1000 seconds is used as a lower bound of the SETHEO runtime. Therefore, the decreasing speedup slope noticeable for large n is not necessarily an indication of a performance saturation, but is significantly influenced by the SETHEO runtime limit. Again, it may be noted that the arithmetic average speedup values are significantly larger than the shown geometric averages, in all cases. Moving from the left to the middle to the right plot shows a strong dependency of the speedup on the problem difficulty for SETHEO. Including many of the simple problems, low absolute speedup is achieved. This is not surprising, since the overhead of generating and distributing tasks cannot pay off for problems that require only a few seconds sequentially (the estimate of the distribution overhead alone is 128 × 0.1 = 12.8 seconds for 256 processors). For increasingly difficult problems (middle and right plots), the absolute speedup becomes superlinear on average up to some n. This is in accordance with the previous speedup discussions based on the measurements of the number of search steps required. It shows that even with the inclusion of the relevant overheads, on average very good results are obtained. The plots also reveal a significantly better performance for mdesired = 16 than for mdesired = 256, for an equivalent number of processors. This is mainly due to the task distribution overhead. For the applied estimate, the geometric averages are Tdist = 1.1 seconds for mdesired = 16 and Tdist = 13.3 seconds for mdesired = 256, regardless of n. These

JARSEL29.tex; 3/03/1999; 10:53; p.32

SPS-PARALLELISM + SETHEO = SPTHEO

429

times are part of the serial fraction of the computation, and reduce the speedup noticeably. Faster communication would lead to a significant shift upwards of the curves for mdesired = 256, and a small shift upwards for mdesired = 16. This shows how the net improvement for some number of processors depends on the communication performance of the hardware. For good absolute speedup on systems with low communication performance, it is sufficient to use a comparatively small number of processors. But even for low communication performance, large numbers of processors can be used successfully on sufficiently hard problems. 4.3.2.5. Comparison with PARTHEO. An earlier parallelization of SETHEO, based on dynamic partitioning, is the PARTHEO system. A direct comparison of SPTHEO with PARTHEO is not possible because PARTHEO is not operational anymore. A comparison of the performance data published for PARTHEO [20] is problematic for the following reasons. PARTHEO is based on SETHEO v2.63, which employs less effective pruning than SETHEO v3.1, which is used for SPTHEO and RCTHEO. Also, the clause sets used in the evaluation of PARTHEO are not unambiguously identifiable anymore. The best speedup results for PARTHEO were reported for the problems Wos4 and Wos16. In each case, the runtime for SETHEO v2.63 on one processor of the PARTHEO hardware platform (a transputer system) is more than a thousand times longer (namely, 3413 and 36,490 seconds, respectively) than for PARTHEO with 16 processors (which requires 1.84 and 5.89 seconds, excluding the input times). In the TPTP, the clause sets identified with Wos4 and Wos16 are named GRP008-1 and GRP036-3. For GRP008-1 and GRP036-3, SETHEO v3.1 on a single HP-workstation requires only about 20 seconds for each problem. The large difference between the performance results for the SETHEO versions show that a comparison of the parallel systems is questionable. 5. Conclusion In this paper, the sequential theorem prover SETHEO, the SPS-model for parallelizing search-based systems, and the parallel theorem prover SPTHEO based on the SPS-model are presented. Extensive performance evaluations are given, in which SPTHEO is compared with SETHEO and RCTHEO. It is found that SPTHEO achieves a substantial performance improvement compared with SETHEO, in terms of speedup, productivity, and numbers of problems solved for realistic runtime limits. SPTHEO also outperforms RCTHEO. As an indication of the achieved advance, SPTHEO solves 167 problems that were not solved by SETHEO (out of 2571 TPTP problems), and 150 more problems than RCTHEO, with a runtime limit of 20 seconds per task for SPTHEO and RCTHEO, and a runtime limit of 1000 seconds for SETHEO. For non-trivial problems (i.e., excluding those which are solved sequentially with a runtime of less than one second), this represents an increase of the number of solved

JARSEL29.tex; 3/03/1999; 10:53; p.33

430

CHRISTIAN B. SUTTNER

problems by 83% compared with SETHEO and 69% compared to RCTHEO. Up to the investigated number of 256 processors, good scalability of SPTHEO is found for sufficiently difficult problems. The SPTHEO system is highly portable due to its reliance on the PVM message passing library, which is available for most parallel computing platforms. Since communication is minimized in the SPS-model, SPTHEO runs efficiently even on workstation networks. The presented work shows the high potential of parallelism for ATP. A significant performance improvement has been achieved, while neither portability, functionality, or the usable hardware platform has been compromised. A key element for obtaining this performance has been the design goal of minimizing the communication and parallelization overhead. Acknowledgments Many thanks to Geoff Sutcliffe for detailed comments improving the readability and correctness of the text. References 1.

Ali, K. A. M. and Karlsson, R.: The MUSE OR-parallel Prolog model and its performance, in Proceedings of the 1990 North American Conference on Logic Programming, MIT Press, 1990. 2. Astrachan, O. L. and Loveland, D. W.: METEORs: High performance theorem provers using model elimination, in R. S. Boyer (ed.), Automated Reasoning: Essays in Honor of Woody Bledsoe, Kluwer Acad. Publ., Dordrecht, 1991. 3. Bose, S., Clarke, E. M., Long, D. E. and Michaylov, S.: Parthenon: A parallel theorem prover for non-Horn clauses, J. Automated Reasoning 8 (1992), 153–181. 4. Bonacina, M. P. and Hsiang, J.: Parallelization of deduction strategies: An analytical study, J. Automated Reasoning 13 (1994), 1–33. 5. Clocksin, W. F. and Alshawi, H.: A method for efficiently executing Horn clause programs using multiple processors, New Generation Comput. 5 (1988), 361–376. 6. Ertel, W.: OR-parallel theorem proving with random competition, in Proceedings of LPAR’92, LNAI 624, Springer, St. Petersburg, Russia, 1992, pp. 226–237. 7. Ertel, W.: Parallele Suche mit randomisiertem Wettbewerb in Inferenzsystemen, DISKI, 25, Infix-Verlag, 1993. 8. Hockney, R.: Performance parameters and benchmarking of supercomputers, Parallel Comput. 17 (1991), 1111–1130. 9. Holzinger, D.: Analysen zur Independent-AND Parallelisierung unter Berücksichtigung des SPS-Modells, Diplomarbeit, Institut für Informatik, Technische Universität München, 1993. 10. Huber, M.: Parallele Simulation des Theorembeweiser SETHEO unter Verwendung des Static Partitioning Konzepts, Diplomarbeit, Institut für Informatik, Technische Universität München, 1993. 11. Korf, R. E.: Depth-first iterative-deepening: An optimal admissible tree search, Artif. Intell. 27(1) (1985), 97–109. 12. Letz, R.: First-Order Calculi and Proof Procedures for Automated Deduction, Ph.D. Thesis, Technische Hochschule Darmstadt, 1993.

JARSEL29.tex; 3/03/1999; 10:53; p.34

SPS-PARALLELISM + SETHEO = SPTHEO

431

13.

Letz, R., Mayr, K. and Goller, C.: Controlled integration of the cut rule into connection tableau calculi, J. Automated Reasoning 13(3) (1994), 297–337. 14. Loveland, D. W.: Mechanical theorem-proving by model elimination, J. ACM 15(2) (1968), 236–251. 15. Loveland, D. W.: Automated Theorem Proving: A Logical Basis, North-Holland, 1978. 16. Letz, R., Schumann, J., Bayerl, S. and Bibel, W.: SETHEO: A high-performance theorem prover, J. Automated Reasoning 8(2) (1992), 183–212. 17. Mayr, K.: Tableaubasierte Beweissysteme, Ph.D. Thesis, Technische Universität München, in preparation. 18. Schumann, J.: Efficient Theorem Provers based on an Abstract Machine, Ph.D. Thesis, Institut für Informatik, Technische Universität München, Germany, 1991. 19. Sun, X.-H. and Gustafson, J. L.: Toward a better parallel performance metric, Parallel Comput. 17 (1991), 1093–1109. 20. Schumann, J. and Letz, R.: PARTHEO: A high-performance parallel theorem prover, in Proceedings of the 10th International Conference on Automated Deduction (CADE), Springer LNAI 449, 1990, pp. 40–56. 21. Sturgill, D. B. and Segre, A. M.: A novel asynchronous parallelism scheme for first-order logic, in Proceedings of the Twelfth Conference on Automated Deduction, Springer LNAI 814, 1994. 22. Suttner, C. B. and Schumann, J.: Parallel automated theorem proving, in Parallel Processing for Artificial Intelligence 1, Machine Intelligence and Pattern Recognition 14, Elsevier, 1994, pp. 209–257. 23. Suttner, C. B. and Sutcliffe, G.: The TPTP Problem Library (TPTP v1.1.1 – TR Date 7.7.94), 1994, Technical Report AR-94-03, Institut für Informatik, Technische Universität München, Munich, Germany; Technical Report 93/11, Department of Computer Science, James Cook University, Townsville, Australia. Electronic copy via http://wwwjessen.informatik.tumuenchen.de/ftp/Automated_Reasoning/Reports/AR-94-03.ps.gz. 24. Sutcliffe, G., Suttner, C. B. and Yemenis, T.: The TPTP Problem Library, in A. Bundy (ed.), Proceedings of the 12th International Conference on Automated Deduction, LNAI 814, Springer-Verlag, 1994, pp. 252–266. 25. Stickel, M. E. and Tyson,W. M.: An analysis of consecutively bounded depth-first search with applications in automated deduction, in Proceedings of the 9th International Joint Conference on Artificial Intelligence (IJCAI-85), 1985. 26. Suttner, C. B.: Parallelization of Search-based Systems by Static Partitioning with Slackness, Ph.D. Thesis, Institut für Informatik, Technische Universität München. Published by InfixVerlag, volume DISKI 101, Germany, 1995. 27. Suttner, C. B.: Static partitioning with slackness, in Parallel Processing for Artificial Intelligence 3, Elsevier, 1996, to appear. 28. Warren, D. H. D.: An abstract PROLOG instruction set, Technical report, SRI, AI Center, Menlo Park, CA, USA, 1983.

JARSEL29.tex; 3/03/1999; 10:53; p.35

Lihat lebih banyak...

SPS-Parallelism + SETHEO = SPTHEO

Descrição do Produto

Comentários