SMOTL—A System to Construct Samples for Data Processing Program Debugging

Share Embed


Descrição do Produto

60

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-5, NO. 1, JANUARY 1979

SMOTL A System to Construct Samples Processing Program Debugging JANIS BICEVSKIS, JURIS BORZOVS, ULDIS STRAUJUMS, ANDRIS ZARINS,

AND

for

Data

EDWARD F. MILLER, JR.

only the possibility of automatic CSS construction without any substantiation of this principle. The SMOTL system constructs a CSS automatically. It differs from other similar systems in two features: 1) SMOTL is oriented to commercially practical programs, whereas other known systems are meant for scientific programs. Index Terms-Analysis of programs, program testing, program valida2) SMOTL is supplied with a strategy for choosing the protion, symbolic execution, test data generation. gram paths to analyze, whereas, in the majority of other systems, the task of selecting such paths is done manually. SMOTL constructs a CSS according to the program text without huI. INTRODUCTION man interference. I N RECENT YEARS more and more attention has been paid to program validation. One of the common approaches is A. System Survey to test a program on different sets of input data [1] - [6], [81 The SMOTL system is intended primarily for batch process[11]. of programs written in SMOD; SMOD is a Cobol-like laning Although absolutely complete program testing can be achievbut has no means of direct access to data on the exguage, able by executing it on all possible program inputs, most reternal storage devices. searchers agree that a natural criterion of program testing comBesides compiling and linkage-editing the SMOTL system pleteness is the execution of all program branches (or, in contains facilities of the following kinds: other words, traversing all exits of statements) for a finite detection 1) of certain run-time errors; number of cases [1 ] - [4], [8]. 2) constructing for tests program validation; A path P (a branch P) is called feasible if there exist input to 3) put out samples on the external storage devices; data which force the program to execute the path P (the to 4) execute the program repeatedly on the constructed branch P). Further on we shall consider only those tests on tests (regression testing); which the program terminates normally. These are said to 5) to print and compare program output data. be permissible. is organized logically into six phases, described next. SMOTL A finite test set S for the given program will be called a 1) Replacement of the source program by its internal reprecomplete test set (CSS), if it consists of permissible tests and sentation in the form of a directed graph. every program branch that can be executed is executable on 2) Static of the program with the objective of time analysis some test in S. and memory space optimization by subsequent SMOTL phases. The idea of program debugging by CSS construction arises Construction 3) of a set of paths, i.e., the test set covering mainly from applications programming experience. The pro- containing only feasible paths and covering all feasible branches. grammer always tries to select a set of tests such that the pro- 4) Minimization of the covering set. gram executes all its branches. In this paper, we shall discuss 5) Construction of test data for paths from a minimized covering set. Manuscript received June 29, 1977; revised April 7, 1978. The con6) Output of test data to an external storage device, compiltributions made by the last-named author have been purposely limited in scope to retain as much as possible the original flavor of this work. ing and linkage editing of the SMOD test, repeated execution, The objective was to clarify the presentation as much as possible, to and printing of generated results. This latter phase has nothing uniformize the terminology to that most likely to be understood by to do with CSS construction and it will not be discussed in readers in the United States, and to clean up a few minor technical details. The unique attitude and approach to problem-solving exhibited the paper. Abstract-The possibility of automatic construction of a complete set of program tests is considered. A test set system is said to be complete if every feasible program branch (segment) is executed by it. The complete test set construction algorithm for commercially oriented data processing programs is outlined, and the results of its functioning on real programs are analyzed.

by the original manuscript has been retained as much as possible. J. Bicevskis, J. Borzovs, U. Straujums, and A. Zarins are with the Department of Computer Science, Latvian State University, Riga, Latvia. E. F. Miller, Jr. is with Software Research Associates, San Francisco, CA.

B. Problem Solvability We shall demonstrate in the following that for every real programming language the problem of CSS construction is solvable

0098-5589/79/0100-0060$00.75 © 1979 IEEE

61

BICEVSKIS et al.: CONSTRUCTING SAMPLES FOR PROGRAM DEBUGGING

the theoretical point of view. The crucial point to obis that every program variable can be assigned only a finite number of different values. To construct a CSS we use an algorithm (described next) for exhaustive search of individual variables' values. We begin the description of the algorithm by indicating how to check the execution feasibility of an individual path. Subsequently, we shall discuss how the infmiite set of all different paths can be treated. We shall consider only initial paths in this section, i.e., paths starting from the first statement. We execute statements of the path beginning from the first one. If the values of operands are known before the execution of a statement then the execution of this statement is carried out in the usual way by the simulation of program operation. The values of operands may not be known in two cases: Case 1: When a data input statement is to be executed. In this area the corresponding values are to be assigned from input data, which certainly are not known. We evaluate these variables by assigning some concrete fixed values from the corresponding domains of possible values of the variables. These domains are uniquely determined by the specifications of the variables and this statement. It is important to note that these domains are finite because only numbers of limited length and precision, including strings of limited length, can be represented in the memory of a computer. Case 2: When there is a statement to be executed which does not read values from input data and nevertheless the values of some operands are not known. In this case the path is obviously infeasible and the further analysis is not needed. The described process terminates in two cases: Case A: When we have succeeded in executing a path from the beginning to the end. Consequently, there will be fixed values for all the input data. This fixation of the values determines the test data which force the program to execute this path. Fixations of this kind are called feasible fixations. As soon as we have a sample which forces execution of a path, this path (obviously) is feasible. Case B: When we have not succeeded in the path execution because of the fact that the selected fixation of input data values does not satisfy some conditional statement of the path. In this case, we must repeat execution of the path fixing another value for input data. As soon as the domains of possible values of all variables are finite and every path contains only a finite number of data input statements, the number of all different possible fixations then also becomes finite. The feasibilty of the path can be checked by effectively examining all of these fixations. We have proved that the feasibility of any individual program path can be determined. Now we shall demonstrate how the infinite set of all different initial paths can be treated. We can examine the paths in ascending order of their length and try to select a set of feasible paths having the program exit statement for the last statement of the path (e.g., a STOP path) which covers every program branch. CSS construction will be completed if we succeed in finding such a set. However, if there is a program branch which has no permissible samples for execution of it then the described profirom

serve

cedure does not terminate. This happens by having to consider paths of continually increasing length and waiting till some feasible sToP-path contains the chosen branch. Thus we are forced to analyze an infmite number of paths (except the case of trivial programs without loops). This apparent difficulty can be avoided, nevertheless. It is clear that at every instant the program's further behavior is uniquely determined by the values of program variables and does not depend on the path on which these values are assigned. More precisely: let feasible paths PA and PB end with the same statement and let the values of the variables at the end of the path PA for a feasible fixation A coincide with the values of the corresponding variables at the end of the path PB for a feasible fixation B. Then, if the path obtained by concatenation of path PA and its continuation PC is feasible, so is also the path obtained by concatenation of paths PB and PC. Note that if two paths differ only in the number of loop traversals and if the value of each variable at the loop exit statement is equal for both paths then there is no need to consider the longer path. Hence, let us consider only paths with different values of variables at the exit statements of loops. The number of such paths is of course finite; this fact permits us to construct a CSS in all cases. From the practical point of view, however, the algorithm described cannot be applied directly because of the possible need of an unending exhaustive search. To find more suitable methods for CSS construction we have to take into account that the domains of variables' values are practically infinite. In theoretical investigations [1], [2], [4], [10], [II] the infinity of the domain is assumed explicitly. Under this assumption, the CSS construction problem is theoretically unsolvable. However, the CSS construction algorithm discussed here applies to a wide class of practical data-processing-oriented programs. The algorithm implemented in SMOTL is similar to the one previously described, but instead of concrete values of variables a generalized description of possible values called a concept "state," is used. It is possible by using some standard optimization methods to reduce the number of different states to a small and manageable number for the case of realworld data-processimg-onented programs. This technique allows us to construct an acceptable CSS in realistic computation times. II. DETAILED SMOTL DESCRIPTION

A. Phase I The first phase of SMOTL operation replaces the source program by a directed graph. The nodes in the graph are the statements of the program and the edges represent the program flow. Every node in the graph contains not only the statement code but also references to the description of operands. Such a representation allows the SMOTL system to traverse the program paths easily and store the program in the core compactly. (About 4K bytes are needed for a program containing 300 statements, to cite a specific example.) In discussing the next phases of SMOTL operation we shall use the PL/l -style program EXAMPLE to demonstrate the sys-

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-5, NO. 1, JANUARY 1979

62

OPTIONS (MAIN); PAY FILE INPUT; NAME FILE INPUT; PAYBILL FILE OUTPUT; 1 P, 2 CODEP PICTURE '(4)9', 2 INCOME PICTURE '(4)9V99', 2 CREDIT PICTURE '(4)9V99'; DECLARE 1 N, 2 CODEN PICTURE '(4)9', 2 NAMEN PICTURE '(4)9V99'; DECLARE 1 PRINTLINE, 2 CODE PICTURE '(4)9', 2 FILL1 CHARACTER (10) INITIAL (' '), 2 NAMEP CHARACTER (12), 2 FILL2 CHARACTER (20) INITIAL (' '), 2 OUTCOME PICTURE '(4)9V99'; OPEN FILE (PAY), FILE (NAME), FILE (PAYBILL); ON*ENDFILE (PAY) GOTO L9; ON ENDFILE (NAME) GOTO L6; READ FILE (PAY) INTO (P); READ FILE (NAME) INTO (N); IF CODEP < CODEN THEN DO; DISPLAY ('NAME NOT FOUND'); GOTO L8; END; IF CODEP > CODEN THEN GOTO L2; OUTCOME = INCOME - CREDIT; CODE = CODEP; NAMEP = NAMEN; WRITE FILE (PAYBILL) FROM (PRINTLINE); GOTO L8; CODEN = +9999; GOTO L3; CODEP = +9999; READ FILE (PAY) INTO (P); GOTO L3; CLOSE FILE (PAY), FILE (NAME), FILE (PAYBILL); EXAMPLE;

EXAMPLE: PROC DECLARE DECLARE DECLARE DECLARE

LO:

Ll: L2: L3: L4: L5: L6: L7: L8: L9: END

Fig. 1. Text of the program EXAMPLE.

tem functioning. This program forms the file PAYBILL by processing the ordered files PAY and NAME. (See Fig. 1.) B. Phase II The reader may easily pass over the following definitions of essentially located statements (ELS's) and essential variables without influencing his understanding. One may consider all program statements as ELS's and all program variables as essential variables. For the sake of memory efficiency, however, we have tried to minimize the number of ELS's and essential

a conditional statement of this path. 2) The value that the variable x has immediately before execution of the ELS is not changed until it is analyzed in a conditional statement of the path. For instance, in the program EXAMPLE the statement LO has no essential variables because on every path which follows LO every variable is assigned a new value from the input file before it is analyzed in a conditional statement. L9 has no essential variables because there are no paths following this has the essential statement that conditional statement variables CODEP and CODEN assigned to it. Unreachable program statements can be detected along with selection of the ELS's; references to uninitialized variables can be found when identifying essential variables. EXAMPLE contains an unreachable statement L7 but it does not contain references to uninitialized variables. Finally, the second phase of SMOTL produces a list of all the possible paths from one ELS to another (further referenced as subpaths). In EXAMPLE such subpaths are:

expression which is analyzed in

(LO,Ll,L2,L3) (LO,Ll,L2,L6,L3) (L3,L8,L3) (L3,L8,L9) (L3,L4,L2,L3) (L3,L4,L2,L6,L3) (L3,L4,L5,L8,L3) (L3,L4,L5,L8,L9) (LO,Ll,L9) C. Phase III The third phase of SMOTL tries to construct a set of feasible paths ending with the program exit statement such that this set contains every feasible branch. The most important remaining problem with the method used in SMOTL is to choose a criterion for procedure termination in the case when some program branch is infeasible. This problem can be solved by the aid of a concept of the state-a generalized description of all possible values of a variable. Because of the fact that the construction of a state is highly complicated and depends on the features of concrete programming language we shall only illustrate the construction of a state by simple examples in this paper. States will be used for two purposes. 1) If there are two paths which differ only in the number of loop traversals we shall not consider the longer path in the case when the states associated with the last statements of the paths coincide. 2) When we investigate the feasibility of a path which is a concatenation of paths PA and PB, the concept of the state is used for the possibility of separate analysis of PA and PB. A state contains all the information that must be known in addition to the fact of feasibility of the path PA in order to recognize the feasibility of the path PB. The feasibility of

variables. The second phase of SMOTL operation selects a set of program statements such that every loop contains at least one statement from this set. We shall call the statement from this set essentially located statements (ELS's). Usually the ELS is a loop exit statement. In addition, the first program statement and every program EXIT statement are considered to be ELS's. If several loops have a common part, then the ELS's are selected from this common part in order to minimize the number of ELS. In the program EXAMPLE the statements labeled LO, L3, and L9 are the ELS's. Associated with every EILS there is a list of variables we call essential variables associated with the ELS. A variable x is said to be an essential variable for a certain ELS if there exists a path beginning with this ELS such that the following statements hold true: 1) The variable x is analyzed (referenced) in some condi- the tional statement of this path, or the variable x enters some the

path

PB can

be

recognized

from

the

fact of feasibility of

path PA and from the set of all states ascribed to the last

BICEVSKIS et al.: CONSTRUCTING SAMPLES FOR PROGRAM DEBUGGING

statement of PA. Moreover, it is guaranteed that if the paths PA and PB are feasible so is the path obtained by concatenation of PA and PB. The third phase of the system consists of STRATEGY and ANALYZER. STRATEGY chooses a subpath and passes it on to ANALYZER together with description of relationships of variables at the beginning of the subpath (i.e., its state). ANALYZER 's task is to decide if the given subpath is feasible with the given state. If the subpath is feasible, ANALYZER constructs a new state and STRATEGY ascribes it to the last statement of this subpath. Further on STRATEGY selects a new subpath and state and turns to ANALYZER again, etc. The end of the work of the third phase is determined by STRATEGY.

1) STRATEGY: STRATEGY begins its operation from the first program statement and the empty state. It selects a subpath to the adjacent ELS and turns to ANALYZER with the question: Is this subpath feasible with the empty state? If the subpath is feasible, then ANALYZER constructs a state which describes the values of essential variables after the execution of the analyzed subpath. Then STRATEGY ascribes the new state to the ELS which is the last statement of the analyzed subpath. Further, STRATEGY selects another- subpath which a) follows the subpath investigated before, and b) which has not yet been analyzed with the newly constructed state. ANALYZER investigates the selected subpath with the given state, etc. In that way STRATEGY constructs a feasible path adding one subpath to it on every step. If some subpath turns out to be infeasible then STRATEGY moves one subpath back and selects another subpath to analyze. The attempted construction of a feasible path in this manner terminates for one of two reasons: 1) the program exit statement is reached; 2) STRATEGY comes to an ELS traversed earlier and the newly constructed state coincides with one of those ascribed to this ELS before and, in addition, every subpath following the ELS has already been investigated (with the same state) in some earlier step. In the first case STRATEGY includes the constructed path in the covering set of paths. In the second case STRATEGY keeps the constructed path in mind. When construction of some other feasible path is completed STRATEGY examines whether it is possible to add some part of path ending with program exit statement to the prior path. If STRATEGY SUCceeds the feasible STOP-path obtained will be included into the covering set. After the construction of one path STRATEGY starts to construct a new feasible path. The starting point of the construction of the next path is an ELS on some path constructed before and chosen so that the number of not-investigated subpath/ state pairs is the greatest possible. Here we make use of the second property of states, which allows the process to start the investigation of every path not necessarily from the first program statement, and which analyzes every subpath with every state only once. 2) ANALYZER: ANALYZER'S task is to decide if the given subpath is feasible with respect to the given state. ANALYZER

63

performs a symbolic execution of a subpath. When a subpath is symbolically executed no values are assigned to the variables, as in the case of normal execution; however, the operations are performed on symbolic values. For instance, the READ statement usually assigns a value from the data set to a variable; in symbolic execution a variable receives a certain symbol for

its value. This symbol denotes the location of the corresponding value in the data set. Thus after the path (LO,L1,L2,L3) in the program EXAMPLE the variables CODEP and CODEN Will receive the symbolic values (PAY, 1-4), and (NAME, 1-4)1, respectively. (PAY, 1-4), represents the value located in the first four bytes of the first record in the file PAY; (NAME, 1-4), represents the value located in the first four bytes of the first record in the file NAME. Symbolic execution of a path results in a system of inequalities which represents the constraints over symbolic values. By checking the consistency of the constraints (solving the system of inequalities) ANA-

determines the path feasibility. In the given example the subpath (L0,L1,L2,L3) does not contain any conditional statement and the system of inequalities is empty. It has a solution; therefore, the subpath (LO,Ll ,L2,L3) is feasible. ANALYZER then constructs the state at the end of the subpath (L0,L1,L2,L3). A state is obtained by direct simplification of the inequality system. (Equalities expressing the values of essential variables at the end of the path are added automatically.) In our example, the program state after the subpath (LO, L1, L2,L3) is the following: LYZER

CODEP =

(PAY,1-4)1

CODEN = (NAME,1-4)1.

On the next step STRATEGY turns to ANALYZER with the subpath (L3,L8,L3) and with the state constructed before. ANALYZER constructs the system- of inequalities (PAY,1-4)1
Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.