Automated Test Generation
Automated Test Generation
Automated Test Generation
AUGUST IYYO
870
KOREL,
Abstracr-Test
data generation in program testing is the process of
identifying a set of test data which satisfies given testing criterion. Most
of the existing test data generators 161, [It], [lo], [16], [30] use symbolic
evaluation to derive test data. However, in practical programs this
technique frequently requires complex algebraic manipulations, especially in the presence of arrays. In this paper we present an alternative
approach of test data generation which is based on actual execution of
the program under test, function minimization methods, and dynamic
data flow analysis. Test data are developed for the program using actual values of input variables. When the program is executed, the program execution flow is monitored. If during program execution an undesirable execution flow is observed (e.g., the actual path does not
correspond to the selected control path) then function minimization
search algorithms are used to automatically locate the values of input
variables for which the selected path is traversed. In addition, dynamic
data Bow analysis is used to determine those input variables responsible for the undesirable program behavior, leading to significant speedup of the search process. The approach of generating test data is then
extended to programs with dynamic data structures, and a search
method based on dynamic data flow analysis and backtracking
is presented. In the approach described in this paper, values of array indexes and pointers are known at each step of program execution, and
this approach exploits this information to overcome difficulties of array
and pointer handling; as a result, the effectiveness of test data generation can be significantly improved.
Zndex Terms-Automated
sis, function minimization,
I.
INTRODUCTION
OFTWARE testing is very labor-intensive and expensive; it accounts for approximately 50% of the cost of
a software system development [ 11, [28]. If the testing
process could be automated, the cost of developing software should be reduced significantly. Of the problems involved in testing software, one is of particular relevance
here: the problem of developing test data. Test data generation in software testing is the process of identifying
program input data which satisfy selected testing criterion. A test data generator is a tool which assists a programmer in the generation of test data for a program.
There are three types of test data generators: pathwise test
data generators [6], [8], [IO], [ 161, [30], data specification generators [3], [ 191, [24], [25], and random test data
generators [7]. This paper focuses on pathwise test data
generators which are tools that accept as input a computer
program and a testing criterion (e.g., total path coverage,
statement coverage, branch coverage, etc.) and then au-
Manuscript received December 19, 1988; revised March 24, 1990. Recommended by W. Howden.
The author is with the Department of Computer Science, Wayne State
University, Detroit, MI 48202.
IEEE Log Number 9036267.
0098-5589/90/0800-0870$0
MEMBER,
IEEE
tomatically generate test data that meet the selected criterion. The basic operation of the pathwise generator consists of the following steps: program control flow graph
construction, path selection, and test data generation. The
path selector automatically identifies a set of paths (e.g.,
near-minimal set of paths) to satisfy selected testing criterion. Once a set of test paths is determined, then for
every path in this set the test generator derives input data
that results in the execution of the selected path.
Most of the pathwise test data generators [6], [8], [lo],
[16], [30] use symbolic evaluation to derive input data.
Symbolic evaluation involves executing a program using
symbolic values of variables instead of actual values.
Once a path is selected, symbolic evaluation is used to
generate a path constraint, which consists of a set of
equalities and inequalities on the programs input variables; this path constraint must be satisfied for the path to
be traversed. A number of algorithms have been used for
the inequality solution. As pointed out in [ 181, [27], symbolic evaluation is a promising approach; however, there
are still several problems which require additional research, e.g., the problem of array element determination.
This problem occurs when the index of an array depends
on input values; in this case, the array element that is
being referenced or defined is unknown. This problem occurs frequently during symbolic evaluation. Inefficient solutions exist, for in the worst case all possible index values can be enumerated. Though there has been some work
on this problem [30] and a related problem for record
structures [27], the results are still unsatisfactory.
In this paper we present an alternative approach of test
data generation, referred to as a dynamic approach of test
data generation, which is based on actual execution of a
program under test, dynamic data flow analysis, and function minimization methods. Test data are developed using
actual values of input variables. When the program is executed on some input data, the program execution flow is
monitored. If, during program execution, an undesirable
execution flow at some branch is observed then a realvalued function is associated with this branch. This function is positive when a branch predicate is false and negative when the branch predicate is true. Function minimization search algorithms are used to automatically
locate values of input variables for which the function becomes negative. In addition, dynamic data flow analysis
is used to determine input variables which are responsible
for the undesirable program behavior, leading to significant speed-up of the search process. In this approach, arrays and dynamic data structures can be handled precisely
1 .OO 0 1990 IEEE
871
BASIC
CONCEPTS
VU
A: array[l..lOO] of integer;
low,high,step: integer,
min,max: integer,
i: integer ;
min:=A[low]
max:=A[low];
i := low + step ;
while i < high do
6.7
83
10
11
kiTin
end;
output (min,max);
end ;
Fig. 1. A sample program.
A TEST DATA
GENERATION
PROBLEM
Let P = < nkl, nkz, , ok,, > be a path in the program. The goal of the test data generation problem is to
find a program input x E D on which P will be traversed.
We shall show that this problem can be reduced to a sequence of subgoals where each subgoal is solved using
function minimization search techniques.
Without loss of generality, we assume that the branch
predicates are simple relational expressions (inequalities
and equalities). That is, all branch predicates are of the
following form:
EI op
~52
872
IEEE
TABLE1
TRANSACTIONS
ON
SOFTWARE
ENGINEERING.
VOL.
If,.
NO.
X. AUGUST
IYYO
[ 131, which progress towards the minimum using a strategy based on the comparison of branch function values
Branch
Branch
only. The main advantage of these search methods is that
RcdiCate
Function F
rcl
they do not require regularity and continuity of the branch
function and the existence of derivatives. The most simple strategy of this form is that known as the alternating
variable method which consists of minimizing with respect to each input variable in turn. W e start searching for
a minimum with the first input variable xl (using a onedimensional search procedure) while keeping all the other
input variables constant until the solution is found (the
Let x0 be the initial program input (selected randomly) branch function becomes negative) or the positive minion which the program is executed. If P is traversed, x0 is mum of the branch function is located. In the latter case,
the solution to the test data generation problem; if not, we the search continues from this minimum with the next inhave to solve the first subgoal. Let T = < r,,,, f,,?, - - - , put variable x2. The search proceeds in this manner until
rPZ> be a program path traversed on x0, and let PI = all input variables xl, * * * , x,, are explored in turn. After
completing such a cycle, the procedure continuously
< nk19 nk17 - * , nk, > be the longest subpath of P, referred to as a successful subpath of P on x0, such that for cycles around the input variables until the solution is found
all j, 1 5 j 5 i, nk, = t,,,. PI represents a successfully or no progress (decrement of the branch function) can be
traversed part of P on input x0; the branch violation oc- made for any input variable. In the latter case, the search
curs on execution of branch (nk,, nk,+, ). Let Fj (x) be a process fails to find the solution even if the positive minbranch function of branch (nk,, nk,+,). The first subgoal, imum of the branch function is located.
Now we shall briefly describe the one-dimensional
now, is to find a value of x which will preserve the traversal of PI and cause Fj (x) to be negative (or zero) at search procedure for solving, for instance, the first
subgoal. The one-dimensional search procedure [ 131connk, ; as a result,
(nk,r
nk, + , ) will be successfully executed.
More formally, we want to find a program input x E D sists of two major phases, an exploratory search and a
pattern search. In the exploratory search, the selected
satisfying
input variable Xj is increased and decreased by a small
F;(x) rel; 0
amount, while the remaining input variables are held constant. These are called the exploratory moves. For each
subject to the constraint:
variable change, the program is executed and the conPI is traversed on x,
straint is checked for possible violation by comparing successful subpath PI with the path which is actually being
whererel,isoneof { <, I, =}.
This problem is similar to the minimization problem traversed. If PI has been traversed, branch function Fi (x)
with constraints becausethe function pi (x) can be mini- is evaluated for the new input. On the other hand, if PI
mized using numerical techniques for constrained min- has not been traversed, the constraint violation is reimization [ 121, [ 131 until Fj (x) becomes negative (or ported. In these exploratory moves, the value of the
zero, depending on rel;). The search procedure for solv- branch function is compared to the value of the branch
function for the previous input. In this way, it is possible
ing subgoals is presented in Section IV.
Let XI be the solution to the first subgoal. Now, either to indicate a direction in which to proceed, that is, to make
the selected path P is traversed (as a consequence,XI is a larger move. If the branch function is improved (dethe solution to the main goal ), or the second subgoal must creased) when xi is decreased, the search should proceed
be solved. In the latter case, let P2 = < nk,, np2, * - * , in the direction of decreasingXi. If, on the other hand, the
branch function is improved when Xj is increased, the
nkp nkl+ 19 * . 1 nk,,,> be the successful subpath of P traversed on xi. Let F,n(x) be the branch function of branch search should proceed in the direction of increasing Xi- If
both the decrement and the increment of Xj do not cause
( nk,,, ) nk,,, + , ). The second subgoal is to find a program input x which satisfies F,,,(x) rel,n 0, subject to the con- the improvement of the branch function, the exploratory
straint: P2 is traversed on x. This process of solving search fails to determine the direction for the search; in
subgoals is repeated until the solution to the main goal is this case, the next input variable is selected for considfound, or one of the subgoals cannot be solved. In the eration.
latter case, the search procedure fails to solve the test data
Assuming that the exploratory moves are able to indigeneration problem.
cate a direction in which to proceed, a larger move called
a pattern move (pattern search) is made. After a pattern
IV. BASIC SEARCH PROCEDURE
move, the program is executed and the constraint is
W e now turn our attention to the question of how to checked for possible violation. If the constraint violation
conduct a search to find the solution to a subgoal. Because has not occurred and branch (np,, nk,+,) has not been
of a lack of assumptions about the branch function and taken, the branch function is evaluated and its value is
constraints, we have selected direct-search methods [ 121, compared to the value of the branch function for the pre-
873
874
IEEE
TRANSACTIONS
:= b[j
+ 31 + v - a[k - 31;
ON
SOFTWARE
ENGINEERING.
VOL..
Ih.
NO.
X. AUGUST
IYYO
v E
u(nk,),
2) v E D(nk,,), and
3) forallj,p
<j < t, v$D(nk,).
This influence describes a situation where one instruction assigns a value to an item of data and the other instruction uses that value. The influences between instructions in T can be represented graphically as an influence
network, where each link between instructions represents
direct influence between them. The example of an influence subnetwork is presented in Fig. 2. In this subnetwork, for instance, instruction 3 directly influences instruction 6 by variable max.
Definition 2: W e say that an input variable Xi influences
instruction nk, in T iff there is a sequence < n,,, n,>,
. . . , nr,c> of instructions from T such that:
1) n,., is an input instruction which defines xi,
2, % = nk,3) n,, directly influences n,? by xi, and
4) for all j, 1 c j < w, there exists a variable v such
that n, directly influences n,, , by v.
From the influence subnetwork of Fig. 2 it is easy to
determine that input variable A [ 391 influences instruction
6 because input instruction 1 directly influences instruction 3 by A[ 391 and instruction 3 directly influences instruction 6 by variable max. By the same token, we can
determine that input variables A [ 5 11, low, and step influence test instruction 6. Consequently, these input variables influence branch function F,(x) in subgoal 1 of Example 1. In the same manner, we can determine that input
variables A [ 391, A [ 5 11, low, and step influence F2(x) in
subgoal 2; input variables A [ 391, A [ 63 1, low, and step
influence F3(x) in subgoal 3; input variables high, low,
and step influence branch function F4(x) in subgoal 4.
This information can be used to speed up the search
during the solution of subgoals by considering only those
input variables which have influence on a given branch
function. As a result, the possibility of a fruitless search
can be significantly reduced. For instance, while solving
the fourth subgoal, we are effectively removing two
hundred (2*100) evaluations of branch function F4(x); if
the number of elements in A had been, for example, one
KOREL:
SOFTWARE
TEST
DATA
GENERATION
875
. AIIW
1 Input
min:=A[low]
P min:=A[391 V
max:=A[low]
P max:=A[39]
l/
SUBGOAL
i:=low+step
2
i < high
max < A[i]
P max:=A[511*/
SUBGOAL
i dj
Fig.
1 Arm1 I
j by v.
variables
Risk
Factor
Input
variables
Ftisk
Factor
SUBGOAL
Input
zbl,
Risk
Factor
A[391
A[%]
low
A[631
A[391
high
low
stq
low
SRP
2
s=P
This arrangement of input variables leads to the solution in 2 1 trials (program executions). On the other hand,
the blind arrangement of variables from Example 1 requires 497 trials to find the solution.
During the search, special attention should be paid to
input variables that influence array indexes, that is, those
input variables that influence the selection of the array
elements during program execution. For example, suppose that while solving the first subgoal in Example 1 input variables are ordered in the following way: low,
A [ 391, A [ 511, step, and suppose that during the exploration of variable low its value has been modified, e.g.,
low = 20. If the search continues from this point with
input variable A [ 391, then the search will fail because
A[391 and A[511 do not influence F,(x) any more. It is
easy to see that array elements A [ 201 and A [ 321 now
influence branch function F,(x). Therefore each time an
input variable receives a new value and this variable influences the index variable, a new set of input variables
influencing a branch function should be derived.
VI. DYNAMIC DATA STRUCTURES
The next extension of the automated test data generation involves records and pointers. Records provide a
grouping facility for data items. A record is a collection
of data items, each of which is said to occupy a field
of the record. A distinct name is associated with each
field, and access to individual items within a record is via
these field identifiers. Consequently, every field in a record can be treated as a separate variable.
Pointers, however, create unique problems since the
pointer variable actually represents two variables: the
pointer itself and the record pointed at. A nameless record
of a given type is created by calling the standard proce-
876
type
NodePointer = Node;
Node = record
data : integer ;
left : NodePointer:
tight: NodePoiitez
ud;
pmcedure FIND (L.: NocWointa; y: intern var q: NodePointer);
$lodePointa
agin
1
p := t;
q:=nil;
:
while p o nil do
t=gin
4
if y = p.data then
agin
q:=p;
z
p:=nil;
end
else
7.8 if y < f.data then p := p.left
9
else p := p.right;
end (&$;
Fig. 3. A sample Pascal procedure.
JY$y
left
right
KOREL:
SOFTWARE
TEST
DATA
877
GENERATlON
ret,
p := L
q:=nil
p 0 nil
y := p.data
P y:=rec,.data*I
YCP--da
P ycnc,.data *I
p := p.left
I* p:=rec,.left *I
y =5
p <> nil
Fig. 4. The influence subnetwork for subpath P, from Example 2.
Procedure FIND is executed on this input, and the following successful subpath Pi = < s, 1, 2, 3, 4, 7, 8, 3,
4, 7, 9, 3, 4 > of P is traversed. The violation occurs on
the execution of branch (4, 5). Since the branch predicate
of branch (4, 5) contains an arithmetic expression, we can
apply the search procedure described in the previous section to solve the third subgoal. However, this procedure
fails to find the solution to this subgoal. It should be obvious that the selected path P cannot be traversed for this
shapeof the input data structure. For this reason, we have
to backtrack to the second subgoal and assign the second
possible value to rec2.right. Thus a new record rec3 is created and adr ( rec3) is assigned to rec2.right; rec3.data receives a random value 67. The following input data structure is created:
y=5
reel
20
nil
rec. I
9
3
. rec2
nilnil
Procedure FIND is executed on this input, and the following successful subpath P2 = < s, 1, 2, 3, 4, 7, 8, 3,
4, 7, 9, 3 > of P is traversed. The violation occurs on the
67
nil Inil
rec3
,
878
-F
20
w
3
reel
nil
rec2
nil
Ea
nilnil
rec3
VII. CONCLUSIONS
The test data generation approach described in this paper is based on program execution, dynamic data flow
analysis, and the function minimization methods. It has
been shown that the test data generation problem can be
reduced to a sequence of subgoals. Function minimization
methods are used to solve these subgoals. Moreover, dynamic data flow analysis is applied to speed up the search
process by identifying those input variables that influence
undesirable program behavior; as a result, the number of
fruitless tries can be significantly reduced. The potential
value of this approach is exhibited, for instance, by the
fact that the efficiency of the search does not depend upon
the size of input arrays. The approach of test data generation has then been extended to programs with dynamic
data structures, and the search method, which uses dynamic data flow analysis and backtracking to determine
the shape of the input dynamic data structure, has been
presented. In the approach described in this paper, values
of array indexes and pointers are known at each step of
program execution, and this approach exploits this information to overcome difficulties of array and pointer handling in the process of test data generation.
Several attempts to use actual program execution to
derive test data has been reported in the literature [4],
[26], [29]. One technique of test data generation for a
selected path was described by Miller and Spooner [26];
B. Symbolic Evaluation
One of the problems of the dynamic approach of test
data generation is its very limited ability to detect path
infeasibility. If the selected path is infeasible and the infeasibility is not detected, a large number of attempts can
be performed before the search procedure terminates and
a lot of effort can be wasted. Symbolic evaluation, on the
other hand, is capable of detecting, to a certain extent,
path infeasibility. We, therefore, believe that combination of both techniques for test data generation can be advantageous. For example, symbolic evaluation can be
used to check path consistency before dynamic approach
of test data generation is used. In addition, application of
symbolic evaluation over mixtures of actual and symbolic
data [ 171 in the test data generation process should be investigated.
C. Procedures
To be of practical value, the dynamic approach of test
data generation has to be extended to programs with procedures. This does not seem to pose difficulties because
it is possible by instrumentation to identify variables that
are used or defined in a procedure call for the actual execution; as a result, dynamic data flow analysis can precisely determine influencing input variables in the presence of procedure calls. On the other hand, static analysis,
in general case, fails to identify the used/defined variables
in the procedure call.
D. Global Optimization
The function minimization algorithm applied in our approach of test data generation isbased on the direct search
method published in [ 121, [ 131. One of the problems of
this method is that it allows only to find a local minimum.
In many cases this can prevent solving subgoals, especially for branch functions with several local minimums.
There exists an extensive research in the area of global
optimization, e.g., [3 11, and several techniques have been
developed to find a global optimum. The research is
needed to investigate the application of those techniques
in the dynamic approach of test data generation.
ACKNOWLEDGMENT
879
gram debugging, IEEE Trans. Software Eng., vol. SE-5, no. 1, pp.
60-66,Jan. 1979.
[71 D. Bird and C. Munoz, Automatic generation of random self-checking test cases, IBM Syst. J., vol. 22, no. 3, pp. 229-245, 1983.
formal system for
I81 R. Boyer, B. Elspas, and K. Levitt. SELECT-A
testing and debugging programs by symbolic execution, SIGPLAN
Notices, vol. 10, no. 6, pp. 234-245. June 1975.
dynamic data flow anomaly detec[91 F. Chan and T. Chen, AIDA-A
and Experition system for Pascal programs, Software-Practice
ence, vol. 17, no. 3, pp. 227-239, Mar. 1987.
1101L. Clarke, A system to generate test data and symbolically execute
programs, IEEE Trans. Sofrware Eng., vol. SE-2, no. 3, pp. 215222, Sept. 1976.
ttt1 R. DeMillo, W. McCracken, R. Martin, and J. Passafiume, Software
Testing and Evaluation.
Benjamin/Cummings,
1987.
1121P. Gill and W. Murray, Eds., Numerical Methods for Constrained
Optimization.
New York: Academic, 1974.
1131 H. Glass and L. Cooper, Sequential search: A method for solving
constrained optimization problems, J. ACM, vol. 12, no. 1, pp. 7182, Jan. 1965.
[I41 R. Fairly, An experimental program-testing facilitv. IEEE Trans.
Software Eng., vol. SE-l, no. 4, pp. 350-357, Dee: 1975.
r151 J. Ferrante, K. Ottenstein, and J. Warren. The nroeram deoendence
graph and its use in optimization, ACM Trans. Program. Lang. Syst.,
vol. 9, no. 3, pp. 319-349, July 1987.
1161W. Howden, Symbolic testing and the DISSECT symbolic evaluation system, IEEE Trans. Software Eng., vol. SE-4, no. 4, pp. 266278. 1977.
New York: Mc(171 - , Functional Program Testing and Analysis.
Graw-Hill, 1987.
[18] D. Ince, The automatic generation of test data, Comput. J., vol.
30,no. 1,pp. 63-69, 1987.
[19] W. Jessop, J. Kanem, S. Roy, and J. Scanlon, ATLAS-An
automated software testing system, in Proc. 2nd Int. Conf. Software
Engitieeering, 1976.
[20] B. Korel, The program dependence graph in static program testing, Inform. Processing Lett., vol. 24, pp. 103-108, Jan. 1987.
PELAS-Program
error locating assistant system, IEEE
r211 -,
Trans. Software Eng., vol. 14, no. 9, pp. 1253-1260, Sept. 1988.
[22] B. Korel and J. Laski, Dynamic program slicing, Inform. Processing Lett., vol. 29, no. 3, pp. 155-163, Oct. 1988.
[23] B. Korel, TESTGEN-A
structural test data generation system,
Dep. Comput. Sci., Wayne State Univ., Detroit, MI, Tech. Rep.
CSC-89-001. 1989.
[24] N. Lyons, An automatic data generation system for data base simulation and testing, Data Base, vol. 8, no. 4, pp. 10-13, 1977.
1251 E. Miller, Jr. and R. Melton, Automated generation of testcase datasets, SIGPLAN Notices, vol. 10, no. 6, pp. 51-58, June 1975.
[261 W. Miller and D. Spooner, Automatic generation of floating-point
test data, IEEE Trans. Software Eng., vol. SE-2, no. 3, pp. 223226, Sept. 1976.
[271 S. Muchnick and N. Jones, Eds., Program Flow Analysis: Theory and
Applications.
Englewood Cliffs, NJ: Prentice-Hall International,
1981.
[281 G. Myers, The Art of Software Testing. New York: Wiley, 1979.
~291 M Paige, Data space testing, ACM Perform. Eval. Rev., vol. 10,
no. 1, pp. 117-127, Spring 1981.
[301 C. Ramamoorthy, S. Ho, and W. Chen, On the automated generation of program test data, IEEE Trans. Software Eng., vol. SE-2,
no. 4. PD. 293-300, Dec. 1976.
New
1311 H. Ratsdhek, New Computer Methods for Global Optimization.
York: Halsted, 1988.
Bogdan Korel (M87) was born in Poland. He received the M.S. degree in electrical engineering
from the Technical University of Kiev, USSR, and
the Ph.D. degree in systems engineering from
Oakland University, Rochester, MI. in 1986.
He is an Assistant Professor in the Department
of Computer Science at Wayne State University,
Detroit, MI. His research interests include automatic software testing and debugging, software
development environments. and distributed systems.
Dr. Korel is a member of the IEEE Computer Society and the Association for Computing Machinery.