Demand Interprocedural Dataflow Analysis

Download as pdf or txt
Download as pdf or txt
You are on page 1of 38

Demand Interprocedural Dataow Analysis

Susan Horwitz, Thomas Reps, and Mooly Sagiv University of Wisconsin

Abstract An exhaustive dataow-analysis algorithm associates with each point in a program a set of dataow facts that are guaranteed to hold whenever that point is reached during program execution. By contrast, a demand dataow-analysis algorithm determines whether a single given dataow fact holds at a single given point. This paper presents a new demand algorithm for interprocedural dataow analysis. The algorithm has four important properties: It provides precise (meet-over-all-interprocedurally-valid-paths) solutions to a large class of problems. It has a polynomial worst-case cost for both a single demand and a sequence of all possible demands. The worst-case total cost of the sequence of all possible demands is no worse than the worst-case cost of a single run of the best known exhaustive algorithm for the same class of problems. Experimental results show that in many situations (e.g., when only a small number of demands are made, or when most demands are answered yes) the demand algorithm is faster than the current best exhaustive algorithm.

CR Categories and Subject Descriptors: D.2.2 [Software Engineering]: Tools and Techniques; D.3.4 [Programming Languages]: Processors compilers, optimization; E.1 [Data Structures] graphs; F.2.2 [Analysis of Algorithms and Problem Complexity]: Nonnumerical Algorithms and Problems computations on discrete structures; G.2.2 [Discrete Mathematics]: Graph Theory graph algorithms General Terms: Algorithms, Experimentation, Theory Additional Key Words and Phrases: demand dataow analysis, distributive dataow framework, graph reachability, interprocedural dataow analysis, interprocedurally realizable path, interprocedurally valid path, meet-over-all-valid-paths solution

Currently at the University of Chicago. This work was supported in part by a David and Lucile Packard Fellowship for Science and Engineering, by the National Science Foundation under grants CCR-8958530 and CCR-9100424, by the Defense Advanced Research Projects Agency under ARPA Order No. 8856 (monitored by the Ofce of Naval Research under contract N00014-92-J-1937), by the Air Force Ofce of Scientic Research under grant AFOSR-91-0308, and by a grant from Xerox Corporate Research. Part of this work was done while the authors were visiting the University of Copenhagen. A preliminary version of this paper appeared in SIGSOFT 95: Proceedings of the Third ACM SIGSOFT Symposium on Foundations of Software Engineering (Washington DC, October 10-13, 1995) [Hor95] Authors addresses: Computer Sciences Department, Univ. of Wisconsin, 1210 West Dayton Street, Madison, WI 53706, USA; Computer Science Department, University of Chicago, Ryerson Hall, 1100 E. 58th St., Chicago, IL, 60637 USA. Electronic mail: {horwitz, reps}@cs.wisc.edu; [email protected]

-2-

1. Introduction An exhaustive dataow-analysis algorithm associates with each point in a program a set of dataow facts that are guaranteed to hold whenever that point is reached during program execution. This information can be used in a variety of software engineering tools (for example, to provide feedback to the programmer about possible errors such as the use of an uninitialized variable, or to determine whether a restructuring transformation is meaning-preserving) or can be used by an optimizing compiler (in choosing valid optimizing transformations). It is not always necessary to compute complete dataow information at all program points. A demand dataow-analysis algorithm determines whether a given dataow fact holds at a given point [Bab78,Due93,Rep94c,Rep94a,Rep94b,Due95]. Demand analysis can sometimes be preferable to exhaustive analysis for the following reasons: Narrowing the focus to specic points of interest. Software-engineering tools that use dataow analysis often require information only at a certain set of program points. Similarly, in program optimization, most of the gains are obtained from making improvements at a programs hot spotsin particular, its innermost loops. However, current tools typically include a phase during which an exhaustive interprocedural dataow-analysis algorithm is used. There is good reason to believe that the use of a demand algorithm will greatly reduce the amount of extraneous information computed. Narrowing the focus to specic dataow facts of interest. Even when dataow information is desired for every program point p, the full set of dataow facts at p may not be required. For example, it is probably only useful to determine whether the variables used at p might be uninitialized, rather than determining that information for all of the variables in the procedure. Reducing work in preliminary phases. In problems that can be decomposed into separate phases, not all of the information from one phase may be required by subsequent phases. For example, the MayMod problem determines, for each call site, which variables may be modied during the call [Ban79,Coo88]. This problem can be decomposed into two phases: computing side effects disregarding aliases (the so-called DMod problem), and computing alias information [Ban79,Coo89,Coo88]. Given a demand (e.g., What is the MayMod set for a given call site c?), a demand algorithm has the potential to reduce drastically the amount of work spent in earlier phases by propagating only relevant demands (e.g., What are the alias pairs (x, y) such that x is in DMod(c)?). Demand analysis as a user-level operation. It is desirable to have program-development tools in which the user can ask questions interactively about various aspects of a program

[Mas80,Wei84,Lin84,Hor86]. Such tools are particularly useful when debugging, when trying to understand complicated code, or when trying to transform a program to execute efciently on a parallel machine. Because it is unlikely that a programmer will ask questions about all program points, solving just the users sequence of demands is likely to be signicantly less costly than an exhaustive analysis.

-3-

Of course, determining whether a given fact holds at a given point may require determining whether other, related facts hold at other points (and those other facts may not be facts of interest in the sense of the second bullet-point above). It is desirable, however, for a demand dataow-analysis algorithm to minimize the amount of such auxiliary information computed. Certainly the worst-case cost of a demand dataowanalysis algorithm (for one demand) should be no worse than the worst-case cost of the best exhaustive algorithm. Furthermore, it is desirable that the information computed in response to one demand be reusable, so as to minimize the cost of a sequence of demands; we call algorithms that are able to reuse information in this way caching demand algorithms. Ideally, the worst-case total cost of the sequence of demands that produces complete dataow information should be no worse than the worst-case cost of a single run of the best possible exhaustive algorithm; we call this the same-worst-case-cost property. Since no non-trivial lower bounds (other than undecidability results) are currently known for dataow analysis, it is not possible to determine whether a demand algorithm has the same-worst-case-cost property; however, it is possible to determine whether a demand algorithm has this property with respect to a particular exhaustive algorithm. This paper presents a new caching demand algorithm for interprocedural dataow analysis. The new algorithm, which is an improved version of the one reported in [Hor95], has four important properties: It provides precise (meet-over-all-interprocedurally-valid-paths) solutions to a large class of problems. It has a polynomial worst-case cost for both a single demand and a sequence of all possible demands. It has the same-worst-case-cost property with respect to the exhaustive algorithm given in [Rep95]. That algorithm is currently the best exhaustive algorithm for the class of dataow problems that can be handled precisely by our demand algorithm: the IFDS problems dened in Section 2.1 (i.e., of the exhaustive algorithms that can handle all IFDS problems, the one given in [Rep95] has the best asymptotic worst-case running time). Experimental results show that in many situations (e.g., when only a small number of demands are made, or when most demands are answered yes) the demand algorithm is faster than the algorithm from [Rep95]. The remainder of the paper is organized as follows: Section 2 provides background material. First, the class of dataow-analysis problems that can be handled by our algorithm is dened. Second, we show how to transform a dataow-analysis problem in this class into a special kind of graph-reachability problem. Section 3 presents our new algorithm, which solves demands for dataow-analysis information by solving equivalent graph-reachability demands. Experimental results on C programs are reported in Section 4. Section 5 discusses related work.

-4-

2. Background 2.1. The IFDS Dataow Framework The algorithm given in Section 3 can be used to solve any interprocedural dataow-analysis problem in which the dataow facts form a nite set D, and the dataow functions (which are of type 2D 2D ) distribute over the meet operator (either union or intersection). We call this class of problems the interprocedural, nite, distributive, subset problems, or IFDS problems, for short. The IFDS problems include all locally separable problemsthe interprocedural versions of classical bit-vector or gen-kill problems (e.g., reaching denitions, available expressions, and live variables)as well as non-locally-separable problems such as truly-live variables [Gie81], copy-constant propagation [Fis88, pp. 660], and possiblyuninitialized variables. The IFDS framework was dened in [Rep95], where we presented an efcient exhaustive algorithm for solving IFDS problems. That denition is summarized below. The IFDS framework is a variant of Sharir and Pnuelis functional approach to interprocedural dataow analysis [Sha81], with an extension similar to the one given by Knoop and Steffen in order to handle programs in which recursive procedures have local variables and parameters [Kno92]. These frameworks generalize Kildalls concept of the meet-over-all-paths solution of an intraprocedural dataowanalysis problem [Kil73] to the meet-over-all-valid-paths solution of an interprocedural dataowanalysis problem. In Kildalls framework, an instance of a dataow-analysis problem consists of a bounded lower semilattice (the dataow information) with meet operator , a owgraph (representing the program), and an

assignment of dataow functions to the edges of the owgraph. If all of the dataow functions are distributive, Kildalls algorithm computes the meet-over-all-paths solution to the problem instance. Similarly, in the IFDS framework, an instance of a dataow-analysis problem (or IFDS problem, for short) consists of the following: A nite set D (the dataow information). A meet operator . The algorithm given in Section 3 requires that the meet operator be union. How-

ever, the algorithm can still be used to solve problems for which the meet operator is intersection: such problems can always be transformed to a complementary problem in which the meet operator is union, and the algorithm can then be applied. A supergraph G * (a collection of owgraphs, one for each procedure). In supergraph G * , a procedure call is represented by two nodes, a call node and a return-site node. In addition to the ordinary intraprocedural edges that connect the nodes of the individual owgraphs, for each procedure call represented by call-node c and return-site node rG * has three edges: an intraprocedural call-toreturn-site edge from c to r; an interprocedural call-to-start edge from c to the start node of the called procedure; an interprocedural exit-to-return-site edge from the exit node of the called procedure to r.

-5-

(The call-to-return-site edges are included so that the IFDS framework can handle programs with local variables and parameters; the dataow functions on call-to-return-site and exit-to-return-site edges permit the information about local variables and value parameters that holds at the call site to be combined with the information about global variables and reference parameters that holds at the end of the called procedure.) An assignment of distributive dataow functions (of type 2D 2D ) to the edges of the supergraph. Given an instance of an IFDS problem, a dataow fact d D, and a owgraph node n, the demand algorithm given in Section 3 determines whether fact d is in the meet-over-all-valid-paths solution at node n. The distinction between meet-over-all-paths and meet-over-all-valid-paths is necessary to capture the idea that not all paths in G * represent potential execution paths. A valid path is one that respects the fact that a procedure always returns to the site of the most recent call. To understand the algorithm of Section 3, it is useful to distinguish further between a same-level valid path (a path in G * that starts and ends in the same procedure, and in which every call has a corresponding return) and a valid path (a path that may include one or more unmatched calls). Example. Figure 1 shows an example program and its supergraph G * . In G * , the path startmain n1 n2 startp n4 exitp n3 is a (same-level) valid path; the path startmain n1 n2 startp n4 is a non-same-level valid path (because the call-to-start edge n2 startp has no matching exit-to-returnsite edge); the path startmain n1 n2 startp n4 exitp n8 is not a valid path because the exit-to-return-site edge exitp n8 does not correspond to the preceding call-to-start edge n2 startp . In Figure 1, the supergraph is annotated with the dataow functions for the possibly-uninitialized variables problem. The possibly-uninitialized variables problem is to determine, for each node n in G * , the set of program variables that may be uninitialized just before execution reaches n. A variable x is possibly uninitialized at n either if there is an x-denition-free valid path from the start of the program to n, or if there is a valid path from the start of the program to n on which the last denition of x uses some variable y that itself is possibly uninitialized. For example, the dataow function associated with edge n6 n7 shown in Figure 1 adds a to the set of possibly-uninitialized variables after node n6 if either a or g is in the set of possibly-uninitialized variables before node n6. The IFDS framework can be used for languages with a variety of features (including procedure calls, parameters, global and local variables, and pointers). Encoding a problem in the IFDS framework may in

-6-

declare g: integer
S.S<x/a>

procedure main begin declare x: integer read(x) call P (x) end

start main ENTER main S.{x,g} n1 READ(x)

start P ENTER P S.S n4 IF a > 0 S.S n5

procedure P (value a : integer) begin S.S{x} if (a > 0) then read(g) a := a g call P (a) print(a, g) S.S{g} end
S.S

n2 CALL P S.S n3 RETURN FROM P S.S

READ(g) S.S{g} n6 a := a g S.if (a S) or (g S) then SU {a} else S{a}

n7 CALL P exit main EXIT main

S.S{g} n8 RETURN FROM P S.S{a} n9 S.S{a} PRINT(a,g) S.S exitP EXIT P S.S

(a) Example program

(b) Its supergraph G *

Figure 1. An example program and its supergraph G * . The supergraph is annotated with the dataow functions for the possibly-uninitialized variables problem. The notation S<x/a> denotes the set S with x renamed to a.

some cases involve a loss of precision; for example, in languages with pointers there may be a loss of precision for problem instances in which there is aliasing. Once a problem has been encoded in the IFDS framework, the demand algorithm presented in this paper provides (with no further loss of precision) an efcient way to determine whether a particular dataow fact is in the meet-over-all-valid-paths solution to the problem. 2.2. From Dataow-Analysis Problems to Realizable-Path Reachability Problems In this section, we show how to convert IFDS problems to realizable-path graph-reachability problems. This is done by transforming an instance of an IFDS problem (a supergraph G * in which each edge has an associated distributive function in 2D 2D ) into an exploded supergraph G # , in which each node n, d represents dataow fact d D at supergraph node n, and each edge represents a dependence between dataow facts at different supergraph nodes.

-7The key insight behind this explosion is that a distributive function f in 2D 2D can be represented using a graph with 2 D + 2 nodes; this graph is called fs representation relation. Half of the nodes in this graph represent fs input; the other half represent its output. D of these nodes represent the individual dataow facts that form set D, and the remaining node (which we call 0) essentially represents the empty set. An edge 0 d means that d is in f (S) regardless of the value of S (in particular, d is in f ()). An edge d 1 d 2 means that d 2 is not in f (), and is in f (S) whenever d 1 is in S. Every graph includes the edge 0 0; this is so that functional composition corresponds to compositions of representation relations (this is explained below). Example. The main procedure shown in Figure 1 has two variables, x and g. Therefore, the representation relations for the dataow functions associated with this procedure will each have six nodes. The function associated with the edge from startmain to n1 is S.{x, g}; that is, variables x and g are added to the set of possibly-uninitialized variables regardless of the value of S. The representation relation for this function is:
0 x g

0 x g

The representation relation for the function S.S {x} (which is associated with the edge from n1 to n2) is shown below. Note that x is never in the output set, and g is there iff it is in S.
0 x g

0 x g

A functions representation relation correctly captures the functions semantics in the sense that the representation relation can be used to evaluate the function. In particular, the result of applying function f to input S is the union of the values represented by the output nodes in fs representation relation that are the targets of edges from the input nodes that represent either 0 or a node in S. For example, consider applying the dataow function S.S {x} to the set {x} using the representation relation shown above. There is no edge out of the initial x node, and the only edge out of the initial 0 node is to the nal 0 node, so the result of this application is . The result of applying the same function to the set {x, g} is {g}, because there is an edge from the initial g node to the nal g node.

-8-

The composition of two functions is represented by pasting together the graphs that represent the individual functions. For example, the composition of the two functions discussed above: S.S {x} S.{x, g}, is represented as follows:
0 x g

0 x g

0 x g

Paths in a pasted-together graph represent the result of applying the composed functions. For example, there is a path in the graph shown above from the initial 0 node to the nal g node. This means that g is in the nal set regardless of the value of S to which the composed functions are applied. There is no path from an initial node to the nal x node; this means that x is not in the nal set, regardless of the value of S. To understand the need for the 0 0 edges in the representation relations, consider composing the two example functions in the opposite order: S.{x, g} S.S {x}. This function composition is represented as follows:
0 x g

0 x g

0 x g

Note that both x and g are in the nal set regardless of the value of S to which the composed functions are applied. This is reected in the graph shown above by the paths from the initial 0 node to the nal x and g nodes. However, if there were no edge from the initial 0 node to the intermediate 0 node, there would be no such paths, and the graph would not correctly represent the composed functions. Returning to the denition of the exploded supergraph G # : Each node n in supergraph G * is exploded into D + 1 nodes in G # , and each edge mn in G * is exploded into the representation relation of the function associated with mn. In particular:

-9For every node n in G * , there is a node n, 0 in G # .

(i)

(ii) For every node n in G * , and every dataow fact d D, there is a node n, d in G # . Given function f associated with edge m n of G * : (iii) There is an edge in G # from node m, 0 to node n, d for every d f (). (iv) There is an edge in G # from node m, d 1 to node n, d 2 for every d 1 , d 2 such that d 2 f ({ d 1 }) and d 2 f (). / (v) There is an edge in G # from node m, 0 to node n, 0. Because pasted together representation relations correspond to function composition, a path in the exploded supergraph from node n, d 1 to node m, d 2 means that if dataow fact d 1 holds at supergraph node n, then dataow fact d 2 will hold at node m. By looking at paths that start from node startmain , 0 (which represents the fact that no dataow facts hold at the start of procedure main) we can determine which dataow facts hold at each node. However, recall that we are not interested in all paths in the exploded supergraph; we are only interested in those that correspond to valid paths in the supergraph G * . We call those paths in G # its realizable paths; similarly, we call a path in G # that corresponds to a samelevel valid path in G * a same-level realizable path. [Rep94a] includes a proof that dataow fact d holds at supergraph node n iff there is a realizable path in G # from node startmain , 0 to node n, d. Example. The exploded supergraph that corresponds to the instance of the possibly-uninitialized variables problem shown in Figure 1 is shown in Figure 2. The dataow functions are replaced by their representation relations. In Figure 2, closed circles represent nodes that are reachable along realizable paths from startmain , 0. Open circles represent nodes not reachable along realizable paths. (For example, note that nodes n8, g and n9, g are reachable only along non-realizable paths from startmain , 0.) As stated above, this information indicates the nodes values in the meet-over-all-valid-paths solution to the dataow-analysis problem. For instance, the meet-over-all-valid-paths solution at node exitp is the set {g}. (That is, variable g is the only possibly-uninitialized variable just before execution reaches the exit node of procedure p.) In Figure 2, this information can be obtained by determining that there is a realizable path from startmain , 0 to exitp , g, but not from startmain , 0 to exitp , a. 3. A Demand Algorithm for IFDS Problems In this section, we show how to solve demand IFDS problems by solving equivalent realizable-path reachability demands. The algorithm, called the Demand-Tabulation Algorithm, is presented in Figure 3. The top-level function of the algorithm is called IsMemberOfSolution. The call IsMemberOfSolution(n, d ) returns true iff there is a realizable path from node startmain , 0 to node n, d in G # . Such a path exists iff the meet-over-all-valid-paths solution to the dataow-analysis problem at node n of G * includes dataow fact d.

- 10 -

0 x g

main ENTER main

start

start P ENTER P

0 a g

0 x g

n1 READ(x)

n4 IF a > 0

0 a g

n5 0 x g n2 CALL P n6 a := a g READ(g)

0 a g

0 a g

0 x g

n3 RETURN FROM P

n7 CALL P 0 x g main EXIT main exit

0 a g

n8 RETURN FROM P

0 a g

n9 PRINT(a,g)

0 a g

P EXIT P

exit

0 a g

Figure 2. The exploded supergraph that corresponds to the instance of the possibly-uninitialized variables problem shown in Figure 1. Closed circles represent nodes of G # that are reachable along realizable paths from startmain , 0. Open circles represent nodes not reachable along such paths.

- 11 declare G # = (N # , E # ): global exploded supergraph PathEdge, SummaryEdge: global edge set, initially empty ReachableNodes: global node set, initially {n, 0 | n N * } VisitedNodes: global node set, initially empty function IsMemberOfSolution(n, d : exploded supergraph node) returns boolean declare en: exploded supergraph node or Failure begin en = BackwardDFS(n, d ) if en = Failure then return(false) else UpdateReachableNodes(en) return(true) end

/* These sets are preserved across calls */ /* This set is preserved across calls */ /* This set is preserved across calls */

[1] [2] [3] [4] [5] [6] [7]

[8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35]

[36] [37] [38] [39] [40] [41] [42] [43] [44]

function BackwardDFS(n, d ) returns exploded supergraph node or Failure declare EdgeWorkList: edge set; NodeStack: node stack begin push n, d onto NodeStack while NodeStack is not empty do pop a node n, d from NodeStack if n, d ReachableNodes then return(n, d) else if n, d VisitedNodes then / insert n, d into VisitedNodes switch n case n is a return-site node : let c be the call node that corresponds to n, and let p be the procedure called at c EdgeWorkList := for each d such that exitp , d n, d E # do Propagate(exitp , d exitp , d, EdgeWorkList) od BackwardTabulateSLRPs(EdgeWorkList) for each d such that c, d n, d (E # SummaryEdge) and c, d VisitedNodes do push c, d onto NodeStack od / end let end case case n is the start node of procedure p : for each c Callers(p) do for each d such that c, d n, d E # and c, d VisitedNodes do push c, d onto NodeStack od / od end case default : for each m, d such that m, d n, d E # and m, d VisitedNodes do push m, d onto NodeStack od / end case end switch od return(Failure) end procedure UpdateReachableNodes(n 0 , d 0 : exploded supergraph node) declare NodeWorkList: node set = declare n, d, m, d: exploded supergraph node begin insert n 0 , d 0 into NodeWorkList while NodeWorkList do Select and remove an exploded supergraph node n, d from NodeWorkList insert n, d into ReachableNodes remove n, d from VisitedNodes for each m, d such that (n, d m, d (E # SummaryEdge)) and n, d m, d is not an exit-to-return-site edge, and (m, d VisitedNodes) and (m, d ReachableNodes) do / Insert m, d into NodeWorkList od od end

Figure 3. The Demand-Tabulation Algorithm determines whether dataow fact d holds at owgraph node n. Procedures BackwardTabulateSLRPs and Propagate are given in Figure 4.

- 12 -

IsMemberOfSolution consists of a backward phase (performed by function BackwardDFS) followed by a forward phase (performed by procedure UpdateReachableNodes). BackwardDFS performs a backward depth-rst search of G # starting from demand node n, d , to determine whether the demand node can be reached via a realizable path from node startmain , 0. UpdateReachableNodes is called only when BackwardDFS is successful. The purpose of UpdateReachableNodes is to update two sets, ReachableNodes and VisitedNodes, that are maintained across calls to IsMemberOfSolution in order to prevent repeating work done on a previous call. The ReachableNodes and VisitedNodes sets are used and maintained as follows: ReachableNodes An exploded-graph node n, d is placed in set ReachableNodes when it has been determined that there is a realizable path from startmain , 0 to n, d. Before the rst call on IsMemberOfSolution is performed, ReachableNodes is initialized to { n, 0 } for all supergraph nodes n. As soon as function BackwardDFS encounters a node n, d that is in ReachableNodes, the backward depth-rst search is terminated, and node n, d is returned (line 12). (The fact that n, d is reachable via a realizable path from startmain , 0 together with the fact that BackwardDFS only visits nodes from which there is a realizable path to the demand node n, d , means that there is a realizable path from startmain , 0 to n, d .) Procedure UpdateReachableNodes starts at the node n, d returned by BackwardDFS, and performs a forward traversal of G # , following only the edges that were traversed backwards by BackwardDFS. All of the nodes that it encounters are reachable from startmain , 0 via a realizable path, so they are added to ReachableNodes. (This update of ReachableNodes is part of what makes the DemandTabulation Algorithm a caching algorithm. The algorithm would still be correct if UpdateReachableNodes followed all edges other than exit-to-return-site edges, but it would increase the time required for a single demand.) VisitedNodes Between invocations of IsMemberOfSolution, the exploded-graph nodes in set VisitedNodes are those for which it has been determined that there is no realizable path from startmain , 0. During an invocation of IsMemberOfSolution, nodes visited for the rst time are added to this set (line 14); those determined to be reachable from startmain , 0 by a realizable path are transferred from this set to the ReachableNodes set by procedure UpdateReachableNodes (lines 39 and 40). Note that when BackwardDFS returns Failure, none of the nodes that have been added to VisitedNodes by BackwardDFS are reachable from startmain , 0, so it is not necessary for IsMemberOfSolution to call UpdateReachableNodes. An interesting aspect of BackwardDFS is how it ensures that only nodes from which there is a realizable path to demand node n, d are visited. This is accomplished by the call to BackwardTabulateSLRPs at

- 13 line 20, which occurs when the node n, d popped from the stack corresponds to a return-site node (i.e., n is a return-site node in G * ). The purpose of BackwardTabulateSLRPs is to nd summary edges, which represent transitive dependences due to procedure calls: A summary edge of the form c, d 1 r, d 2 (where c is a call node and r is the matching return-site node) represents a same-level realizable path from c, d 1 to r, d 2 . Summary edges are recorded in the (global) set named SummaryEdge. After calling BackwardTabulateSLRPs, BackwardDFS can continue its backward traversal across the newly discovered summary edges (line 21).

[45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72]

[73] [74] [75] [76] [77] [78] [79]

declare G # = (N # , E # ): global exploded supergraph declare PathEdge, SummaryEdge: global edge set, initially empty /* These sets are preserved across calls */ procedure BackwardTabulateSLRPs(EdgeWorkList: edge set) begin while EdgeWorkList do Select and remove an edge n, d 1 exitp , d from EdgeWorkList if (n, d 1 is startp , 0) or (startp , 0 exitp , d PathEdge) then / switch n case n a return-site node : let c be the call node that corresponds to n, and q be the procedure called at c for each d 2 such that exitq , d 2 n, d 1 E # do Propagate(exitq , d 2 exitq , d 2 , EdgeWorkList) od for each d 2 such that c, d 2 n, d 1 (E # SummaryEdge) do Propagate(c, d 2 exitp , d, EdgeWorkList) od end let end case case n the start node of procedure p : for each c Callers(p) do let q be cs procedure, and r be the return-site node that corresponds to c for each d 3 , d 4 such that c, d 4 n, d 1 E # and exitp , d r, d 3 E # do if c, d 4 r, d 3 SummaryEdge then / Insert c, d 4 r, d 3 into SummaryEdge for each d 2 such that r, d 3 exitq , d 2 PathEdge do Propagate(c, d 4 exitq , d 2 , EdgeWorkList) od od end let od end case default : for each m, d 2 such that m, d 2 n, d 1 E # do Propagate(m, d 2 exitp , d, EdgeWorkList) od end case end switch od end procedure Propagate(n, d 1 exitp , d: edge, EdgeWorkList: edge set) begin if d 1 is 0 then n := startp if n, d 1 exitp , d PathEdge then / Insert n, d 1 exitp , d into PathEdge Insert n, d 1 exitp , d into EdgeWorkList end

Figure 4. Procedure BackwardTabulateSLRPs nds summary edges and records them in set SummaryEdge.

- 14 -

BackwardDFS calls three auxiliary subprograms: Callers, Propagate, and BackwardTabulateSLRPs. Function Callers(p) returns the set of call nodes that represent calls on p; procedures Propagate and BackwardTabulateSLRPs are shown in Figure 4. As discussed above, the purpose of BackwardTabulateSLRPs is to nd summary edges, and to record them in the set named SummaryEdge. In order to do this, BackwardTabulateSLRPs nds path edges (which represent same-level realizable paths in G # ) whose targets are nodes of the form exitp , d (i.e., nodes of G # that correspond to exit nodes of G * ). It records all such path edges in the (global) set named PathEdge. Procedure BackwardTabulateSLRPs is a worklist algorithm that starts with an initial worklist containing a set of zero-length path edges (edges of the form exitp , d exitp , d); on each iteration of the main loop it deduces the existence of additional path edges and summary edges. In terms of leading quickly to a yes answer to a demand, the best thing that can happen in BackwardTabulateSLRPs is to discover a path edge whose source is a 0 node (i.e., a path edge of the form n, 0 exitp , d). In this case, the answer to the current demand is guaranteed to be yes. However, BackwardTabulateSLRPs cannot simply quit, because it is vital that the PathEdge and SummaryEdge sets be left in a consistent state to ensure that subsequent calls to IsMemberOfSolution return the correct answer. In particular, BackwardTabulateSLRPs must nish nding all path edges whose targets are some node other than exitp , d. On the other hand, there is no need to process any more path edges to exitp , d. Therefore, on discovering such an edge, BackwardTabulateSLRPs inserts the path edge startp , 0 exitp , d into PathEdge and into the worklist (lines 74, 77 and 78). This situation is illustrated below. The solid bold arrow represents the path edge whose source is a 0 node, and the dotted bold arrow represents the new path edge that is inserted into PathEdge and the worklist.
< startp , 0 > < exitp , d >

<n, 0 >

Furthermore, when a path edge is taken off the worklist (line 46) it is processed only if it is itself of the form startp , 0 exitp , d, or if that path edge has not yet been discovered. The congurations that are used by BackwardTabulateSLRPs to deduce the existence of path edges and summary edges are depicted in Figure 5. The rst two diagrams of Figure 5 correspond to the case where n is a return-site node; the next two diagrams correspond to the case where n is a start node; and the nal diagram corresponds to the default case. In Figure 5, the bold dotted arrows represent edges that are inserted into sets PathEdge and SummaryEdge if they were not previously in those sets.

- 15 -

p o < exitp, d >

p o < exitp, d >

o < n, d > 1

< c, d2 > o

o < n, d >
1

< exitq, d 2 >

Line 51

Line 52

q o < exitq, d2>

< c, d4 > o

o < r, d >
3

< c, d4 >

o < r, d >
3

< n, d >
1

o < exit , d >


p

< n, d > o 1 p

o < exit , d > p

Line 60

Line 61

o < exit , d > p

E # edge corresponding to a controlflowgraph edge E # edge corresponding to a calltoreturnsite edge or edge in SummaryEdge

o < n, d >
1

E # edge corresponding to a calltostart or exittoreturnsite edge edge in PathEdge (possibly new) edge in PathEdge (possibly new) edge in SummaryEdge

< m, d2 > o

Line 68

Key

Figure 5. These ve diagrams show how procedure BackwardTabulateSLRPs deduces the existence of new path and summary edges.

- 16 Example. When IsMemberOfSolution is called with the exploded supergraph node n9, g from the example shown in Figure 2, (i.e., the demand Might g be uninitialized at node n9? is made), the following steps are performed (all line numbers refer to lines in Figure 3): 1. BackwardDFS is called with node n9, g. 2, Node n9, g is pushed onto NodeStack at line 8, and then popped off (into n, d) at line 10. 3. Node n9, g is inserted into VisitedNodes at line 14. The default case of the switch (line 29) is taken, and node n8, g is pushed onto NodeStack. 4. Node n8, g is popped from NodeStack; n8 is a return-site node, so the case on line 16 is selected, and BackwardTabulateSLRPs is called for the rst time. This causes summary edge n7, g n8, g to be inserted into SummaryEdge (and causes several other edges to be inserted into PathEdge). 5. Node n7, g is pushed onto NodeStack at line 21. 6. Node n7, g is popped from NodeStack; the default case is taken, and node n6, g is pushed onto NodeStack. 7. Node n6, g is popped from NodeStack; the default case is taken, but there are no edges that satisfy the for-loop condition (line 30). 8. NodeStack is now empty, so BackwardDFS returns Failure, and IsMemberOfSolution returns false. 3.1. Cost Of The Demand-Tabulation Algorithm In this section we discuss the time and space requirements of the Demand-Tabulation Algorithm. To express these costs in terms of the size of the (unexploded) supergraph, we will use the following parameters: N E Call D the number of nodes in supergraph G * the number of edges in supergraph G * the number of call nodes in supergraph G * the size of set D

The maximum number of exploded supergraph and summary edges (and thus, the worst-case time and space requirements of the Demand-Tabulation Algorithm) varies depending on what class of dataowanalysis problems is being solved. There are two interesting sub-classes of the distributive dataowanalysis problems: the h-sparse problems and the locally separable problems. Denition 3.1. A problem is h-sparse if all problem instances have the following property: For each function f on an ordinary intraprocedural edge or a call-to-return-site edge of G * , the number of edges in G # that represent function f, excluding edges that emanate from the 0 node, is at most hD. In general, when the nodes of G * represent individual statements and predicates (rather than basic blocks), and when there is no aliasing, we expect most distributive problems to be h-sparse (with h < D): < Each statement changes only a small portion of the execution state, and accesses only a small portion of the state as well. Therefore, the dataow functions, which are abstractions of the statements semantics,

- 17 should be close to the identity function. The identity function is represented using D + 1 edges; thus, the number of edges needed to represent each dataow function should be roughly D. Example. When the nodes of G * represent individual statements and predicates, and there is no aliasing, every instance of the possibly-uninitialized variables problem is 2-sparse. The only non-identity dataow functions are those associated with assignment statements. The outdegree of every non-0 node in the representation of such a function is at most two: a variables initialization status can affect itself and at most one other variable, namely the variable assigned to. Denition 3.2. A problem is locally separable if all problem instances have both of the following properties: Intraprocedural dataow functions have only component-wise dependences: For each function f on an ordinary intraprocedural edge or a call-to-return-site edge of G * , for each dataow fact d, either d is not in f (S) for any S, or d is in f (S) for all S, or d is in f (S) iff d is in S. In other words, while there is no restriction on the number of out-going edges from the initial 0 node of a functions representation relation, every other initial node d must either have no out-going edges, or a single out-going edge to nal node d. Corresponding calls and returns have related dataow functions: If the representation relation for the dataow function associated with a call-to-start edge c s includes the edge c, d 1 s, d 2 , then the representation relation for the dataow function associated with the corresponding exit-to-return-site edge e r either includes the edge e, d 2 r, d 1 , or exploded node e, d 2 has no outgoing edge. The locally separable problems are the interprocedural versions of the classical separable problems from intraprocedural dataow analysis (also known as gen/kill or bit-vector problems). All locally separable problems are 1-sparse, but not vice versa. Another parameter that affects the running time of the Demand-Tabulation Algorithm is the bandwidth for the transmission of dataow information between procedures. In particular, the times given here rely on the fact that it is always possible to construct G # so the maximum outdegree of a non-0 node in a callto-start edges representation relation, and the maximum indegree of a non-0 node in an exit-to-return-site edges representation relation are both 1. (See the Appendix of [Rep95] for a more complete discussion of this issue.) The implementation reported in Section 4 constructs G # so that these properties hold. The table in Figure 6 summarizes the worst-case size of the exploded supergraph G # (in terms of the number of exploded edges) as well as the worst-case number of summary edges that might be added by the Demand-Tabulation Algorithm for distributive, h-sparse, and locally-separable dataow-analysis problems (the number of summary edges added in the worst-case is the same for a single demand and for a sequence of all possible demands). In practice, we have found that the actual numbers are much smaller than those

- 18 -

Class of functions Distributive

Graph-theoretic characterization of the dataow functions properties Up to O (D 2 ) edges/representation-relation At most O (hD) edges/representation-relation O (D) edges/representation-relation

Number of edges in G #

Number of added summary edges

h-sparse
Locally separable

O (ED 2 ) O (hED) O (ED)

O (CallD 2 ) O (CallD 2 ) O (CallD)

Figure 6. Worst-case space requirements for the exploded supergraph for three different classes of dataow-analysis problems.

in this table (see Figure 9). The table in Figure 7 summarizes the worst-case times required for the Demand-Tabulation Algorithm for six different classes of problems. In each case, the time given is the worst-case time for a single demand. The details of the analysis of the running time of the DemandTabulation Algorithm can be found in [Rep94a]. The most efcient exhaustive algorithm known for the class of IFDS problems is the one given in [Rep95]. Its worst-case running times are almost identical to the times given in Figure 7; the only difference is that for an intraprocedural, locally separable problem, the bound for the exhaustive algorithm is O (ED), while the bound for the Demand-Tabulation Algorithm is O (E). The similarity in the worst-case running times of the two algorithms reects the fact that (theoretically) a dataow fact at one point might depend on all other facts at all other points. In practice, however, we have found that the DemandTabulation Algorithm (applied to a single demand) is much faster than the exhaustive algorithm (see Figure 11). 3.2. The Same-Worst-Case-Cost Property We have designed the Demand-Tabulation Algorithm so that it has the same-worst-case-cost property with respect to the exhaustive algorithm of [Rep95]. In particular, a call to IsMemberOfSolution can re-use the sets ReachableNodes, VisitedNodes, PathEdge, and SummaryEdge, whose values are preserved across calls. When the Demand-Tabulation Algorithm is used with a request sequence that places demands on all

Class of functions Distributive

Asymptotic running time Intraprocedural Interprocedural problems problems

h-sparse
Locally separable

O (ED 2 ) O (hED) O (E)

O (ED 3 ) O (Call D 3 + hED 2 ) O (ED)

Figure 7. Asymptotic running time of the Demand-Tabulation Algorithm (for answering a single demand) for six different classes of dataow-analysis problems.

- 19 -

nodes of G # , BackwardDFS and UpdateReachableNodes will each traverse a given edge in G # at most once during the processing of the request sequence. BackwardTabulateSLRPs will traverse a given summary edge or an edge of G # in procedure p at most D times: once for each node of the form exitp , d. (The information accumulated in sets PathEdge and SummaryEdge prevents procedure BackwardTabulateSLRPs from performing additional work, and the information accumulated in ReachableNodes and VisitedNodes prevents BackwardDFS and UpdateReachableNodes from performing additional work.) In general, this is bounded by O (E D 3 ), which is the same amount of work that could be performed in the worst case by the exhaustive algorithm given in [Rep95]. Thus, the Demand-Tabulation Algorithm has the same-worst-case-cost property with respect to the exhaustive algorithm. While this is an important property, it does not, of course, mean that the Demand-Tabulation Algorithm will always outperform the exhaustive algorithm. First, the constant factors are different for the two algorithms. Second, there will be problem instances for which the exhaustive algorithm will not achieve its worst-case cost. Therefore, there will be times when the exhaustive algorithm will outperform the Demand-Tabulation Algorithm (see Figure 12). 4. Experimental Results 4.1. Background to the Experiments We have carried out two experiments to compare the performance of the Demand-Tabulation Algorithm to that of the exhaustive algorithm of [Rep95], and two further experiments to study the trade-offs between the benet and overhead of the caching performed by the Demand-Tabulation Algorithm. In all of our reported results, running times reect the trimmed mean of ve data points (i.e., all experiments were run ve times, and the average running times were computed after discarding the high and low values). Three different analysis algorithms were used in the study: (1) the Demand-Tabulation Algorithm, as described above, (2) a non-caching version of the Demand-Tabulation Algorithm (that returns true as soon as it visits a node of the form n, 0, reinitializes the set VisitedNodes to after each invocation of IsMemberOfSolution, does not maintain the set ReachableNodes, but does preserve the sets PathEdge and SummaryEdge across calls to IsMemberOfSolution), 1 and (3) the exhaustive algorithm reported in [Rep95]. The three algorithms described above were implemented in C and used with a front end that analyzes a C program, builds the programs control-ow graph, and then generates the corresponding exploded supergraph for ve dataow-analysis problems:

The non-caching algorithm does not have the same-worst-case-cost property with respect to the exhaustive algorithm. In the worst case, on a request sequence that places demands on all nodes of G # , the non-caching algorithm could perform as much as (N E D 3 ) work, which is worse than the O (E D 3 ) bound on the work performed by the exhaustive algorithm.

- 20 -

Possibly-Uninitialized Variables This is the problem that we have used as our running example. Simple Uninitialized Variables This is the locally separable version of the possibly-uninitialized variables problem, in which a variable is considered to be initialized whenever it is the target of an assignment, regardless of whether the right-hand-side expression includes possibly-uninitialized variables. (So every simple uninitialized variable is also possibly uninitialized, but not vice versa.) Live Variables This is the standard, locally separable problem in which variable x is considered to be live at supergraph node n iff there is a path from n to the end of the program on which x is used before being dened. It is useful to identify assignments to non-live variables: Programming tools might ag them as indicating possible logical errors, and optimizing compilers can use this information to perform dead-code elimination (i.e., by removing such assignments). Truly Live Variables This is a non-locally-separable (and more accurate) version of the live-variables problem in which variable x is considered to be truly live at supergraph node n iff there is a path from n to the end of the program on which x is used in a truly live context before being dened, where a truly live context means: in a predicate, or in a call to a library routine, or in an expression whose value is assigned to a truly live variable [Gie81]. Because it is non-locally-separable, the truly-live-variables problem is in some sense a harder problem than the live-variables problem; its results are also more accurate (every truly live variable is also live, but not vice versa) and thus, for example, can lead to more opportunities for deadcode elimination. Some assignments to variables that are live but not truly live can be discovered by repeatedly solving the live-variables problem and removing assignments to non-live variables until no more assignments to non-live variables are found. However, there are two potential disadvantages to solving the live variables problem rather than the truly live variables problem: the problem needs to be solved repeatedly, and in the presence of cycles in the control-ow graph, there may be assignments to non-truly-live variables that are never discovered (and thus cannot be reported as logical errors or removed). Constant Predicates This is a non-locally-separable problem that seeks to determine, for every predicate that consists of a single identier, whether that predicate is guaranteed to have a constant value (either truenonzeroor falsezero). To do this, it performs a simple kind of copy-constant propagation, tracking, for every scalar variable x, whether x might be non-zero, zero, or (an unknown value). Given a predi-

cate that consists of just the identier x, if the dataow fact <x, non-zero> holds at that points, while neither fact <x, zero>, nor fact <x, > holds at that point, then the predicate is guaranteed to be true

(and similarly, it is possible to determine when the predicate is guaranteed to be false).

- 21 -

In our experiments, procedure calls via pointers to procedures, and aliasing due to pointers were handled by our C front end as follows: For all of the dataow-analysis problems, every call via a pointer was considered to be a possible call to every procedure of an appropriate type that was passed as a parameter or whose value was assigned to a variable somewhere in the program. For all of the dataow-analysis problems, every memory write via a pointer was considered to be a possible write of every piece of heap-allocated storage and of every variable to which the address-of operator (&) was applied somewhere in the program. For the live variables, truly live variables, and constant-predicates problems, every memory read via a pointer was considered to be a possible read of every piece of heap-allocated storage and of every variable to which the address-of operator was applied somewhere in the program. For the possibly-uninitialized variables problem, memory reads were considered to read only the value of the pointer itself. This is because the results of this analysis are suitable for providing feedback to the programmer rather than for guiding an optimizing compiler; it is more important to avoid overwhelming the programmer by reporting hundreds of possibly-uninitialized variables than to be sure that absolutely every possibly-uninitialized variable has been reported. (For the simple uninitializedvariables problem, reads via pointers are irrelevant, since a variable that is the target of an assignment is considered to be initialized regardless of which variables are used to compute the assigned value.) Of course, the results of the two live-variable analysis problems and of the constant-predicates problem might be improved if we rst did a pointer analysis and then used the results of that analysis in setting up the dataow functions (rather than treating pointers as described above). However, it is interesting to note that even with this very simple treatment of pointers we are able to identify a signicant number of assignments to dead variables (see Figure 8) and some constant predicates. Furthermore, the goal of our experiments was to compare the performance of the caching-demand, non-caching-demand, and exhaustive algorithms. We were looking for insights about the characteristics of a dataow problem that predict which algorithm will be best; these characteristics should be independent of the particular problems used, or how they were dened. Tests were carried out on a Sun SPARCstation 20 Model 61 with 128 MB of RAM. The study used 53 C programssome standard UNIX utilities, some programs from the SPEC integer benchmark suite [92], and some programs used for benchmarking in previous studies [Lan93,Aus94]. For each program, the table in Figure 8 gives the number of lines of preprocessed source code (with blank lines removed), the parameters that characterize the size of the control-ow graphs (number of procedures, number of call sites, number of control-ow graph nodes), and, for each of the ve dataow-analysis problems, the number of interesting program-point/dataow-fact pairs (see the next paragraph), and the number of these pairs that are in the meet-over-all-valid-paths solution (i.e., if demands are made for all interesting program-point/dataow-fact pairs, this is the number of demands that would be answered yes).

- 22 -

Recall that demand analysis is potentially preferable to exhaustive analysis whenever the full set of all dataow facts at all points is not required. In this case, it may be more efcient to use a demand algorithm, issuing demands only for the program-point/dataow-fact pairs of interest. To test whether this is true in practice, one of our experiments compares the time required by the exhaustive algorithm to the time required by the Demand-Tabulation Algorithm to answer all interesting demands, where interesting is dened for each of our dataow problems as follows: For the two versions of the uninitialized-variables problem, every use of a scalar variable x gives rise to the demand might x be uninitialized here?; for the two versions of the live variables problem, every assignment to a scalar variable x gives rise to the demand is x live here?; and for the constant predicates problem, every instance of a predicate that consists only of the identier x gives rise to three demands: might x be zero here?, might x be non-zero here?, and might x be here?.2

The table in Figure 9 provides information about the size of the dataow domain for each dataow problem for each test program, the sizes of the exploded supergraphs, and the number of summary edges added by the Demand-Tabulation Algorithm when processing all interesting demands. In the following subsections, the times reported for the experiments include the time used to build the exploded supergraphs and to perform dataow analysis on those graphs (they do not include the time used by the front end to build the test programs control-ow graphs). Figure 10 shows the ratios of the times used to build the exploded supergraphs to the times used for analysis. Five graphs are shown: one for the exhaustive algorithm, one for the Demand-Tabulation Algorithm used to answer a single demand (using the average time for 20 randomly selected demands), one for the Demand-Tabulation Algorithm used to answer all interesting demands, one for the non-caching demand algorithm used to answer a single demand (using the average time for the same 20 demands used for the Demand-Tabulation Algorithm), and one for the non-caching algorithm used to answer all interesting demands. It is interesting to compare these ratios with the ratios predicted by the table in Figure 7 (showing the asymptotic running times of the Demand-Tabulation Algorithm for distributive, h-sparse, and locally separable problems). For the h-sparse problems (truly live variables, possibly uninitialized variables, and constant predicates), analysis time is related to the size of the graph by factors of D 2 and D 3 . Therefore, one would expect that graph-construction time would tend to take less of the total time as the size of the graph increases. This expectation is born out by our measurements. However, for the locally separable problems (live variables and simple uninitialized variables), analysis time is linear in the size of the graph. Therefore, one would expect the ratio of graph-construction time to total time to be independent of the size of the
2

Recall that a particular predicate is constant only if one of the rst two demands is answered yes, while the other two demands are answered no. Therefore, the number of demands answered yes reported for this problem in Figure 8 is much greater than the number of predicates found to be constant.

- 23 -

Example xref queens hash misr exptree dry chomp diff.diffh genetic anagram allroots ul ks compress stanford clinpack travel lex315 sim mway pokerd ft ansitape loader gcc.main voronoi ratfor livc struct.beauty diff.diff xmodem compiler learn.learn gnugo triangle football dixie eqntott twig arc cdecl lex patch yacr2 assembler unzip tbl gcc.cpp simulator agrep ptx li bc

uninit-vars statistics live-vars statistics Lines CFG statistics of # yes # yes source P Call N # demands possibly simple # demands truly live code uninit uninit live 68 8 13 111 4 4 146 6 9 216 6 8 238 13 17 241 14 17 262 21 44 303 14 49 336 17 32 344 15 22 427 7 19 451 14 35 574 14 17 657 15 28 665 47 79 695 12 43 725 15 23 747 17 102 748 15 47 806 21 42 1099 25 84 1185 28 48 1222 36 108 1255 30 79 1285 31 97 1394 47 104 1531 52 266 1674 86 203 1701 33 212 1761 41 126 1809 26 153 1908 38 349 1954 34 77 1963 28 88 1968 18 42 2075 58 257 2439 36 86 2470 61 215 2555 76 222 2574 90 254 2577 32 203 2645 62 329 2746 54 264 2911 52 157 2994 52 247 3261 40 126 3462 83 314 4061 54 216 4239 99 408 4906 64 136 5001 48 130 6054 132 555 6745 98 675 204 402 224 399 438 329 902 653 691 599 427 1366 1145 1760 1424 1269 554 1130 2178 1819 2167 1017 2247 2143 2406 1421 2792 4462 2797 4983 3425 3680 3849 3397 3435 6002 2902 4598 4383 5912 3474 6709 5265 3934 5227 3585 6538 5828 5873 7837 7326 5960 8423 65 97 49 135 171 117 247 219 245 199 156 310 389 428 675 583 316 261 1869 777 720 365 447 542 830 696 847 1404 711 1178 743 619 597 1184 3217 2252 697 1766 1278 1903 730 2577 1569 2314 1192 1237 2149 2874 1323 3919 4354 1480 2511 0 0 1 0 0 6 0 4 0 2 0 2 0 0 6 15 11 2 635 145 67 0 1 60 6 3 232 0 18 75 21 126 7 120 104 36 36 337 16 72 2 201 216 54 28 185 135 27 310 33 734 343 660 0 0 1 0 0 5 0 4 0 2 0 1 0 0 3 1 5 2 6 12 6 0 1 5 1 3 1 0 6 4 10 0 0 39 29 12 14 21 1 0 0 4 41 2 2 16 15 4 9 6 96 17 13 88 180 90 177 178 120 361 320 290 261 226 578 491 772 612 595 290 451 1234 811 996 419 1012 894 1128 686 1245 2052 1276 1839 1487 1543 1475 1413 1765 2953 1256 2117 2049 2693 1520 3192 2592 1939 2233 1715 2966 2815 2217 3687 3475 3267 3267 79 165 74 159 154 91 317 277 232 231 188 514 444 679 517 539 265 438 1181 703 875 395 848 886 1017 577 941 1611 1062 1538 1232 1175 1172 1148 1610 2906 1131 1913 1813 2384 1346 2748 2433 1795 2157 1559 2564 2516 2181 3312 3268 3022 3022 81 165 76 159 158 100 319 277 236 232 190 514 446 682 538 541 267 440 1182 705 875 395 849 886 1017 610 946 1613 1068 1540 1232 1179 1172 1150 1622 2906 1131 1916 1817 2384 1383 2800 2440 1795 2169 1565 2564 2525 2181 3322 3281 3065 3065

const-preds statistics # demands 42 51 12 9 54 6 54 57 21 39 6 195 24 165 51 12 75 138 411 63 192 45 168 165 309 168 411 24 327 465 543 714 309 624 582 687 222 405 312 666 333 861 771 468 516 462 1119 1425 996 1116 1830 1500 654 # yes 14 22 4 3 21 2 28 21 4 14 2 73 11 67 24 4 40 80 159 27 77 12 75 72 143 69 142 10 125 213 328 448 152 260 215 329 89 177 121 351 151 363 377 214 185 229 512 766 463 571 837 1161 298

Figure 8. Test program information.

- 24 -

uninit-vars statistics Example G # edges summary edges D possibly simple possibly simple D uninit uninit uninit uninit 11 22 2 12 123 50 261 160 295 66 32 679 157 396 296 210 201 553 1195 1205 607 202 6414 471 3273 3596 6587 5616 6887 1771 1621 10305 1872 1313 0 5195 1572 3948 7905 11819 7201 20229 11950 3245 2633 8092 11373 5314 5041 6043 4971 38862 49964 11 18 0 6 112 50 151 128 270 62 0 649 81 344 258 198 150 439 1110 941 568 161 4393 267 2943 2971 7401 5597 6610 1601 1607 8389 1824 943 0 5148 1335 3797 7719 11536 7011 19857 11974 3311 2324 7266 10672 5026 3703 5849 4628 27950 49084

live-vars statistics G # edges truly live live summary edges truly live live 99 42 15 44 203 220 400 444 829 101 162 847 91 460 2279 1654 357 601 1765 3021 1494 537 35016 1485 3624 5030 11150 4369 8322 3601 6014 13993 1572 2041 885 5310 1964 11998 41523 14945 12266 20890 15668 4158 8500 16929 12325 18954 10557 9424 10831 51551 56647 27 14 10 18 86 57 119 141 247 45 44 645 47 380 159 235 177 176 1064 306 672 314 2410 500 2208 1030 7040 951 5400 1241 1732 8903 1410 740 222 2941 1012 6775 6776 10404 7587 13767 10586 1215 2563 6239 6758 5380 3400 4112 3409 44612 33835

const-preds statistics sumD G # edges mary edges 43 82 34 85 79 58 67 100 100 115 73 133 91 181 106 115 166 109 235 256 109 79 286 130 223 226 175 157 208 388 271 130 175 121 232 250 169 208 397 385 217 388 331 280 154 424 226 316 121 319 622 325 352 7380 0 32893 23 5986 0 31431 0 29508 35 16681 0 51995 162 39295 158 64908 67 40131 27 29044 0 180002 976 97043 0 294058 136 124818 14 141760 0 82632 4 89476 102 544236 10 399681 348 194389 447 70695 140 553269 1113 223230 0 521955 1331 278519 280 476097 936 611871 0 597034 1540 1361962 874 827700 1960 472640 8675 610777 2326 341553 66 631468 4 1122236 1529 471333 594 969982 1415 1787337 1511 2161776 5439 751691 1611 2753136 4080 1685318 8429 840228 59 774266 385 1595253 3157 1380852 9775 1722977 4565 642022 1196 2321891 5858 4213669 3687 1795698 4826 2813877 22448

xref 18 3493 3376 queens 40 16729 16556 hash 13 2493 2401 misr 30 11377 11181 exptree 27 10596 10395 dry 27 8641 8510 chomp 23 18786 18290 diff.diffh 63 36198 35799 genetic 35 24157 23848 anagram 72 35979 35594 allroots 29 12280 12003 ul 45 62661 62123 ks 48 54686 53695 compress 94 165091 164214 stanford 66 93488 92944 clinpack 43 55205 54551 travel 90 50598 50118 lex315 41 38734 38253 sim 81 192191 190725 mway 119 203320 200451 pokerd 67 140864 139371 ft 39 38717 38252 ansitape 126 273206 271608 loader 64 125521 124495 gcc.main 107 272511 269954 voronoi 81 110663 108563 ratfor 59 177753 176195 livc 57 239190 236321 struct.beauty 83 261733 260599 diff.diff 160 613366 605402 xmodem 121 406443 404237 compiler 44 177068 176087 learn.learn 89 337739 336109 gnugo 53 167169 164727 triangle 108 334696 331110 football 97 484289 481017 dixie 87 260924 258759 eqntott 71 348804 346608 twig 135 651217 636825 arc 158 954734 951715 cdecl 74 271604 270265 lex 130 972295 969098 patch 144 799454 794874 yacr2 106 348885 346615 assembler 70 378857 375720 unzip 179 711922 707762 tbl 89 589267 585684 gcc.cpp 138 830388 822277 simulator 61 372914 370631 agrep 151 1191039 1187014 ptx 271 2017638 1995183 li 113 698042 695496 bc 152 1372175 1367767

18 3627 3489 45 18702 17706 15 3207 3109 31 12500 11825 28 11329 11051 35 11658 11192 25 20881 19555 72 44792 43181 42 31205 30647 86 45830 44543 34 14947 14725 51 71871 69634 56 62673 60180 108 212791 184215 85 133601 128626 54 71690 68990 103 58891 57549 48 51208 50846 85 209054 203021 144 244501 237500 75 166329 158840 44 43850 43237 174 464349 399937 70 144834 141695 112 298455 289970 85 126283 120223 93 317360 312737 121 565720 561393 111 382174 374492 182 947107 696872 134 475438 460730 52 229519 226545 124 488639 474089 62 206718 201418 137 426522 384299 129 728595 719297 117 354942 348047 95 505852 488345 154 1415560 744234 187 1189180 1172834 106 427621 416435 157 1219914 1199864 162 966122 933514 107 357509 353996 86 473472 454803 208 856644 831084 135 983139 977401 168 1154487 1053956 75 493171 486588 186 1526409 1481386 285 2320122 2127202 124 818403 809486 174 1688319 1661439

Figure 9. Graph sizes and added summary edges.

- 25 -

graph. This expectation is not born out by our measurements; instead, the ratio tends to decrease as the size of the graph increases. 4.2. Experiments Experiment 1: Single demand vs. Exhaustive Our rst two experiments compared the Demand-Tabulation Algorithm with the exhaustive algorithm. Our rst experiment reects what might happen when dataow analysis is used in the context of a tool that intersperses demands and program modications (so if an exhaustive algorithm is used, it must be re-run whenever a demand is made following a modication). In this case, it is reasonable to compare the time required by the exhaustive algorithm with the time required by the demand algorithm to answer a single demand. Therefore, for this study, we recorded the following data for each dataow-analysis problem and for each test program: (1) the time used by the exhaustive algorithm to build the exploded supergraph and to nd the meet-over-all-valid-paths solution; (2) the average time used by the Demand-Tabulation Algorithm to build the exploded supergraph and to answer a single demand (using 20 randomly selected demands). This data is summarized in Figure 11. Although there are a few cases where the Demand-Tabulation Algorithm is slower than the exhaustive algorithm, they are all tests for which the running times are trivial (less than 3 seconds). It seems clear that overall, the Demand-Tabulation Algorithm is preferable to the exhaustive algorithm when the goal is to answer a single demand. Experiment 2: Sequence of demands vs. Exhaustive Our second comparison of the Demand-Tabulation Algorithm and the exhaustive algorithm reects what happens when complete dataow information is desired (i.e., when it is desired to know, for all interesting program-point/dataow-fact pairs, whether that pair is in the meet-over-all-valid-paths solution). Therefore, for this study we recorded the time used by the Demand-Tabulation Algorithm to answer the sequence of all interesting demands, for each dataow-analysis problem and for each test program, and compared those times to the times required by the exhaustive algorithm. This data is summarized in Figure 12. The Demand-Tabulation Algorithm outperforms the exhaustive algorithm in all cases for the constantpredicates and live-variables problems, and in all but three cases for the truly-live-variables problem; it is clearly the algorithm of choice in these cases. For the two versions of the uninitialized-variables problem, the Demand-Tabulation Algorithm is almost always slower than the exhaustive algorithm; sometimes signicantly so. The Demand-Tabulation Algorithm is clearly not the algorithm of choice for these problems.

- 26 -

Graph-Construction Time vs Total Time


DEMAND-TABULATION ALG. (1 DEMAND) 1 1 DEMAND-TABULATION ALG. (ALL DEMANDS)

0.95

0.9

Graph-Construction Time / Total Time

0.9

Graph-Construction Time / Total Time 0 1000 2000 3000 4000 5000 Number of CFG Nodes 6000 7000 8000 9000

0.8

0.7

0.85

0.6

0.8

0.5

0.75

0.4

0.7

0.3

0.65

0.2

0.6

0.1 0 1000 2000 3000 4000 5000 Number of CFG Nodes 6000 7000 8000 9000

NON-CACHING-DEMAND ALG. (1 DEMAND) 1 1 0.9 0.8 Graph-Construction Time / Total Time 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 1000 2000 3000 4000 5000 Number of CFG Nodes 6000 7000 8000 9000 0 1000

NON-CACHING-DEMAND ALG. (ALL DEMANDS)

0.95

Graph-Construction Time / Total Time

0.9

0.85

0.8

0.75

0.7

0.65

0.6

0.55

2000

3000

4000 5000 Number of CFG Nodes

6000

7000

8000

9000

EXHAUSTIVE ALG. 1 0.9 0.8 Graph-Construction Time / Total Time 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 1000 2000 3000 4000 5000 Number of CFG Nodes 6000 7000 8000 9000

possibly-uninitialized variables simple uninitialized variables truly live variables live variables constant predicates

Figure 10. Ratios of times used to build the exploded supergraph to times used for analysis for the three different algorithms.

- 27 -

Experiment 1: Single Demand vs Exhaustive


TRULY LIVE VARIABLES 1 0.9 Demand-Tabulation Alg. Time / Exhaustive Alg. Time 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 20 40 60 80 100 120 Exhaustive Alg. Running Time (Seconds) 140 160 180 Demand-Tabulation Alg. Time / Exhaustive Alg. Time 1 LIVE VARIABLES

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1 0 2 4 6 8 10 12 14 Exhaustive Alg. Running Time (Seconds) 16 18 20

UNINITIALIZED VARIABLES 1.2 1.1 Demand-Tabulation Alg. Time / Exhaustive Alg. Time 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0 2 4 6 8 10 12 14 Exhaustive Alg. Running Time (Seconds) 16 18 20 0.5 0 1 2 Demand-Tabulation Alg. Time / Exhaustive Alg. Time 1.1 1.2

SIMPLE UNINITIALIZED VARIABLES

0.9

0.8

0.7

0.6

3 4 5 Exhaustive Alg. Running Time (Seconds)

CONSTANT PREDICATES 1

0.9 Demand-Tabulation Alg. Time / Exhaustive Alg. Time

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1 0 5 10 15 20 25 30 35 Exhaustive Alg. Running Time (Seconds) 40 45 50

Figure 11. First comparison of the Demand-Tabulation Algorithm and the exhaustive algorithm. The exhaustive algorithm is used to nd the entire meet-over-all-valid-paths solution, and the Demand-Tabulation Algorithm is used to answer a single demand.

- 28 -

Experiment 2: Sequence of Demands vs Exhaustive


TRULY LIVE VARIABLES 1.8 1 LIVE VARIABLES

1.6 Demand-Tabulation Alg. Time / Exhaustive Alg. Time Demand-Tabulation Alg. Time / Exhaustive Alg. Time 0 20 40 60 80 100 120 Exhaustive Alg. Running Time (Seconds) 140 160 180

0.9

1.4

0.8

1.2

0.7

0.6

0.8

0.5

0.6

0.4

0.4

0.2

0.3

0.2 0 2 4 6 8 10 12 14 Exhaustive Alg. Running Time (Seconds) 16 18 20

UNINITIALIZED VARIABLES 3 3.5

SIMPLE UNINITIALIZED VARIABLES

Demand-Tabulation Alg. Time / Exhaustive Alg. Time

Demand-Tabulation Alg. Time / Exhaustive Alg. Time

2.5

2.5

1.5

1.5

0.5 0 2 4 6 8 10 12 14 Exhaustive Alg. Running Time (Seconds) 16 18 20

0.5 0 1 2 3 4 5 Exhaustive Alg. Running Time (Seconds) 6 7 8

CONSTANT PREDICATES 1

0.9 Demand-Tabulation Alg. Time / Exhaustive Alg. Time

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1 0 5 10 15 20 25 30 35 Exhaustive Alg. Running Time (Seconds) 40 45 50

Figure 12. Second comparison of the Demand-Tabulation Algorithm and the exhaustive algorithm. The exhaustive algorithm is used to nd the entire meet-over-all-valid-paths solution, and the Demand-Tabulation Algorithm is used to answer all interesting demands.

- 29 -

We believe that there are two characteristics of dataow problems that are reasonable predictors of the relative speeds of the Demand-Tabulation Algorithm (applied to all interesting demands) and the exhaustive algorithm: 1. The number of demands, relative to the size of the exploded graph. 2. The percentage of demands with yes answers. If the number of demands is very small, clearly the Demand-Tabulation Algorithm will visit many fewer nodes than the exhaustive algorithm, and so less time is likely to be required for the Demand-Tabulation Algorithm. If most demands are answered yes, the nodes visited by the Demand-Tabulation Algorithm will also be visited by the exhaustive algorithm; however, since demands are not placed for all facts at all points, the Demand-Tabulation Algorithm should still be faster. However, if most demands are answered no, the Demand-Tabulation Algorithm may visit many more nodes than the exhaustive algorithm: demands answered no correspond to unreachable exploded supergraph nodes, so the exhaustive algorithm does not visit those nodes or any of their predecessors; however, the demand algorithm starts at those nodes and visits all predecessors, eventually discovering that none of them is in the ReachableNodes set. In the case of the two live-variables problems, most of the demands (is x live at this assignment?) lead to a yes answer, while in the case of the two uninitialized-variables problems, most of the demands (might x be uninitialized at this use?) lead to a no answer. The graph in Figure 13 plots the percentage of demands that are answered yes versus the ratio of the running times of the two algorithms for the ve dataow-analysis problems. Based on the results of our rst two experiments, we hypothesize that when the goal is to answer demands at most program points, and it is expected that most demands will be answered no, the exhaustive algorithm will be the algorithm of choice. However, when the expected number of demands is small (for example, in an interactive tool, or in a restructuring tool that is likely to demand dataow information only for a small part of a program before performing a transformation, or for a problem like the constantpredicates problem), or it is expected that most demands will be answered yes, then the DemandTabulation Algorithm will be the algorithm of choice. Experiment 3: Caching vs Non-caching demand (single demand) The goal of our third and fourth experiments was to study the tradeoffs between the benet and overhead of caching, rst on a single demand and then on a sequence of demands. For our third experiment we applied the non-caching demand algorithm to the same 20 randomly selected demands used in Experiment 1 (starting the algorithm from scratch for each demand as was done for the Demand-Tabulation Algorithm), and we computed the average running time for a single demand. The results of this experiment are shown in Figure 14.

- 30 -

Experiment 2: Time Ratios vs. Percentage of yes Answers

9
Exhaustive Alg. Time / Demand-Tabulation Alg. Time (All Demands)

0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Ratio of Yes Answers to Total # of Demands 0.8 0.9 1

possibly-uninitialized variables simple uninitialized variables truly live variables live variables constant predicates

Figure 13. When most demands are answered yes, the Demand-Tabulation Algorithm is likely to outperform the exhaustive algorithm; the situation is reversed when most demands are answered no.

As expected, caching involves some overhead (about 50% in the worst case for our experiments). This is due partly to the extra work required to maintain the ReachableNodes and VisitedNodes sets, and partly due to the fact that more time is required to build the exploded supergraph. In particular, the DemandTabulation Algorithm must traverse edges in both directions (BackwardDFS traverses edges backwards and UpdateReachableNodes traverses edges forwards) while the non-caching demand algorithm only traverses edges backwards. Thus, the Demand-Tabulation Algorithm must include both predecessor and successor information in the exploded supergraph, while the non-caching demand algorithm only needs to

- 31 -

include predecessor information. Experiment 4: Caching vs Non-caching demand (sequence of demands) For our nal experiment, we applied the non-caching demand algorithm to the same sequences of all interesting demands to which the Demand-Tabulation Algorithm was applied in Experiment 2. The results of this experiment are shown in Figure 15. For a sequence of demands, the benets of caching outweigh its overhead in all cases for the possiblyuninitialized variables and the simple uninitialized-variables problems (however, recall that we have already concluded that the exhaustive algorithm is superior to the Demand-Tabulation Algorithm for a sequence of demands for these two problems). For the truly live variables problem the caching algorithm is faster in all but ve cases (and in those cases it is only about 5% slower). For the constant-predicates problem, caching is a win in about half of the cases, and for the live-variables problem, in about two-fths of the cases. However, for both of these problems, it seems that the caching algorithm is still the algorithm of choice: In the worst case for the caching algorithm, it took about 1.4 times as long as the non-caching algorithm; in the worst case for the non-caching algorithm, it took about 1.8 times as long as the caching algorithm. There is only one case in which the time for the non-caching algorithm is at least 25% less than the time for the caching algorithm, while there are seven cases in which the time for the caching algorithm is at least 25% less than the time for the non-caching algorithm. 5. Relation to Previous Work Until very recently, work on demand-driven dataow analysis only considered the intraprocedural case (cf. [Bab78]) and work on interprocedural dataow analysis only considered the exhaustive case (cf. [Sha81,Cal88,Cal86,Kno92]). Because in intraprocedural dataow analysis all paths in the control-ow graph are assumed to be valid execution paths, the work on demand-driven intraprocedural dataow analysis does not extend to the interprocedural case, where the notion of realizable paths is important. One approach to obtaining demand algorithms for interprocedural dataow-analysis problems was described by Reps [Rep94c,Rep94b]. Reps presented a way in which algorithms that solve demand versions of interprocedural analysis problems can be obtained automatically from their exhaustive counterparts (expressed as logic programs) by making use of the magic-sets transformation, a general transformation developed in the logic-programming and deductive-database communities for creating efcient demand versions of (bottom-up) logic programs [Roh86,Ban86,Bee87,Ull89]. Reps illustrated this approach by showing how to obtain a demand algorithm for the interprocedural locally separable problems. Subsequent work by Reps, Sagiv, and Horwitz extended the logic-programming approach to the class of

- 32 -

Experiment 3: Caching Demand vs Non-Caching Demand (Single Demand)


TRULY LIVE VARIABLES 1.35 1.45 1.4 Demand-Tabulation Alg. Time / Non-Caching Alg. Time 1.35 1.3 1.25 1.2 1.15 1.1 1.05 1 0.95 0 2 4 6 8 10 12 Non-Caching Demand Alg. Running Time (Seconds) 14 16 0 1 2 3 4 5 Non-Caching Demand Alg. Running Time (Seconds) 6 7 LIVE VARIABLES

Demand-Tabulation Alg. Time / Non-Caching Alg. Time

1.3

1.25

1.2

1.15

1.1

1.05

0.95

UNINITIALIZED VARIABLES 1.45 1.4 Demand-Tabulation Alg. Time / Non-Caching Alg. Time Demand-Tabulation Alg. Time / Non-Caching Alg. Time 1.35 1.3 1.25 1.2 1.15 1.1 1.05 1 0.95 0.9 0 0.5 1 1.5 2 2.5 3 3.5 Non-Caching Demand Alg. Running Time (Seconds) 4 4.5 5 0.9 0 0.5 1.4 1.5

SIMPLE UNINITIALIZED VARIABLES

1.3

1.2

1.1

1 1.5 2 2.5 3 Non-Caching Demand Alg. Running Time (Seconds)

3.5

CONSTANT PREDICATES 1.5

Demand-Tabulation Alg. Time / Non-Caching Alg. Time

1.4

1.3

1.2

1.1

0.9

0.8 0 1 2 3 4 5 6 Non-Caching Demand Alg. Running Time (Seconds) 7 8

Figure 14. Comparison of the caching and non-caching demand algorithms for a single demand.

- 33 -

Experiment 4: Caching Demand vs Non-Caching Demand (Sequence of Demands)


TRULY LIVE VARIABLES 1.1 1.3 LIVE VARIABLES

Demand-Tabulation Alg. Time / Non-Caching Alg. Time

Demand-Tabulation Alg. Time / Non-Caching Alg. Time

1.2

1.1

0.9

0.8

0.9

0.7

0.8

0.6

0.7

0.5 0 5 10 15 20 25 30 Non-Caching Demand Alg. Running Time (Seconds) 35 40

0.6 0 1 2 3 4 5 6 7 8 Non-Caching Demand Alg. Running Time (Seconds) 9 10 11

UNINITIALIZED VARIABLES 1 0.9 Demand-Tabulation Alg. Time / Non-Caching Alg. Time 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 50 100 150 200 250 Non-Caching Demand Alg. Running Time (Seconds) 300 350 Demand-Tabulation Alg. Time / Non-Caching Alg. Time 1

SIMPLE UNINITIALIZED VARIABLES

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1 0 10 20 30 40 Non-Caching Demand Alg. Running Time (Seconds) 50 60

CONSTANT PREDICATES 1.4

Demand-Tabulation Alg. Time / Non-Caching Alg. Time

1.3

1.2

1.1

0.9

0.8

0.7

0.6

0.5 0 2 4 6 8 10 12 14 Non-Caching Demand Alg. Running Time (Seconds) 16 18

Figure 15. Comparison of the caching and non-caching demand algorithms for a sequence of demands.

- 34 -

IFDS problems [Rep94a,Rep95]. (The latter papers do not make use of logic-programming terminology; however, the exhaustive algorithms described in the papers have straightforward implementations as logic programs. Demand algorithms can then be obtained by applying the magic-sets transformation.) Several people, leery of the (space, time, and conceptual) overheads involved in using logic databases, questioned whether the logic-programming approach to obtaining demand algorithms for interprocedural dataow analysis can really produce implementations that are efcient enough to be used in real-world program-analysis tools. Although the jury is still out on this issue (waiting for improved logic-database implementations), it is natural to ask a related question: Is there a way to adapt the ideas so that they can be used in program-analysis tools written in imperative programming languages? The present paper can be viewed as answering this question in the afrmative. The two basic ideas used in the magic-sets transformation are propagation of queries and caching of results, and it is fairly easy to transfer these notions over to demand algorithms written in an imperative programming language (such as C). The Demand-Tabulation Algorithm given in Section 3 can be viewed as an analog of the magic-setstransformed exhaustive dataow analysis program: The operations that push nodes onto NodeStack in BackwardDFS and insert edges into EdgeWorkList in BackwardTabulateSLRPs for subsequent processing can be viewed as query-propagation operations; the sets ReachableNodes, VisitedNodes, PathEdge, and SummaryEdge, whose values are preserved across calls, can be viewed as caches of previously computed results (and previously computed intermediate values). On the other hand, there are a number of benets obtained when an imperative programming language is used to implement these ideas. The most important benet is that the algorithm of Section 3 has a simple, low-overhead implementation in an imperative programming language. The implementation is based on array indexing and linked lists, and involves neither term-unication nor term-matching. In addition, an imperative implementation has the opportunity to exploit specic properties of the problem that are not present in all logic programs (and hence would not be exploited by either the magic-sets transformation or the bottom-up engine used for evaluating logic-programs). The two forms of early cut-off employed by the algorithm given in Section 3 provide two examples of how such properties can be exploited to improve performance: 3 The use of a depth-rst-search strategy in BackwardDFS allows BackwardDFS to terminate as soon as a node in ReachableNodes is encountered. In BackwardTabulateSLRPs, once it is known that there is a path from some node of the form m, 0 to exitp , d, the path edge startp , 0 exitp , d is inserted into PathEdge. Subsequent tests for membership of this edge in PathEdge allow BackwardTabulateSLRPs to avoid processing more path edges with
3

A previous version of the Demand-Tabulation Algorithm that was presented in [Hor95] did not employ either of these strategies for early cut-off.

- 35 target exitp , d. Of course, these early cut-offs can only be taken when the answer to the current demand is yes. Thus, these improvements to the algorithm are most signicant for problems with a high ratio of yes answers. A related approach to obtaining demand versions of dataow-analysis algorithms has been investigated by Duesterwald, Gupta, and Soffa, rst for intraprocedural problems [Due93] and subsequently for interprocedural problems [Due95]. In their approach, a set of dataow equations is set up on the ow graph (but as if all edges were reversed). The ow functions on the reversed graph are the (approximate) inverses of the original forward functions. Their algorithm for solving such problems is a demand-driven algorithm that repeatedly propagates a query from a node in the control-ow graph to the nodes predecessors. (The appropriate query is generated by applying the inverse dataow function.) Caching also plays a role: Values of summary functions are tabulated; these express how queries at return sites generate queries at call sites. The Duesterwald-Gupta-Soffa approach is more general than ours because it can handle distributive problems on any nite lattice, while the Demand-Tabulation Algorithm is limited to distributive problems on nite subset lattices. (They can also provide approximate information in cases where the ow functions are monotonic but not distributive.) However, this generality is achieved at some cost. When applied to an IFDS problem, the worst-case cost of the algorithm given in [Due95] is exponential: O (E D 2D ), while the worst-case cost of the Demand-Tabulation Algorithm is polynomial: O (E D 3 ). This is not the entire story, however, because the Duesterwald-Gupta-Soffa framework can be used as a conceptual framework for deriving particular algorithms for specic problems as special cases of their general methods. For instance, this is done for copy-constant propagation in the second half of [Due95], and yields an algorithm with polynomial running time. Our algorithm can be viewed as the specialization of the Duesterwald-Gupta-Soffa framework to the entire class of IFDS problems. (But no further specialization for a particular problem is necessary to obtain an algorithm with polynomial running time. That is, from a specication of the edge transformers for a particular problem instance, our techniques automatically yield an algorithm with polynomial running time.) Another framework for demand analysis is given in [Sag]. That framework applies to a class of distributive problems that is strictly larger than the IFDS problems and that is incomparable to the class to which the Duesterwald-Gupta-Soffa framework applies. (The framework of [Sag] applies only to distributive problems, whereas the Duesterwald-Gupta-Soffa framework can be applied to some non-distributive problems. However, the Duesterwald-Gupta-Soffa framework requires that the lattice of dataow values have a nite number of elements, whereas the framework of [Sag] requires only that the lattice have nite height.) The Demand-Tabulation Algorithm can be viewed as a specialization of the algorithm of [Sag] to IFDS problems, but with several improvements (e.g., the two forms of early cut-off discussed above, and the elimination of an entire phase of the more general algorithm).

- 36 -

At the other end of the spectrum, it is interesting to compare our work with Callahans programsummary-graph algorithm for ow-sensitive side-effect analyses [Cal88]. As discussed in [Rep95], Callahans problems fall into the class of locally separable IFDS problems, the subclass of the IFDS problems that corresponds to interprocedural versions of the classic gen/kill problems. 4 From the standpoint of asymptotic worst-case complexity, Callahans problems can actually be solved more efciently with the algorithm from [Rep95] than with the algorithm given by Callahan. Because the present paper provides a demand version of the exhaustive algorithm from [Rep95] (where the demand algorithm has the same worst-case complexity as the exhaustive algorithm), our work also provides a demand algorithm for solving all of Callahans problems. References
92. , SPEC Component CPU Integer Release 2/1992, (CINT92), Standard Performance Evaluation Corporation (SPEC), Fairfax, VA (1992). Aus94. Austin, T. M., Breach, S. E., and Sohi, G., Efcient detection of all pointer and array access errors, Proceedings of the ACM SIGPLAN 94 Conference on Programming Language Design and Implementation, (Orlando, FL, June 20-24, 1994), ACM SIGPLAN Notices 29(6) pp. 290-301 (June 1994). Bab78. Babich, W.A. and Jazayeri, M., The method of attributes for data ow analysis: Part II. Demand analysis, Acta Informatica 10(3) pp. 265-272 (October 1978). Ban86. Bancilhon, F., Maier, D., Sagiv, Y., and Ullman, J., Magic sets and other strange ways to implement logic programs, in Proceedings of the Fifth ACM Symposium on Principles of Database Systems, (Cambridge, MA), (March 1986). Ban79. Banning, J.P., An efcient way to nd the side effects of procedure calls and the aliases of variables, pp. 29-41 in Conference Record of the Sixth ACM Symposium on Principles of Programming Languages, (San Antonio, TX, January 29-31, 1979), (January 1979). Bee87. Beeri, C. and Ramakrishnan, R., On the power of magic, pp. 269-293 in Proceedings of the Sixth ACM Symposium on Principles of Database Systems, (San Diego, CA, March 1987), (March 1987). Cal86. Callahan, D., Cooper, K.D., Kennedy, K., and Torczon, L., Interprocedural constant propagation, Proceedings of the SIGPLAN 86 Symposium on Compiler Construction, (Palo Alto, CA, June 25-27, 1986), ACM SIGPLAN Notices 21(7) pp. 152-161 (July 1986). Cal88. Callahan, D., The program summary graph and ow-sensitive interprocedural data ow analysis, Proceedings of the ACM SIGPLAN 88 Conference on Programming Language Design and Implementation, (Atlanta, GA, June 22-24, 1988), ACM SIGPLAN Notices 23(7) pp. 47-56 (July 1988). Coo88. Cooper, K.D. and Kennedy, K., Interprocedural side-effect analysis in linear time, Proceedings of the ACM SIGPLAN 88 Conference on Programming Language Design and Implementation, (Atlanta, GA, June 22-24, 1988),
4

Strictly speaking, Callahans problems are not quite IFDS problems because they are concerned with computing information that summarize the effects of a procedure, rather than what must be true at a program point in all calling contexts. This is only a minor technical difference, and does not invalidate the point being made above.

- 37 -

ACM SIGPLAN Notices 23(7) pp. 57-66 (July 1988). Coo89. Cooper, K.D. and Kennedy, K., Fast interprocedural alias analysis, pp. 49-59 in Conference Record of the Sixteenth ACM Symposium on Principles of Programming Languages, (Austin, TX, January 11-13, 1989), (January 1989). Due93. Duesterwald, E., Gupta, R., and Soffa, M.L., Demand-driven program analysis, Technical Report TR-93-15, Department of Computer Science, University of Pittsburgh, Pittsburgh, PA (October 1993). Due95. Duesterwald, E., Gupta, R., and Soffa, M.L., Demand-driven computation of interprocedural data ow, in Conference Record of the Twenty-Second ACM Symposium on Principles of Programming Languages, (San Francisco, CA, January 23-25, 1995), (January 1995). Fis88. Fischer, C.N. and LeBlanc, R.J., Crafting a Compiler, Benjamin/Cummings Publishing Company, Inc., Menlo Park, CA (1988). Gie81. Giegerich, R., Moncke, U., and Wilhelm, R., Invariance of approximative semantics with respect to program transformations., pp. 1-10 in Informatik-Fachberichte 50, Springer-Verlag, Berlin Heidelberg New York (1981). Hor86. Horwitz, S. and Teitelbaum, T., Generating editing environments based on relations and attributes, ACM Transactions on Programming Languages and Systems 8(4) pp. 577-608 (October 1986). Hor95. Horwitz, S., Reps, T., and Sagiv, M., Demand interprocedural dataow analysis, in Proceedings of the ACM SIGSOFT Symposium on the Foundations of Software Engineering, (October 1995). Kil73. Kildall, G., A unied approach to global program optimization, pp. 194-206 in Conference Record of the First ACM Symposium on Principles of Programming Languages, (Boston, MA, October 1-3, 1973), (October 1973). Kno92. Knoop, J. and Steffen, B., The interprocedural coincidence theorem, pp. 125-140 in Proceedings of the Fourth International Conference on Compiler Construction, (Paderborn, FRG, October 5-7, 1992), Lecture Notes in Computer Science, Vol. 641, ed. U. Kastens and P. Pfahler,Springer-Verlag, New York, NY (1992). Lan93. Landi, W., Ryder, B., and Zhang, S., Interprocedural modication side effect analysis with pointer aliasing, pp. 56-67 in Proceedings of the ACM SIGPLAN 93 Conference on Programming Language Design and Implementation, (Albuquerque, NM, June 23-25, 1993), (June 1993). Lin84. Linton, M.A., Implementing relational views of programs, pp. 132-140 in Proceedings of the ACM SIGSOFT/SIGPLAN Software Engineering Symposium on Practical Software Development Environments, (Pittsburgh, PA, April 23-25, 1984), (April 1984). Mas80. Masinter, L.M., Global program analysis in an interactive environment, Tech. Rep. SSL-80-1, Xerox Palo Alto Research Center, Palo Alto, CA (January 1980). Rep94a. Reps, T., Sagiv, M., and Horwitz, S., Interprocedural dataow analysis via graph reachability, Technical Report 94-14, Datalogisk Institut, University of Copenhagen, Copenhagen, Denmark (April 1994). Rep94b. Reps, T., Demand interprocedural program analysis using logic databases, in Applications of Logic Databases, ed. R. Ramakrishnan,Kluwer Academic Publishers, Boston, MA (1994). Rep94c. Reps, T., Solving demand versions of interprocedural analysis problems, pp. 389-403 in Proceedings of the Fifth International Conference on Compiler Construction, (Edinburgh, Scotland, April 7-9, 1994), Lecture Notes in Computer Science, Vol. 786, ed. P. Fritzson,Springer-Verlag, New York, NY (1994). Rep95. Reps, T., Sagiv, M., and Horwitz, S., Precise interprocedural dataow analysis via graph reachability, in

- 38 -

Conference Record of the Twenty-Second ACM Symposium on Principles of Programming Languages, (San Francisco, CA, January 23-25, 1995), (January 1995). Roh86. Rohmer, R., Lescoeur, R., and Kersit, J.-M., The Alexander method, a technique for the processing of recursive axioms in deductive databases, New Generation Computing 4(3) pp. 273-285 (1986). Sag. Sagiv, M., Reps, T., and Horwitz, S., Precise interprocedural dataow analysis with applications to constant propagation, Theoretical Computer Science, (). (To appear.) Sha81. Sharir, M. and Pnueli, A., Two approaches to interprocedural data ow analysis, pp. 189-233 in Program Flow Analysis: Theory and Applications, ed. S.S. Muchnick and N.D. Jones,Prentice-Hall, Englewood Cliffs, NJ (1981). Ull89. Ullman, J.D., Principles of Database and Knowledge-Base Systems, Volume II: The New Technologies, Computer Science Press, Rockville, MD (1989). Wei84. Weiser, M., Program slicing, IEEE Transactions on Software Engineering SE-10(4) pp. 352-357 (July 1984).

You might also like