On Polynomial Expressions with C-Finite Recurrences in Loops with Nested Nondeterministic Branches

Wang, Chenglin; Lin, Fangzhen

doi:10.1007/978-3-031-65627-9_20

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14681))

Included in the following conference series:

International Conference on Computer Aided Verification

2031 Accesses
1 Citations

Abstract

Loops are inductive constructs, which make them difficult to analyze and verify in general. One approach is to represent the inductive behaviors of the program variables in a loop by recurrences and try to solve them for closed-form solutions. These solutions can then be used to generate invariants or directly fed into an SMT-based verifier. One problem with this approach is that if a loop contains nondeterministic choices or complex operations such as non-linear assignments, then recurrences for program variables may not exist or may have no closed-form solutions. In such cases, an alternative is to generate recurrences for expressions, and there has been recent work along this line. In this paper, we further work in this direction and propose a template-based method for extracting polynomial expressions that satisfy some c-finite recurrences. While in general there are possibly infinitely many such polynomials for a given loop, we show that the desired polynomials form a finite union of vector spaces. We propose an algorithm for computing the bases of the vector spaces, and identify two cases where the bases can be computed efficiently. To demonstrate the usefulness of our results, we implemented a prototype system based on one of the special cases, and integrated it into an SMT-based verifier. Our experimental results show that the new verifier can now verify programs with non-linear properties.

You have full access to this open access chapter, Download conference paper PDF

Symbolic Computation and Automated Reasoning for Program Analysis

Algebra-Based Synthesis of Loops and Their Invariants (Invited Paper)

Polynomial Invariants by Linear Algebra

Keywords

1 Introduction

Loops in computer programs induce inductive behaviors that are difficult to analyze. One method is by recurrence analysis through first extracting recurrences from loops and then solving them for closed-form solutions [11, 17, 19]. Once the solutions have been computed, they can be used in many downstream tasks such as invariant generation and program verification. So far most recurrence-based methods focus on individual program variables and their recurrences. In practice, due to complex control flow (e.g., nested branches in a loop) and operations (e.g., non-linear operations), individual variables may not have well-defined recurrences. For example, consider the program in Fig. 1. Due to the non-deterministic branches, there are no well-defined recurrences for the variables x and y, not to mention computing closed-form solutions to them. However, if we consider the expression $x^2+y+1$, there is a simple c-finite recurrence for it: let $q=x^2+y+1$, and $q(k) = x(k)^2 + y(k) + 1$, where x(k) and y(k) denote the values of x and y, respectively, after the kth iteration of the while loop. It is easy to verify that q(k) satisfies the following c-finite recurrence:

$$\begin{aligned} q(k+1) = 4q(k), \end{aligned}$$

from which one can compute a closed-form solution $q(k) = 4^k q(0)$. This by itself shows an interesting property about the program. It can also potentially be used to prove other properties that can be related to this expression. Furthermore, as shown by Kovács [16], c-finite recurrences and their closed-form solutions can be naturally used as a downstream tool to generate polynomial invariants.

This example shows that although individual program variables may not satisfy any recurrences, expressions made out of them may sometimes have recurrences that can be solved. As mentioned, finding expressions that have solvable recurrences is a useful exercise in itself in program analysis. It can also help program verification as we will see later.

While most works on loops and recurrences have been on individual program variables, there are two recent studies on recurrences arisen from expressions: the work by Amrollahi et al. [1] and that by Cyphert and Kincaid [4]. In this paper, we extend the current work by considering a larger program model that allows nested nondeterministic branches:

where $\textbf{x} = \textbf{p}_i(\textbf{x})$ is a simultaneous assignments of a tuple of variables $\textbf{x}$ by a corresponding tuple of polynomials $\textbf{p}_i(\textbf{x})$. Given such a program, we consider polynomial expressions and systematically exploring all c-finite recurrences. In comparison, Amrollahi et al. [1] considered only simple loops without nested branches, and for limited forms of recurrences. As we shall see, our results in this paper strictly subsumes theirs even for simple loops. While Cyphert and Kincaid [4] also considered nested branches, they are reduced to ones without nested branches by program abstraction which is not guaranteed to be complete. However, their results on simple loops without nested branches are systematic. In fact, for these programs, our results are equivalent to theirs. We will have a more detailed discussion of related work later.

Briefly, our main contributions in this paper are as follows:

1.
We propose a sound and semi-complete template-based method for finding polynomials that satisfy c-finite recurrences.
2.
We show that the set of polynomials of the bounded degree d that satisfy c-finite recurrences of order r, for any given $d\ge 0$ and $r\ge 0$, forms a finite union of vector spaces. Based on this finding, we propose an algorithm to compute the bases of the vector spaces, and their closed-form solutions.
3.
We identify two special cases, (1) $r=1$ and (2) all $\textbf{p}_i$’s are linear, where bases of these vector spaces are computed by solving linear equations.
4.
We implemented a prototype system for finding closed-form solutions of polynomial expressions for the first special case and integrated it into a program verifier. Our experimental results shows that with this tool, many programs with non-linear properties can now be proved.

The rest of this paper is structured as follows. Section 2 introduces notations and reviews some basic concepts used in this paper. Section 3 introduces the template-based method and shows that the problem of finding polynomials with c-finite recurrences can be reduced to solving a system of quadratic equations. Section 4 shows the polynomials of bounded degree d satisfying c-finite recurrences of order r form a finite union of distinct vector space. Section 5 shows that under some settings, the computation for finite solutions can be easier than that proposed in Sect. 4. Section 6 introduces the implemented system and summarizes the experimental results. Finally, Sect. 7 discusses related work.

2 Preliminaries

In this section, we introduce notations and briefly review some concepts used in this paper.

2.1 Polynomials

A monomial in $\textbf{x} = \begin{bmatrix}x_1, \dots , x_n\end{bmatrix}^T$ is a product of the form

$$\begin{aligned} x_1^{\alpha _1}\cdot x_2^{\alpha _2}\cdots x_n^{\alpha _n}, \end{aligned}$$

where $\alpha _i$’s are nonnegative integers. We simplify the notation for monomials as follows: let $\alpha = (\alpha _1, \dots , \alpha _n)$ be an n-tuple of nonnegative integers. Then we set

$$\begin{aligned} \textbf{x}^{\alpha } = x_1^{\alpha _1}\cdot x_2^{\alpha _2}\cdots x_n^{\alpha _n},. \end{aligned}$$

The total degree of the monomial $x^\alpha $ is denoted $|\alpha | = \alpha _1 + \dots , \alpha _n$. A polynomial p in $\textbf{x}$ with coefficients in a field $\mathbb {K}$ is a finite linear combination of monomials of form

$$\begin{aligned} p = \sum _\alpha a_{\alpha }\textbf{x}^\alpha , \end{aligned}$$

where $a_\alpha \in \mathbb {K}$ and the sum is over a finite number of n-tuples $\alpha =(\alpha _1, \dots , \alpha _n)$. The set of all polynomials in $\textbf{x}$ with coefficients in a field $\mathbb {K}$ is denoted by $\mathbb {K}[x_1, \dots , x_n]$ or $\mathbb {K}[\textbf{x}]$ for short. The total degree of p, denoted $\text {deg}(p)$, is the maximum $|\alpha |$ such that the coefficient $a_\alpha $ is nonzero. By a polynomial p of bounded degree d, we mean the total degree of it is less than or equal to d.

The set of all polynomials of bounded degree d (denoted by $\mathbb {K}_d[\textbf{x}]$) forms a vector space and all monomials of bounded degree d form a basis. By fixing the order on those monomials, a polynomial p can be represented using coordinate vector whose elements are coefficients of p. For example, let $\begin{bmatrix}1, x, y, x^2, xy, y^2\end{bmatrix}^T$ be the basis. Then the coordinate vector of $p = 2 + 3x + 4y^2$ is $\begin{bmatrix}2, 3, 0, 0, 0, 1 \end{bmatrix}^T$.

For a polynomial vector $\textbf{q} = \begin{bmatrix}q_1, \dots , q_n \end{bmatrix}^T$, we have $\textbf{q}(\textbf{x}) = \begin{bmatrix}q_1(\textbf{x}), \dots , q_n(\textbf{x})\end{bmatrix}^T$. The result of the polynomial composition $(p\circ \textbf{q})(\textbf{x}) = p(\textbf{q}(\textbf{x}))$ is a polynomial of bounded degree $d_pd_{\textbf{q}}$, where $d_p$ is the total degree of p and $d_\textbf{q}$ is the maximum total degree among total degrees of $q_i$’s. The polynomial composition is distributed over addition. That is, $(p_1 + p_2)\circ \textbf{q} = (p_1 \circ \textbf{q}) + (p_2 \circ \textbf{q})$ for any $p_1$, $p_2$, and $\textbf{q}$. Given another polynomial vector $\textbf{p} = \begin{bmatrix}p_1, \dots , p_n\end{bmatrix}^T$, we have $\textbf{p}\circ \textbf{q} = \begin{bmatrix}p_1\circ \textbf{q}, \dots , p_n\circ \textbf{q}\end{bmatrix}^T$.

2.2 Eigenvalues and Matrix Polynomials

Given an $n \times n$ matrix M, if a scalar $\lambda $ and a nonzero vector $\textbf{a}$ satisfy the equation

$$\begin{aligned} M\textbf{a} = \lambda \textbf{a}, \end{aligned}$$

then $\lambda $ is called an eigenvalue of M and $\textbf{a}$ is called an eigenvector of M associated with $\lambda $. The pair $(\lambda , \textbf{a})$ is an eigenpair for M.

Given a univariate polynomial $p(x) = x^k + a_{k-1}x^{k-1} + \cdots + a_1x + a_0$ of degree k, the evaluation of it at a square matrix M is well-defined by

$$\begin{aligned} p(M) = M^k + a_{k-1}M^{k-1} + \cdots + a_1M + a_0I. \end{aligned}$$

Recall that the following Theorem in [10], which follows the fundamental theorem of algebra, links the eigenpairs of p(M) to those of M in a simple way.

Theorem 1

Let p(x) be a univariate polynomial of degree k. If $(\lambda , \textbf{a})$ is an eigenpair of M, then $(p(\lambda ), \textbf{a})$ is an eigenpair of p(M). Conversely, if $k \ge 1$ and if $\mu $ is an eigenvalue of p(M), then there is some eigenvalue $\lambda $ of M s.t. $\mu = p(\lambda )$.

Example 1

Let $p(x) = x^2 + 3x + 1$ The eigenvalues of $M = \begin{bmatrix}2 &{} 3\\ 4 &{} 3\end{bmatrix}$ are 6 and $-1$. The eigenvalues of $p(M) = M^2 + 3M + 1 = \begin{bmatrix}23 &{} 24\\ 32 &{} 31\end{bmatrix}$ are $p(6) = 6^2 + 3\cdot 6 + 1 = 55$ and $p(-1) = (-1)^2 + 3\cdot (-1) +1 = -1$.

2.3 C-Finite Recurrences

A sequence $\{a(k)\}_{k=0}^\infty $ is c-finite if it satisfies a c-finite recurrence of the following form for some constant $c_i \in \mathbb {Q}$’s and integer $r \ge 1$:

$$\begin{aligned} a(k+r) = c_1 a(k+r-1) + \dots + c_r a(k), \end{aligned}$$

(1)

where r is the order of this recurrence.

The following inhomogeneous c-finite recurrence with an extra constant term $c_{r+1}$ is also considered in this paper:

$$\begin{aligned} a(k+r) = c_1 a(k+r-1) + \dots + c_r a(k) + c_{r+1}. \end{aligned}$$

(2)

The constant term $c_{r+1}$ is often discarded in the literature because a sequence $\{a(k)\}_{k=0}^\infty $ satisfying an inhomogeneous c-finite recurrence (2) of order r must satisfy a homogeneous one (1) of order $r+1$. We consider the inhomogeneous case because in a setting discussed later, the computation will be easier.

The characteristic polynomial p of a c-finite recurrence (1) is a univariate polynomial defined as:

$$\begin{aligned} p(t) = t^r - c_1t^{r-1} - \dots - c_rt. \end{aligned}$$

Every c-finite recurrence (1) has a closed-form solution in the following exponential polynomial form [7]:

$$\begin{aligned} a(k) = \sum _{i=1}^{s}p_i(k)\lambda _i^k, \end{aligned}$$

(3)

where s is the number of distinct roots of the characteristic polynomial, $\lambda _i$’s are those distinct roots, and $p_i$’s are polynomials whose degrees are one less than multiplicities of the corresponding roots $\lambda _i$’s and coefficients are determined by initial values of a(k). Conversely, any sequence admitting a closed-form solution of form (3) is c-finite.

2.4 Program Model and Problem Statement

In this paper, we consider the program model illustrated in Fig. 2. In words, given a set of variables $\textbf{x} = \begin{bmatrix}x_1, \dots , x_n\end{bmatrix}^T$, each iteration of the loop updates the values of these variables non-deterministically by some polynomial transitions $\textbf{p}_i = \begin{bmatrix}p_{i1}, \dots , p_{in}\end{bmatrix}^T$, where $p_{ij} \in \mathbb {Q}[\textbf{x}]$ and $\textbf{p}_i(\textbf{x}) = [p_{i1}(\textbf{x}), \dots , p_{in}(\textbf{x})]^T$. Notice that this class of programs can model nested deterministic branches and nested loops in a natural way - see Sect. 6 for an example.

We denote the values of $\textbf{x}$ after the kth iteration by $\textbf{x}(k) = \begin{bmatrix}x_1(k), \dots , x_n(k)\end{bmatrix}^T$. Given a program in Fig. 2 and some integer $r \ge 1$, we want to find some polynomial expressions $q(\textbf{x})$ of bounded degree d satisfying the following c-finite recurrence for some $c_i \in \mathbb {Q}$:

$$\begin{aligned} q(\textbf{x}(k+r)) = c_1 q(\textbf{x}(k+r-1)) + \dots + c_r q(\textbf{x}(k)). \end{aligned}$$

(4)

3 Reduction to Solving a System of Quadratic Equations

Given a program as described by Fig. 2, a bounded degree d, and the order r, we want to find polynomials $q \in \mathbb {Q}[\textbf{x}]$ satisfying c-finite recurrences (4). To consider all possible interleaves of those non-deterministic branches, letting $q'_i$ be the polynomial composition $q'_i = q \circ \textbf{p}_{w[r-i]} \circ \dots \circ \textbf{p}_{w[1]}$, Eq. (4) is equivalent to the following formula:

$$\begin{aligned} \bigwedge _{w \in W_r}\sum _{i=0}^{r-1}c_iq'_i(\textbf{x}(k)) =0, \end{aligned}$$

(5)

where $W_r$ is the set of all r-tuples over $\{1, \dots , m\}$ and $c_0 = 1$;

Intuitively, each $w \in W_r$ denotes a possible execution path for any r consequent iterations. So for any $i < r$, the composition $\textbf{p}_{w[r-i]} \circ \dots \circ \textbf{p}_{w[1]}$ denotes the transition after the first i iterations. That is, each $q'_i(\textbf{x}(k))$ is $q(\textbf{x}(k + i))$ in recurrence (4) for a possible execution path. Since in the formula (5), w is ranged over all r-tuples over $\{1, \dots , m\}$, all possible interleaves of those branches are considered. Therefore, this formula is equivalent to the recurrence (4).

To find a polynomial q of bounded degree d satisfying this formula, we set up a template polynomial for it:

$$\begin{aligned} q(\textbf{x}) = \sum _\alpha a_{\alpha }\textbf{x}^\alpha , \end{aligned}$$

where $a_\alpha $’s are unknown and $\textbf{x}^\alpha $’s are all monomials of bounded degree d. After plugging the template into formula (5), the left-hand side of each conjunct is a polynomial over $\textbf{x}(k)$. To be zero, all coefficients of this polynomial must be zero. Note that those unknown values $c_i$’s are multiplied with q, whose coefficients are also unknown, so the coefficients of those polynomials on the left-hand side are quadratic expression of these unknown values.

Example 2

Consider the loop in Fig. 1, if $r = 1$ and we want to find polynomial q of bounded degree 2 that satisfies Eq. (1), we set up a template for q as follows:

$$\begin{aligned} q(x, y) = a_0x^2 + a_1 xy + a_2y^2 + a_3x + a_4y + a_5. \end{aligned}$$

There are two conjuncts in the resulting formula (5). One of them is as follows:

$$\begin{aligned} &(-a_0c_1 + 4a_0 - 8a_1 + 16a_2)x(k)^2\\ +&(-a_1c_1 + 8a_1 - 32a_2c_1)x(k)y(k)\\ + &(4a_0 - 16a_2 - a_3c_1 + 2a_3 - 4a_4)x(k)\\ + &(-a_2c_1 + 16a_2)y(k)^2\\ + &(4a_1 + 16a_2 - a_4c_1 + 4a_4)y(k)\\ + &a_0 + 2a_1 + 4a_2 + a_3 + 2a_4 - a_5c_1 + a_5 = 0 \end{aligned}$$

By setting all coefficients to be zero, this conjunct produces the following system of quadratic equations:

$$\begin{aligned} -a_0c_1 + 4a_0 - 8a_1 + 16a_2 &= 0\\ -a_1c_1 + 8a_1 - 32a_2c_1 &= 0\\ 4a_0 - 16a_2 - a_3c_1 + 2a_3 - 4a_4 &= 0\\ -a_2c_1 + 16a_2 &= 0\\ 4a_1 + 16a_2 - a_4c_1 + 4a_4 &= 0\\ a_0 + 2a_1 + 4a_2 + a_3 + 2a_4 - a_5c_1 + a_5 &= 0 \end{aligned}$$

Together with the equations generated from the other conjunct, each solution to them corresponds to a required polynomial q and the recurrence it satisfies. For this example, $a_0 = 1$, $a_4=1$, $a_5 = 1$, $c_1 = 4$, and others are zero is one of the solutions, which corresponds to $q(k) = x(k)^2 + y(k) + 1$ and $q(k+1) = 4q(k)$. Note that, this is not the only solution to the quadratic equations. For example, $a_0 = \lambda , a_4 = \lambda , a_5 = \lambda , c_1 = 4/\lambda $ is a solution for any $\lambda \not = 0$.

4 Finding Finite Representative Solutions

In the previous section, we achieve a system of quadratic equations, whose solutions correspond to the desired polynomials and the recurrences they satisfy. The number of solutions to such equations may be infinite. But most of them are redundant in the sense that some of them are linear combinations of others. For example, if $q_1$ and $q_2$ are polynomials satisfying recurrence (4) for the same $c_i$’s, then any linear combination of them also satisfies this recurrence with the same $c_i$’s. That is, given an assignment to $c_i$’s, the polynomials q satisfying recurrence (4) form a vector space. Since different $c_i$’s may result in different vector spaces for those polynomials, the set of all polynomials satisfying recurrence (4) is a union of vector spaces.

Lemma 1

Given a loop, a bounded degree d, the order r of the c-finite recurrence, the set of polynomials $q \in \mathbb {Q}_d[\textbf{x}]$ satisfying recurrence (4) of order r is a union of vector spaces.

Proof

The zero polynomial must satisfy the recurrence (4) for any assignment to $\{c_1, \dots , c_r\}$. Suppose both $q_1, q_2 \in \mathbb {Q}_d[\textbf{x}]$ satisfy the recurrence (4) with the same $\{c_1, \dots , c_r\}$. That is,

$$\begin{aligned} q_1(\textbf{x}(k+r)) = c_1 q_1(\textbf{x}(k+r-1)) + \dots + c_r q_1(\textbf{x}(k)),\\ q_2(\textbf{x}(k+r)) = c_1 q_2(\textbf{x}(k+r-1)) + \dots + c_r q_2(\textbf{x}(k)). \end{aligned}$$

Then $k_1q_1 + k_2q_2$ for any $k_1$, $k_2$ also satisfies the recurrence (4) with the same $\{c_1, \dots , c_r\}$. So any assignment to $\{c_1,\dots , c_r\}$ corresponds to a vector space of q. As a result, the set of polynomials q satisfying the recurrence (4) of order r is a union of vector spaces. $\square $

Lemma 1 shows that the desired polynomials constitute some vector spaces. But distinct vector spaces among them may be infinite, making it impossible to produce finite representative solutions. The following theorem claims that these polynomials form a finite union of vector spaces.

Theorem 2

Given a loop, a bounded degree d, an order r, the set of polynomials $q \in \mathbb {Q}_d[\textbf{x}]$ satisfying recurrence (4) of order r is a finite union of vector spaces,

Proof

By Lemma 1, the set of polynomials $q \in \mathbb {Q}_d[\textbf{x}]$ is a union of vector spaces. Let B be the set of all basis vectors of these vector spaces. It is known that as a vector space, the dimension of $\mathbb {Q}_d[\textbf{x}]$ is ${{n + d} \atopwithdelims ()d} + 1$, where n is number of program variables. If $|B| > {{n + d} \atopwithdelims ()d} + 1$, there is a vector $b \in B$ s.t. b is a linear combination of other vectors in B. Keep removing such vectors in B and denote the resulting set as $B'$. Then $|B'| \le {{n + d} \atopwithdelims ()d} + 1$ and all vectors in it are linearly independent. In other words, the vector space spanned by $B'$ is the smallest vector space that contains all polynomials in $\mathbb {Q}_d[\textbf{x}]$ satisfying recurrence (4) of order r. Since each vector q in $B'$ satisfies some recurrence (4) of order r, it must satisfies some exponential polynomial (3). That is, for each vector $q \in B'$, we have

$$\begin{aligned} q(\textbf{x}(k)) = \sum _{i=1}^{s_q}p_{q, i}(k)\lambda _{q, i}^k, \end{aligned}$$

where $p_{q, i}$ are polynomials and the sum of their total degrees are less than or equal to r. By definition of $B'$, any $q' \in \mathbb {Q}_d[\textbf{x}]$ that satisfying some recurrence (4) of order r are some linear combination of vectors in $B'$. Therefore, we have

$$\begin{aligned} q'(\textbf{x}(k)) = \sum _{q\in B'}a_q q(\textbf{x}(k)) = \sum _{q\in B'}a_q\sum _{i=1}^{s_q}p_{q, i}(k)\lambda _{q, i}^k, \end{aligned}$$

where $a_q \in \mathbb {R}$. Since it is an exponential polynomial, we can establish the following characteristic polynomial

$$\begin{aligned} \prod _{q \in B'}\prod _{i=1}^{s_q}(t - \lambda _{q, i})^{d_{q, i}}, \end{aligned}$$

where $d_{q, i} \in \mathbb {N}$ and the sum of them is r. Any desired $q' \in \mathbb {Q}_d[\textbf{x}]$ must satisfy some recurrence (4) whose characteristic polynomial is of the above form for some $d_{q, i}$’s. Since the sum of $d_{q, i}$ is r and the number of $\lambda _{q, i}$ is finite (because $|B'|$ is bounded), the number of possible characteristic polynomials and corresponding recurrences are finite. Each recurrence corresponds a vector space, so the number of vector spaces is finite. $\square $

Note that Theorem 2 is not only applicable to the program in Fig. 2, but applicable to all general loops. In the proof of Theorem 2, a finite set $B'$ is constructed and any polynomial that satisfies some recurrence (4) is a linear combination of vectors in $B'$. So $B'$ can be used as the representative solutions.

Given a program in Fig. 2, Algorithm 1 gives the process to compute all polynomial expressions that satisfy recurrence (4). Initially, given the inputs P, d, and r, it sets up a template for the desired polynomial (variable $\texttt {p}$ is the template polynomial and ${\textbf {coeffs}}$ is a vector of the unknown coefficients) and establishes those quadratic equations mentioned in Sect. 3. $\texttt {eqs}$ is the set of quadratic equations and $\textbf{c}$ is a vector of unknown coefficients $c_i$’s in recurrence (1). The variable $\texttt {bases}$ is initialized as an empty set. $\textbf{a}$ denotes an assignment to $\textbf{c}$. $\theta $ is a formula recording the remaining possible $\textbf{c}$ that should be considered. In each iteration, the loop asks for a model of $\theta $, which is an assignment to $\textbf{c}$. After plugging it into the quadratic equations (Line 7), these equations are reduced to linear ones. Basis $\textbf{B}$ of these linear equations are then computed and is added to $\texttt {bases}$. Each element in $\textbf{B}$ is a solution to $\textbf{coeffs}$, so each corresponds to a desired polynomial. Since in each iteration, a vector space of desired polynomials is computed and the basis is added to $\texttt {bases}$, $\texttt {bases}$ is a spanning set of the smallest vector space containing all vector spaces computed so far. In future iterations we do not want to consider those $\textbf{c}$ whose corresponding vector space is contained in the vector space spanned by $\texttt {bases}$. So variable $\texttt {constraint}$ is the formula that will be added to $\theta $ saying that in the future iterations, the computed vector spaces should contain at least one polynomial outside the vector space spanned by $\texttt {bases}$.

Theorem 3

Given a program P in Fig. 2, bounded degree d, and the order r, Algorithm 1 terminates with a set of polynomials s.t. all polynomials $p \in \mathbb {Q}_d[\textbf{x}]$ satisfying recurrence (4) of order r are linear combinations of them.

Proof

In each iteration, the basis $\textbf{B}$ is computed from the quadratic equations established in Sect. 3 with unknown $c_i$’s replaced with some constant $\textbf{a}$. Then it is added to $\texttt {bases}$, so $\texttt {bases}$ is a spanning set of the smallest vector space containing all vector spaces computed so far. In each iteration, after computing the corresponding basis $\textbf{B}$, a constraint is set up in Line 10, which is then added to $\theta $. Note that the value $\textbf{a}$ used to compute a new vector space in each iteration is a valid assignment to $\theta $ as shown in Line 6. So the constraint set up in Line 10 states that those desired polynomials in the vector space spanned by $\texttt {bases}$, which is the smallest vector space containing all computed vector spaces, is found, in later iteration those computed vector spaces should contains at least one vector outside this space. When the loop terminates, $\theta $ is unsatisfiable, which means all desired polynomials have been computed. So if it terminates, all desired polynomials are linear combinations of vectors in $\texttt {bases}$.

We next show the termination. Since in each iteration, at least one polynomial not appeared in the vector space spanned by $\texttt {bases}$ is in the vector space computed in current iteration, the vector space spanned by $\texttt {bases}$ will be enlarged in dimension at least by 1. When dimension of the space spanned by $\texttt {bases}$ reaches the dimension of the smallest vector space containing all desired polynomial, which is upper bounded by ${{n + d} \atopwithdelims ()d} + 1$, there is no other vector outside the space, so $\theta $ becomes unsatisfiable, which terminates the loop. $\square $

5 Special Cases Where the Computations Are Easier

In the previous section, we derived an algorithm, which can find finite representative solutions for the quadratic equations set up in Sect. 3. But in each iteration, it asks for a model of a non-linear formula, which requires a powerful SMT solver or algebraic system. In this section, we discuss two special cases, whose computations are much easier. The key observation is that given a polynomial vector $\textbf{p}$, the polynomial composition $T_{\textbf{p}}(q) = q \circ \textbf{p}$ is a linear transformation.

Lemma 2

Given a polynomial vector $\textbf{p}$, the polynomial composition $T_\textbf{p}(q) = q \circ \textbf{p}$ is a linear transformation from $\mathbb {Q}_{d_p}[\textbf{x}]$ to $\mathbb {Q}_{d_pd_\textbf{q}}[\textbf{x}]$, $d_q = deg(q)$ and $d_{\textbf{p}}$ is the maximum among $deg(p_i)$;

Proof

Given any scalars $c_1, c_2 \in \mathbb {Q}$ and polynomials $q_1, q_2$, we have

$$\begin{aligned} T_\textbf{p}(c_1q_1 + c_2q_2) &= (c_1q_1 + c_2q_2) \circ \textbf{p}\\ &= c_1q_1 \circ \textbf{p} + c_2 q_2 \circ \textbf{p}\\ &= c_1T_\textbf{p}(q_1) + c_2T_\textbf{p}(q_2). \end{aligned}$$

$\square $

It is known that a linear transformation can be represented by a transformation matrix, which is constructed by computing the image of each basis element under the transformation and putting coordinates of those images in order.

Example 3

Consider again the program in Fig. 1. Let $\{x^2, xy, y^2, x, y, 1\}$ be the basis of $\mathbb {Q}_2[x, y]$. As in Example 2, let the template polynomial be $q(x, y) = a_0x^2 + a_1 xy + a_2y^2 + a_3x + a_4y + a_5$, whose coordinate is $\textbf{a} = \begin{bmatrix}a_0, a_1, a_2, a_3, a_4, a_5\end{bmatrix}$. For the two transitions in Fig. 1, the transformation matrices are

$$\begin{aligned} M_1 = \begin{bmatrix} 4 &{} -8 &{} 16 &{} 0 &{} 0 &{} 0\\ 0 &{} 8 &{} -32 &{} 0 &{} 0 &{} 0\\ 0 &{} 0 &{} 16 &{} 0 &{} 0 &{} 0\\ 4 &{} 0 &{} -16 &{} 2 &{} -4 &{}0\\ 0 &{} 4 &{} 16 &{} 0 &{} 4 &{} 0\\ 1 &{} 2 &{} 4 &{} 1 &{} 2 &{} 1 \end{bmatrix}, M_2 = \begin{bmatrix}4 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0\\ 0 &{} 8 &{} 0 &{} 0 &{} 0 &{} 0\\ 0 &{} 0 &{} 16 &{} 0 &{} 0 &{} 0\\ 0 &{} 6 &{} 0 &{} 2 &{} 0 &{} 0\\ 0 &{} 0 &{} 24 &{} 0 &{} 4 &{} 0\\ 0 &{} 0 &{} 9 &{} 0 &{} 3 &{} 1\end{bmatrix}. \end{aligned}$$

Note that $M_1\textbf{a}$ and $M_2\textbf{a}$ are coordinates of $(q\circ \textbf{p}_1)(x, y)$ and $(q\circ \textbf{p}_2)(x, y)$.

5.1 Polynomials Satisfying First Order Inhomogeneous C-Finite Recurrences

In this subsection, given a $d \ge 0$, we consider to find polynomial expressions $q \in \mathbb {Q}_d[\textbf{x}]$ satisfying a first order inhomogeneous c-finite recurrence of the following form for some $c_1, c_2$:

$$\begin{aligned} q(\textbf{x}(k + 1)) = c_1q(\textbf{x}(k)) + c_2. \end{aligned}$$

(6)

There are two cases to be considered:

1.
if $c_1 \not = 1$, then for any polynomial q that satisfies Eq. (6), we can construct a new polynomial $q'(\textbf{x}) = q(\textbf{x}) + \frac{c_2}{c_1 - 1}$ s.t. the following equation holds:
$$\begin{aligned} q'(\textbf{x}(k+1)) = c_1q'(\textbf{x}(k)). \end{aligned}$$
(7)
2.
if $c_1 = 1$, then Eq. (6) becomes
$$\begin{aligned} q(\textbf{x}(k+1)) - q(\textbf{x}(k)) - c_2 = 0, \end{aligned}$$
where the left-hand side is a zero polynomial. As a result, all its coefficients must be zero. This forms a system of linear equations, from which a basis can be computed to represent all polynomials that forms the desired recurrence.

Since in the second case, the computation for the desired polynomials is reduced to solve a system of linear equations, which is much easier, so in the rest of this subsection, we focus on the first case.

Similar to Eq. (5), for a program as described by Fig. 2, recurrence (7) is equivalent to the following formula:

$$\begin{aligned} \bigwedge _{i=1}^{m}(q \circ \textbf{p}_i)(\textbf{x}(k)) = c_1q(\textbf{x}(k)). \end{aligned}$$

(8)

Since polynomial composition is a linear transformation and can be represented by a matrix, assuming the basis is ordered by putting monomials with higher degrees in front, formula (8) is equivalent to the following one:

$$\begin{aligned} \bigwedge _{i=1}^{m} \begin{bmatrix}M_{i1}\\ M_{i2}\end{bmatrix}\textbf{a} = \begin{bmatrix}0\\ c_1\textbf{a}\end{bmatrix}, \end{aligned}$$

(9)

where $M_{i1} \in \mathbb {Q}^{(s - t) \times (t + 1)}$, $M_{i2} \in \mathbb {Q}^{(t + 1)\times (t + 1)}$, $s = {n+dd_\textbf{p} \atopwithdelims ()dd_\textbf{p}}$, $t = {{n+d} \atopwithdelims ()d}$, $d_{\textbf{p}}$ is the maximum among $deg(p_i)$, $\begin{bmatrix}M_{i1}\\ M_{i2}\end{bmatrix}$ is the transformation matrix of $q \circ \textbf{p}_i$, and $\textbf{a}$ is the coordinate for the template polynomial q. There are $(s - t)$ zeros on the right-hand side, because $q \circ \textbf{p}_i$ may produce terms with higher order than d and they should be zero to ensure formula (8) holds (because there is no terms with degrees higher than d on the right-hand side).

Formula (9) can be further split as follows:

$$\begin{aligned} \bigwedge _{i=1}^{m}M_{i1}\textbf{a} &= 0, \end{aligned}$$

(10)

$$\begin{aligned} \bigwedge _{i=1}^{m}M_{i2}\textbf{a} &= c_1\textbf{a}. \end{aligned}$$

(11)

Equations in (10) can be solved using Gaussian elimination. Each equation in formula (11) is the definition of eigenvalues and eigenvectors for a square matrix, so formula (11) says that $c_1$ must be common eigenvalues of those matrices $M_{i2}$’s. Because any $n \times n$ matrix has at most n distinct eigenvalues, the number of solutions to $c_1$ is finite. So to solve the Eq. (11), we just need to enumerate all common eigenvalues of $M_{i2}$’s and replace $c_1$ with those eigenvalues. After that, equations in (11) are reduced to linear ones, which can be solved easily.

Theorem 4

Given a program in Fig. 2 and an integer d, let $d_{\textbf{p}}$ be the maximum among $deg(p_i)$’s. By ordering all monomials in $\mathbb {Q}_{dd_{\textbf{p}}}[\textbf{x}]$ with monomials with higher degree in front, the possible values for $c_1$ in Eq. (8) are common eigenvalues of lower squared matrices of those transformation matrices of $q\circ \textbf{p}_i$.

Example 4

Consider the program in Fig. 1. In Example 3, we computed transformation matrices for both branches as $M_1$ and $M_2$. To solve for $c_1$ and q in Eq. (8), by Theorem 4, $c_1$ should be common eigenvalues of $M_1$ and $M_2$, which are $\{1, 2, 4, 8, 16\}$. For $c_1 = 1$, the solution to q in Eq. (8) is the constant polynomial $q = \lambda $ for all $\lambda \in \mathbb {Q}$, which is trivial. For $c_1 = 2, 8, 16$, the solution to q is zero. For $c_1 = 4$, the solution to q is any multiple of $q(x, y) = x^2 + y + 1$, which is a basis for the solution vector space and used as the representative solution.

5.2 Linear Transitions

In this subsection, we assume all $\textbf{p}_i$’s in Fig. 2 are linear and try to find all polynomials of bounded degree d that satisfy c-finite recurrences (4) of order r.

If all $\textbf{p}_i$ are linear transitions, the transformation matrices of them are square. So Eq. (5) is equivalent to

$$\begin{aligned} \bigwedge _{w \in W_r} (\prod _{i=1}^{r} M_{w[i]} - c_1 \prod _{i=1}^{r-1}M_{w[i]} - \dots - c_{r-1}M_{w[1]} - c_rI)\textbf{a} = 0, \end{aligned}$$

(12)

where $W_r$ is the set of all r-tuples over $\{1, \dots , m\}$, $M_i$’s are transformation matrices of $p \circ \textbf{p}_i$, and $\prod _{i=1}^sM_{w[i]} = M_{w[s]}M_{w[s-1]}\dots M_{w[1]}$.

Formula (12) is hard to solve, because $\textbf{a}$ lies in the intersection of the nullspaces of the matrices in the parenthesis. Different assignments to $c_i$’s may result in different nullspaces, thus different solutions to $\textbf{a}$. Our solution is to transform formula (12) into formula (13)–(14) below, where $c_i$’s only appear in matrix polynomials $p(M_1)$ for some univariate polynomials p’s:

$$\begin{aligned} \forall 0\le l < r. \forall 2 \le i \le m. &\bigwedge _{w \in W_{r - l-1}} p_l(M_1)(M_{i} - M_1)(\prod _{j=1}^{r-l-1}M_{w[j]})\textbf{a} = 0\end{aligned}$$

(13)

$$\begin{aligned} &\wedge p_r(M_1)\textbf{a} = 0, \end{aligned}$$

(14)

where $p_k(M_1) = M_1^k - \sum _{j=1}^kc_jM_1^{k-j}$.

Theorem 5

Formula (12) is equivalent to formula (13) - (14).

The proof is given in Appendix A due to page limits. Intuitively, there is a one-to-one correspondence between conjuncts in formula (12) and those in formula (13)–(14). Formula (14) is the conjunct in formula (12) whose $w = [1]^r$. For any other conjunct in formula (12), whose w has l trailing 1’s, one can find another one, whose $w'$ has $l+1$ trailing 1’s and has the same prefix of length $r-l-1$ with w. The difference between them can be simplified into a conjunct in formula (13). Transforming formula (13)–(14) back to formula (12) is done reversely.

Formula (13) and (14) are simpler than Formula (12) in the sense that all $c_i$’s appear in some matrix polynomials $p_k(M_1)$, which makes it easier to be analyzed. In the following, we show how to solve for $\textbf{a}$ for the following equation, which is a conjunct in formula (13):

$$\begin{aligned} p_l(M_1)(M_{i} - M_1)(\prod _{j=1}^{r-l-1}M_{w[j]})\textbf{a} = 0. \end{aligned}$$

(15)

Recall that by the fundamental theorem of algebra, a polynomial $p(t) = t^l + c_1t^{l-1} + \dots + c_l$ can be factored into $p(t) = \prod _{i=1}^l(t - \alpha _i)$, where $\alpha _i$’s are roots of p(t). When this factorization is applied on $p_l(M_1)$ [10], it becomes:

$$\begin{aligned} p_l(M_1) = \prod _{i=1}^{l}(M_1 - \alpha _i I). \end{aligned}$$

Each factor $(M_1 - \alpha _i I)$ is a matrix, whose singularity depends on the value of $\alpha _i$. If $\alpha _i$ is some eigenvalue of $M_1$, then this matrix is singular, otherwise it is invertible. The factor $(M_1 - \alpha _iI)$ is called an eigenvalue factor of $p_l(M_1)$ if $\alpha _i$ is an eigenvalue of $M_1$. After this factorization, formula (15) becomes

$$\begin{aligned} \prod _{j=1}^l(M_1 - \alpha _j I)(M_i - M_1)(\prod _{j=1}^{r-l-1}M_{w[j]})\textbf{a} = 0. \end{aligned}$$

(16)

Note that the multiplication between those factors $(M_1 - \alpha _iI)$ are mutually commutative. So if some $\alpha _i$’s are not eigenvalues of $M_1$, then the corresponding factors are invertible and can be canceled, which converts formula (16) into

$$\begin{aligned} \prod _{j=1}^{s}(M_1 - \lambda _j I)(M_i - M_1)(\prod _{j=1}^{r-l-1}M_{w[j]})\textbf{a} = 0, \end{aligned}$$

where $0 \le s \le l$ and $\lambda _j$’s are eigenvalues of $M_1$.

This cancellation suggests that it is the eigenvalue factors that determine the solutions set to $\textbf{a}$. In other words, if $p_l(M_1)$ and $p_l'(M_1)$ have the same set of eigenvalue factors, then $p_l(M_1)(M_i - M_1)(\prod _{j=1}^{r-l-1}M_{w[j]})\textbf{a} = 0$ and $p_l'(M_1)(M_i - M_1)(\prod _{j=1}^{r-l-1}M_{w[j]})\textbf{a} = 0$ have the same solution set to $\textbf{a}$. So solving Eq. (15) can be done by enumerating all possible eigenvalue factor combinations for $p_l(M_1)$. That is, the solution to $\textbf{a}$ of the following formula is equivalent to that of Eq. (15):

$$\begin{aligned} \bigvee _{\Lambda \in \mathbf {\Lambda }_l}\prod _{\lambda \in {\Lambda }}(M_1 - \lambda I)(M_i - M_1)(\prod _{j=1}^{r-l-1}M_{w[j]})\textbf{a} = 0, \end{aligned}$$

(17)

where $\mathbf {\Lambda }_l$ is the set of all the subsets whose cardinalities are less than or equal to l of the set of eigenvalues of $M_1$ (i.e., $\forall \Lambda \in \mathbf {\Lambda }. |\Lambda | \le l$). Each disjunct in formula (17) is a linear equation, which can be solved easily. Formula (14) can be solved in a similar way.

Note that the derivation above solves a single equation (15). But solving formula (13)–(14) cannot be simply solving each equation using this approach and then intersecting those solution sets, because all $p_k(M_1)$’s share the same coefficients $c_i$’s, which puts constraints on the choice of eigenvalue factors combinations implicitly. But solving those equations by enumerating all possible eigenvalue factor combinations for those $p_k(M_1)$’s without considering this implicit constraints indeed gets all possible solutions to $\textbf{a}$, although some may be invalid because of ignoring those constraints. So to solve formula (13) and (14), we adopt the following “generate and check” procedure. Generate. For each $p_k(M_1)$ in formula (13)–(14), we enumerate all $\Lambda \in \mathbf {\Lambda }_l$ and replace $p_k(M_1)$ in formula (13) and (14) with $\prod _{\lambda \in \Lambda }(M_1-\lambda I)$. Solve the resulted linear equations for a basis, which is a candidate solution to formula (13)–(14). Check. To validate whether a basis generated in the ‘generate’ phase represents one of the vector spaces of $\textbf{a}$, we only need to replace $\textbf{a}$ with elements in the basis in formula (13) and (14) and see whether the resulting formulas are satisfiable. If it is, the basis indeed corresponds to vector spaces of $\textbf{a}$. If any element in the basis makes the formulas unsatisfiable, the basis is not valid and is filtered out.

Note that in the ‘generate’ phase, we only need to compute eigenvalues for some known matrices and in the ‘check’ phase, the resulting formulas after the substitution are linear. So this procedure is computationally cheaper than Algorithm 1.

Candidate solutions computed in the ‘generate’ phase are solutions to the formula obtained by ignoring the fact that those $p_i$’s have the same coefficients $c_i$’s, which is a weak version of formula (13) and (14). So each solution to formula (13) and (14) is also a solution generated in the ‘generate’ phase, which guarantees the completeness. Soundness is guaranteed by the ‘check’ phase, which those candidate solutions back to formula (13)–(14) and filters out invalid ones.

6 Experimental Evaluation

To evaluate the effectiveness of our proposed methods in program verification, we implemented a prototype system called PExpr on C-like programs based on the algorithm given in Sect. 5.1. The reasons why we choose to implement this algorithm instead of others are as follows: (1) The algorithm in Sect. 5.1 allows polynomial assignments, while the one in Sect. 5.2 can only handle linear ones. And some programs in the benchmark we consider do have polynomial assignments. (2) As it will be seen below that for those programs that cannot be proved by the method in Sect. 5.1, they cannot be proved even if the general Algorithm 1 were used, not to mention the one in Sect. 5.2. And the algorithm in Sect. 5.1 is more efficient than others.

6.1 Implementation

Our system consists of two parts:

The verifier is built on top of LLVM. C programs are first compiled into LLVM IR, and then the IR is translated into first-order language using a technique similar to the one proposed in [17]. Typically, loops are translated as recurrences and solved using the recurrence solver (see below).
The core algorithm proposed in this paper is integrated into the recurrence solver proposed in [21], which is capable of solving conditional recurrences. As mentioned above, we only implement the method proposed in Sect. 5.1, which is simple and efficient.

The verifier extracts and feeds recurrences into the recurrence solver, the solver will first try to solve closed-form solutions for each individual variable using the technique proposed in [21]. If it fails, then the method in Sect. 5.1 will be applied. When the first time this method is applied, the polynomial degree is set to be 2. If no non-trivial result is returned, it is set to be 3. The closed-form solutions together with other axioms generated by the verifier is directly fed into SMT solver Z3 [6] to prove the correctness of the program. Nested loops are abstracted using the program model considered in this paper. For example, a loop below (left), whose body consists of two consequent loops, is treated as the one below (right), where A and B are loop-free statements.

$$ {\textbf {while}}~(*)~{\textbf {do }} \left\{ \begin{array}{l} {\textbf {while}}~(*)~A;\\ {\textbf {while}}~(*)~B; \end{array}\right\} {\textbf {while}}~(*)~{\textbf {do }} \left\{ \begin{array}{l} {\textbf {if}}~(*)~A;\\ {\textbf {else if}}~(*)~B; \end{array}\right\} $$

6.2 Benchmarks and Environment

Our 48 benchmarks programs are adapted from the set of safe programs in the c/nla-digbench set of the Software Verification Competition (SV-COMP) [2]. The original c/nla-digbench consists of 26 classical algorithms. All are annotated with some assertions to be proved in the end and loop invariants in each loop. To make the verification non-trivial, as done in [4], we remove all those loop invariants. Otherwise, the verification will be simply to prove those annotated invariant are indeed invariant and then use them to prove the assertions at the end. Further, since we want to see the effectiveness of our finite representative solutions in verification, programs with multiple assertions to be proved are split into several copies and each copy has one assertion to be proved. By doing so, we can see for each program, what assertions can be proved by simply providing our representative solutions to SMT solvers. As a result, 48 programs are collected as benchmark for the experiment and we call it NLA.

All experiments were conducted on a virtual machine with a guest OS of Ubuntu 22.04 with 8 GB of RAM. The host machine is a MacBook Pro (16-inch, 2019) with 2.3GHz 8-core Intel Core i9. All tools were run with the BenchExec tool [22] using a time limit of 60 s on all benchmarks.

6.3 Comparison Tools

We compared PExpr with USP-Quad [4], VeriAbs [5], ULTIMATE Automizer [9]. And since we integrate the proposed method into the recurrence solver proposed in [21], to see the effectiveness, we also compared with PExpr with the proposed technique disabled (the resulting system is called PRS).

USP-Quad adopts a strategy that given a transition ideal, it computes a solvable one from it and then computes closed-form solutions to polynomial expressions based on the solvable transition ideal (see related works for detailed comparison). As reported in [4], the refinement technique proposed in [3] improves the analysis of USP-Quad, so when running USP-Quad, we enabled the refinement. VeriAbs is the champion in the ReachSafety track in SV-COMP 2023. It is a reachability verifier for C programs that incorporates a portfolio of techniques (e.g., k-induction). ULTIMATE Automizer is the best tool for the c/nla-digbench in SV-COMP 2023, which implements approaches based on automata [8]. PRS applies the technique proposed in [21] to solve conditional recurrence for individual program variables.

6.4 Experimental Results

Table 1 summarizes the comparison results. PExpr is the best among those tools proving 34 programs, of which there are 3 programs can only be proved by PExpr. Programs that PExpr failed to prove are classified as 3 categories:

Integer division. Some loops contain integer division without any guard to guarantee the effect is the same as real division (i.e., rounding occurs). There are 8 programs in this category.
Path condition matters. Our program model ignores all guards of those nested if statements. But some programs’ correctness is guaranteed by those guards. PExpr cannot capture this semantics because these guards are discarded. There is 1 program in this category.
Non-c-finite recurrences. PExpr only tries to find polynomial expressions among variables that satisfy c-finite recurrences. So if those expressions are not c-finite, PExpr is not able to find them. 5 programs are in this category.

Table 1. Comparison of tools on the NLA benchmarks. “Number of success” denotes the number of programs that each tool can prove successfully. “Number of timeout” records the number of timeout occurs when running the corresponding tool. “Time” records the amount of time, in seconds, taken by each tool (only cases that are successfully proved are counted). The best result in each category is bolded

Full size table

USP-Quad ranks second. There are 3 programs that it can prove while PExpr failed. Two of them contains integer division which cannot be proved to be equivalent to real divisions by PExpr. The other is the program whose path condition matters when proving its correctness. When facing multi-path loops, USP-Quad tries to find solvable transition ideal in a more semantic way, while PExpr simply discards those guards and treats them as non-deterministic branches.

Among those programs proved by VeriAbs, PExpr failed on one of them, whose integer division cannot be handled by PExpr. Both ULTIMATE Automizer and PRS work well on simple loops (loops without nested branches). ULTIMATE Automizer can also prove four more loops with nested branches. One of them belongs to those programs whose expressions do not satisfy c-finite recurrences. Although PRS is able to solve conditional recurrences, those loops with nested branches in the benchmark do not have the periodic property, which is the key for PRS to find closed-form solutions. So PRS only works on simple loops or those whose assertions entailed directly by the loop exiting conditions.

7 Related Works

This work follows up on recurrence-based methods for program verification. The connection between loops and recurrences is widely known. Recently, it was used in Lin’s translation [17] from C-like programs to first-order logic. This led to the program verifier VIAP [19] that relies on off-the-shelf tools like Mathematica [23] for solving recurrences. Kincaid et al. [12] also treated loops as recurrences, and proposed algorithms for solving them. Their follow-up works [13, 14] consider finding closed-form solutions that can be used directly by SMT solvers. More capable recurrence solvers for multi-paths loops have been studied [20, 21].

When recurrences for individual variables either do not exist or cannot be solved, an alternative is to consider expressions of the variables. Lin [18] considered this as an application of automated theorem discovery. Kincaid et al. [14] proposed a method for finding linear expressions with solvable recurrences. As their solvable recurrences are all c-finite, these linear expressions can also be generated by our algorithm. More closely related works are Amrollahi et al. [1] and Cyphert and Kincaid [4]. Below we discuss them in more details.

Amrollahi et al. [1] considered computing polynomial expressions q that satisfy the recurrence $q(\textbf{x}(n+1)) = cq(\textbf{x}(n)) + p(\textbf{v}(n))$, where $c\in \mathbb {R}$, $\textbf{v}$ are variables that have exponential polynomials as their closed-form solutions, and p a polynomial. Our method can also generate such polynomial expressions because if q satisfies the above recurrence, then it has an exponential polynomial closed-form solution, thus it satisfies a c-finite recurrence. However, there is a polynomial expression that satisfies a c-finite recurrence but not the above recurrence, as pointed out in [4]. Furthermore, the method in [1] is also template-based, and does not consider finite representation of all possible solutions.

Cyphert and Kincaid [4] considered loops as transition ideal and introduced the concept of solvable transition ideals to represent spaces of polynomials with solvable recurrences. Since their solvable polynomial recurrences are equivalent to c-finite recurrences, their results and ours are equivalent if the loop is simple, i.e. when there is no nested branches. For loops with nested branches, they adopt the same method in [15] that looks for linear expressions whose values change by a polynomial loop invariant after an iteration of the loop. As such, they will miss other polynomial expressions that have c-finite recurrences. In comparison, our method is is sound and semi-complete as shown above.

8 Conclusion

Based on the observation that for loops with nondeterministic branches, recurrences for individual program variables may not exist, we have considered the possibility of finding recurrences for expressions. Specifically, for loops with nested nondeterministic branches, we have proposed a sound and semi-complete algorithm for finding polynomial expressions that satisfy some c-finite recurrences. We have also considered in detailed two special cases, one on polynomials that satisfy first-order inhomogeneous c-finite recurrences, and the other on loops with linear transitions, and showed how to compute closed-form solutions more efficiently in these cases. To illustrate the effectiveness of the proposed method, we have implemented our algorithm for the first special case, and showed through experiments that the new technique indeed can be effective in being able to verify more benchmark programs.

References

Amrollahi, D., Bartocci, E., Kenison, G., Kovács, L., Moosbrugger, M., Stankovič, M.: Solving invariant generation for unsolvable loops. In: Singh, G., Urban, C. (eds.) SAS 2022. LNCS, pp. 19–43. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-22308-2_3
Beyer, D.: Competition on software verification and witness validation: sv-comp 2023. In: Sankaranarayanan, S., Sharygina, N. (eds.) Tools and Algorithms for the Construction and Analysis of Systems. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-30820-8_29
Cyphert, J., Breck, J., Kincaid, Z., Reps, T.W.: Refinement of path expressions for static analysis. Proc. ACM Program. Lang. 3(POPL), 45:1–45:29 (2019). https://doi.org/10.1145/3290358
Cyphert, J., Kincaid, Z.: Solvable polynomial ideals: the ideal reflection for program analysis. arXiv preprint arXiv:2311.04092 (2023)
Darke, P., Agrawal, S., Venkatesh, R.: VeriAbs: a tool for scalable verification by abstraction (competition contribution). In: Groote, J.F., Larsen, K.G. (eds.) Tools and Algorithms for the Construction and Analysis of Systems: 27th International Conference, TACAS 2021, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2021, Luxembourg, pp. 458–462. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-72013-1_32
De Moura, L., Bjørner, N.: Z3: an efficient SMT solver. In: International Conference on Tools and Algorithms for the Construction and Analysis of Systems, pp. 337–340. Springer (2008)
Google Scholar
Everest, G., van der Poorten, A.J., Shparlinski, I., Ward, T., et al.: Recurrence sequences, vol. 104. American Mathematical Society Providence, RI (2003)
Google Scholar
Heizmann, M., et al.: Ultimate automizer with SMTInterpol: (competition contribution). In: Piterman, N., Smolka, S.A. (eds.) Tools and Algorithms for the Construction and Analysis of Systems: 19th International Conference, TACAS 2013, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2013, Rome, 16–24 March 2013, pp. 641–643. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-36742-7_53
Heizmann, M., Hoenicke, J., Podelski, A.: Refinement of trace abstraction. In: Palsberg, J., Su, Z. (eds.) SAS 2009. LNCS, vol. 5673, pp. 69–85. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03237-0_7
Horn, R.A., Johnson, C.R.: Matrix Analysis. Cambridge University Press (2012)
Google Scholar
Kincaid, Z., Breck, J., Boroujeni, A.F., Reps, T.: Compositional recurrence analysis revisited. In: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017), pp. 248-262. Association for Computing Machinery, New York (2017). https://doi.org/10.1145/3062341.3062373
Kincaid, Z., Breck, J., Boroujeni, A.F., Reps, T.: Compositional recurrence analysis revisited. SIGPLAN Not. 52(6), 248–262 (2017). https://doi.org/10.1145/3140587.3062373
Kincaid, Z., Breck, J., Cyphert, J., Reps, T.: Closed forms for numerical loops. Proc. ACM Program. Lang. 3(POPL) (2019). https://doi.org/10.1145/3290368
Kincaid, Z., Cyphert, J., Breck, J., Reps, T.: Non-linear reasoning for invariant synthesis. Proc. ACM Program. Lang. 2(POPL), 1–33 (2017)
Google Scholar
Kincaid, Z., Koh, N., Zhu, S.: When less is more: Consequence-finding in a weak theory of arithmetic. Proc. ACM Program. Lang. 7(POPL), 1275–1307 (2023)
Google Scholar
Kovács, L.: Reasoning algebraically about P-solvable loops. In: Ramakrishnan, C.R., Rehof, J. (eds.) Tools and Algorithms for the Construction and Analysis of Systems, pp. 249–264. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78800-3_18
Lin, F.: A formalization of programs in first-order logic with a discrete linear order. Artif. Intell. 235, 1–25 (2016). https://doi.org/10.1016/j.artint.2016.01.014
Lin, F.: Machine theorem discovery. AI Magazine 39(2), 53–59 (2018). https://www.aaai.org/ojs/index.php/aimagazine/article/view/2794
Rajkhowa, P., Lin, F.: VIAP 1.1: (Competition Contribution). In: Beyer, D., Huisman, M., Kordon, F., Steffen, B. (eds.) Tools and Algorithms for the Construction and Analysis of Systems: 25 Years of TACAS: TOOLympics, Held as Part of ETAPS 2019, Prague, 6–11 April 2019, Proceedings, Part III, pp. 250–255. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-17502-3_23
Silverman, J., Kincaid, Z.: Loop summarization with rational vector addition systems. In: Dillig, I., Tasiran, S. (eds.) Computer Aided Verification: 31st International Conference, CAV 2019, New York City, 15–18 July 2019, Proceedings, Part II, pp. 97–115. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-25543-5_7
Wang, C., Lin, F.: Solving conditional linear recurrences for program verification: the periodic case. Proc. ACM Program. Lang. 7(OOPSLA1), 28–55 (2023)
Article Google Scholar
Wendler, P., Beyer, D.: Bench exec 3.16 (2023). https://github.com/sosy-lab/benchexec
Wolfram, S., et al.: The MATHEMATICA® Book, Version 4. Cambridge University Press (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
Chenglin Wang & Fangzhen Lin

Authors

Chenglin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Fangzhen Lin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chenglin Wang .

Editor information

Editors and Affiliations

University of Waterloo, Waterloo, ON, Canada
Arie Gurfinkel
Georgia Institute of Technology, Atlanta, GA, USA
Vijay Ganesh

A Proof of Theorem 5

Proof

$\Rightarrow :$ For a conjunct in formula (12), if the corresponding $w = [1]^r$, then it is exactly formula (14). Otherwise, if w has l trailing ones (i.e., $w = v + [1]^l$ for some $v \in W_{r-l}$ and $v[r-l] \not = 1$), it can be transformed into a conjunct in formula (13) by finding another conjunct in formula (12) whose corresponding $w' = w[:r-l-1] + [1]^{l+1}$. The difference between them is

$$\begin{aligned} \left( \left( \prod _{i=1}^{r} M_{w[i]} - \prod _{i=1}^r M_{w'[i]}\right) - \dots - \left( c_{r-1}M_{w[1]} - c_{r-1}M_{w'[1]}\right) \right) \textbf{a} = 0. \end{aligned}$$

(18)

Since w and $w'$ have the same prefix of length $r-l-1$, we have $\prod _{i=1}^tM_{w[i]}-\prod _{i=1}^tM_{w'[i]} = 0$ for all $t \le r-l-1$. Equation (18) is thus simplified into

$$\begin{aligned} \left( \left( \prod _{i=1}^{r} M_{w[i]} - \prod _{i=1}^r M_{w'[i]}\right) - \dots - c_{l}\left( \prod _{i=1}^{r-l}M_{w[i]} - \prod _{i=1}^{r-l}M_{w'[i]}\right) \right) \textbf{a} = 0. \end{aligned}$$

(19)

Because w has l trailing 1’s, for $s \ge r-l$, we have $\prod _{i=1}^sM_{w[i]} = M_{1}^{s-r+l}\prod _{i=1}^{r-l}M_{w[i]},$ Similarly, for $w'$ we have $\prod _{i=1}^sM_{w'[i]} = M_{1}^{s-r+l+1}\prod _{i=1}^{r-l-1}M_{w'[i]},$ So the difference $\prod _{i=1}^sM_{w[i]} - \prod _{i=1}^sM_{w'[i]}$ is

$$\begin{aligned} &M_{1}^{s-r+l}\prod _{i=1}^{r-l}M_{w[i]} - M_{1}^{s-r+l+1}\prod _{i=1}^{r-l-1}M_{w'[i]}\\ =~&M_1^{s-r+l}\left( M_{w[r-l]}\prod _{i=1}^{r-l-1}M_{w[i]} - M_1\prod _{i=1}^{r-l-1}M_{w'[i]}\right) \\ =~&M_1^{s-r+l}\left( M_{w[r-l]} - M_1\right) \prod _{i=1}^{r-l-1}M_{w[i]}. \end{aligned}$$

The second equality holds because w and $w'$ have the same prefix of length $r-l-1$. Applying this conversion on the left-hand side of Eq. (19), we have

$$\begin{aligned} \left( M_1^l\left( M_{w[r-l]}-M_1\right) \prod _{i=1}^{r-l-1}M_{w[i]}\right) - \dots - \left( c_{l}\left( M_{w[r-l]}-M_1\right) \prod _{i=1}^{r-l-1}M_{w[i]}\right) . \end{aligned}$$

After factorization, Eq. (19) becomes

$$\begin{aligned} p_l(M_1)(M_{w[r-l]} - M_1)(\prod _{i=1}^{r-l-1}M_{w[i]})\textbf{a} = 0, \end{aligned}$$

(20)

where $p_l(M_1) = M_1^l - \sum _{j=1}^lc_jM_1^{l-j}$. Since $2 \le w[r-l] \le m$, formula (20) is one of conjuncts in formula (13). This completes the proof that each conjunct in formula (12) can be transformed into one conjunct in formula (13)–(14).

$\Leftarrow :$ For this direction, we show that each conjunct in formula (12) can be derived from formula (13)–(14). There are two different w in formula (12) and formula (13), to distinguish them we denote $w_{12}$ for the one in formula (12) and $w_{12}$ for formula (13). We prove this by induction on the number l of trailing 1’s in $w_{12}$. For the base case $l = r$, there is only one conjunct whose w has r trailing 1 in formula (12), which is formula (14). For the inductive case, we assume for all conjuncts in formula (12) whose $w_{12}$ has more than l trailing 1’s can be derived from formula (13)–(14). Then we need to prove that conjuncts in formula (14) whose $w_{12}$ has l trailing 1 can be derived from formula (13)–(14). For such $w_{12}$, we have $w_{12} = w_{12}[:r-l] + [1]^l$. We can find another $w_{12}' = w_{12}[:r-l-1] + [1]^{l+1}$, which has more than l trailing ones. So for $w_{12}'$, its corresponding conjunct in formula (12) can be derived from formula (13)–(14) by the inductive hypothesis. Let $w_{12} = w_{12}[:r-l-1]$ and $i = w_{12}[r-l]$. Add up the conjunct represented by $w_{12}'$ and ($w_{12}$, i) in formula (12) and (13) respectively, the result will be the conjunct represented by $w_{12}$. This completes the proof. $\square $

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, C., Lin, F. (2024). On Polynomial Expressions with C-Finite Recurrences in Loops with Nested Nondeterministic Branches. In: Gurfinkel, A., Ganesh, V. (eds) Computer Aided Verification. CAV 2024. Lecture Notes in Computer Science, vol 14681. Springer, Cham. https://doi.org/10.1007/978-3-031-65627-9_20

Download citation

DOI: https://doi.org/10.1007/978-3-031-65627-9_20
Published: 26 July 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-65626-2
Online ISBN: 978-3-031-65627-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

On Polynomial Expressions with C-Finite Recurrences in Loops with Nested Nondeterministic Branches

Abstract

Similar content being viewed by others

Symbolic Computation and Automated Reasoning for Program Analysis

Algebra-Based Synthesis of Loops and Their Invariants (Invited Paper)

Polynomial Invariants by Linear Algebra

Keywords

1 Introduction

2 Preliminaries

2.1 Polynomials

2.2 Eigenvalues and Matrix Polynomials

Theorem 1

Example 1

2.3 C-Finite Recurrences

2.4 Program Model and Problem Statement

3 Reduction to Solving a System of Quadratic Equations

Example 2

4 Finding Finite Representative Solutions

Lemma 1

Proof

Theorem 2

Proof

Theorem 3

Proof

5 Special Cases Where the Computations Are Easier

Lemma 2

Proof

Example 3

5.1 Polynomials Satisfying First Order Inhomogeneous C-Finite Recurrences

Theorem 4

Example 4

5.2 Linear Transitions

Theorem 5

6 Experimental Evaluation

6.1 Implementation

6.2 Benchmarks and Environment

6.3 Comparison Tools

6.4 Experimental Results

7 Related Works

8 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Proof of Theorem 5

A Proof of Theorem 5

Proof

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us