# **Circuit Depth Reductions**

#### Alexander Golovnev

Georgetown University, Washington, DC, USA alex.golovnev@gmail.com

#### Alexander S. Kulikov

Steklov Institute of Mathematics at St. Petersburg, Russia St. Petersburg State University, Russia alexanderskulikov@gmail.com

### R. Ryan Williams

CSAIL & EECS, MIT, Cambridge, MA, USA rrw@mit.edu

#### Abstract -

The best known size lower bounds against unrestricted circuits have remained around 3n for several decades. Moreover, the only known technique for proving lower bounds in this model, gate elimination, is inherently limited to proving lower bounds of less than 5n. In this work, we propose a non-gate-elimination approach for obtaining circuit lower bounds, via certain depth-three lower bounds. We prove that every (unbounded-depth) circuit of size s can be expressed as an OR of  $2^{s/3.9}$  16-CNFs. For DeMorgan formulas, the best known size lower bounds have been stuck at around  $n^{3-o(1)}$  for decades. Under a plausible hypothesis about probabilistic polynomials, we show that  $n^{4-\varepsilon}$ -size DeMorgan formulas have  $2^{n^{1-\Omega(\varepsilon)}}$ -size depth-3 circuits which are approximate sums of  $n^{1-\Omega(\varepsilon)}$ -degree polynomials over  $\mathbb{F}_2$ . While these structural results do not immediately lead to new lower bounds, they do suggest new avenues of attack on these longstanding lower bound problems.

Our results complement the classical depth-3 reduction results of Valiant, which show that logarithmic-depth circuits of linear size can be computed by an OR of  $2^{\varepsilon n}$   $n^{\delta}$ -CNFs, and slightly stronger results for series-parallel circuits. It is known that no purely graph-theoretic reduction could yield interesting depth-3 circuits from circuits of super-logarithmic depth. We overcome this limitation (for small-size circuits) by taking into account both the graph-theoretic and functional properties of circuits and formulas.

We show that improvements of the following pseudorandom constructions imply super-linear circuit lower bounds for log-depth circuits via Valiant's reduction: dispersers for varieties, correlation with constant degree polynomials, matrix rigidity, and hardness for depth-3 circuits with constant bottom fan-in. On the other hand, our depth reductions show that even modest improvements of the known constructions give elementary proofs of improved (but still linear) circuit lower bounds.

2012 ACM Subject Classification Theory of computation  $\rightarrow$  Circuit complexity

Keywords and phrases Circuit complexity, formula complexity, pseudorandomness, matrix rigidity

 $\textbf{Digital Object Identifier} \quad 10.4230/LIPIcs.ITCS.2021.24$ 

Related Version A full version of the paper [18] is available at https://arxiv.org/abs/1811.04828.

Funding Alexander S. Kulikov: Research presented in Section 4 is supported by the RNF grant 18-71-10042

 $R.\ Ryan\ Williams:$  Supported by NSF CCF-1909429 and CCF-1741615.

### 1 Introduction

The Boolean circuit model is natural for computing Boolean functions. A circuit corresponds to a simple straight line program where every instruction performs a binary operation on two operands, each of which is either an input or the result of a previous instruction. The structure of this program is extremely simple: no loops, no conditional statements. Still, we

© Alexander Golovnev, Alexander S. Kulikov, and R. Ryan Williams; licensed under Creative Commons License CC-BY 12th Innovations in Theoretical Computer Science Conference (ITCS 2021). Editor: James R. Lee; Article No. 24; pp. 24:1–24:20

#### 24:2 Circuit Depth Reductions

know no functions in P (or even NP, or even  $E^{NP}$ ) that requires even 3.1n binary instructions ("size") to compute on inputs of length n. This is in sharp contrast with the fact that it is easy to non-constructively find such functions: simple counting arguments show a random function on n variables has circuit size  $\Omega(2^n/n)$  with probability 1 - o(1) [52].

The strongest known circuit size lower bound  $(3 + \frac{1}{86})n - o(n)$  was proved for affine dispersers for sublinear dimension [14]. This proof, as well as all previous proofs for general circuit lower bounds against explicit functions, is based on the method of gate elimination. The main idea is to find a substitution to an input variable that eliminates sufficiently many gates from the given circuit, and then proceed by induction. While this is the most successful method known so far for proving lower bounds for unrestricted circuits, the resulting case analysis becomes increasingly tedious: when eliminating (say) 3 or 4 gates, one must consider all possible cases when two of these gates coincide. It is difficult to imagine a proof of 5n lower bound using these ideas. This intuition was recently made formal in [17], where it was shown that a certain formalization of the gate elimination technique is unable to obtain a stronger than 5n lower bound. Therefore we must find new approaches for proving lower bounds against circuits of unbounded depth. Let us review some of the prior results on various circuit models.

#### **Linear Circuits**

Superlinear lower bounds are not known even for linear circuits, i.e., circuits consisting of only XOR gates (also known as  $\oplus$  gates). Note every linear function with one output has a circuit of size at most n-1. For linear circuits, we consider linear transformations, multi-output functions of the form f(x) = Ax where  $A \in \mathbb{F}_2^{m \times n}$ . For a random matrix  $A \in \{0,1\}^{n \times n}$ , the size of the smallest linear circuit computing Ax is  $\Theta(n^2/\log n)$  [33] with probability 1-o(1), but for explicitly-constructed matrices the strongest known lower bound is 3n-o(n) due to Chashkin [6]. Interestingly, Chashkin's proof is not based on gate elimination: he first shows that the parity check matrix  $H \in \{0,1\}^{\log n \times n}$  of the Hamming code has circuit size 2n-o(n) by proving that every circuit for H has at least n-o(n) gates of out-degree at least 2n-o(n) by proving that every circuit for H has at least n-o(n) additional gates are needed for H'. Similarly, the best known lower bound on the complexity of linear circuits with  $\log n \le m < o(n^2)$  outputs is 2n+m-o(n) (also follows from [6]).

### **Log-Depth Circuits**

Nothing stronger than a  $(3 + \frac{1}{86})n - o(n)$  size lower bound is known even for circuits of depth  $O(\log n)$ . It is straightforward to show that any function that depends on all of its n variables requires depth at least  $\log n$ . One can also present an explicit function that cannot be computed by a circuit of depth smaller than  $2 \log n - o(\log n)$  using Nechiporuk's lower bound of  $n^{2-o(1)}$  on formula size over the full binary basis [35]. Still, proving superlinear size lower bounds for circuits of depth  $O(\log n)$  remains a major open problem [56].

### **Constant-Depth Circuits**

Another natural and simple model of computation is bounded-depth unbounded fan-in circuits, which correspond to highly parallelizable computation. In this paper, we focus on depth-2 circuits of the form AND  $\circ$  OR (i.e., CNFs) and depth-3 circuits of the form

<sup>&</sup>lt;sup>1</sup> All logarithms are base 2 unless noted otherwise.

OR  $\circ$  AND  $\circ$  OR (i.e., ORs of CNFs), where the inputs of the circuit are variables and their negations, and the gates have unbounded fan-in. Such circuits are much more structured, and therefore are easier to analyze and to prove lower bounds. For example, it is easy to show that the minimal number of clauses in a CNF computing the parity of n bits is equal to  $2^{n-1}$ , which yields an optimal lower bound for depth-2 circuits. However, already for depth 3 there is a large gap between known lower and upper bounds: it is known [10, 50] that the minimum depth-3 circuit size of a random function on n variables is  $\Theta(2^{n/2})$ , but the best known lower bound for an explicit function is  $2^{\Omega(\sqrt{n})}$  [20, 22, 39, 3, 38, 34].

Much stronger lower bounds are known for depth-3 circuits where the fan-in of the "bottom" gates (those closest to the inputs) is bounded by a parameter k. Namely, for any  $k \leq O(\sqrt{n})$ , Paturi, Saks, and Zane [39] proved a  $2^{n/k}$  lower bound for computing parity, Wolfovitz [60] proved a lower bound of  $(1+1/k)^{n+O(\log n)}$  for ETHR  $\frac{n}{k+1}$ , and a stronger lower bound of  $2^{\frac{\mu_k n}{k-1}}$  for  $k \geq 3$  and some constants  $\mu_k > 1$  was proven in [38] for a BCH code. For example, [38] gives a lower bound of  $2^{0.612n}$  when the bottom fan-in of the circuit is k=3, and a lower bound of  $2^{n/10}$  for the bottom fan-in k=16. For the case of bottom fan-in k=2, even a  $2^{n-o(n)}$  lower bound is known [40].

A simple counting argument shows that for any constant k = O(1), a random function requires depth-3 circuits of size  $2^{n-o(n)}$ . Calabro, Impagliazzo, and Paturi [5] construct a family of  $2^{O(n^2)}$  explicit functions, most of which require depth-3 circuits with k = O(1) of size  $2^{n-o(n)}$ . Santhanam and Srinivasan [46] improve on this by constructing such a family of functions of size  $2^{f(n)}$  for every  $f(n) = \omega(n \log n)$ .

### **DeMorgan Formulas**

While explicit super-linear lower bounds for *circuits* are not known, there are super-linear lower bounds for *formulas*. In this paper, we focus on the well-studied DeMorgan formulas, which are circuits where every intermediate computation is used exactly once: all gates have out-degree one, and the operations are fan-in two ANDs and ORs, with inputs being variables and their negations. The two most successful methods for proving lower bounds on DeMorgan formula size are random restrictions [54, 2, 24, 36, 21, 55] as well as Karchmer-Wigderson games and the Karchmer-Raz-Wigderson conjecture [29, 27, 26, 16, 12]. Both approaches have led to a lower bound of  $n^{3-o(1)}$  and are currently stuck at giving stronger lower bounds.

### 1.1 Valiant's Depth Reduction

Remarkably, a classical result of Valiant from the 70's relates three of the four models above: linear, log-depth, and constant-depth circuits. Using a depth reduction for DAGs [13], Valiant [56] shows that for any circuit of size cn and depth d, and for every integer k, one can remove at most  $\frac{2ckn}{\log d}$  wires such that the resulting circuit has depth at most  $d/2^k$ . Letting k be a sufficiently large constant, this wire-removal lemma shows how any circuit of size O(n) and depth  $O(\log n)$  can be converted into an  $OR \circ AND \circ OR$  circuit where the OR output gate has fan-in  $2^{O(n/\log\log n)}$  and the lower OR gates have fan-in  $O(n^{\varepsilon})$  for any desired  $\varepsilon > 0$ . Hence, by exhibiting a function that has no depth-3 circuit with these restrictions, it follows that this function cannot be computed by circuits of linear size and logarithmic depth. Unfortunately, the best known lower bounds on depth-3 circuits (as mentioned earlier) are still too far from those required for this reduction.

<sup>&</sup>lt;sup>2</sup> ETHR  $\frac{n}{k+1}$  outputs 1 if and only if the sum of the *n* input bits over the integers equals  $\frac{n}{k+1}$ .

#### 24:4 Circuit Depth Reductions

In the same paper, Valiant introduced the notion of matrix rigidity (a similar notion was independently introduced by Grigoriev [19]) and related it to the size of linear circuits of log-depth using ideas similar to those described above. Alas, the known lower bounds on matrix rigidity are also far from being able to give new lower bounds on the size of log-depth linear circuits.

### 1.2 Our Results: New Depth Reductions

The main contributions of this paper are new reductions to depth-3 circuits that work for unrestricted circuits and (conditionally) for super-cubic formulas, as well as new results connecting various pseudorandom objects to circuit lower bounds. In particular, we show how to express super-cubic DeMorgan formulas as subexponential-size depth-3 circuits of a certain form, under the hypothesis that DeMorgan formulas have probabilistic polynomials of non-trivial degree. This suggests an approach for improving formula size lower bounds, by proving strong lower bounds on depth-3 circuits.

### 1.2.1 Depth Reductions for Circuits

In Valiant's depth reduction, one can only have  $d/2^k < \log n$  (and < cn removed edges) for circuits of depth  $d \le O(\log n)$ . Thus, Valiant's depth reduction technique does not yield interesting results for circuits of super-logarithmic depth. Moreover, Schnitger and Klawe [47, 48, 30] construct an explicit family of DAGs showing that the parameters achieved by Valiant are essentially optimal. Their counterexamples convincingly show that a pure graph-theoretic approach to circuit depth reduction cannot give non-trivial results for unrestricted circuits.

In this paper, we overcome this difficulty by presenting a counterpart of Valiant's depth reduction that works for circuits of unrestricted depth. Our depth reduction takes into account not only the underlying graph of a circuit, but also the *functions* computed by the circuit gates.

Our first result shows that unbounded-depth circuits of size less than 3.9n can be converted into  $2^{\delta n}$  disjunctions of short 16-CNFs, for some  $\delta < 1$ .

▶ Theorem 1. Every circuit of size s can be computed as an  $OR_{2^{\lceil \frac{s}{2} \rceil}} \circ AND_s \circ OR_2$  circuit and as an  $OR_{2^{\lceil \frac{s}{2} \rceil}} \circ AND_{2^{14} \cdot s} \circ OR_{16}$  circuit.

As a consequence, in order to prove a 3.9n - o(n) size lower bound on unrestricted circuits, it suffices to provide a function that cannot be computed by an OR of fewer than  $2^{n-o(n)}$  16-CNF's. To prove Theorem 1, we gradually transform the given circuit into an OR of CNF's by carefully picking a suitable internal gate and branching on its two possible output values. In contrast to Valiant's reduction, our transformation works for circuits of arbitrary depth. This is achieved by an argument that takes into account both the graph structure of the circuit and the functional properties of the gates involved. Since in this approach we can branch on internal gates (inside the circuit), we can avoid a massive case analysis. This also distinguishes our approach from known circuit lower bound proofs based on gate elimination, which must set input gates (or gates very close to the inputs) for the argument to work.

It should be noted that known satisfiability algorithms based on branching, as well as circuit lower bounds based on gate elimination [39, 38, 49, 45, 8] may be viewed as depth-reductions for small circuits: if at most k variables are set in any branch before the circuit has a "trivial" form, then the circuit can be expressed as an OR of  $2^k$  "trivial" forms. At the same time, the known techniques in this line of work appear stuck at lower bounds of around 3n, and provably cannot go beyond linear-size bounds [17].

On the way to proving Theorem 1, we study structural results about converting small circuits into disjunctions of k-CNFs, that have curious connections to properties of k-CNFs found in the Satisfiability Coding Lemma [39, 38] and Sparsification Lemma [25, 5]. In particular, we ask the following question.

▶ Open Problem 2. Prove or disprove: for any constant c, any circuit of size cn can be computed as an

$$OR_{2^{(1-\delta(c))n}} \circ AND \circ OR_{\gamma(c)}$$

circuit, for some  $\delta(c) > 0$  and integer  $\gamma(c) \geq 1$ .

If such depth-3 circuits always existed, this would constitute a new approach to proving superlinear circuit lower bounds. If no depth-3 circuit of this form exists for some linear-size circuits, then we would have a separation between linear-size circuits and (for example) superlinear-size series-parallel circuits (by Valiant's reduction for such circuits, see Theorem 9). Note that for the gate elimination method such limitations are known [17], and they do not apply to the approach presented in this work.

Our second result is a new "non-rigidity" result for matrices with small linear circuits: if a matrix M over  $\mathbb{F}_2$  can be computed by a linear circuit of size s, then it is possible to flip at most 16 bits in every row of M to drop its rank below s/4. This opens up an approach to proving linear circuit lower bounds on sizes up to 4n.

▶ Theorem 3. For every matrix  $M \in \mathbb{F}_2^{m \times n}$  of linear circuit complexity s,  $\mathbb{R}_M(|s/4|) \le 16$ .

### 1.2.2 Pseudorandom Objects and Circuit Lower Bounds

The classical result by Valiant shows that improvements of known depth-3 circuit lower bounds and rigid matrices imply super-linear log-depth circuit lower bounds. Our depth reductions show that even modest improvements of the known constructions also give modest improvements of unrestricted circuit lower bounds.

In the full version of this paper [18], we show that Valiant's and our reduction are applicable to two more types of pseudorandom objects: dispersers for varieties, and functions having small correlation with low degree polynomials. These implications are briefly summarized<sup>3</sup> in Table 1.

### 1.2.3 Depth Reduction for Formulas

For DeMorgan formulas we give a conditional depth-reduction (stated informally, see Theorem 14 for a formal statement): if there is an  $\varepsilon > 0$  such that DeMorgan formulas of size s have probabilistic polynomials of degree  $s^{1-\varepsilon}$  and error 1/3 over  $\mathbb{F}_2$ , then for some  $\delta > 0$  every DeMorgan formula of size  $O(n^{3+\delta})$  can be written as an approximate sum of  $2^{n^{1-\gamma}}$  degree- $n^{1-\gamma}$   $\mathbb{F}_2$ -polynomials for a constant  $\gamma > 0$ .<sup>4</sup> Moreover, if there are probabilistic polynomials of degree  $O(\sqrt{s})$  for DeMorgan formulas of size s (which we conjecture is true), our depth reduction holds for DeMorgan formulas of size  $n^{3.99}$ .

<sup>&</sup>lt;sup>3</sup> In this table we only present strongest implications from the strongest premises. Our reductions would still give new circuit lower bounds even from weaker objects (see the full version [18] for complete statements of these results). For example, the second line of the table says that a lower bound of  $2^{n-o(n)}$  against depth-3 circuits would give a lower bound of 3.9n. On the other hand, a lower bound of  $2^{0.8n}$  would lead to an elementary proof of a lower bound of 3.1n.

<sup>&</sup>lt;sup>4</sup> Similar results can be stated for  $\mathbb{F}_p$  where p is any prime.

#### 24:6 Circuit Depth Reductions

**Table 1** Comparing the depth reductions of this paper (labeled with \*) with the depth reduction of Valiant [56] (labeled with V). We use the following notation (all formal definitions are given in Section 2 and the full version of the paper [18]): s(f) is the smallest size of a circuit computing f,  $s_{\log}$  refers to circuits of depth  $O(\log n)$ ,  $s_3^k$  refers to circuits that are ORs of k-CNFs,  $s_{\oplus}$  refers to circuits consisting of ⊕ gates only; (d, m, s)-disp. stands for a (d, m, s)-disperser, a function that is not constant on any subset of the Boolean hypercube of size at least s that is defined as the set of common roots of at most m polynomials of degree at most d;  $\mathbb{R}_M(r)$  is the row-rigidity of M for the rank r over  $\mathbb{F}_2$ , i.e., the smallest row-sparsity of a matrix A such that  $\operatorname{rank}(M \oplus A) \leq r$ .

|        | improving known lower bound                                                                                                                | to lower bound                                                                                                     | implies lower bound                                        |
|--------|--------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------|
| V      | $s_3^{n^{\varepsilon}}(f) \ge 2^{n^{1-\varepsilon}} [39]$                                                                                  | $s_3^{n^{\varepsilon}}(f) \ge 2^{\omega \left(\frac{n}{\log \log n}\right)}$                                       | $s_{\log}(f) = \omega(n)$                                  |
| *      | $s_3^{16}(f) \ge 2^{\frac{n}{10}} [38]$                                                                                                    | $s_3^{16}(f) \ge 2^{n - o(n)}$                                                                                     | $s(f) \ge 3.9n$                                            |
| V      | $\left(n^{\varepsilon}, \infty, 2^{n-n^{1/2-\varepsilon}}\right)$ -disp. [44]                                                              | $\left(n^{\varepsilon}, \infty, 2^{n-\omega\left(\frac{n}{\log\log n}\right)}\right)$ -disp.                       | $s_{\log}(f) = \omega(n)$                                  |
| *      | $(16, \infty, 2^{(1-\varepsilon)n})$ -disp. [58]                                                                                           | $(16, 1.3n, 2^{o(n)})$ -disp.                                                                                      | $s(f) \ge 3.9n$                                            |
| *      | $\left(16, \frac{n}{(\log n)^c}, 2^{o(n)}\right)$ -disp. [9]                                                                               | $(16, 1.3n, 2^{o(n)})$ -disp.                                                                                      | $s(f) \ge 3.9n$                                            |
| V<br>* | $\mathbb{R}_M\left(\omega\left(\frac{n}{\log\log n}\right)\right) > \log\log n \ [15]$ $\mathbb{R}_M\left(\frac{n}{65}\right) > 16 \ [41]$ | $\mathbb{R}_M\left(\omega\left(\frac{n}{\log\log n}\right)\right) > n^{\varepsilon}$ $\mathbb{R}_M(n - o(n)) > 16$ | $s_{\oplus,\log}(M) = \omega(n)$<br>$s_{\oplus}(M) \ge 4n$ |

Interestingly, the techniques used to express DeMorgan formulas as depth-3 circuits are totally different from those used in Theorem 1 and 3. Namely, we first balance a formula (without increasing its size too much), decompose it into a small top part and several small bottom formulas, approximate the top part by a real-valued low-degree polynomial, then rewrite the bottom parts as probabilistic polynomials (as hypothesized). Finally, we collapse these two polynomials into a depth-3 circuit.

The hypothesis that lower-degree probabilistic polynomials exist for every DeMorgan formula of size s looks very plausible. We have not found an example of a size-s formula that resists the construction of an  $O(\sqrt{s})$ -degree probabilistic polynomial. Note that such polynomials do exist in the real-approximation sense [43]. For example, every symmetric function (such as MAJORITY) has probabilistic polynomials of  $O(\sqrt{s})$  degree [1], and it is not hard to show that the layered OR-AND tree of depth  $\log_2(s)$  has a probabilistic polynomial of  $O(\sqrt{s})$  degree as well; in fact, any layered tree of depth  $\log_2(s)$  with the same gate type at each layer (AND or OR) has such degree. It is possible that there are "nasty" formulas that resist lower-degree probabilistic polynomials, but given the examples we already know, we do not know what they might look like.

▶ Open Problem 4. Prove or disprove: every DeMorgan formula of size s has a probabilistic polynomial over  $\mathbb{F}_2$  of degree  $O(\sqrt{s})$  with constant error less than 1/2.

<sup>&</sup>lt;sup>5</sup> Briefly: we can always write such formulas as either an OR of ANDs of  $O(\sqrt{s})$  literals, or an AND of ORs of  $O(\sqrt{s})$  literals. From there, we can simply replace the output gate with an O(1)-degree probabilistic polynomial (as in Razborov [42]), and the other gates with exact polynomials of  $O(\sqrt{s})$  degree.

### 1.3 Motivating Example

Here we provide a simple example of a reduction of unbounded circuits to depth-3 circuits, to give an idea of what is possible.

A formula is a circuit where every internal gate (i.e. not the inputs and not the output) has out-degree exactly 1. In our simple example, we will show that a circuit of size, say, 2.7n can be computed by an OR of  $2^{0.9n}$  formulas of small size (2.7n). Since we know almost-quadratic lower bounds [35] on formula size, we may hope to find a function which is not computable by an OR of  $\ll 2^n$  linear-size formulas.

▶ **Lemma 5** (Toy Example). Every circuit of size s can be expressed as an OR of  $2^{\lceil s/3 \rceil}$  formulas, each of size less than s.

**Proof.** For a circuit C, let s(C) denote its size. For  $s \leq 3$ , we just transform a circuit into a single formula of the same size. For s > 3, we proceed by induction. If the given circuit C is a formula, no transformation is needed. Otherwise take the topologically first gate G of out-degree at least 2. Note G is computed by a formula (all previous gates have out-degree 1); let t = s(G) be the size of this formula. Consider two minimum-size circuits  $C_0$  and  $C_1$  that compute the same function as C on the input sets  $\{x \in \{0,1\}^n : G(x) = 0\}$  and  $\{x \in \{0,1\}^n : G(x) = 1\}$ , respectively. We claim that  $s(C_0), s(C_1) \leq s - t - 2 \leq s - 3$ , since to compute  $C_0$  and  $C_1$  one can remove the subcircuit in C computing gate C as well as two successors of C. The successors can be removed because C0 outputs a constant on both parts of the considered partition of the Boolean hypercube, and all gates in the subcircuit of C0 are only needed to compute C1 is computed by a formula). Now, note that

$$C(x) \equiv (\neg G(x) \land C_0(x)) \lor (G(x) \land C_1(x)).$$

Applying the induction hypothesis to  $C_0$  and  $C_1$ , we can rewrite C as an OR of at most  $2^{\lceil (s-3)/3+1 \rceil} \leq 2^{\lceil s/3 \rceil}$  formulas of size (s-t-2)+(t+1) < s.

This result would imply a circuit lower bound of 3n - o(n) for any function that has correlation at most  $2^{-n+o(n)}$  with all formulas of linear size. While we do know functions that have exponentially small correlation  $2^{-\varepsilon n}$  with formulas of linear size [45, 28, 51, 31, 55, 23], none of them gives a bound of  $2^{-n+o(n)}$ . At any rate there is an inherent limitation for this toy approach. By Parseval's identity, every Boolean function has a Fourier coefficient  $\geq 2^{-n/2}$ . This implies that the correlation of this function with the corresponding parity function is at least  $2^{-n/2}$  (and this is essentially tight correlation with small formulas for a random function). Since every parity on a subset of inputs can be computed by a formula of size  $\leq n$ , Lemma 5 would only be able to prove circuit lower bounds of 1.5n.

In order to prove stronger circuit lower bounds, we need to improve both parameters: the constant 3 in the exponent, and the class of formulas we reduce circuits to. Our Theorem 1 achieves this: it reduces a circuit to an OR of  $2^{\lceil \frac{s}{3.9} \rceil}$  formulas, each of which is a 16-CNF. Therefore strong enough correlation bounds against 16-CNFs would yield new circuit lower bounds.

### 2 Definitions and Preliminaries

### 2.1 Unrestricted Circuits

Let  $B_{n,m}$  be the set of all Boolean functions  $f: \{0,1\}^n \to \{0,1\}^m$  and let  $B_2 = B_{2,1}$ . A circuit is a directed acyclic graph that has n nodes of in-degree 0 labeled with  $x_1, \ldots, x_n$  that are called *input gates*. All other nodes are called *internal gates*, have in-degree 2, and are labeled

with operations from  $B_2$ . Some m gates are also marked as output gates. Such a circuit computes a function from  $B_{n,m}$  in a natural way. The  $size\ s(\mathcal{C})$  of a circuit  $\mathcal{C}$  is its number of *internal* gates. This definition extends naturally to functions: s(f) is the smallest size of a circuit computing the function f.

The depth of a gate G is the maximum number of edges (also called wires) on a path from an input gate to G. The depth of a circuit is the maximum depth of its gates. By  $s_{\log n}(f)$  we denote the smallest size of a circuit of depth  $O(\log n)$  computing f.

A circuit is called *linear* if it consists of  $\oplus$  gates only. The corresponding circuit size measure is denoted by  $s_{\oplus}$ .

Our unrestricted circuits are usually drawn with input gates at the top, so by a top gate of a circuit we mean a gate that is fed by two variables.

### 2.2 Series-Parallel Circuits

A labeling of a directed acyclic graph G = (V, E) is a function  $\ell \colon V \to \mathbb{N}$  such that for every edge  $(u, v) \in E$  one has  $\ell(u) < \ell(v)$ . A graph/circuit G is called *series-parallel* if there exists a labeling  $\ell$  such that for no two edges  $(u, v), (u', v') \in E, \ell(u) < \ell(u') < \ell(v) < \ell(v')$ . The corresponding circuit complexity measure is  $s_{\text{Sp}}$ .

### 2.3 Depth-3 Circuits

Unlike unrestricted circuits, depth-3 circuits are usually drawn the other way around, i.e., with the output gate at the top. In this paper, we focus on  $OR \circ AND \circ OR$  circuits, i.e., ORs of CNFs. We will use subscripts to indicate the fact that the fan-in of a particular layer is bounded. Namely, an  $OR_p \circ AND_q \circ OR_r$  circuit is an OR of at most p CNFs each of which contains at most q clauses and at most r literals in every clause. Since the gates of a depth 3 circuit are allowed to have an unbounded fan-in, it is natural to define the size of such a circuit as its number of wires. It is not difficult to see that for k = O(1) the size of an  $OR \circ AND \circ OR_k$  circuit is equal to the fan-in of its output gate up to a polynomial factor in n. By  $s_3^k(f)$  we denote the smallest size of an  $OR \circ AND \circ OR_k$  circuit computing f.

### 2.4 Rigidity

We say that a matrix  $M \in \mathbb{F}_2^{m \times n}$  is *s-sparse* if each *row* of M contains at most s non-zero elements. The *rigidity* of a matrix  $M \in \mathbb{F}_2^{m \times n}$  for the rank parameter r is the minimum sparsity of a matrix  $A \in \{0,1\}^{m \times n}$  such that  $\operatorname{rank}_{\mathbb{F}_2}(M \oplus A) \leq r$ :

$$\mathbb{R}_M(r) = \min\{s : \operatorname{rank}_{\mathbb{F}_2}(M \oplus A) \le r, A \text{ is } s\text{-sparse}\}.$$

### 2.5 Probabilistic, Approximate, and Robust Polynomials

Since even functions of small circuit and formula complexity may only have large-degree polynomial representations, it often proves convenient to use randomized polynomials or polynomials which approximate (rather than exactly compute) a given function.

▶ **Definition 6** (Probabilistic polynomials). Let  $f: \{0,1\}^n \to \{0,1\}$  be a Boolean function. A distribution  $\mathcal{D}$  of n-variate degree-d polynomials over  $\mathbb{F}_2$  is a probabilistic polynomial for f with degree d and error  $\varepsilon$  if for every  $x \in \{0,1\}^n$ ,

$$\Pr_{p \sim \mathcal{D}}[f(x) = p(x)] \ge 1 - \varepsilon.$$

▶ **Definition 7** (Approximate Polynomials). Let  $f: \{0,1\}^n \to \{0,1\}$  be a Boolean function. An n-variate multilinear degree-d polynomial p over  $\mathbb{R}$  is an approximate polynomial for f with degree d and error  $\varepsilon$  if for every  $x \in \{0,1\}^n$ ,

$$|p(x) - f(x)| \le \varepsilon.$$

▶ **Definition 8** (Robust Polynomials). Let  $f: \{0,1\}^n \to [0,1]$  be a polynomial over  $\mathbb{R}$ . Then a polynomial  $p: \mathbb{R}^n \to \mathbb{R}$  is δ-robust for f if for every  $x \in \{0,1\}^n$  and for every  $\varepsilon \in [-1/3,1/3]^n$ ,

$$|f(x) - p(x + \varepsilon)| \le \delta.$$

### 2.6 Valiant's Depth Reductions

Here we formally recall the classical depth reduction results by Valiant [56].

- ▶ **Theorem 9** ([56, 4, 57]). For every  $c \ge 1$  and  $\varepsilon > 0$  there exists a  $\delta > 0$  such that every circuit C of size cn and depth  $c \log n$  can be computed as
- 1. an  $OR_{2\frac{\delta n}{\log \log n}} \circ AND \circ OR_{n^{\varepsilon}}$  circuit
- **2.** and as an  $OR_{2^{\varepsilon n}} \circ AND \circ OR_{2^{(\log n)^1 \delta}}$  circuit.

Furthermore, for every  $c \ge 1$  and  $\varepsilon > 0$  there is a  $k \ge 1$  such that every series-parallel circuit of size cn and unbounded depth can be computed as an  $OR_{2^{\varepsilon n}} \circ AND \circ OR_k$  circuit.

Theorem 9 applied to linear circuits yields the following.

- ▶ **Theorem 10** ([56, 4, 57]). Let  $M \in \mathbb{F}^{m \times n}$  be a matrix. For every  $c \geq 1$  and  $\varepsilon > 0$  there exists  $\delta > 0$  such that, if a linear circuit  $\mathcal{C}$  of size cn and depth  $c \log n$  computes Mx for every  $x \in \mathbb{F}^n$ , then
- 1.  $\mathbb{R}_M\left(\frac{\delta n}{\log\log n}\right) \leq n^{\varepsilon};$
- **2.** and  $\mathbb{R}_M(\varepsilon n) \leq 2^{(\log n)^{1-\delta}}$ .

Furthermore, for every  $c \ge 1$  and  $\varepsilon > 0$  there is a  $k \ge 1$  such that if  $\mathcal{C}$  is a series-parallel linear circuit of size cn and unbounded depth, then  $\mathbb{R}_M(\varepsilon n) \le k$ .

## 3 Formula Depth Reduction

In this section, we give a (conditional) depth reduction for DeMorgan formulas. We start by balancing a given formula. For this we use the following result due to Tal [55].

▶ Lemma 11 (Claim VI.2 in [55]). Let F be a DeMorgan formula of size s over the set of variables  $X = \{x_1, ..., x_n\}$ , and t be some parameter; then, there exist  $k \leq 36s/t$  formulas over X, denoted by  $T_1, ..., T_k$ , each of size at most t, and there exists a read-once formula F' of size k such that  $F'(T_1(x), ..., T_k(x)) = F(x)$  for all  $x \in \{0, 1\}^n$ .

Below we will also make use of the following results by Reichardt [43] and Sherstov [53].

- ▶ **Theorem 12** ([43]). If  $f: \{0,1\}^n \to \{0,1\}$  can be computed by a DeMorgan formula of size s, then f has an approximate polynomial of degree  $O(\sqrt{s})$  with error  $\varepsilon = 1/10$ .
- ▶ **Theorem 13** ([53]). If  $f: \{0,1\}^n \to [0,1]$  is a polynomial of degree d over  $\mathbb{R}$ , then there is a  $\delta$ -robust polynomial p for f of degree  $O(d + \log(1/\delta))$ .

Now we are ready to present the main result of this section: Assuming DeMorgan formulas of size s have probabilistic polynomials of degree  $O(s^{1-\delta})$  for some  $\delta > 0$ , we will obtain subexponential-size depth-3 circuits computing formulas of super-cubic size.

In the following, a SUM gate will compute an approximate sum: a (real-weighted) sum of the inputs such that, over all Boolean inputs, the sum is within  $\pm 1/3$  of the 0-1 value of a desired Boolean function.

▶ **Theorem 14.** Suppose for some  $\delta > 0$ , DeMorgan formulas of size  $\ell$  have probabilistic polynomials of degree  $\ell^{1-\delta}$  with error 1/3. Then for every  $\alpha < \delta/(1-\delta)$  there is a  $\gamma > 0$ , so that for every formula F of size  $s = O(n^{3+\alpha})$ , there is a  $2^{n^{1-\gamma}}$ -size approximate sum of degree- $n^{1-\gamma}$   $\mathbb{F}_2$ -polynomials computing F. That is, F can be computed by a

```
\mathsf{SUM}_{2^{n^1-\gamma}} \circ \mathsf{MOD2}_{2^{n^1-\gamma}} \circ \mathsf{AND}_{n^1-\gamma} \ .
```

**Proof.** First, we apply Lemma 11 to F for some parameter t to be defined later. We obtain a read-once formula F' of size k = O(s/t), and k formulas  $T_1, \ldots, T_k$  each of size  $\leq t$ .

Let p be an approximate polynomial (over the reals) for F' of degree  $d = O(\sqrt{k})$  with error 1/10, guaranteed by Theorem 12. Applying Theorem 13, we get a 1/10-robust polynomial p' for p of degree  $d' = O(\sqrt{k})$ .

By the hypothesis of the theorem, we know that each  $T_i$  has a probabilistic polynomial of degree  $O(t^{1-\delta})$  with error  $\varepsilon = 1/3$ . For each  $T_i$ , draw  $O(\log s)$  independent copies of this probabilistic polynomial, and take their majority vote with an  $O(\log s)$ -degree polynomial. For an appropriate leading constant in the big-O, we can obtain a probabilistic polynomial for  $T_i$  of degree  $O(t^{1-\delta} \cdot \log s)$  with error 1/(10s).

Let  $\mathcal{D}_1, \ldots, \mathcal{D}_k$  be probabilistic polynomials of degree  $D = O(t^{1-\delta} \cdot \log s)$  with error  $\varepsilon = 1/(10s)$  for the formulas  $T_1, \ldots, T_k$ . The error bound  $\varepsilon = 1/(10s)$  guarantees that for every  $x \in \{0,1\}^n$ , all k polynomials compute the correct value with probability at least 9/10.

Now for every  $T_i$ , we compute the average  $A_i$  (over the reals) of O(n) independent samples from  $\mathcal{D}_i$ . By a Chernoff bound and union bound, each  $A_i$  is within  $\pm 1/10$  of the correct 0-1 value for  $T_i$ , over all  $2^n$  inputs x, with probability of error  $1/\exp(n)$ . By the properties of robust polynomials, p' fed the sums  $A_i$  will still output the correct value (within  $\pm 1/10$ ) for all inputs  $x \in \{0,1\}^n$ , for some choice of samples.

Therefore F can be computed by a

```
SUM_{nd'} \circ PRODUCT_{d'} \circ SUM_{O(n)} \circ MOD2 \circ AND_D.
```

Applying distributivity to the PRODUCT of SUMs, we get

$$SUM_{nd'} \circ SUM_{nO(d')} \circ PRODUCT_{d'} \circ MOD2 \circ AND_D$$
.

Noting the PRODUCTs now take 0/1 inputs, we can replace them with ANDs:

$$\mathsf{SUM}_{n^{d'}} \circ \mathsf{SUM}_{n^{O(d')}} \circ \mathsf{AND}_{d'} \circ \mathsf{MOD}2 \circ \mathsf{AND}_D.$$

Taking the Fourier expansion of the AND function, we can replace each AND gate with a SUM of  $2^{d'}$  MOD2s of fan-in  $\leq d'$ :

$$\mathsf{SUM}_{n^{d'}} \circ \mathsf{SUM}_{n^{O(d')}} \circ \mathsf{SUM}_{2^{d'}} \circ \mathsf{MOD2} \circ \mathsf{AND}_D.$$

Merging the SUMs, our final expression has the form:

$$\mathsf{SUM}_{n^{O(d')}} \circ \mathsf{MOD2} \circ \mathsf{AND}_D.$$

Finally, we want to choose a value of t so that the fan-in of the SUM is subexponential, and the fan-ins of the AND's are sublinear (which will also imply that the fan-in of the MOD2's are sub-exponential). Let  $t = n^{1+\beta}$ , where  $\beta$  is an arbitrary number between  $\alpha < \beta < \delta/(1-\delta)$ . Note that

$$d' = O(\sqrt{k}) = O(\sqrt{s/t}) = O(n^{1 - \frac{\beta - \alpha}{2}}) = O(n^{1 - \gamma})$$

for every  $0 < \gamma < \frac{\beta - \alpha}{2}$ . Also, observe that

$$D = O(t^{1-\delta} \cdot \log s) = O(n^{1-(1-\delta)(\delta/(1-\delta)-\beta)} \log n) = O(n^{1-\gamma})$$

for every  $0 < \gamma < (1 - \delta)(\delta/(1 - \delta) - \beta)$ .

From the upper bounds on d' and D, we have that F can be computed by

$$\mathsf{SUM}_{2^{n^{1-\gamma}}} \circ \mathsf{MOD2}_{2^{n^{1-\gamma}}} \circ \mathsf{AND}_{n^{1-\gamma}}$$

for some  $\gamma > 0$ .

The above formula depth reduction shows that, if there are more efficient probabilistic polynomials for DeMorgan formulas (and we have no reason to doubt this), then supercubic formulas have interesting representations as approximate sums of sub-exponentially many sub-linear degree  $\mathbb{F}_2$ -polynomials. Recent work [59, 7] can already be applied to prove interesting lower bounds against approximate sums of  $2^{n^{\alpha}}$   $\mathbb{F}_2$ -polynomials of degree  $n^{\beta}$ , where  $\alpha + \beta < 1$ . The remaining challenge will be to prove lower bounds when  $\max\{\alpha, \beta\} < 1$ .

## 4 Circuit Depth Reductions

In this section, we present new depth reductions for circuits with unrestricted depth.

### 4.1 Linear Circuits

We start by considering linear circuits, i.e., circuits consisting of  $\oplus$  gates only. For technical reasons, we assume that there are n+1 input gates in a linear circuit:  $x_1, \ldots, x_n$  as well as the constant 0. For a matrix  $M \in \{0,1\}^{m \times n}$ , we say that a linear circuit  $\mathcal{C}$  with m outputs computes the linear transformation M if the i-th output of  $\mathcal{C}(x)$  equals the i-th row of Mx for all  $x \in \{0,1\}^n$ , treating  $\mathcal{C}(x)$  as the vector of output values. We say that a linear circuit  $\mathcal{C}$  computing M is optimal if no circuit of smaller size computes M.

The main result of this subsection asserts that matrices computable by small linear circuits are not too rigid. The contrapositive says: to get an improved lower bound on the size of linear circuits, it suffices to construct a matrix with good rigidity parameters. Below, we restate the corresponding theorem formally and then prove it.

▶ Theorem 3. For every matrix  $M \in \mathbb{F}_2^{m \times n}$  of linear circuit complexity s,  $\mathbb{R}_M(\lfloor s/4 \rfloor) \leq 16$ .

**Proof.** Let  $\mathcal{C}$  be an optimal circuit of size s computing M. If s < 16 or the depth of  $\mathcal{C}$  is at most 4, then each output depends on at most 16 variables. Hence M is 16-sparse and the theorem statement holds. Consider this as the base case of an induction on s.

For the induction step, we "normalize"  $\mathcal{C}$ . Namely, we show how to express M as the (modulo 2) sum of two  $\mathbb{F}_2$ -matrices A and B, where A is 16-sparse (each row has  $\leq$  16 ones) and B has rank at most  $\lfloor s/4 \rfloor$ . Note that if  $\mathcal{C}$  has an output gate H of depth at most 4, then H depends on at most  $2^4 = 16$  inputs. Thus the corresponding row  $r_H$  of M has at most 16 ones. Consider the  $(m-1) \times n$  matrix  $M_{-H}$  obtained by removing  $r_H$  from M. We claim

that  $\mathbb{R}_{M_{-H}}(\lfloor s/4 \rfloor) \leq 16$  implies  $\mathbb{R}_{M}(\lfloor s/4 \rfloor) \leq 16$ . Indeed, suppose  $M_{-H} = A_{-H} \oplus B_{-H}$  where  $A_{-H}$  is 16-sparse and rank $(B_{-H}) \leq \lfloor s/4 \rfloor$ . To get matrices A and B for M, we simply add the row  $r_{H}$  to  $A_{-H}$  and a corresponding all-zero row to  $B_{-H}$ . Clearly, the resulting matrix A is 16-sparse and the rank of the resulting matrix B does not change. Thus, in the following, we assume WLOG that C has no output gates of depth at most 4. Our crucial step is the following claim.

- $\triangleright$  Claim 15. Let  $\mathcal{C}$  be an optimal linear circuit computing  $M \in \{0,1\}^{m \times n}$  such that  $s(\mathcal{C}) \ge 16$ , and no output gate of  $\mathcal{C}$  has depth smaller than 5. Then there is a gate G in  $\mathcal{C}$  and a linear circuit  $\mathcal{C}'$  computing a matrix  $M' \in \{0,1\}^{m \times n}$  with the properties:
- 1.  $s(\mathcal{C}') \leq s(\mathcal{C}) 4$ , and
- **2.** for every  $x \in \{0,1\}^n$ , if G(x) = 0 then C(x) = C'(x).

For now, suppose the claim is proved. Consider the circuit  $\mathcal{C}'$ , gate G in  $\mathcal{C}$ , and matrix M' provided by Claim 15. Let  $g \in \{0,1\}^{1 \times n}$  be the characteristic vector of the linear function computed by G, so that G(x) = gx. By the claim, gx = 0 implies  $(M \oplus M')x = 0$ . Hence  $(M \oplus M')$  is either the zero matrix, or it defines the same linear subspace as  $g: M \oplus M' = tg$  for a vector  $t \in \{0,1\}^{m \times 1}$ .

By the induction hypothesis,  $M' = A' \oplus B'$  where A' is 16-sparse, and  $\operatorname{rank}(B') \leq \lfloor \frac{s-4}{4} \rfloor = \lfloor \frac{s}{4} \rfloor - 1$ . Thus,  $M = A' \oplus B$ , where the matrix  $B = B' \oplus tg$  has rank at most  $\lfloor s/4 \rfloor$  by subadditivity of the rank function.

We now turn to proving the remaining claim.

Proof of Claim 15.

- Case 1: There is a gate G in C of depth at least 2 and at most 4, and has outdegree at least 2. Let the predecessors of G be B and C, and call two of its successors D and E, see Figure 1 (in this and the following figures, we write the out-degrees of some of the gates near them). The circuit C' is obtained from C by "assigning" the output of G to be 0. Note that B(x) = C(x) for all  $x \in \{0,1\}^n$  where G(x) = 0. At least one of B and C must be an internal gate (otherwise G would have depth 1), let it be G. Since G computes the same function as G, it may be removed from G': we remove it, and replace every wire of the form  $G \to H$  by a new wire  $G \to H$ . Note that neither  $G \to H$  is an output gate. Now, we show that both  $G \to H$  and  $G \to H$  are removed. Let us focus on the gate  $G \to H$  it is shown similarly and call its other predecessor  $G \to H$ . Since  $G \to H$  is gate  $G \to H$  by a wire  $G \to H$  by a wire  $G \to H$  by a wire  $G \to H$ . If  $G \to H$  happens to be an output gate, we move the corresponding output label from  $G \to H$  to  $G \to H$ .
- Case 2: All gates of depth at least 2 and at most 4 have out-degree exactly 1 in  $\mathcal{C}$ . Take a gate G of depth 4 and trace back its longest path to an input:  $x_i \to D \to C \to B \to G$ . Let also E be the successor of G (which exists because C has depth at least 5). By assumption, gates B and C have out-degree 1. This means that in C they are only used for computing the gate G. This, in turn, means that assuming G = 0, we can remove G, B, and C (note none of them is an output). Finally, the gate E can be replaced by the other input E of E (note E (note E), since E0 is optimal).

This completes the proof.

▶ Remark 16. Extending the same ideas, one can show that any linear circuit  $\mathcal{C}$  of size s can be computed by an  $\operatorname{OR}_{2^{\lceil \frac{s}{4} \rceil}} \circ \operatorname{AND}_{s \cdot 2^{14}} \circ \operatorname{OR}_{16}$  circuit. For this, one considers two optimal circuits  $\mathcal{C}_0$  and  $\mathcal{C}_1$  resulting from  $\mathcal{C}$  by assuming G = 0 and G = 1, respectively. As shown in the proof, both  $\mathcal{C}_0$  and  $\mathcal{C}_1$  have size at most s - 4. One then proceeds by induction. We illustrate this approach in full detail in the next subsection.

 $\langle$ 





Case 1: assuming G = 0, the gate G is removed, B is replaced by C, and D and E are replaced by their other predecessors.

Case 2: assuming G = 0, the gates B, C, and G are removed whereas E is replaced by F.

- **Figure 1** Cases in the proof of Claim 15.
- ▶ Remark 17. The proof of Theorem 3 gives a decomposition  $M = A \oplus B = A \oplus (C \cdot D)$ , where  $A \in \mathbb{F}^{m \times n}$  is 16-sparse,  $C \in \mathbb{F}^{m \times s/4}$  is composed of vectors t, and  $D \in \mathbb{F}^{s/4 \times n}$  is composed of vectors g. Since the chosen gate G always has depth at most four, the vector g is 16-sparse. Thus, we in fact have a decomposition  $M = A \oplus (C \cdot D)$ , where both A and D are 16-sparse. In particular, the row-space of M is spanned by the union of row-spaces of A and D. This implies that the row-space of M can be spanned by at most  $(m + \frac{s}{4})$  16-sparse vectors. The corresponding matrix property is called *outer dimension*, and it is studied in [37, 32]. While the current lower bounds on the outer dimension of explicit matrices do not lead to new circuit lower bounds, it would be interesting to study their applications in this context.

### 4.2 General Boolean Circuits

In this section, we study the following natural question: given a Boolean circuit<sup>6</sup> and given an integer  $k \geq 2$ , what is the smallest  $OR \circ AND \circ OR_k$  circuit computing the same function? To this end, we introduce the following notation. For an integer  $k \geq 2$ , we define  $\alpha(k)$  as the infimum of all values  $\alpha$  such that any circuit of size s can be rewritten as a  $OR_{2^{\alpha s}} \circ AND \circ OR_k$  circuit.

For proving upper bounds on  $\alpha(k)$  it will be convenient to consider the following class of circuits. Let  $OR_p \circ AND_q \circ C(r)$  be a class of circuits with an output OR that is fed by at most p AND's of at most q circuits of size at most r.

- ▶ **Theorem 18.** Every circuit of size s can be computed as:
- 1. an  $OR_{2\lceil \frac{s}{2} \rceil} \circ AND_{\lceil \frac{s}{2} \rceil} \circ C(1)$  circuit;
- **2.** an  $OR_{2\lceil \frac{s}{3.9} \rceil} \circ AND_{\lceil \frac{s}{3} \rceil} \circ C(15)$  circuit.

<sup>&</sup>lt;sup>6</sup> In this section we consider functions with one output, but these results can be trivially generalized to the multi-output case.

Note that any circuit of size r depends on at most r+1 variables, and hence can be written as an (r+1)-CNF with at most  $2^r$  clauses. Therefore every  $\operatorname{OR}_p \circ \operatorname{AND}_q \circ C(r)$  circuit can be easily converted into a  $\operatorname{OR}_p \circ \operatorname{AND}_{q2^r} \circ \operatorname{OR}_{r+1}$  circuit. Thorem 1, which we restate below, is then an immediate corollary of Theorem 18. In turn, it implies that  $\alpha(2) \leq \frac{1}{2}$  and  $\alpha(16) \leq \frac{1}{3.9}$ .

▶ Theorem 1. Every circuit of size s can be computed as an  $OR_{2^{\lceil \frac{s}{2} \rceil}} \circ AND_s \circ OR_2$  circuit and as an  $OR_{2^{\lceil \frac{s}{3} \rceil}} \circ AND_{2^{14} \cdot s} \circ OR_{16}$  circuit.

**Proof of Theorem 18.** Both parts are proven in a similar fashion. We proceed by induction on s. The base case is when s is small. We then just have an  $OR_1 \circ AND_1 \circ C(s)$  circuit.

For the induction step we take a gate G of C and consider two circuits  $C_0$  and  $C_1$  where  $C_i$  computes the same as C on all inputs  $\{x \in \{0,1\}^n : G(x) = i\}$ . We may assume both  $C_i$ 's are minimal size among all such circuits. Since  $C_i$  can be obtained from C by removing the gate G (as it computes the constant i on the corresponding subset of the Boolean hypercube), we conclude that  $s(C_i) < s$ . This allows us to proceed by induction. Assume that by the induction hypothesis  $C_i$  is guaranteed to be expressible as an  $OR_{p_i} \circ AND_{q_i} \circ C(r_i)$  circuit. We use the following identity to convert C into the required circuit:

$$C(x) \equiv ([G(x) = 0] \land C_0(x)) \lor ([G(x) = 1] \land C_1(x)). \tag{1}$$

Assume that the subcircuit of C computing the gate G has at most t gates. We claim that  $[G(x) = i] \wedge C_i$  can be written as an  $OR_{p_i} \circ AND_{q_i+1} \circ C(\max\{r_i, t\})$  circuit. For this, we just feed a new circuit computing G to every AND gate. Plugging this into (1), gives an

$$OR_{p_0+p_1} \circ AND_{\max\{q_0,q_1\}+1} \circ C(\max\{t,r_0,r_1\})$$
 (2)

circuit for computing C.

Below, we provide details specific to each of the two items from the theorem statement. In particular, we estimate the parameters  $p_i$ 's,  $q_i$ 's,  $q_i$ 's,  $q_i$ 's, and t and plug them into (2).

- 1. The base case is s=1. Then  $\mathcal{C}$  consists of a single gate and can be expressed as an  $\mathrm{OR}_1 \circ \mathrm{AND}_1 \circ C(1)$  circuit. For the induction step, assume that  $s \geq 2$  and take a gate A that depends on two variables. Let G=A, hence t=1. The gate A must have at least one successor (otherwise  $\mathcal{C}$  can be replaced by a circuit with smaller than s gates). Clearly, A and its successors are not needed in  $\mathcal{C}_i$ 's. Hence, by the induction hypothesis  $p_i \leq 2^{\frac{s-2}{2}+1}, q_i \leq \frac{s-2}{2}+1, r_i \leq 1$ . Plugging this into (2) gives the desired result.
- 2. Take a gate A that is fed by two variables x and z and has the maximum distance to an output. If its distance to output is at most 4, then  $s(\mathcal{C}) \leq 15$  and we just rewrite it as an  $\mathrm{OR}_1 \circ \mathrm{AND}_1 \circ C(15)$  circuit. This is the base case. Assume now that the distance from A to the output gate is at least 5. In the analysis below, we always "follow" the longest path from A to the output. This allows us to conclude that any such path is long enough and hence each gate considered has positive out-degree (i.e., is not an output). Moreover, each gate on this path cannot depend on too many variables. Let B be a successor of A on the longest path to the output.

In the five cases below, we show that we can always find a gate G that  $s(G) \leq 15$  and both  $s(\mathcal{C}_0)$  and  $s(\mathcal{C}_1)$  are small enough. In particular,  $s(\mathcal{C}_0), s(\mathcal{C}_1) \leq s-4$  works for us:  $p_0 + p_1 \leq 2 \cdot 2^{\lceil \frac{s-4}{3.9} \rceil} < 2^{\lceil \frac{s}{3.9} \rceil}, \max\{q_0, q_1\} + 1 \leq \lceil \frac{s-4}{3} \rceil + 1 < \lceil \frac{s}{3} \rceil$ .

See Figure 2 for an illustration of the five cases. For a gate G, by  $\operatorname{out}(G)$  we denote the out-degree of G.



Case 1.1: when ECase 1.2: when is constant, one re-C is constant, one moves B, C, E, removes B, C, and removes B and and successors successors of C. of E.

Case 2.1: when B is constant, one its successors, replace A by  $D \oplus c$ .

Case 2.2.1: when B is constant, one when B is conremoves B and its stant, one resuccessors, and A.

Case 2.2.2: moves B and its successors: moreover, B=1it forces A to be a constant and removes A and its successors.

**Figure 2** Cases in the proof of the second part of Theorem 18.

Case 1: out(B) = 1. Let C be the successor of B.

Case 1.1: out(C) = 1. Let E be the successor of C. Let G = E. In  $C_i$ 's, one removes B, C (as they were only needed to compute E that is now a constant), E, and the successors of E.

Case 1.2: out(C)  $\geq 2$ . Let G = C. In  $C_i$ 's, one removes B, C, and the successors of C.

Case 2:  $out(B) \geq 2$ . Let D be the other input of B. It may be a gate or an input variable. If B computes a constant Boolean binary operation or an operation that depends on A or D only, then  $\mathcal{C}$  is not optimal. Otherwise, B computes one of the following two types of functions (either linear or quadratic polynomial over  $\mathbb{F}_2$ ):

**Case 2.1:**  $B(A, D) = A \oplus D \oplus c$  where  $c \in \{0, 1\}$ . Let G = B. In  $C_i$ 's, one immediately removes B and its successors. Also, in  $C_i$ ,  $D \oplus A = i \oplus c$ . Hence, A may be replaced by  $D \oplus i \oplus c$ .

Case 2.2:  $B(A, D) = (A \oplus a) \cdot (D \oplus d) \oplus c$  where  $a, d, c \in \{0, 1\}$ .

Case 2.2.1: out(A) = 1. Let G = C. In  $C_i$ 's, one removes B, its successors, and A. Case 2.2.2:  $out(A) \geq 2$ . Let D be the other successor of B. Let G = B. In  $C_i$ 's, one removes B and its successors. Also,  $B = c \oplus 1$  forces  $A = a \oplus 1$  and  $D = d \oplus 1$ . Hence, in  $\mathcal{C}_{c \oplus 1}$  two additional gates are removed: A and its successors (if a successor of B happens to be a successor of A also, then it is a function on Aand D and the circuit can be simplified, which contradicts its optimality). Hence,  $p_0 + p_1 \le 2^{\left\lceil \frac{s-3}{3.9} \right\rceil} + 2^{\left\lceil \frac{s-5}{3.9} \right\rceil}$ . This is smaller than  $2^{\left\lceil \frac{s}{3.9} \right\rceil}$  since  $2^{-\frac{3}{3.9}} + 2^{-\frac{5}{3.9}} < 1$ .

This completes the proof.

▶ Remark 19. It is not difficult to see that the output OR gate is a "disjoint OR", and can be replaced by a SUM gate over the integers. In other words, for every  $x \in \{0,1\}^n$ , at most one subcircuit feeding into the OR gate may evaluate to 1. This holds because we always consider two mutually exclusive cases: G = 0 or G = 1.

#### 4.3 Properties of $\alpha(k)$

We start by observing a lower bound on  $\alpha(k)$ .

▶ **Lemma 20.** For any integer  $k \ge 2$ ,  $\alpha(k) \ge 1/k$ .

**Proof.** Let  $\oplus_n$  denote the parity function of n inputs. It has  $2^{n-1}$  inputs where it is equal to 1 and all these inputs are isolated, that is, the Hamming distance between any pair of them is at least 2. As proven by Paturi, Pudlák, and Zane [39], every k-CNF has at most  $2^{n(1-1/k)}$  isolated satisfying assignments. This implies that  $\oplus_n$  cannot be computed by an OR of fewer than  $2^{n/k-1}$  k-CNFs. Since  $s(\oplus_n) = n-1$ , this implies that

$$\alpha(k) \ge \frac{\frac{n}{k} - 1}{n - 1}.$$

Since this must hold for arbitrary large n,  $\alpha(k) \geq 1/k$ .

Thus, we know the exact value of  $\alpha(2) = \frac{1}{2}$ . This immediately implies a circuit lower bound of 2n - o(n) for BCH codes. Indeed, it was shown in [40] that when the bottom fan-in is restricted to k=2, then BCH codes require depth-3 circuits of size  $2^{n-o(n)}$ . And, since  $\alpha(2) = \frac{1}{2}$ , they must have circuit complexity at least 2n - o(n).

One can use techniques from Theorem 18 to prove an upper bound of  $\alpha(3) \leq \frac{\log_2 3}{4}$ . Thus, we know that

$$\frac{1}{3} \le \alpha(3) \le \frac{\log_2 3}{4} < 0.3963.$$

We conjecture that the upper bound on  $\alpha_3$  is tight. One way to prove this would be to find the  $s_3^3$  complexity of the inner product function:  $IP(x_1, \ldots, x_n) = x_1x_2 \oplus x_3x_4 \oplus \cdots \oplus x_{n-1}x_n$ . In particular, if the upper bound shown in the next lemma is tight, then  $\alpha(3) = \frac{\log_2 3}{4}$ .

#### ▶ Lemma 21.

- $\begin{array}{ll} \textbf{1.} & 2^{\frac{n}{4}} \leq s_3^2(\mathrm{IP}) \leq 2^{\frac{n}{2} o(n)}. \\ \textbf{2.} & 2^{\frac{n}{6}} \leq s_3^3(\mathrm{IP}) \leq 3^{\frac{n}{4}}. \end{array}$

**Proof.** Note that by substituting every other input of IP by 1, one gets the parity function  $\oplus_{\frac{\pi}{n}}$  on the remaining n/2 inputs. Now both lower bounds follow from the corresponding lower bounds for the parity function:  $s_3^2(\oplus_k) \geq 2^{\frac{k}{2}}$  and  $s_3^3(\oplus_k) \geq 2^{\frac{k}{3}}$ .

1. The first upper bound follows from the fact that  $IP(x_1,\ldots,x_n)=1$  iff there is an odd number of ones among

$$p_1 = x_1 x_2, p_2 = x_3 x_4, \dots, p_{\frac{n}{2}} = x_{n-1} x_n.$$

Hence,

$$\operatorname{IP}(x_1,\ldots,x_n) \equiv \bigvee_{S \subseteq \left[\frac{n}{2}\right] \colon |S| \bmod 2 = 1} \left( \bigwedge_{i \in S} [p_i = 1] \land \bigwedge_{i \notin S} [p_i = 0] \right).$$

It remains to note that each  $[p_i = c]$  can be expressed as a 2-CNF because  $p_i$  depends on two variables.

2. For the second upper bound, note that  $IP(x_1,\ldots,x_n)=1$  iff there is an odd number of 1's among

$$p_1 = x_1 x_2 \oplus x_3 x_4, p_2 = x_5 x_6 \oplus x_7 x_8, \dots, p_{\frac{n}{4}} = x_{n-3} x_{n-2} \oplus x_{n-1} x_n.$$

To compute IP by a depth 3 circuit, we go through all possible  $2^{\frac{n}{4}-1}$  values of  $p_1, \ldots, p_{\frac{n}{4}}$  such that an odd number of them is equal to 1:

$$\operatorname{IP}(x_1, \dots, x_n) \equiv \bigvee_{S \subseteq \left[\frac{n}{4}\right] \colon |S| \bmod 2 = 1} \left( \bigwedge_{i \in S} [p_i = 1] \land \bigwedge_{i \notin S} [p_i = 0] \right)$$
(3)

Now, we show that  $[p_i = 0]$  can be written as a single 3-CNF, whereas  $[p_i = 1]$  can be expressed as an OR of two 3-CNFs. W.l.o.g. assume that i = 1. The clauses of a 3-CNF expressing  $[p_i = 0]$  should reject all assignments to  $x_1, x_2, x_3, x_4 \in \{0, 1\}$  where  $IP(x_1, x_2, x_3, x_4) = 1$ . In all such assignments, one of the two monomials  $(x_1x_2)$  and  $(x_1x_2)$  and  $(x_1x_2)$  is equal to 0 whereas the other one is equal to 1. Hence, one needs to write down a set of clauses rejecting the following four partial assignments:  $\{x_1 = 0, x_3 = x_4 = 1\}$ ,  $\{x_1 = x_2 = 1, x_3 = 0\}$ ,  $\{x_1 = x_2 = 1, x_4 = 0\}$ . Thus,

$$[p_1(x_1, x_2, x_3, x_4) = 0] \equiv (x_1 \vee \neg x_3 \vee \neg x_4) \wedge (x_2 \vee \neg x_3 \vee \neg x_4) \wedge (\neg x_1 \vee \neg x_2 \vee x_3) \wedge (\neg x_1 \vee \neg x_2 \vee x_4).$$

In turn, to express  $[p_1 = 1]$  as an OR of two 3-CNFs we consider both assignments to  $x_1$ :

$$[p_1(x_1, x_2, x_3, x_4) = 1] \equiv ((x_1) \land [x_2 \oplus x_3 x_4 = 0]) \lor ((\neg x_1) \land [x_3 x_4 = 1]).$$

It remains to note that each of  $[x_2 \oplus x_3x_4 = 0]$  and  $[x_3x_4 = 1]$  can be written as a 3-CNF. Let  $[p_i = 0] \equiv P_i$  and  $[p_i = 1] \equiv ((x_i) \land Q_i) \lor ((\neg x_i) \land R_i)$  where  $P_i$ ,  $Q_i$ , and  $R_i$  are 3-CNFs. One may then expand (3) as follows:

$$\bigvee_{S\subseteq [\frac{n}{4}]\colon |S| \bmod 2 = 1} \left( \bigvee_{T\subseteq S} \left( \bigwedge_{i\in T} \left( (x_i) \land Q_i \right) \land \bigwedge_{i\in S\backslash T} \left( (\neg x_i) \land R_i \right) \land \bigwedge_{i\not\in S} P_i \right) \right)$$

The fan-in of the resulting OR-gate is

$$\sum_{S \subseteq [\frac{n}{4}]: \ |S| \bmod 2 = 1} 2^{|S|} \le \sum_{i=0}^{\frac{n}{4}} \binom{n/4}{i} 2^i = 3^{\frac{n}{4}}.$$

#### ▶ Open Problem 22. Determine $s_3^3(IP)$ .

Besides finding the exact values of  $\alpha(k)$ , it would be interesting to find out whether every circuit of *linear size* can be computed by a non-trivial depth 3 circuit with constant bottom fan-in. We restate this open problem below.

▶ Open Problem 2. Prove or disprove: for any constant c, any circuit of size cn can be computed as an

$$OR_{2^{(1-\delta(c))n}} \circ AND \circ OR_{\gamma(c)}$$

circuit, for some  $\delta(c) > 0$  and integer  $\gamma(c) \geq 1$ .

This paper supports the conjecture by showing that it holds for small values of c. As another example, we can consider a class of functions where we know linear *upper bounds* on circuit complexity. For any *symmetric* function f (i.e., a function whose value depends only on the sum over integers of the input bits) we know that  $s(f) \leq 4.5n + o(n)$  [11]. It is also known [40, 60] that symmetric functions can be computed by relatively small depth-3 circuits:  $s_3^k(f) \leq \text{poly}(n) \cdot (1+1/k)^n$  (and this bound is tight [60]).

Since in our depth reduction results, we always get k-CNFs with small linear number of clauses, it is interesting to study the expressiveness of OR of exponential number of such k-CNFs. Let us define  $\alpha(k,c)$  as the infimum of all values  $\alpha$  such that any circuit of size at most cn can be computed as an  $OR_{2^{\alpha n}} \circ AND_{cn} \circ OR_k$ . We can upper bound the rate of convergence of  $\alpha(k,c)$  using the following width reduction result for CNF-formulas [49, 5].

▶ Theorem 23 ([49, 5]). For any constant  $0 < \varepsilon \le 1$  and a function  $C : \mathbb{N} \to \mathbb{N}$ , any CNF formula f with n variables and  $n \cdot C(n)$  clauses can be expressed as  $f = OR_{i=1}^t f_i$ , where  $t \le 2^{\varepsilon n}$  and each  $f_i$  is a k-CNF formula with at most  $n \cdot C(n)$  clauses, where  $k = O\left(\frac{1}{\varepsilon} \cdot \log\left(\frac{C(n)}{\varepsilon}\right)\right)$ .

For our applications, we are interested in  $\alpha(k,c)$  for small fixed c. Since for every c,  $\alpha(k,c)$  is a non-increasing bounded sequence, we let  $\alpha(\infty,c) = \lim_{k\to\infty} \alpha(k,c)$ . Then Theorem 23 implies that  $\alpha(k,c) \geq \alpha(\infty,c) \geq \alpha(k,c) - O(\frac{\log(ck)}{k})$ .

#### References -

- Josh Alman and Ryan Williams. Probabilistic polynomials and hamming nearest neighbors. In FOCS 2015, pages 136–150. IEEE, 2015.
- 2 Alexander E. Andreev. On a method for obtaining more than quadratic effective lower bounds for the complexity of  $\pi$ -schemes. *Moscow Univ. Math. Bull.*, 42(1):63–66, 1987.
- 3 Ravi B. Boppana. The average sensitivity of bounded-depth circuits. *Inf. Process. Lett.*, 63(5):257–261, 1997.
- 4 Chris Calabro. A lower bound on the size of series-parallel graphs dense in long paths. In ECCC, volume 15, 2008.
- 5 Chris Calabro, Russell Impagliazzo, and Ramamohan Paturi. A duality between clause width and clause density for SAT. In *CCC 2006*, pages 252–260, 2006.
- 6 Aleksandr V. Chashkin. On the complexity of Boolean matrices, graphs and their corresponding Boolean functions. *Discrete Math. and Appl.*, 4(3):229–257, 1994.
- 7 Lijie Chen and Ryan Williams. Circuit lower bounds from PCP of proximity. Unpublished manuscript, 2019.
- 8 Ruiwen Chen and Valentine Kabanets. Correlation bounds and #SAT algorithms for small linear-size circuits. In *COCOON 2015*, pages 211–222. Springer, 2015.
- 9 Gil Cohen and Avishay Tal. Two structural results for low degree polynomials and applications. In RANDOM 2015, pages 680–709, 2015.
- Vlado Dančík. Complexity of Boolean functions over bases with unbounded fan-in gates. Inf. Process. Lett., 57(1):31–34, 1996.
- 11 Evgeny Demenkov, Arist Kojevnikov, Alexander S. Kulikov, and Grigory Yaroslavtsev. New upper bounds on the boolean circuit complexity of symmetric functions. *Inf. Process. Lett.*, 110(7):264–267, 2010.
- 12 Irit Dinur and Or Meir. Toward the KRW Composition Conjecture: Cubic Formula Lower Bounds via Communication Complexity. In *CCC* 2016, pages 3:1–3:51, 2016.
- Paul Erdös, Ronald L. Graham, and Endre Szemerédi. On sparse graphs with dense long paths. *Comp. and Math. with Appl.*, 1:145–161, 1975.
- Magnus G. Find, Alexander Golovnev, Edward A. Hirsch, and Alexander S. Kulikov. A better-than-3n lower bound for the circuit complexity of an explicit function. In FOCS 2016, pages 89–98, 2016.
- 15 Joel Friedman. A note on matrix rigidity. Combinatorica, 13(2):235–239, 1993.
- Dmitry Gavinsky, Or Meir, Omri Weinstein, and Avi Wigderson. Toward better formula lower bounds: An information complexity approach to the KRW composition conjecture. In STOC 2014, pages 213–222, 2014.
- 17 Alexander Golovnev, Edward A. Hirsch, Alexander Knop, and Alexander S. Kulikov. On the limits of gate elimination. J. Comput. Syst. Sci., 96:107–119, 2018.

- 18 Alexander Golovnev, Alexander S. Kulikov, and R. Ryan Williams. Circuit depth reductions. arXiv, 2018. arXiv:1811.04828.
- 19 Dmitrii Yu. Grigoriev. Application of separability and independence notions for proving lower bounds of circuit complexity. Zap. Nauch. Sem. POMI, 60:38–48, 1976.
- 20 Johan Håstad. Almost optimal lower bounds for small depth circuits. In STOC 1986, pages 6–20, 1986.
- 21 Johan Håstad. The shrinkage exponent of de Morgan formulas is 2. SIAM J. Comput., 27(1):48-64, 1998.
- 22 Johan Håstad, Stasys Jukna, and Pavel Pudlák. Top-down lower bounds for depth 3 circuits. In FOCS 1993, pages 124–129, 1993.
- 23 Russell Impagliazzo and Valentine Kabanets. Fourier concentration from shrinkage. *Comput. Complex.*, 26(1):275–321, 2017.
- 24 Russell Impagliazzo and Noam Nisan. The effect of random restrictions on formula size. Random Struct. Algorithms, 4(2):121–134, 1993.
- 25 Russell Impagliazzo, Ramamohan Paturi, and Francis Zane. Which problems have strongly exponential complexity? *J. Comput. Syst. Sci.*, 63(4):512–530, 2001.
- Mauricio Karchmer, Ran Raz, and Avi Wigderson. Super-logarithmic depth lower bounds via the direct sum in communication complexity. *Comput. Comput.*, 5(3/4):191–204, 1995.
- 27 Mauricio Karchmer and Avi Wigderson. Monotone circuits for connectivity require superlogarithmic depth. SIAM J. Discrete Math., 3(2):255–265, 1990.
- 28 Tali Kaufman, Shachar Lovett, and Ely Porat. Weight distribution and list-decoding size of reed-muller codes. IEEE Trans. Inf. Theory, 58(5):2689–2696, 2012.
- 29 Valeriy M. Khrapchenko. A method of determining lower bounds for the complexity of  $\pi$ -schemes. Math. Notes of the Acad. of Sci. of the USSR, 10(1):474–479, 1971.
- 30 Maria M. Klawe. Shallow grates. *Theor. Comput. Sci.*, 123(2):389–395, 1994.
- 31 Ilan Komargodski, Ran Raz, and Avishay Tal. Improved average-case lower bounds for demorgan formula size. In *FOCS 2013*, pages 588–597, 2013.
- 32 Satyanarayana V. Lokam. Complexity lower bounds using linear algebra. Found. Trends Theor. Comput. Sci., 4(1-2):1–155, 2009.
- 33 Oleg B. Lupanov. On rectifier and switching-and-rectifier schemes. *Dokl. Akad. Nauk SSSR*, 111(6):1171–1174, 1956. In Russian.
- 34 Or Meir and Avi Wigderson. Prediction from partial information and hindsight, with application to circuit lower bounds. In *ECCC*, volume 24, 2017.
- 35 Edward I. Nechiporuk. On a Boolean function. Dokl. Akad. Nauk SSSR, 169(4):765–766, 1966.
- 36 Mike Paterson and Uri Zwick. Shrinkage of de Morgan formulae under restriction. *Random Struct. Algorithms*, 4(2):135–150, 1993.
- 37 Ramamohan Paturi and Pavel Pudlák. Circuit lower bounds and linear codes. *J. Math. Sci.*, 134(5):2425–2434, 2006.
- 38 Ramamohan Paturi, Pavel Pudlák, Michael E Saks, and Francis Zane. An improved exponential-time algorithm for k-SAT. J.~ACM,~52(3):337-364,~2005.
- 39 Ramamohan Paturi, Pavel Pudlák, and Francis Zane. Satisfiability coding lemma. In FOCS 1997, pages 566–574, 1997.
- 40 Ramamohan Paturi, Michael E. Saks, and Francis Zane. Exponential lower bounds for depth 3 Boolean circuits. In STOC 1997, pages 86–91, 1997.
- 41 Pavel Pudlák and Zdeněk Vavřín. Computation of rigidity of order  $\frac{n^2}{r}$  for one simple matrix. Comment. Math. Univ. Carolinae, 32(2):213–218, 1991.
- 42 Alexander A. Razborov. Lower bounds on the dimension of schemes of bounded depth in a complete basis containing the logical addition function. *Mat. Zametki*, 41(4):598–607, 1987.
- 43 Ben W. Reichardt. Reflections for quantum query algorithms. In SODA 2011, pages 560–569. SIAM, 2011.
- 44 Zachary Remscrim. The Hilbert function, algebraic extractors, and recursive fourier sampling. In FOCS 2016, pages 197–208, 2016.

#### 24:20 Circuit Depth Reductions

- 45 Rahul Santhanam. Fighting perebor: New and improved algorithms for formula and QBF satisfiability. In FOCS 2010, pages 183–192, 2010.
- 46 Rahul Santhanam and Srikanth Srinivasan. On the limits of sparsification. In ICALP 2012, pages 774–785, 2012.
- 47 Georg Schnitger. A family of graphs with expensive depth-reduction. Theor. Comput. Sci., 18(1):89–93, 1982.
- 48 Georg Schnitger. On depth-reduction and grates. In FOCS 1983, pages 323–328, 1983.
- Rainer Schuler. An algorithm for the satisfiability problem of formulas in conjunctive normal form. *J. Algorithms*, 54(1):40–44, 2005.
- 50 Igor S. Sergeev. On complexity of circuits and formulas of bounded depth over unbounded fan-in bases. *Disc. Math. Appl.*, 30(2):120–137, 2018. In Russian.
- 51 Kazuhisa Seto and Suguru Tamaki. A satisfiability algorithm and average-case hardness for formulas over the full binary basis. *Comput. Complex.*, 22(2):245–274, 2013.
- 52 Claude E. Shannon. The synthesis of two-terminal switching circuits. Bell Syst. Tech. J., 28:59–98, 1949.
- 53 Alexander A Sherstov. Making polynomials robust to noise. In STOC 2012, pages 747–758. ACM, 2012.
- 54 Bella A. Subbotovskaya. Realizations of linear functions by formulas using +, ·, −. Dokl. Akad. Nauk SSSR, 136(3):553–555, 1961.
- 55 Avishay Tal. Shrinkage of De Morgan formulae by spectral techniques. In FOCS 2014, pages 551–560. IEEE, 2014.
- 56 Leslie G. Valiant. Graph-theoretic arguments in low-level complexity. In MFCS 1977, pages 162–176, 1977.
- 57 Emanuele Viola. On the power of small-depth computation. Found. Trends Theor. Comput. Sci., 5(1):1–72, 2009.
- 58 Emanuele Viola and Avi Wigderson. Norms, XOR lemmas, and lower bounds for polynomials and protocols. *Theory Comput.*, 4(1):137–168, 2008.
- Richard Ryan Williams. Limits on representing Boolean functions by linear combinations of simple functions: Thresholds, ReLUs, and low-degree polynomials. In CCC 2018, pages 6:1–6:24, 2018.
- 60 Guy Wolfovitz. The complexity of depth-3 circuits computing symmetric Boolean functions. *Inf. Process. Lett.*, 100(2):41–46, 2006.