# Parity, Circuits, and the Polynomial-Time Hierarchy* 

Merrick Furst, ${ }^{1}$ James B. Saxe, ${ }^{2}$ and Michael Sipser ${ }^{3}$<br>${ }^{1}$ Department of Computer Science, Carnegie-Mellon University, Pittsburgh, Pa., 15213<br>${ }^{2}$ Department of Computer Science, Carnegie-Mellon University, Pittsburgh, Pa., 15213<br>${ }^{3}$ Department of Mathematics, Massachusetts Institute of Technology, Boston, Mass., 02139


#### Abstract

A super-polynomial lower bound is given for the size of circuits of fixed depth computing the parity function. Introducing the notion of poly-nomial-size, constant-depth reduction, similar results are shown for the majority, multiplication, and transitive closure functions. Connections are given to the theory of programmable logic arrays and to the relativization of the polynomial-time hierarchy.


## 1. Introduction

It is of both practical and theoretical interest to know the size of boolean circuits necessary to compute common boolean functions. Despite considerable effort, the techniques for proving combinatorial lower bounds on circuit size remain primitive. To date, even for circuits restricted to be monotone, linear lower bounds are the best that can be shown for naturally arising functions [14, 2, 16].

Lupanov studied bounded depth circuits in 1961 and showed that parity circuits of depth 2 must have an exponential number of gates [11]. Krapchenko, investigating the complexity of the parity function, proved an $n^{2}$ lower bound on the size of parity formulas $[9,10]$. Considering a different restriction of the circuit model we prove a superpolynomial lower bound: constant-depth circuits of $\neg$-gates and arbitrary fan-in $\wedge$ and $\vee$-gates must use more than a polynomial number of gates to compute parity, majority, multiplication, or transitive closure.

[^0]Constant-depth circuits elegantly model programmable logic arrays ( $P L A ' s$ ), a type of integrated circuit used inside microprocessors to compactly represent many functions [14]. According to folklore, some common functions, e.g., parity, multiplication, and transitive closure cannot be implemented with small PLA's. The new lower bound establishes a basis for this belief by showing that any PLA implementing these functions must be using more than a polynomial amount of chip area.

Lower bounds for constant-depth circuits relate to questions about the relativization of the polynomial-time hierarchy. We show that an $\Omega\left(n^{\text {poly }(\log n)}\right)$ lower bound on the size of constant-depth circuits computing parity would imply the existence of an oracle $A$ that separates $\cup_{i} \Sigma_{i}^{P, A}$ from PSPACE ${ }^{A}$. Readers not interested in this connection may wish to skip Section 2 and proceed directly to Section 3 where the circuit lower bound is proved.

## 2. The Polynomial-Time Hierarchy

The Meyer-Stockmeyer polynomial-time hierarchy is a ladder of language classes lying between $P$ and PSPACE, as depicted in Figure 1 [15, 20].

A language $L$ is in the class $\Sigma_{i}^{P}$ if and only if the strings $y$ in $L$ can be characterized as satisfying a sentence

$$
\exists^{P} x_{1} \forall^{P} x_{2} \cdots Q_{i}^{P} x_{i} R\left(y, x_{1}, \ldots, x_{i}\right)
$$

in which
(i) $\exists^{P} x_{j}$ quantify over strings of length polynomial in $|y|$,
(ii) $Q_{i}^{P}$ is $\forall^{P}$, or $\exists^{P}$ as $i$ is even or odd, and
(iii) $R$ is a deterministic polynomial-time predicate.


Fig. 1. The Polynomial-Time Hierarchy

The class $\Pi_{i}^{P}$ is defined to be co- $\Sigma_{i}^{P}$. Thus, $P=\Sigma_{0}^{P}=\Pi_{0}^{P}, N P=\Sigma_{1}^{P}$, and $\operatorname{co} N P=\Pi_{1}^{P}$.

Aside from the obvious trivial inclusions, nothing is known about the relationships among these classes. Separating $\Sigma_{i}^{P}$ from $\Sigma_{j}^{P}$, for some $i, j$ with $0<i<j$, would separate $P, N P$, and PSPACE. Baker, Gill and Solovay [3] posed the following question for language classes in the polynomial hierarchy relativized to oracles; does there exist an oracle $X$ such that, for some values of $i$, $\Sigma_{i}^{P, X} \subsetneq \sum_{i+1}^{P, X}$ ? The strongest separation theorem known gives an oracle $B$ such that, $\Sigma_{2}^{P, B} \neq \Pi_{2}^{P, B}$, which implies $\Sigma_{2}^{P, B} \subsetneq \Sigma_{3}^{P, B}$ [4]. The technique used to obtain this result is not strong enough to separate any higher levels.

## Parity and the Relativized Polynomial Hierarchy

There is more than one notion of relativized space bounded computation [12]. Permitting machines to make exponentially long oracle queries-by not accounting for the space used on the query tape-makes it quite easy to construct an $A$ that separates $U_{i} \Sigma_{i}^{P, A}$ and PSPACE ${ }^{A}$. The separation problem becomes interesting only if oracle queries must fit within the polynomial-space bound and we adopt this convention.

Definition. The polynomial hierarchy relativized by an oracle $A$ is $\mathrm{PH}^{A}=$ $\cup_{i} \Sigma_{i}^{P, A}$.

In what follows we prove that if parity of $n$ inputs cannot, for any $k$, be computed by constant-depth, $O\left(n^{\log ^{k} n}\right)$-size circuits, then there exists an oracle $A$ that separates PSPACE ${ }^{A}$ and PH $^{A}$. The proof requires several lemmas.

First, consider the oracle property,
$L^{A}=\left\{1^{n} \mid\right.$ the number of strings of length $n$ in $A$ is odd $\}$,
as defined by Angluin [1]. It is certainly in PSPACE ${ }^{A}$. Can it be in $\Sigma_{i}^{P, A}$, for some fixed $i$ ?

Lemma 2.1. Let

$$
\sigma(y)=\exists^{P} x_{1} \forall^{P} x_{2} \cdots Q_{i}^{P} x_{i} R^{A}\left(x_{1}, \ldots, x_{i}, y\right)
$$

be a sentence in $\Sigma_{i}^{P, A}$. There is an equivalent sentence

$$
\bar{\sigma}(y)=\exists \exists^{P} x_{1} \forall^{P} x_{2} \ldots Q_{i+2}^{P} x_{i+2} \bar{R}^{A}\left(x_{1}, \ldots, x_{i+2}, y\right)
$$

in $\Sigma_{i+2}^{P, A}$, where $\bar{R}^{A}\left(x_{1}, \ldots, x_{i+2}, y\right)$ is a predicate that can be computed by a machine that makes at most one oracle query and runs for a polynomial amount of time.
Proof. We show how $R^{A}\left(x_{1}, \ldots, x_{i}, y\right)$ can be expressed as

$$
\exists^{P} r \forall^{P} s \bar{R}^{A}\left(x_{1}, \ldots, x_{i}, r, s, y\right)
$$

where $\bar{R}^{A}\left(x_{1}, \ldots, x_{i}, r, s, y\right)$ can be computed by a machine that makes only one oracle query and runs for a polynomial amount of time. Then, substituting this two quantifier sentence for $R^{A}$ in $\sigma$ we obtain $\bar{\sigma}$.

Let $M_{R}^{A}\left(x_{1}, \ldots, x_{i}, y\right)$ be a polynomial-time machine that computes the predicate $R^{A}\left(x_{1}, \ldots, x_{i}, y\right)$. If specific values of $x_{1}, \ldots, x_{i}$, and $y$ are given, $M$ runs for a polynomial number of steps, makes some oracle query $q_{j}$, receives a response $r_{j}$ (" $q_{j} \in A$ ", or " $q_{j} \notin A$ "), branches one of two ways based on the outcome, and iterates this process a polynomial number of times before finally accepting or rejecting. This computation (whose value is really a function of the oracle $A$ ) may be viewed as a binary tree $T$ with internal nodes labeled with oracle queries $q_{j}$, edges labeled " $q_{j} \in A$ ", or " $q_{j} \notin A "$, and leaves labeled either "accept" or "reject". The sentence

$$
\exists P_{r} \forall^{P} s \bar{R}^{A}\left(x_{1}, \ldots, x_{i}, r, s, y\right)
$$

must characterize those oracles $A$ that lead $M$ to accept. Informally, the sentence should say: "there is an accepting path in $T$ along which every query response is consistent with $A$." More formally, let $r$ range over all possible polynomially long sequences of (query $q_{j}$, response $r_{j}$ ) pairs, and let $s$ range over all possible queries $q_{s}$. Let $\bar{R}^{A}\left(x_{1}, \ldots, x_{i}, r, s, y\right)$ be the predicate
$" r=\left(q_{j_{1}}, r_{j_{1}}\right) \ldots\left(q_{j_{1}}, r_{j_{i}}\right)$ is an accepting branch in $T$
$\quad$ and
if $q_{s}$ appears as query $q_{j_{t}}$ along branch $r$ then $r_{j_{1}}$ is the correct answer to the question $q_{j_{t}} \in{ }^{?} A^{\prime \prime}$.

The first part of this predicate can be computed in polynomial time by simply simulating $M$. The second part can be determined by making at most one oracle query.

Actually, a technically stronger result can be shown.
Corollary 2.2. Let

$$
\sigma(y)=\exists^{P} x_{1} \forall^{P} x_{2} \ldots Q_{i}^{P} x_{i} R^{A}\left(x_{1}, \ldots, x_{i}, y\right)
$$

be a sentence in $\Sigma_{i}^{P, A}$. There is an equivalent sentence

$$
\bar{\sigma}(y)=\exists^{P} x_{1} \forall^{P} x_{2} \ldots Q_{i+1}^{P} x_{i+1} \vec{R}^{A}\left(x_{1}, \ldots, x_{i+1}, y\right)
$$

in $\Sigma_{i+1}^{P, A}$, where $\bar{R}^{A}\left(x_{1}, \ldots, x_{i+1}, y\right)$ can be computed by a machine that makes at most one oracle query and runs for a polynomial amount of time.

Proof. In the proof of Lemma 2.1, the two quantifier sentence equivalent to $R$ could just as easily have been constructed to be a $\forall \exists$ sentence whose polynomialtime predicate requires only one oracle query to evaluate. If quantifier $Q_{i}$ had been existential (universal), the first quantifier of the two quantifier sentence used to replace $R$ could have been chosen to be existential (universal). This could have allowed a merging of two adjacent levels of like quantifiers, giving a $\Sigma_{i+1}^{P, A}$ sentence $\bar{\sigma}$.

Lemma 2.3. If the oracle property $L^{A}$, defined by (1) is in $\Sigma_{i}^{P, A}$ then, for some fixed $k$, there exists a depth $i+1, O\left(n^{\log ^{k} n}\right)$-size circuit to compute parity of $n$ variables.

Proof. Suppose $1^{n}$ is in $L^{A}$ if and only if $\exists^{P} x_{1} \forall^{P} x_{2} \ldots Q_{i}^{P} x_{i} R^{A}\left(x_{1}, \ldots, x_{i}, 1^{n}\right)$. Let $\bar{\sigma}$ be as in Corollary 2.2. We can think of the sentence

$$
\bar{\sigma}\left(1^{n}\right)=\exists^{P} x_{1} \forall^{P} x_{2} \ldots Q_{i+1}^{P} x_{i+1} \bar{R}^{A}\left(x_{1}, \ldots, x_{i+1}, 1^{n}\right)
$$

as describing a height $i+1$ tree with root labeled " 3 ", internal nodes labeled " 3 " or " $\forall$ ", and leaves labeled " $\bar{R}^{A}\left(x_{1}, \ldots, x_{i+1}, 1^{n}\right)$ ", for specific values of $x_{1}, \ldots, x_{i+1}$. Each internal node has $2^{\left(n^{\kappa}\right)}$ sons, one for each possible value of an $x_{i}$. The predicate $\bar{R}^{A}\left(x_{1}, \ldots, x_{i+1}, 1^{n}\right)$, for fixed values of $x_{1}, \ldots, x_{i+1}$, is determined by at most one oracle query $q_{i}$. Thinking of the $2^{n}$ length- $n$ oracle queries as variables $q_{1}, \ldots, q_{2^{n}}$, we see that $\vec{R}^{A}\left(x_{1}, \ldots, x_{i+1}, 1^{n}\right)$ is equivalent to either the literal $q_{j}$, or the literal $\bar{q}_{j}$, for some $j$. Replacing $\forall$-nodes by $\wedge$-gates, $\exists$-nodes by $\vee$-gates, and the $\bar{R}^{A}\left(x_{1}, \ldots, x_{i+1}, 1^{n}\right)$ leaves by $q_{j}$ or $\bar{q}_{j}$ as appropriate, we obtain, for some $c$, a depth $i+1, c^{\left(n^{k}\right)}$-size circuit computing parity of the $2^{n}$ variables $q_{j}$. Noticing that $c^{\left(n^{k}\right)}$ is $O\left(\left(2^{n}\right)^{\log ^{k}\left(2^{n}\right)}\right)$, the result follows.

Theorem 2.4. If the parity function on $n$ variables cannot be computed by constant-depth, $O\left(n^{\log ^{k} n}\right)$-size circuits, then there exists an oracle $A$ such that PSPACE ${ }^{A} \supsetneq \mathrm{PH}^{\mathrm{A}}$.

Proof. Using Lemma 2.3 and the assumption that parity cannot be computed by constant-depth, $\Omega\left(n^{\log ^{k} n}\right)$-size circuits, as in [4], we can diagonalize away from PH and construct an oracle $A^{\prime}$ such that $L^{A^{\prime}} \notin \mathrm{PH}^{A^{\prime}}$. On the other hand, for any $A$, $L^{A} \in$ PSPACE $^{A}$ since a machine need only run through all strings of length $n$ and count how many are in $A$ to determine " $1{ }^{n} \in L^{?} L^{A}$ ". Thus, $L^{A^{\prime}} \in \operatorname{PSPACE}^{A^{\prime}}-\mathrm{PH}^{A^{\prime}}$.

We conjecture that there is no $k$ such that parity can be computed by constant-depth, $\Omega\left(n^{\log ^{k} n}\right)$-size circuits.

## 3. The Parity Lower Bound

We begin by defining terms.
Definition. For each $n, X_{n}=\left\{x_{1}, \ldots, x_{n}\right\}$ is the set of inputs, and $\bar{X}_{n}=$ $\left\{x_{1}, \bar{x}_{1}, \ldots, x_{n}, \bar{x}_{n}\right\}$ is the set of literals.

Definition. A 0-circuit is a literal; an i-circuit is a nonempty collection of ( $i-1$ )-circuits. An $i$-circuit is an $\wedge$-circuit if $i$ is even and an $\vee$-circuit if $i$ is odd. There are two constant circuits, 0 and 1 . The function computed by an $\wedge$-circuit ( $\vee$-circuit) $C$ is the $\wedge(\vee)$ of the functions computed by $C$ 's members. The 0 -circuit $x_{j}\left(\bar{x}_{j}\right)$ computes the (negation of the) $j$ th projection function. The constant circuits compute the constant functions. For each circuit $C$, the function

$$
f_{C}:\{0,1\}^{n} \rightarrow\{0,1\}
$$

is the function that $C$ computes.

Definition. For circuits $B$ and $C, B$ belongs to $C$, if $B$ is hereditarily a member of $C$, i.e., "belongs to" is the transitive closure of "member of". The depth of an $i$-circuit is $i$, the size of an $i$-circuit is the number of circuits belonging to it. Constant and 0 -circuits have size 0 and depth 0 .

The circuits defined in this way are equivalent to conventional circuits except they are organized into levels of $\Lambda$ 's and $V$ 's that connect only to adjacent levels. An arbitrary circuit of depth $d$ and size $s$ on $n$ inputs can be converted to one of this form having depth $d$ and size at most $4 s+2 n$. Much of this increase comes from the necessary introduction of trivial gates having only one input. If these are counted as wires, the size of the new circuit is only $2 s$.

Definition. A restriction is a function $\rho: X_{n} \rightarrow\{0,1, *\}$. The natural extension of $\rho$ to $\bar{X}_{n}$ is made by assigning

$$
\rho\left(\bar{x}_{i}\right)= \begin{cases}*, & \text { if } \rho\left(x_{i}\right)=* \\ \neg \rho\left(x_{i}\right), & \text { if } \rho\left(x_{i}\right) \neq *\end{cases}
$$

Restrictions fix some of the inputs of a function while leaving others, those assigned $*$, variable. For any restriction $\rho$, and function $f:\{0,1\}^{n} \rightarrow\{0,1\}, \rho$ induces a function

$$
f^{\rho}:\{0,1\}^{k} \rightarrow\{0,1\}
$$

where $k$ is the number of literals assigned *. If $\alpha \in\{0,1\}^{k}, f^{\rho}(\alpha)=f\left(\alpha^{\rho}\right)$, where $\alpha^{\rho}$ is $\rho\left(x_{1}\right) \rho\left(x_{2}\right) \ldots \rho\left(x_{n}\right)$ with the $k *$ 's replaced by the $k$ elements of $\alpha$.

In the same spirit that restrictions induce new functions, they also induce new circuits. A restriction $\rho$ forces an $\wedge$-circuit ( $\vee$-circuit) to be 0 if $\rho$ forces any (all) of its members to be 0 . Dually, $\rho$ forces an $\wedge$-circuit ( $\vee$-circuit) to be 1 if $\rho$ forces all (any) of its members to be 1 . Also, $\rho$ forces a 0 -circuit $x \in \bar{X}_{n}$ to be a 0 (1) if $\rho(x)=0$ (1). A restriction $\rho$ induces a circuit $C^{\rho}$ from the circuit $C$ in the following sense. If $\rho$ forces $C$ to be $0(1)$, then $C^{\rho}$ is the constant circuit $0(1)$. If $C$ is not forced by $\rho$, then $C^{\rho}$ is recursively defined to be $\left\{B^{\rho} \mid B\right.$ is an unforced member of $C\}$.

Lemma 3.1. For any circuit $C$, function $f$, and restriction $\rho$, if $C$ computes $f$ then $C^{\rho}$ computes $f^{\rho}$.

Proof. Straightforward.
Definition. An $n$-parity function is $x_{1}+\cdots+x_{n}(\bmod 2)$, or the negation of this. A parity circuit is any circuit that computes a parity function.

Lemma 3.2. For any parity function $f$ and restriction $\rho, f^{\rho}$ is a parity function.
The following is the main theorem. Its proof occupies most of the rest of this section.

Theorem 3.3. Parity cannot be computed by constant-depth, polynomial-size circuits.

Proof. (By contradiction) Let $d$ be the smallest depth admitting polynomial size parity $d$-circuits. Lupanov proves that parity 2 -circuits are exponentially large [12], therefore, $d$ must be at least 3. We proceed in three steps. First we construct, using suitably chosen restrictions of the polynomial-size parity $d$-circuits, poly-nomial-size parity $d$-circuits all of whose 1 -circuits are of constant size. Then we apply a second set of restrictions to these circuits obtaining polynomial-size parity $d$-circuits all of whose 2 -circuits are of constant size. Finally, we modify these circuits to get polynomial-size parity $(d-1)$-circuits, thereby contradicting the minimality of $d$.

Step 1. Let $C_{1}, C_{2}, \ldots$ be parity $d$-circuits, where $C_{n}$ computes an $n$-parity function and has size $\leqslant n^{k}$. We show that, for each sufficiently large $n$, a restriction $\rho$ chosen at random from a suitable distribution has a non-zero probability of inducing from $C_{n}$ a circuit $C_{n}^{p}$ such that,
(i) $C_{n}^{\rho}$ computes an $m$-parity function, for some $m \geqslant \sqrt{n} / 2$, and
(ii) all the 1-circuits of $C_{n}^{\rho}$ are bounded in size, independently of $n$.

The random restriction $\rho: X^{n} \rightarrow\{0,1, *\}$ is chosen from a distribution that independently assigns, for each $i$, the probabilities:

$$
\begin{aligned}
& \operatorname{Pr}\left[\rho\left(x_{i}\right) \text { is a } *\right]=1 / \sqrt{n} \\
& \operatorname{Pr}\left[\rho\left(x_{i}\right) \text { is a } 1\right]=\operatorname{Pr}\left[\rho\left(x_{i}\right) \text { is a } 0\right]=\frac{1-1 / \sqrt{n}}{2} .
\end{aligned}
$$

Fixing a constant $c$, whose value we exhibit later, $\rho$ fails if either $C_{n}^{\rho}$ contains a 1 -circuit of size $>c$ or $\rho$ assigns * to fewer than $\sqrt{n} / 2$ inputs. Since $C_{n}$ has at most $n^{k} 1$-circuits,

$$
\begin{align*}
\operatorname{Pr}[\rho \text { fails }] \leqslant & \operatorname{Pr}[\rho \text { assigns }<\sqrt{n} / 2 * \text { 's }] \\
& +n^{k} \cdot \operatorname{Pr}\left[\text { a given induced } 1-\text { circuit in } C_{n}^{\rho} \text { has size }>c\right] \tag{2}
\end{align*}
$$

Chebyshev's inequality shows that the first term of (2) is bounded above by $O\left(n^{-1 / 2}\right)$. (Note that all our inequalities hold for sufficiently large $n$.) To bound the second term of (2), consider a 1 -circuit $B$ and the probability that $B^{\rho}$ has size $>c$. Either $B$ is wide, i.e, has size $\geqslant c \ln n$, or $B$ is narrow, i.e., has size $<c \ln n$.

Case 1. $B$ is wide.
$\operatorname{Pr}[B$ is not forced $] \leqslant \operatorname{Pr}[$ no member of $B$ is assigned 1$]$

$$
\begin{aligned}
& \leqslant(3 / 4)^{c \ln n}, \quad(\text { for } n \geqslant 4) \\
& \leqslant n^{c \ln (3 / 4)} \\
& =o\left(n^{-c / 4}\right) .
\end{aligned}
$$

Case 2. $B$ is narrow.

$$
\begin{aligned}
\operatorname{Pr}\left[\text { size of } B^{\rho}>c\right] & \leqslant \operatorname{Pr}[B \text { contains } \geqslant c * \text { 'ed inputs }] \\
& \leqslant\binom{ c \ln n}{c}(1 / \sqrt{n})^{c} \\
& \leqslant(c \ln n)^{c} n^{-c / 2} \\
& =o\left(n^{-c / 4}\right)
\end{aligned}
$$

If $c=8 k$, then

$$
\begin{aligned}
\operatorname{Pr}\left[C_{n}^{\rho} \text { contains a 1-circuit of size }>c\right] & \leqslant n^{k} \cdot o\left(n^{-c / 4}\right)=n^{k} \cdot o\left(n^{-k}\right) \\
& =o\left(n^{-k}\right)
\end{aligned}
$$

Hence,

$$
\operatorname{Pr}[\rho \text { fails }] \leqslant n^{-1 / 2}+o\left(n^{-k}\right)=o(1)
$$

Thus, for every sufficiently large $n$, there exists a restriction $\rho$ such that,
(i) $C_{n}^{\rho}$ computes an $m$-parity function, for some $m \geqslant \sqrt{n} / 2$,
(ii) the size of $C_{n}^{\rho} \leqslant n^{k}$ is polynomial in $m$, and
(iii) $C_{n}^{\rho}$ contains no 1-circuits of size greater than $c$.

From these induced circuits it is easy to obtain a sequence $D_{1}, D_{2}, \ldots$ of polynomial-size parity $d$-circuits in which every 1 -circuit has size $\leqslant c$.

Step 2. Let $D_{1}, D_{2}, \ldots$ be $d$-circuits, such that, for each $n, D_{n}$ computes an $n$-parity function, has at most $n^{k}$ gates, and contains no 1-circuits of size greater than $c$. We show that, for each sufficiently large $n$, a restriction $\rho$ chosen from the distribution used in Step 1 has a non-zero probability of inducing from $D_{n}$ a circuit equivalent to one which
(i) computes an $m$-parity function, for some $m \geqslant \sqrt{n} / 2$, and
(ii) $C_{n}^{\rho}$ has 2-circuits bounded in size independently of $n$.

Fixing a constant $b_{c}, \rho$ fails if either
(i) $\rho$ assigns fewer than $\sqrt{n} / 2 *$ 's, or
(ii) if some 2-circuit of $D_{n}^{\rho}$ depends upon $\geqslant b_{c}$ inputs.

As in Step 1, the probability that $\rho$ assigns too few *'s is very small. To show that there is some $b_{c}$ for which the second condition is also unlikely we make the following claim.

Claim. For every $c$, there is a constant $b_{c}$, such that, for any 2 -circuit $A$ all of whose 1 -circuits are of size at most $c, \operatorname{Pr}\left[A^{\rho}\right.$ depends upon $>b_{c}$ inputs $]=o\left(n^{-k}\right)$. The claim is proved by induction on $c$.

Basis. ( $c=1$ ) The 2 -circuit $A$ computes the $\wedge$ function and hence by an argument dual to that used in Step 1, we see that $b_{1}$ exists.

Induction. (Show $b_{c}$ exists given $b_{c-1}$.) Let $b=k \cdot 4^{c}$ and consider two cases, one in which $A$ is wide, i.e, has $\geqslant b \ln n$ disjoint 1 -circuits (no common inputs), and the other in which $A$ is narrow.

Case 1. $A$ is wide.

$$
\begin{aligned}
\operatorname{Pr}[A \text { is not forced }] & \leqslant \operatorname{Pr}[\text { no member of } A \text { is forced to } 0] \\
& \leqslant \operatorname{Pr}[\text { a } 1 \text {-circuit of size } \leqslant c \text { is not forced to } 0]^{b \ln n} \\
& \leqslant\left(1-4^{-c}\right)^{b \ln n} \\
& =n^{b \ln \left(1-4^{-c}\right)} \\
& \leqslant n^{-b 4^{-c}} .
\end{aligned}
$$

For $b=k \cdot 4^{c}$ this is $o\left(n^{-k}\right)$.
Case 2. $A$ is narrow. Choose a maximal collection of disjoint 1-circuits in $A$ and let $H$ be the set of inputs appearing in this collection. Since $A$ is narrow $|H| \leqslant b c \log n$, and $H$ hits (contains at least one input appearing in) each of $A$ 's 1 -circuits. Let $h$ be the number of $*^{\prime}$ d inputs in $H$, and let $l=2^{h}$. Let $\rho_{1}, \rho_{2}, \ldots, \rho_{l}$ be the $2^{h}$ restrictions obtained by setting these $*$ 's to 0 or 1 in all possible ways. The value computed by $A^{\rho}$ can be determined from the values of $A^{\rho_{1}}, \ldots, A^{\rho_{l}}$ and the $h *$ 'd inputs. Furthermore, since $H$ hits every 1 -circuit, the 1 -circuits in each $A^{\rho_{i}}$ are of size at most $c-1$. Thus, by the induction hypothesis, $\operatorname{Pr}\left[A^{\rho_{i}}\right.$ depends upon $>b_{c-1}$ inputs] $=o\left(n^{-k}\right)$. By an argument similar to that in Step 1, Case 2, $\operatorname{Pr}[h>4 k] \leqslant o\left(n^{-k}\right)$. If $A^{\rho}$ depends upon $>4 k+l \cdot b_{c-1}$ inputs, then either $h>4 k$ or one of the $A^{\rho_{i}}$ depends upon more than $b_{c-1}$ inputs. Since $l \leqslant 2^{4 k}$ whenever $h \leqslant 4 k$, letting $b_{c}=4 k+2^{4 k} \cdot b_{c-1}$, we have

$$
\begin{aligned}
\operatorname{Pr}\left[A^{\rho} \text { depends upon }>b_{c} \text { inputs }\right] & \leqslant o\left(n^{-k}\right)+2^{4 k} \cdot o\left(n^{-k}\right) \\
& =o\left(n^{-k}\right) .
\end{aligned}
$$

Claim proved. Hence, for the value of $b_{c}$ given by the claim, a non-failing $\rho$ exists. Since any 2 -circuit which depends upon only $b_{c}$ inputs is equivalent to a 2-circuit of size at most $b_{c} \cdot 2^{b_{c}}$, we can construct from $D_{n}^{\rho}$ an equivalent $d$-circuit that,
(i) computes an $m$-parity function, for some $m>\sqrt{n} / 2$, and
(ii) has 2-circuits that are of size at most $b_{c} \cdot 2^{b_{c}}$.

We thus obtain a sequence $E_{1}, E_{2}, \ldots$ of polynomial-size parity $d$-circuits all of whose 2-circuits are of constant size.

Step 3. The parity $d$-circuits $E_{i}$ can be converted to parity ( $d-1$ )-circuits by rewriting their 2 -circuits of size $a$, using the distributive law, as $\vee-\wedge$-circuits of size at most $a \cdot 2^{a}$. Replacing all 2-circuits in the $E_{i}$ by these, merging the two now adjacent levels of $V$ 's (recall that $d \geqslant 3$ ), and then applying DeMorgan's laws to exchange $\vee$ 's and $\wedge$ ' $s$, we obtain parity ( $d-1$ )-circuits at most a constant factor larger.

Thus, we conclude that there exists a sequence $F_{1}, F_{2}, \ldots$ of polynomial size parity ( $d-1$ )-circuits, which is impossible.

A more careful analysis slightly improves the lower bound. Instead of finding an equivalent $\vee-\wedge 2$-circuit at the end of Step 2, it is sufficient to proceed
directly to Step 3 and find an equivalent $\vee-\wedge$ circuit there. Parity $d$-circuits of size $O\left(n^{f(n)}\right)$ are converted in Step 1 to parity $d$-circuits of size $O\left(n^{2 f(n)}\right)$ whose 1 -circuits are of size $O(f(n))$. Each repetition of Step 2 at most triply exponentiates the size of the 1 -circuits. Denoting the $k$ th iterate of the $\log$ function by $\log ^{(k)}$, and the inverse of the function $g(0)=1, g(n)=2^{g(n-1)}$, by $\log ^{*} n$, we have:

Corollary 3.4. Parity d-circuits must be of size $\Omega\left(n^{\log ^{(k)} n}\right)$, where $k=3(d-2)$.
Corollary 3.5. Polynomial-size parity circuits must have depth $\Omega\left(\log ^{*} n\right)$.
Corollary 3.6. The boolean function " $\equiv k(\bmod p)$ ", for any $k$ and $p \geqslant 2$, cannot be computed by constant-depth, polynomial-size circuits.

Proof. An argument exactly analogous to that in the proof of the main theorem confirms this result.

## 4. Consequences

In the previous section we showed that parity cannot be computed by constantdepth, polynomial-size circuits. The parity function seems so simple that one immediately wonders if other, seemingly more complicated, functions also cannot be realized in constant depth and polynomial size. Here we prove that majority, multiplication of binary integers, and transitive closure share this property with parity.

Definition. A function $f$ is constant-depth, polynomial-size reducible to a function $g\left(f \leqslant_{c p} g\right)$ if $f$ can be realized with constant-depth, polynomial-size circuits on literals, made up of $\vee$-gates, $\wedge$-gates, $\neg$-gates, and gates computing the function g.

Lemma 4.1. If Parity is $\leqslant_{c p}$ reducible to a function $g$, then $g$ is not realizable in constant depth and polynomial size.

Proof. Straightforward.
Definition. The Majority predicate on $n$ binary variables is defined to be 1 if and only if more than half of the inputs are 1.

Lemma 4.2. Parity $\leqslant_{c p}$ Majority.
Proof. (outline) Let $x_{1}, \ldots, x_{n}$ be the variables for which we wish to construct a constant-depth, polynomial-size circuit using $\wedge$-gates, $\vee$-gates, and Majority gates. Using a Majority gate and a constant 0 -circuit ( 0 or 1 ) we can compute the predicate $P_{k}=$ "at least $k$ bits are on" in depth one, and polynomial size. We
compute parity as

$$
\left(P_{0} \wedge \overline{P_{1}}\right) \vee\left(P_{2} \wedge \overline{P_{3}}\right) \vee\left(P_{4} \wedge \overline{P_{5}}\right) \vee \cdots
$$

This lemma and Lemma 4.1 yield the following.
Theorem 4.3. Majority cannot by realized by constant-depth, polynomial-size circuits.

We now show that circuits performing multiplication of binary integers in constant depth require more than a polynomial number of gates. This proves that multiplication cannot be implemented by polynomial-size program logic arrays (PLA's.) We point out, for contrast, that addition can be realized in constant depth and polynomial size, and hence can be implemented by polynomial-size PLA's.

Definition. Let $a=\left(a_{i}\right), b=\left(b_{i}\right)$ be two $n$-bit binary numbers. A Multiplication gate has $2 n$ inputs ( $a_{i}$ ), $\left(b_{i}\right)$, and $2 n$ outputs, where the output bits are the $2 n$ bits of the product of $a$ and $b$.

Lemma 4.5. Parity $\leqslant_{c p}$ Multiplication.
Proof. (outline) Let $x_{0}, \ldots, x_{n-1}$ be the variables for which we wish to construct a constant-depth, polynomial-size circuit using $\vee$-gates, $\wedge$-gates, and multiplication gates. Let $k=[\log n]$. Define two $k n$-bit binary numbers $a$ and $b$ as follows:

$$
\begin{aligned}
& a=\sum_{i=0}^{n-1} a_{i} 2^{k i}, \text { and } \\
& b=\sum_{i=0}^{n-1} b_{i} 2^{k i},
\end{aligned}
$$

where for all $i, b_{i}=1$ and $a_{i}=x_{i}$. The $2 k n$ bits of these numbers can easily be computed from the $x_{j}$ with a circuit of depth 1 .

Consider the product

$$
a b=\sum_{i=0}^{2 n-1} c_{i} 2^{k i},
$$

where each $c_{i}$ is a $k$-bit binary number. The low order bit of the number

$$
c_{n-1}=\sum_{i=0}^{n-1} a_{i} b_{n-1-i}=\sum_{i=0}^{n-1} x_{i}
$$

is the parity of the $x_{i}$, Figure 2. Therefore, given a multiplication gate, we can compute parity with constant-depth, polynomial-size circuits

Theorem 4.6. Multiplication cannot be computed by constant-depth, polynomial-size circuits.


Fig. 2. Parity $\leqslant_{c p}$ Multiplication.

## Proof. From Lemmas 4.1 and 4.2. [6]

Definition. Let $A=\left(a_{i j}\right)$ be the $n$ by $n$ adjacency matrix for a graph $G$. A Transitive Closure gate has $n^{2}$ inputs $a_{i j}$ and $n^{2}$ outputs $a_{i j}^{*}$ such that $A^{*}=\left(a_{i j}^{*}\right)$ is the adjacency matrix for the transitive closure of $G$.

Lemma 4.7. (J. Byrd) Parity $\leqslant_{c p}$ Transitive Closure.
Proof. Let $x_{1}, \ldots, x_{n}$ be the variables for which we want to construct a constantdepth, polynomial-size parity circuit using $\vee$-gates, $\wedge$-gates, and transitive closure gates. Let $G$ be a graph with $n+2$ vertices defined by a $0-1$ assignment to the $x_{i}$ in the following way. The vertices of $G$ are labeled

$$
v_{\text {start }}, v_{x_{1}}, \ldots, v_{x_{n}}, v_{\text {end }}
$$

There is an edge between $v_{\text {start }}$ and the first $v_{x_{i}}$, for which $x_{i}=1$. There is an edge between $v_{\text {end }}$ and the last $v_{x_{j}}$ for which $x_{j}=1$. There is also an edge between $v_{x_{i}}$
and $v_{x_{j}}$ if $i<j$, and
(i) $x_{i}=1$,
(ii) $x_{i+1}=x_{i+2}=\cdots=x_{j-1}=0$, and
(iii) $x_{j}=1$.

Thus $G$ contains a single path linking $v_{\text {start }}$ to $v_{\text {end }}$ passing through the "on" variables from left to right, Figure 3.

From the literals $x_{1}, \bar{x}_{1}, \ldots, x_{n}, \bar{x}_{n}$ we can compute the $n^{2}$ bits of the adjacency matrix $A=\left(a_{i j}\right)$ for $G$ with $\wedge$-gates as follows.

$$
\begin{aligned}
a_{\text {start }, i} & =\bar{x}_{1} \wedge \cdots \wedge \bar{x}_{i-1} \wedge x_{i} \\
a_{i, \text { end }} & =x_{i} \wedge \bar{x}_{i+1} \wedge \cdots \wedge \bar{x}_{n} \\
a_{i j} & =x_{i} \wedge \bar{x}_{i+1} \wedge \cdots \wedge \bar{x}_{j-1} \wedge x_{j}
\end{aligned}
$$

Consider the graph $G^{2}$ on the vertices

$$
v_{\text {start }}^{\prime}, v_{x_{1}}^{\prime}, \ldots, v_{x_{n}}^{\prime}, v_{\text {end }}^{\prime}
$$

in which there is an edge from $v_{r}^{\prime}$ to $v_{s}^{\prime}$ if and only if there is a path of length 2 in $G$ from $v_{r}$ to $v_{s}$, Figure 3. The $n^{2}$ bits of the adjacency matrix $B=\left(b_{i j}\right)$ for $G^{2}$ can be computed from the $a_{i j}$ with $\vee$-gates and $\wedge$-gates as follows,

$$
b_{r s}=\left(a_{r 1} \wedge a_{1 s}\right) \vee\left(a_{r 2} \wedge a_{2 s}\right) \vee \cdots \vee\left(a_{r n} \wedge a_{n s}\right)
$$

Therefore, the $b_{i j}$ can be computed in constant depth and polynomial size from the $x_{i}$. Take the transitive closure $B^{*}=\left(b_{i j}^{*}\right)$ of $B$ with a transitive closure gate. The bit $b_{\text {start, end }}^{*}$ is a 1 if and only if the sum of $x_{1}, \ldots, x_{n}$ is odd. Thus Parity $\leqslant_{c p}$ Transitive Closure.

Skyum and Valiant have shown that any function computable by polynomial-size formulas can be reduced by projections to transitive closure [18]. This is an alternative proof of Lemma 4.7.

Theorem 4.8. Transitive Closure cannot be realized in constant depth and polynomial size.


Fig. 3. The graphs $G$ and $G^{2}$.

## 5. Areas for Additional Research

(1) Better lower bounds: It is straightforward to give $O\left(2^{n /(d-1)}\right)$ sized parity $d$-circuits. How can the gap between upper and lower bounds be tightened?
(2) Polynomial lower bounds: The bits of Addition can be computed by polynomial-size 3 -circuits. Is it possible to compute them with linear size $d$-circuits for some $d$ ? Can any size-depth trade-off be given?
(3) Polynomial-size, constant-depth reduction: Elucidate the structure of this reducibility. We conjecture that the majority function does not reduce to the parity function.
(4) Connections with the infinitary version: The proofs of the main theorem of this paper and the main theorem in [20] are structurally similar. Is there any formal connection between them? Is there a finitary version of the measure theoretic proof of the infinitary version of the theorem presented in [8]?
(5) More powerful models: Polynomial-size, bounded-width decision dags can simulate polynomial-size, bounded-depth circuits, as well as compute functions such as parity and " $\equiv 0(\bmod k)$ " [7]. Can they compute the majority function?

## 6. Acknowledgments

The authors would like to thank Jim Byrd, Steven Fortune, and Charles Leiserson for many helpful conversations. Steve Fortune first suggested that the depth 3 case might be tractable, Jim Byrd found the proof showing that parity is constant-depth, polynomial-size reducible to transitive closure, and Charles Leiserson pointed out the connection to PLA's. Our special thanks go to Hania Gajewska, Dan Hoey, Martin Tompa, and the referees for carefully reading earlier versions of this manuscript and suggesting many valuable improvements.

IBM, at Yorktown Heights and San Jose, provided two of the authors with congenial atmospheres while this research was in progress. The authors were partially funded by NSF grant MCS-81-05555, ONR grant N00014-76-C-0370, and an IBM Graduate Fellowship.

## References

1. D. Angluin, Counting problems and the polynomial-time hierarchy. Theoretical Computer Science, to appear.
2. N. Blum, A $2.75 n$ lower bound for the combinational complexity of boolean functions. University of Saarbrucken, Technical Report.
3. T. Baker, J. Gill, and R. Solovay, Relativizations of the $P={ }^{?} N P$ question. SIAM Journal of Computing, 4, 4, 1975.
4. T. Baker and A. Selman, A second step toward the polynomial hierarchy. Theoretical Computer Science, 8, 2, 1979, pp. 177-187.
5. A. Chandra, D. Kozen, and L. Stockmeyer, Alternation. Journal of the ACM, 28, 1, January 1981.
6. Digital Equipment Corporation, Decsystem 10 Assembly Language Handbook. Third Edition, 1973, pp. 51-52.
7. M. Furst, Bounded width computation DAG's. In preparation, 1982.
8. M. Furst, J. B. Saxe, M. Sipser, Parity, circuits and the polynomial-time hierarchy. 22ND Symposium on the Foundations of Computer Science, 1981, pp. 260-270.
9. M. Furst, J. B. Saxe, M. Sipser, Depth 3 circuits require $\Omega\left(n^{c \log n}\right)$ gates to compute parity: a geometric argument. In preparation.
10. V. Krapchenko, Complexity of the realization of a linear function in the class of II-circuits. English translation in Math. Notes Acad. Sci., USSR, 1971, pp. 21-23; orig. in Mat. Zamet, 9, 1, pp. 35-40.
11. V. Krapchenko, A method of obtaining lower bounds for the complexity of II-schemes. English translation in Math. Notes Acad. Sci USSR, 1972, pp. 474-479; orig. in Mat. Zamet, 10, 1, pp. 83-92.
12. O. Lupanov, Implementing the algebra of logic functions in terms of constant-depth formulas in the basis,$+^{*}$, - English translation in Sov. Phys.-Dokl., 6, 2, 1961; orig. in Dokla. Akad. Nauk SSSR, 136, 5.
13. R. Ladner and N. Lynch, Relativization of questions about $\log$ space computability. Mathematical Systems Theory, 10, 1, 1976.
14. C. Mead and L. Conway, Introduction to VLSI Systems. Addison-Wesley, Reading, Mass. 1980.
15. W. Paul, A 2.5 N lower bound for the combinational complexity of boolean functions. 7th Annual ACM Symposium on Theory of Computing, 1975, pp. 27-36.
16. J. Savage, The Complexity of Computing. John Wiley and Sons, New York, 1976, Sect. 2.4.
17. C. P. Schnorr, A $3 n$ lower bound on the network complexity of boolean functions. Theoretical Computer Science, 10, 1, 1980, p. 83.
18. L. J. Stockmeyer, The polynomial-time hierarchy. Theoretical Computer Science, 3, 1, 1976, pp. 1-22.
19. S. Skyum and L. G. Valiant, A complexity theory based on boolean algebra. 22nd Symposium on the Foundations of Computer Science, 1981, pp. 244-253.
20. M. Sipser, On polynomial vs exponential growth. In preparation.
21. L. Stockmeyer and A. Meyer, Word problems requiring exponential time, preliminary report. 5th Annual ACM Symposium on Theory of Computing, 1973.

Received May 1982 and in revised form July 13, 1983 and August 22, 1983.


[^0]:    *Research partially funded by NSF Grant MCS-81-05555 and ONR Grant N00014-76-C-0370.

