Cellular Automata Surviving $k$ Steps

When I started my PhD I had two big questions: (1) How can we explain the practical effectiveness of SAT solvers in the face of widespread belief that P $\neq$ NP? (2) What does Rice's theorem say about what we can prove about programs? This project, done during my first year, was a sortof meditation on these: Cellular automata are famously complex [36] despite their austere simplicity, and, as we'll get into, SAT solvers are weirdly good at synthesizing them.

Abstract Recent work by Wolfram composes pairs of elementary cellular automata to ``survive $k$ steps'' — maintaining at least one $\blacksquare$ cell for $k$ steps before transitioning to exclusively $\square$ . This paper formalizes and analyzes the problem of surviving $k$ steps. We develop novel analysis techniques for inhomogeneous cellular automata based on Boolean satisfiability and binary decision diagrams that classify 99% (of 32,640) pairs of elementary automata as able to solve Survive- $k$ -Steps for an infinite, or finite, number of $k$ . We measure the performance impact of various heuristics on our SAT-based analysis and find, while learned clauses slow down the solver, variable selection heuristics are critical for performance.

1 Introduction

Recent work by Wolfram [1] has synthesized inhomogeneous cellular automata [2] from pairs of elementary rules using an evolutionary algorithm. The main focus is automata that survive $k$ steps -- having a $\blacksquare$ cell in the first $k$ states and thereafter exclusively $\square$ cells -- henceforth the Survive- $k$ -Steps problem (see also [3], [4]).

This post empirically and analytically analyzes the Survive- $k$ -Steps problem with ideas from formal methods. We develop a novel reduction of inhomogeneous cellular automata to Boolean satisfiability. Then, using a SAT solver, binary decision diagrams, and purely analytical methods, classify 99% of pairs of elementary rules as able to survive $k$ steps for an infinite, or finite, number of $k$ . We also comprehensively solve, or prove the unsolvability of, Survive- $k$ -Steps for $0<k\leq 50$ for every pair of elementary rules with our SAT-based synthesis technique, finding there is rich structure to what pairs of rules can survive $k$ steps.

1.1 Example

Consider if elementary cellular automata rule 18 and 82 can be composed to survive $k$ steps. An illustration of the input/output pairs for the automata is given below.

Elementary cellular automata are functions that take three inputs and produce one output. being drawn in rule 18's box means on input rule 18 will output $\square$ . As the illustration shows, for the most part, rules 18 and 82 are the same, only differing on input . This will give them interesting behavior when surviving $k$ steps.

Elementary cellular automata "grow" from an initial state. To determine a cell's value in the next state, the value of the cell to the left, the cell's current value, and the value of the cell to the right are input to the automaton's function. The automata in this post are grown on finite-width states, so cells on the edge "wrap around": The cell to the left of the leftmost cell is the rightmost cell and the cell to the right of the rightmost cell is the leftmost cell.

For example, on the left side of the illustration below rule 18 is grown from an initial state with three $\square$ cells and two $\blacksquare$ cells. The second row of the drawing is the second state of the automaton. From left to right, the value of each cell in the second state is determined by the input/output pairs , , , , and .

With inhomogeneous elementary cellular automata, we can choose which rule determines the value of each cell. For example, on the right-hand side of the illustration above rule 82 is chosen to determine the value of the 4th cell in the second state, changing its value to $\blacksquare$ . In the illustration, patterns disambiguate which rule determines the value of each cell.

Two rules can survive $k$ steps if, starting from an initial state with exactly one $\blacksquare$ cell, there is a choice of what rule to use where, such that there is at least one $\blacksquare$ cell in the first $k$ states and only $\square$ cells thereafter. For example, in the illustration below, rules 18 and 82 are composed to survive $4$ steps.

Notice for values of $k<4$ , rules 18 and 82 cannot survive $k$ steps, as no matter what automaton is used where, the first three states will be the same. It is only in the third state, when, due to wrapping, the pattern appears. Because the rules differ on this input, rule 82 can be chosen, causing the automaton to "die".

The question this post asks is, for an arbitrary pair of rules, is there an infinite number of $k$ for which Survive- $k$ -Steps is solvable? For rules 18 and 82, we'll now show the answer is infinite. Consider the choice of rules in the illustration below and notice how the final state has a single $\blacksquare$ cell.

From this and our previous solution to Survive- $4$ -Steps, we know how to (1) start from a state with a single $\blacksquare$ cell and return to a state with a single $\blacksquare$ cell after $4$ steps, and (2) start from a state with a single $\blacksquare$ cell and "die" after $4$ steps. Even though the start/end position of the $\blacksquare$ cell in the drawing above is different, because states wrap at the edges, we can "rotate" a choice of what rule to use where to line up $\blacksquare$ cells (Lemma 2 will formalize this). Thus, we can compose our automata to survive $k$ steps for any $k$ expressible as $k=4n+4$ , an infinite number of $k$ . For example, for $n=1$ the illustration below gives a solution to Survive- $8$ -Steps.

Throughout this post, we'll formalize and automate this technique (c.f. §6.3) and several others for exactly characterizing the set of $k$ for which Survive- $k$ -Steps is solvable on a width- $w$ board. New ideas will be required to identify when Survive- $k$ -Steps is unsolvable and to automate the synthesis of the examples in this section.

1.2 Organization

In §3, we give a formalization of inhomogeneous elementary cellular automata, Survive- $k$ -Steps, and some basic lemmas about their behavior. Using this formalization, we reduce the Survive- $k$ -Steps problem to Boolean satisfiability in §4. This equips us with powerful analysis tools from formal methods: SAT solvers and binary decision diagrams. These are many orders of magnitude faster than Wolfram's evolutionary synthesis technique [1] (though, we emphasize, the focus of that work is not performance). For example, even having implemented a reasonably efficient version of Wolfram's technique in C (random-guessing.c in the associated code §8), it would run for hours and fail to synthesize a solution to Survive- $10$ -Steps for most rule tuples, whereas the SAT solver takes seconds. While the focus of our work is analysis, not synthesis, SAT-based synthesis of cellular automata is a promising area we discuss more in §2.1.

Equipped with an efficient analysis technique, §5 presents the results of an experiment synthesizing solutions to Survive- $k$ -Steps on a width- $31$ board for every rule tuple and value of $k$ from $1$ to $50$ . These experimental results show there is considerable structure to what pairs of rules can solve Survive- $k$ -Steps, and pairs of rules tend to solve Survive- $k$ -Steps for all values of $k$ or none.

Based on these experimental results, we develop analytical techniques for proving if a rule tuple can solve Survive- $k$ -Steps for an infinite, or finite, number of $k$ in §6. We find at least 61% of tuples are capable of solving Survive- $k$ -Steps for an infinite number of $k$ , with states of the same width used by Wolfram [1]. Of this 61%, for all but 78 we show there is a finite number of unsolvable $k$ . In total, we classify 99% of tuples as either having a finite or infinite number of solvable $k$ .

The performance of SAT solvers at synthesizing solutions to Survive- $k$ -Steps is interesting given how large the search space is. Even in the $5\times 6$ illustrations from the previous section, there are $2^{30}$ choices of what rule to use where. To quickly synthesize solutions, there must be structural properties of the problem exploited by the SAT solver. §7 explores what features in the SAT solver make it good at synthesis, finding much of the performance can be attributed to the core DPLL [5] algorithm and variable selection heuristics [6], [7], while learned clauses [44] slow it down. We give some empirical evidence variable selection heuristics may be picking up on interesting structural properties of the automata, and discuss this in lieu of the famous theoretical complexity of elementary cellular automata. We conclude in §8 and discuss directions for future work.

2 Related Work

The study of inhomogeneous cellular automata appears to date back to 1986 when Hartman [2] introduced them as a simplification of Boolean networks. Since then, inhomogeneous cellular automata have been studied for random number generation [8], traffic simulation [9], public-key cryptography [10], and Sections 5 and 6 of Bhattacharjee et. al.'s 2020 survey paper [11] gives many more examples.

Several recent papers have used SAT-based encodings of 2D cellular automata, but to our knowledge we are the first to give an encoding for inhomogeneous cellular automata. Bain [12] gives an encoding of Conway's Game of Life [13] and uses it to find the preimage of states (c.f. §6.4), and Nishimura and Hasebe [14] use a SAT-based encoding of Conway's game to synthesize patterns with particular behaviors. D'Antonio and Delzanno [15] also give a general encoding of an n-dimensional cellular automaton (CA) with an arbitrary number of states and use it to analyze Mazoyer's solution to the firing squad synchronization problem [16]. And finally Fatès [17] uses a similar SAT-based encoding to explore solutions to the density classification problem.

This post is the first to consider what features of SAT solvers are responsible for their performance synthesizing cellular automata. Building off our work in §7, future SAT-based analysis might consider disabling clause learning to speed up solve times. An interesting question for future work is if the same heuristics important for the inhomogeneous model are important for other types of cellular automata.

Outside of SAT-based encodings, there is a rich body of work synthesizing cellular automata via evolutionary algorithms [18], [19], [20], [21], [22], [23]. For inhomogeneous automata, in 1996 Sipper [24] evolved an inhomogeneous CA which solved the density classification problem better than the best possible homogeneous CA, and more recently in 2016 Grouchy and D'Eleuterio [25] evolved an inhomogeneous CA that computes 4-bit addition. Wolfram's work [3], [1], [4] is unique in that it focuses on the learning dynamics of these algorithms instead of a particular synthesis task.

Another promising approach to cellular automata synthesis comes from Mordvintsev et. al. [26] and Miotti et. al. [27] who have used ideas from machine learning. Miotti et. al., for example, synthesize a discrete, 2D cellular automaton that grows into a $20\times 20$ image of a lizard in twelve steps from a single, centered live cell. Though unlike the automata in this post where cells are either $\blacksquare$ or $\square$ , their solution has an incredible $2^{128}$ possible states per cell.

2.1 Our Method As a Synthesis Tool

Figure 1. Two inhomogeneous automata synthesized with a maximum satisfiability solver [28]. (Left) A solution to Survive- $k$ -Steps using rules 4 and 110, minimizes the number of $\blacksquare$ cells, and matches the simple' solution found by Wolfram **[1]**. (Right) An automaton that grows into an image using rules 126 and 225, maximizes similarity to a dithered Mona Lisa. See MinMax Synthesis.ipynbandImage Synthesis.ipynb` respectively in the associated code §8.

Figure 2. Successive states of a 2D cellular automaton synthesized by a SAT solver which grows into a $10\times 10$ image after 5 steps, starting from a single, centered $\square$ cell. See Stickman Synthesis.ipynb in the associated code §8.

Throughout this post, we use the SAT solver primarily for analysis, its ability to generate proofs of unsatisfiability being crucial. However, there is a wealth of literature about SAT-based synthesis [29], [30] outside the cellular automata field. This section briefly discusses our preliminary investigation in this area. SAT-based synthesis of cellular automata seems to be a promising area for future work.

SAT solvers have a few advantages over the synthesis techniques discussed so far. First, their ability to generate proofs of unsatisfiability allows for more precise insights into synthesis problems. In Grouchy and D'Eleuterio's [25] work, for example, the failure of their evolutionary method to find a solution lends itself to a probabilistic argument there is no solution, but a SAT-based approach could generate an actual proof of this. For example, with a naive encoding of a two-state 2D automaton, we were able to generate a proof there is no automaton which grows into Miotti et. al.'s [27] $20\times 20$ lizard image in twelve steps and has the lizard image as a fixed point. Moreover, maximum satisfiability solvers [28] can solve synthesis problems while (min/max)imizing particular features. Figure 1, for example, shows an inhomogeneous cellular automaton which survives $k$ steps while minimizing the number of $\blacksquare$ cells used and one that maximizes its similarity to an image.

While not the focus of this post, we agree with previous authors using SAT for cellular automata analysis/synthesis [12], [14], [15], [17], it seems like a promising area for future work. To this end, the code associated with this post §8 includes BitAdder.ipynb, FunctionSynthesis.ipynb, and Stickman Synthesis.ipynb which provide a naive SAT-based encoding of Grouchy and D'Eleuterio's [25] bit adder, Miotti et. al.'s [27] 2D automata synthesis (Figure 2), and the function synthesis problem considered by Wolfram [1] respectively. Based on our preliminary experiments, follow-up work on an improved Boolean encoding combined with a SAT-based synthesis technique like counterexample guided synthesis [29] seems promising.

2.2 Formal Characterization

In the 2025 ACL2 Workshop we gave a formal characterization of inhomogeneous cellular automata and the Survive- $k$ -Steps problem [31] using the ACL2s theorem prover [32]. We then formally proved that odd-numbered elementary cellular automaton rules (by Wolfram code [33]) cannot be composed to survive $k$ steps for any $k$ , and that inhomogeneous CA unable to transition from a state with at least one $\blacksquare$ cells to a state with exclusively $\square$ cells cannot survive $k$ steps for any $k$ . This post substantially builds on our previous work. We develop an analytical method to determine if an inhomogeneous automaton can transition from a state with at least one $\blacksquare$ cells to a state with exclusively $\square$ cells, we give a reduction of inhomogeneous cellular automata to Boolean satisfiability, and we describe seven additional techniques for analyzing the solvability of Survive- $k$ -Steps.

3 Formalization

An elementary cellular automaton is a function $\lambda_i:\{\square, \blacksquare\}^3\to \{\square, \blacksquare\}$ . There are $2^{2^3}=256$ such functions, which we'll enumerate by Wolfram code [33]. To denote $\lambda_i(\square,\blacksquare,\square)=\square$ , we'll write . From a state $x\in\{\blacksquare,\square\}^w$ of width $w$ , the next state of an elementary cellular automaton is computed via a wrapped convolution of the automata over $x$ . Inhomogeneous elementary cellular automata differ in that different automata can be applied at each position in the convolution.

We'll consider the case where a finite set of elementary rules, $E=\{\lambda_{i_1},\lambda_{i_2},\dots,\lambda_{i_t}\}$ , are available. Let

a_E:\{0,\dots,w-1\}\times\mathbb{N}\to E

be a rule array where $a_E(i,j)=\lambda$ indicates automaton $\lambda$ should be used to determine the value of the $i$ th cell in the $j$ th state of the inhomogeneous automaton. We'll denote inhomogeneous elementary cellular automata by a tuple $\mathfrak{A}=\langle a_E, x\rangle$ containing their initial state and rule array. To denote the $i$ th element of the $j$ th state of $\mathfrak{A}$ , we'll write $\mathfrak{A}(i,j)$ , or equivalently $\langle a_E, x\rangle(i,j)$ , the value of which is given by the formula below.

\tag{1} \mathfrak{A}(i,j) = \begin{cases} \begin{aligned} &x_i &&\text{if } j = 0 \quad &\text{(Case 1)} \\ &a_E(i,j)(p,q,r) &&\text{otherwise} \quad &\text{(Case 2)} \end{aligned} \end{cases}

where

\begin{aligned} p &= \mathfrak{A}(i^-, j-1), \\ q &= \mathfrak{A}(i, j-1), \\ r &= \mathfrak{A}(i^+, j-1), \end{aligned}

and

\tag{2} i^- = (i - 1) \bmod w, \quad i^+ = (i+1) \bmod w.

A formal definition of Survive- $k$ -Steps easily follows from this.

Definition 1. Let $x\in \{\blacksquare,\square\}^w$ be an initial state of width $w$ with exactly one $\blacksquare$ . For $k>0$ and set of rules $E$ , $a_E$ solves Survive- $k$ -Steps if $\mathfrak{A}=\langle a_E,x\rangle$ satisfies

\tag{3} \forall j < k, \exists i, \mathfrak{A}(i,j)=\blacksquare

and

\tag{4} \forall j \geq k,\forall i, \mathfrak{A}(i,j)=\square.

We write $\text{s} k \text{s} (E,w)$ to denote Survive- $k$ -Steps is solvable for $E$ and $w$ . The position of the single $\blacksquare$ cell in $x$ does not change if Survive- $k$ -Steps is solvable (c.f. Lemma 2).

We'll now give two useful facts about Survive- $k$ -Steps. First, Lemma 1 says, if Survive- $k$ -Steps is solvable with a set of automata, then there is a rule-array using those automata with finitely-many unique rows of output that solves Survive- $k$ -Steps. This lets us encode any Survive- $k$ -Steps problem as a finite Boolean formula.

Lemma 1. If $a_E$ is a rule array solving Survive- $k$ -Steps, then

a_E'(i,j)=\begin{cases} a_E(i,j) & \text{if } j\leq k \\ a_E(i,k+1) & \text{otherwise} \end{cases}

solves Survive- $k$ -Steps.

Proof. For $x\in\{\blacksquare,\square\}^w$ with exactly one $\blacksquare$ , because $a_E$ solves Survive- $k$ -Steps, $\mathfrak{A}=\langle a_E,x\rangle$ has exclusively $\square$ cells in states $\geq k$ . Thus, row- $(k+1)$ of $a_E$ describes what rules to use where to transition from an all- $\square$ state to an all- $\square$ state. In turn, $a_E'$ above can repeat $a_E$ 's $k+1$ th row for all rows $>k$ and the automata $\mathfrak{A}'=\langle a_E',x\rangle$ will have exclusively $\square$ cells in states $\geq k$ . Thus, by the construction of $a_E'$ , $\forall i,j: \mathfrak{A}'(i,j)=\mathfrak{A}(i,j)$ , so $a_E'$ also survives $k$ steps. See A.1 in the appendix for a more syntactic, inductive proof of this.

Lemma 2 formalizes that rotating $\mathfrak{A}$ 's input and rule array rotates $\mathfrak{A}$ 's output. This is almost trivial, but means the position of the $\blacksquare$ in the initial state of Survive- $k$ -Steps is unimportant, because if there is a solution with $\blacksquare$ in one position, then rotating the solution generates solutions for all positions. This also simplifies reachability analysis because if the all- $\square$ state is unreachable from any state with a single $\blacksquare$ , then the all- $\square$ state is unreachable from all states with a single $\blacksquare$ .

Lemma 2. For any $\mathfrak{A}=\langle a_E,x\rangle$ and $k\in\mathbb{Z}$ , let

a_E^k(i,j)= a_E((i+k)\bmod w,j)

and

x_i^k= x_{(i+k)\bmod w}

be versions of $a_E$ and $x$ rotated $k$ cells to the right, and let $\mathfrak{A}^k=\langle a_E^k,x^k\rangle$ . For any predicate $\varphi$ and $j\in\mathbb{N}$ ,

(\forall 0\leq i<w: \varphi(\mathfrak{A}(i,j))) \leftrightarrow (\forall 0\leq i<w: \varphi(\mathfrak{A}^k(i,j))).

So, if (3), (4) hold for $\mathfrak{A}$ , they also hold for $\mathfrak{A}^k$ .

Proof. By easy induction on $j$ , $\mathfrak{A}((i+k)\bmod w,j)=\mathfrak{A}^k(i,j)$ . For $0\leq i < w$ , $(i+k)\bmod w$ is a one-to-one mapping back onto $0\leq i < w$ , so the set of arguments to $\varphi$ in quantification over $\mathfrak{A}^k$ and $\mathfrak{A}$ are the same.

4 Reduction To Satisfiability

Equipped with a formal definition of inhomogeneous cellular automata, we can reduce Survive- $k$ -Steps to Boolean satisfiability which will allow synthesizing solutions with a SAT solver. We will also use the ideas here in §6.4 to analyze reachable states with binary decision diagrams.

The basic idea behind the reduction is to introduce a variable for each value of $\mathfrak{A}(i,j)$ , then constrain those variables so they adhere to (3) and (4). Of course, the values of $\mathfrak{A}(i,j)$ are determined by the values of $a_E(i,j)$ , so we also introduce a variable for each of the $|E|$ possible values of $a_E(i,j)$ , require exactly one of these variables be true, and interpret the true one as the value of $a_E(i,j)$ . Our scheme requires $O(|E|wk)$ variables per Lemma 1.

The formulas here are not in conjunctive normal form (CNF), the format expected by SAT solvers. We use the Tseitin [34] encoding provided by PySAT's [35] formula library to convert them to CNF.

First, we'll describe three formulas which, when satisfied, ensure $y_{i,j}$ corresponds to the outputs of $\mathfrak{A}$ , and $r_{i,j,\lambda}$ to the outputs of $a_E$ .

Theorem 1. For any $\mathfrak{A}=\langle a, x\rangle$ , if formulas (7), (8), and (9) are satisfied, then

\tag{5} y_{i,j} \leftrightarrow \mathfrak{A}(i,j)=\blacksquare

and

\tag{6} r_{i,j,\lambda} \leftrightarrow a_E(i,j)=\lambda.

Proof. For each cell, there is one variable for its color and $|E|$ variables for the possible values of $a_E$ . Formula (7) ensures only one of the $a_E$ variables is true at a time (i.e. $a_E(i,j)$ has a single output). For this and Formula (8) the $i$ and $j$ variables are quantified over all $0\leq i < w$ and $0\leq j \leq (k+1)$ when representing a Survive- $k$ -Steps instance. Quantifying $j$ up to $(k+1)$ being sufficient per Lemma 1.

\tag{7} \left(\bigvee_{\lambda \in E} r_{i,j,\lambda}\right) \wedge \bigwedge_{\substack{\lambda,\lambda' \in E \\ \lambda \neq \lambda'}} (\neg r_{i,j,\lambda} \lor \neg r_{i,j,\lambda'})

Next, let $p = y_{i^-, j-1}$ , $q = y_{i, j-1}$ , and $r = y_{i^+, j-1}$ , with $i^+,i^-$ as in (2). Formula (8) ensures $y_{i,j}$ adheres to Case 2 of (1) with $\lambda$ being quantified over all $\lambda \in E$ . Intuitively, this formula says that if $r_{i,j,\lambda}$ is true, then $y_{i,j}$ 's value should match $\lambda$ 's output. Notice, in line with our conversion of $\blacksquare$ to true and $\square$ to false, the constant values in this formula mean terms like $x_1=p$ will be translated into $(\neg y_{i^-, j-1})$ when $x_1$ is $\blacksquare$ and $(y_{i^-, j-1})$ otherwise.

\tag{8} \neg r_{i,j,\lambda}\lor\left(\bigwedge_{x\in\{\square, \blacksquare\}^3}\!\!\!\!\!(x_1=p\land x_2=q\land x_3=r)\rightarrow(y_{i,j}\leftrightarrow \lambda(x)=\blacksquare)\right)

Finally when $j=0$ , formula (9) ensures $y_{i,j}$ adheres to Case 1 of (1).

\tag{9} \bigwedge_{0\leq i< w}y_{i,0}\leftrightarrow x_i=\blacksquare

The next theorem gives straightforward constraints on $y_{i,j}$ which, when applied, make satisfying assignments correspond to solutions to Survive- $k$ -Steps.

Theorem 2. If formulas (7), (8), (9), (10), and (11) are satisfied, the assignment to the $r_{i,j.\lambda}$ variables corresponds to a rule array solving Survive- $k$ -Steps.

Proof. This is an extension of Theorem 1 to require Survive- $k$ -Steps as well as the rules of inhomogeneous automata are satisfied. (10), below, ensures requirement (3) of Survive- $k$ -Steps is satisfied

\tag{10} \bigwedge_{0\leq j<k}\left(\bigvee_{0\leq i < w}y_{i,j}\right)

and (11) below ensures requirement (4) of Survive- $k$ -Steps is satisfied. Notice, again, how Lemma 1 means quantifying up to $j\leq k+1$ is sufficent.

\tag{11} \bigwedge_{k\leq j\leq k+1}\left(\bigwedge_{0\leq i < w}\neg y_{i,j}\right)

The solving rule array can be extracted from a satisfying assignment per (6).

5 Experimental Results

Figure 3. Adjacency matrix for Survive- $50$ -Steps on a width 31 board. A $\blacksquare$ at position $(i,j)$ means Survive- $50$ -Steps is possible with elementary cellular automaton rules $i$ and $j$ . Top-left is $(0,0)$ and bottom right is $(255,255)$ .

Equipped with an efficient synthesis technique we set up an experiment to comprehensively determine what tuples of elementary rules can solve Survive- $k$ -Steps. For every rule tuple, for every $k\in\{1,\dots,50\}$ we determined if Survive- $k$ -Steps is solvable on a width 31 board, the same width used by Wolfram [1]. Figure 3 shows an adjacency matrix of compatible rules for solving Survive- $50$ -Steps.

At the least, the ability to solve Survive- $k$ -Steps appears to be non-random. Optimistically, upon seeing these results we hoped for a purely analytical solution capable of predicting if a given tuple could solve Survive- $k$ -Steps. As we will discuss in §6 we were somewhat successful in discovering one -- our purely analytical techniques completely describe the triangular shapes in the bottom-right quadrant of Figure 3 as well as the alternating pattern of $\square$ s; however, many of our techniques require a combination of analytical insight and SAT solver to analyze a given tuple.

The second takeaway is illustrated in Figure 4. The vast majority of tuples appear to either solve Survive- $k$ -Steps for all values of $k$ greater than a cutoff or never solve it. In our experimental results, there are few exceptions to this rule. This observation led us to attempt to classify tuples according to whether they could solve Survive- $k$ -Steps for an infinite, or finite, number of values of $k$ .

Finally, Figure 5 and 6 show a sampling of the most and least compatible rules in our experiment. It is unsurprising and notable that two of the three most compatible rules are well-studied. Rule 110 is famously capable of universal computation [36], and see "Rule 22: History" [37]. On the other hand, the least compatible rules are mostly $\blacksquare$ and don't exhibit much complexity.

Figure 4. Percentage of tuples capable of solving Survive- $k$ -Steps as a function of $k$ , on a board of width $31$ . $k=16$ and $k=31$ have discontinuities due to rules which require wrapping.

Figure 5. Rules 22, 94, and 110 grown from a random initial state. These rules solve Survive- $50$ -Steps with 100% of other rules on a width 31 board.

Figure 6. Rules 255, 207, and 221 grown from a random initial state. These rules solve Survive- $50$ -Steps on a width 31 board with the lowest number of other rules: 77, 79, and 79 respectively.

6 Automatic Proving

6.1 Analytical Impossible $k$

Figure 7. (Left) Pixel (i,j) is $\blacksquare$ if Technique 1 identifies $E=\{\lambda_i,\lambda_j\}$ as unable to solve Survive- $k$ -Steps. (Right) Same as left with Technique 2. Technique 1 identifies 8,128 tuples as unable to solve Survive- $k$ -Steps. Technique 2 identifies 2,931.

This section gives two analytical techniques for determining if Survive- $k$ -Steps is solvable. Figure 7 visualizes the tuples classified by these techniques.

Technique 1.

\forall \lambda_i\in E, i\text{ is odd} \rightarrow \nexists k, \text{s} k \text{s} (E,w)

Theorem 1 of our previous work [31] provides a formal proof of the correctness of this. The basic idea is, if the least significant bit of a rule's Wolfram code is 1 (i.e. the number is odd), then it outputs $\blacksquare$ on input . In turn, the all- $\square$ state will transition to the all- $\blacksquare$ state, so (4) cannot be satisfied.

6.1.1 Technique 2

Consider states $k-1$ and $k$ in a Survive- $k$ -Steps instance. Per the rules of Survive- $k$ -Steps (c.f. (3) and (4)), the $(k-1)$ th state has at least one $\blacksquare$ cell and the $k$ th state is all- $\square$ . This gives a necessary condition for solving Survive- $k$ -Steps with a set of rules: the rules must be able to transition from a state with a $\blacksquare$ to the all- $\square$ state. See Lemma 4 of our previous work [31] for a formal proof of this.

We now look for an analytical method to determine if an inhomogeneous cellular automaton can transition from a state with at least one $\blacksquare$ to a state with only $\square$ , hereafter the $\blacksquare$ -to- $\square$ transition. At a high level, we will construct a graph with a bijection between its cycles and elements of $\{\blacksquare,\square\}^w$ from which the automaton can reach the all- $\square$ state. Then, if a cycle corresponding to a state that is not the all- $\square$ state exists, the $\blacksquare$ -to- $\square$ transition is possible, otherwise Survive- $k$ -Steps is unsolvable.

Consider the graph $G$ below.

$G$ is an annotated De Bruijn graph [38]. In a particular way, every $x\in\{\blacksquare,\square\}^w$ corresponds to a cycle in $G$ and the labels of the edges traversed in $x$ 's cycle correspond to the set of inputs an automaton will see during a wrapped convolution over $x$ .

To see this, consider the following value for $x$ .

By moving a two-cell sliding window over $x$ (with wrapping) we get a sequence of values corresponding to a cycle in $G$ .

The set of labels of edges traversed in $x$ 's cycle is

and these labels are the set of inputs an inhomogeneous automaton will receive during a wrapped convolution over $x$ . Evidentially, every $x\in\{\blacksquare,\square\}^w$ corresponds to a cycle with this property and every length- $w$ cycle corresponds to an element of $\{\blacksquare,\square\}^w$ .

To see how $G$ relates to the $\blacksquare$ -to- $\square$ transition, consider an inhomogeneous automaton $\mathfrak{A}$ composed of the elementary rules $E$ . Consider what elements of $\{\blacksquare,\square\}^w$ $\mathfrak{A}$ can transition to the all- $\square$ state from. If, during a wrapped convolution over $x\in\{\blacksquare,\square\}^w$ , $\mathfrak{A}$ receives an input $p\in\{\blacksquare,\square\}^3$ for which $\forall \lambda \in E: \lambda(p)=\blacksquare$ , then regardless of the choice of what element of $E$ to use at $p$ 's position, $\mathfrak{A}$ will output a $\blacksquare$ , so $\mathfrak{A}$ cannot transition to all- $\square$ from $x$ . On the other hand, if there is no such $p$ , then $\mathfrak{A}$ will be able to choose an element of $E$ outputting $\square$ at each position in the wrapped convolution over $x$ and transition to the all- $\square$ state. Thus, the subset $X\subseteq\{\blacksquare,\square\}^w$ from which $\mathfrak{A}$ can transition to all- $\square$ is exactly those elements for which an element of $\{\blacksquare,\square\}^3$ on which every element of $E$ outputs $\blacksquare$ on does not appear during wrapped convolution. By the bijection between cycles and elements of $\{\blacksquare,\square\}^w$ , generating $X$ is simply a matter of removing edges from $G$ labeled with patterns every element of $E$ outputs $\blacksquare$ on and checking if a length- $w$ cycle passing through a vertex with a $\blacksquare$ remains.

Finding a cycle like this is a minor variation of the $k$ -cycle-problem (for fixed $k=w$ ), which is well-known to be solvable in time polynomial in the number of vertices in $G$ [39]. This is the approach taken by our implementation (§8).

For example, consider, below, the input output pairs for elementary rules 238 and 215 and if they can be composed to transition from a state with at least one $\blacksquare$ to one with only $\square$ .

Both output $\blacksquare$ on , , , and , so to determine if the $\blacksquare$ -to- $\square$ transition is possible we remove those edges from $G$ . The length- $w$ cycles in the resulting graph correspond to elements of $\{\blacksquare,\square\}^w$ containing no patterns rules 238 and 215 both output $\blacksquare$ on, i.e. states which can transition to the all- $\square$ state.

The only cycle in the resulting graph corresponds to the all- $\square$ state, so the only state that can transition to the all- $\square$ state is the all- $\square$ state, so the $\blacksquare$ -to- $\square$ transition is impossible using rules 238 and 215.

Figure 8. Visualization of what rule tuples each technique classifies. For each technique, pixel $(i,j)$ is black if that technique identifies $E=\{\lambda_i,\lambda_j\}$ as unable to solve Survive- $k$ -Steps for an infinite number of $k$ .

6.2 Computational Impossible $k$

Like the techniques of the previous section, the techniques here focus on a particular part of Survive- $k$ -Steps and try to prove a given tuple cannot solve it. Unlike the previous section, they use computational means.

To see the basic idea, recall Technique 2 (§6.1.1). To prove the $\blacksquare$ -to- $\square$ transition is impossible, we exploited patterns that appear in the inputs to inhomogeneous automata. For example, from the De Bruijn graph and bijection between inputs and cycles, it is clear appears in $x$ iff does. Relationships like this one create structure in the $\blacksquare$ -to- $\square$ transition which Technique 2 exploits.

Another way to prove the $\blacksquare$ -to- $\square$ transition is impossible is to brute-force all $2^{2w}$ rule array, initial state combinations and see if any did it. This seems computationally infeasible on the surface, but remarkably SAT solvers can easily produce proofs brute forcing will not work for values of $w$ like $31$ , used throughout this post. Computational proofs like this are used throughout this section.

Technique 2. The pattern $p\in\{\blacksquare,\square\}^3$ appears in $x$ , denoted $p\in x$ , if $\exists i:x_{i^-}=p_0\land x_i=p_1 \land x_{i^+}=p_2$ with $i^+,i^-$ as in (2). Letting $\langle a_E,x\rangle(\cdot,j)$ denote row $j$ of $\langle a_E,x\rangle$ (note 1), if

\forall a_E,x,j: p\in\langle a_E,x\rangle(\cdot,j)\rightarrow p\in\langle a_E,x\rangle(\cdot,j+1)

then we call $p$ a fixed-point pattern for $E$ . If $p$ is a fixed-point pattern for $E$ , $p$ has a $\blacksquare$ , $x$ has exactly one $\blacksquare$ cell, and for some $j$ , $\forall a_E: p\in\langle a_E,x\rangle(\cdot,j)$ , then Survive- $k$ -steps is unsolvable for $k>j$ .

Technique 3's correctness can be seen as follows: Because $p$ is a fixed-point pattern and $\blacksquare\in p$ , if $p$ appears in state $j$ , then all states $>j$ have a $\blacksquare$ . Survive- $k$ -Steps requires states $\geq k$ to have no $\blacksquare$ , so if starting from the initial state of Survive- $k$ -Steps, $p$ appears in state $j$ regardless of the rule array, then all states $>j$ have a $\blacksquare$ regardless of the rule array, so Survive- $k$ -Steps is unsolvable for all $k>j$ .

Lemma 3 shows an example of a fixed-point pattern.

Lemma 3. For $E=\{\lambda_{114},\lambda_{119}\}$ , if row $k$ contains , then all rows $\geq k$ contain , so is a fixed-point pattern. (note 2)

Proof. Consider the drawing below which shows the result of evaluating rules 114 and 119 on a row containing . The s denote cells whose values are undetermined. As rules 114 and 119 are similar, the center three cells are determined. On the second row, eventually occurs, as any cycle in $G$ which traverses the edge labeled will also traverse the edge labeled , so is a fixed-point pattern for $E=\{\lambda_{114},\lambda_{119}\}$ .

To check if Technique 3 applies for $j$ , a set of rules $E$ , and width $w$ , we construct a formula with clauses (7) and (8) (note 3) then let $\varphi_{p,j}$ be a formula that is true iff pattern $p$ appears in row $j$ of the automata.

\varphi_{p,j}=\bigvee_{0\leq i < w}\left(y_{i^-,j}=p_0\land y_{i,j}=p_1 \land y_{i^+,j}=p_2\right)

Next, for every $p\in\{\blacksquare,\square\}^3$ except we check if the formula $\varphi_{p,0}\land\neg\varphi_{p,1}$ is unsatisfiable. The set of unsatisfiable $p$ , denoted $U$ , are $E$ 's fixed-point patterns. To determine if a fixed-point pattern appears in state $j$ of all Survive- $k$ -Steps instances, for $x\in\{\blacksquare,\square\}^w$ with exactly one $\blacksquare$ , we construct a formula with clauses (7), (8), (9) and

\bigwedge_{p\in U}\neg\varphi_{p,j}.

If this formula is unsatisfiable, then a fixed point pattern with a $\blacksquare$ appears in state $j$ for all Survive- $k$ -Steps instances, so Technique 3 applies.

Technique 3. For any $E,w$ , let $G$ be a graph with verticies $\{0,1,\dots,w\}$ and an edge from $i$ to $j\neq i$ if it is possible to transition from a state with $i$ $\blacksquare$ cells to one with $j$ . If there is no path from $1$ to $0$ in $G$ , then $\neg\exists k: \text{s} k \text{s} (E,w)$ .

For example, the graph for $E=\{66, 106\},w=7$ is below.

Because there is no way to transition from a state with one $\blacksquare$ to zero, Survive- $k$ -Steps is unsolvable for all $k$ for $E=\{66, 106\},w=7$ .

Determining if it is possible to transition from a state with $n$ $\blacksquare$ to one with $m$ is done much like Technique 3. We construct a formula with clauses (7) and (8), let $\varphi_{n,j}$ be a formula that is true iff $n$ $\blacksquare$ appear in row $j$ of the automata, and check if $\varphi_{n,0}\land\varphi_{m,1}$ is satisfiable. We encode $\varphi_{n,j}$ using the sequential counter encoding of Sinz [40] provided by PySAT [35].

The example graph above contains redundant edges: Once it is known there are no outgoing edges from $1$ , the unreachability of $0$ follows, so there is no need to determine if there is an edge between $4$ and $5$ , for example. Technique 4's implementation performs breadth first search starting at $1$ and finishes early if $0$ is reached; Even so, search can be expensive on well-connected graphs. We use one more technique to catch some of these computationally difficult rules.

Technique 4. For a set of rules, $E$ , if state $k+1$ has a $\square$ cell if state $k$ does, and the only way to transition from a state with at least one $\blacksquare$ cell to one with all $\square$ cells is via a state with all $\blacksquare$ cells, then $w>1\rightarrow \nexists k, \text{s} k \text{s} (E,w)$ .

6.3 Infinitely Solvable $k$

We'll now formalize the technique for determining when the set of solvable $k$ is infinite from the introduction. We call rule arrays which cause the automaton to start and end in a state with a single $\blacksquare$ cell "grow gadgets".

Definition 2. For $k>0$ , a set of rules $E$ , and width $w$ , there is a Grow- $k$ -Gadget iff for $x\in\{\blacksquare, \square\}^w$ with a single $\blacksquare$ cell there exists $\mathfrak{A}=\langle a_E,x\rangle$ s.t.,

\tag{12} \forall j< k,\exists i, \mathfrak{A}(i,j)=\blacksquare

and the $(k+1)$ th state has a single $\blacksquare$ cell, namely

\tag{13} \exists i,\mathfrak{A}(i,k)=\blacksquare \land \forall i', (\mathfrak{A}(i',k)=\blacksquare\rightarrow i'=i).

When we have a grow gadget and a solution to Survive- $k$ -Steps, this means there is an infinite number of solvable $k$ . However, as we saw in the introduction, there may also be an infinite number of unsolvable $k$ . To determine if there is a finite number of unsolvable $k$ , we can use a classic result from number theory, which tells us if we synthesize a set of grow gadgets, the greatest common divisor of which is 1, then there is a finite number of unsolvable $k$ .

Lemma 4. Let $a_1, a_2, \ldots, a_n$ be positive integers such that $\gcd(a_1, a_2, \ldots, a_n) = 1$ . Then there exists a largest integer $N$ which cannot be expressed as a non-negative integer linear combination of $a_1, a_2, \ldots, a_n$ .

For details, see [41]. Lemma 5 describes precisely how to determine if the number of unsolvable $k$ is finite.

Lemma 5. For any width $w$ and set of automata $E$ , let

G= \{k\mid \text{Grow-$k$-Gadget($E,w$) is solvable}\},

and let

U= \{k\mid \neg \text{s} k \text{s} (E,w)\}.

If $\exists k, \text{s} k \text{s} (E,w)$ and $\gcd(G)=1$ , then $U$ has finitely-many elements.

Proof. Let $G_i$ denote the $i$ th element of $G$ , and suppose there is $k$ s.t. Survive- $k$ -Steps is solvable. By stacking grow gadgets, we can solve Survive- $j$ -Steps for any $j$ expressible as

j=k+\sum_{i=1}^{|G|} a_iG_i,

where $a_i\in\mathbb{Z}$ and $a_i\geq 0$ . Per Lemma 4, when $\gcd(G)=1$ , there is a largest number $N$ which cannot be expressed in terms of the above summation, so $|U|\leq N$ .

For every rule tuple, we tried to synthesize Grow- $k$ -Gadgets for every $k<51$ , stopping early if the GCD of synthesized gadgets was one. Of the 32,640 rule tuples, only 25 had at least one grow gadget and $\gcd(G)\neq 1$ , of the 25 all had either $\gcd(G)= 2$ or $\gcd(G)= 16$ . In every case we investigated this was due to wrapping. For example, rules 18 and 82 from the introduction §1.1 have a GCD of $16$ , because after this many steps on a width $31$ board wrapping causes a pattern their output differs on to appear.

6.4 Binary Decision Diagrams

Figure 9. Visualization of rule tuples classified by BDDs that are not classified by Techniques 1-6. In both images, pixel $(i,j)$ is black if $E=\{\lambda_i,\lambda_j\}$ is identified as solving Survive- $k$ -Steps for an (in)finite number of $k$ . See bdd_reachability.py in the associated code §8 for our implementation using the tulip-control/dd [42] package.

The techniques presented so far are sound but not complete. They tell us when Survive- $k$ -Steps is solvable for finite/infinite values of $k$ , but some tuples escape categorization. This section gives two computational techniques that are sound and complete. Within a limited computational budget they classify 72 tuples not classified by our previous techniques.

A binary decision diagram (BDD) is a data structure that provides a canonical, efficient representation of boolean functions. While their worst-case complexity is exponential, much like SAT solvers, BDDs work well on many common boolean functions. Using BDDs, we can compute a Boolean formula, the set of satisfying assignments to which corresponds to the set of states reachable by an inhomogeneous automaton. By manipulating this formula, we can determine when Survive- $k$ -Steps is solvable for a finite/infinite number of $k$ .

For a set of rules $E$ , let $R_E$ be any boolean formula over $x,x'\in\{\blacksquare,\square\}^w$ satisfying

R_E(x_0,x_1,\dots,x'_0,x'_1,\dots)\leftrightarrow\exists a_E,\forall 0\leq i < w, \langle a_E,x\rangle(i,1)=x'_i.

That is, $R_E(x,x')$ is true iff an inhomogeneous automaton using the rules in $E$ can transition from state $x$ to state $x'$ in one step. Concretely, $R_E$ could be,

\bigwedge_{0\leq i<w}\bigwedge_{c\in\{\blacksquare,\square\}^3} (x_{i^-}=c_0\land x_i=c_1\land x_{i^+}=c_2)\rightarrow(\bigvee_{\lambda\in E} x'_i=\lambda(c)).

Note that in the formula above, the $y_0,y_1,y_2$ values are literal values iterated over in $\bigwedge$ , not free variables. Thus, the expression $x_i=y_1$ becomes $\neg x_i$ in the end result if $y_i$ 's literal value is $\square$ and $x_i$ if $y_i$ 's literal value is $\blacksquare$ .

Given a formula for a set of states $Y_i$ , $R_E$ can be used to compute a formula for the set of states reachable in one step. Let $\varphi[x/y]$ denote the formula $\varphi$ with all free occurrences of the variable $x$ replaced by $y$ . For a length- $w$ vector of values $y$ , let $\varphi[x_i/y_i]$ be shorthand for $\varphi[x_1/y_1][x_2/y_2]\dots[x_w/y_w]$ . A formula(note 4) for the set of states reachable in one step from $Y_i$ is,

Y_{i+1}=\bigvee_{c\in\{\blacksquare,\square\}^w}(R_E\land Y_i)[x_i/c_i][x'_i/x_i].

In turn, a formula $A$ for the all the states reachable from $Y_0$ is given by, $A=\bigvee_{0\leq i}Y_i$ . Concretely, $A$ is a formula over the variables $x_0,x_1,\dots,x_w$ . An assignment to those variables makes $A$ true iff it corresponds to a state reachable from one of the states described by $Y_0$ .

To determine if $q\in\{\blacksquare,\square\}^w$ is an unreachable state, the formula

(\bigwedge_{0\leq i < w}(q_i=x_i))\land A

can be compared with the BDD canonical representation of False. If the all- $\square$ state is unreachable from an initial state, $Y_0$ , corresponding with one $\blacksquare$ cell, then Survive- $k$ -Steps is unsolvable.

As $A\subseteq\{\blacksquare,\square\}^w$ , $A$ is finite, and in turn can be computed by taking successive values of $Y_i$ until a fixed point is reached. Computing $A$ is not always feasible, so our implementation gives up after computing $Y_{10}$ if a fixed point had not been reached. Even so, we were able to determine an additional 19 tuples were unable to solve Survive- $k$ -Steps on a width- $31$ board which were not detected by previous techniques.

To determine when the set of solvable $k$ is infinite using a BDD, the basic idea is to compute a formula for the set of states that can reach the all- $\square$ state, are not the all- $\square$ state, and can be reached from an initial state with one $\blacksquare$ cell. Call this set $I$ . We look for a sequence of elements of $I$ , $i_0,i_1,\dots,i_{l-1},i_0$ , s.t. $\forall 0\leq j<l,R(i_j,i_{j+1})$ , i.e. a cycle of states. If a cycle exists, Survive- $k$ -Steps is solvable for an infinite number of $k$ , because a cycle in $I$ means there is a sequence of states which can be arbitrarily repeated and then transition to all- $\square$ . This is, in effect, a generalized version of grow gadgets (2).

To compute $I$ , start by computing $F$ , a formula for the set of states forward-reachable from an initial state with one $\blacksquare$ cell via the method above. Then let $\square^w$ denote the all- $\square$ state and compute $B$ the set of backward-reachable states from $\square^w$ . $I$ is $(F\cap B) \setminus \square^w$ .

Computing $B$ decomposes to computing $Y_i$ given a formula for $Y_{i+1}$ which is a slight variation on forward-reachability:

Y_i = \bigvee_{c\in\{\blacksquare,\square\}^w}(R_E\land Y_{i+1}[x_i/x'_i])[x'_i/c_i].

Now we'll prove our claim about cycles in $I$ and show how to determine if $I$ has a cycle.

Lemma 6. Let $G$ be the graph with vertices $I$ and edges

\{(x,y)\mid (x,y)\in (I\times I)\land R_E(x,y)\}.

There is a cycle in $G$ $\leftrightarrow$ Survive- $k$ -Steps is solvable for an infinite number of $k$ .

Proof. In the $\rightarrow$ direction, let $L$ be the set of vertices in $G$ 's cycle and for some $l\in L$ let $c_f$ be the number of transitions required to reach $l$ from the initial state and $c_b$ the number of transitions required to reach all- $\square$ from $l$ . By repeating $G$ 's loop $n$ times, one can solve Survive- $k$ -Steps for any $k$ expressible as $k=c_f+c_b+n|L|$ , an infinite number of $k$ .

In the $\leftarrow$ direction, let $k\geq2^w$ be a solvable value of $k$ and let $x_1,x_2,\dots,x_k$ be a sequence of states before reaching all- $\square$ and solving Survive- $k$ -Steps. As there are $2^w-1$ states containing a $\blacksquare$ cell, for some $i< i'$ $x_i=x_{i'}$ and $x_i,x_{i+1},\dots,x_{i'}$ is a cycle in $G$ .

Lemma 7. Let $G$ be defined as in Lemma 6, $I_0=I$ , and

I_{k+1}=\{i\mid i\in I\land \exists i'\in I_k, R(i',i)\}.

$\exists k, I_k\neq\varnothing\land I_k=I_{k+1}\leftrightarrow$ there is a cycle in $G$ .

Proof. In the $\rightarrow$ direction, let $G_k$ be defined like $G$ but containing only vertices in $I_k$ . We'll prove $G_k$ has a cycle, so as $G_k\subseteq G$ , $G$ does as well. Let $f:I^k\rightarrow I^k$ be any function s.t. $\forall i\in I^k,R(f(i),i)$ . The existence of $f$ is guaranteed as $I_k=I_{k+1}\rightarrow\forall i\in I_k,\exists i'\in I_k,R(i',i)$ . Starting from $i_0\in I_k$ , consider the sequence $i_0,f(i_0),f(f(i_0)),f(f(f(i_0))),\dots$ . Because $I_k$ is finite ( $I_k\subseteq\{\blacksquare,\square\}^w$ ) and non-empty, this sequence must eventually repeat a vertex, thus there is a cycle in $G_k$ .

In the $\leftarrow$ direction, if $G$ contains a cycle $L\subseteq I$ , then $\forall k, L\subseteq I_k$ as, for any $l\in L$ one can obtain $l'$ that transitions to $l$ in $k$ steps by following the cycle's edges in reverse for $k$ steps. Next, note $I_{k+1}\subseteq I_k$ as any state reachable via a sequence of $k+1$ elements of $I$ is reachable via a sequence of $k$ elements of $I$ . Thus, as $I$ is finite, there is some $k_0$ for which $I_{k_0}=I_{k_0+1}$ and because $L\subseteq I_{k_0}$ $I_{k_0}\neq\varnothing$ .

$I_{k+1}$ can be computed the same way as $Y_{i+1}$ above, with the added requirement the states remain in $I$ .

I_{k+1}=(\bigvee_{y\in\{\blacksquare,\square\}^w}(R_E\land I_k)[x_i/y_i][x'_i/x_i])\land I.

As with $Y_{i+1}$ , computing successive values of $I_k$ is not always computationally feasible. Even so, we were able to classify 53 tuples not classified in §6.3 as able to solve Survive- $k$ -Steps for an infinite number of $k$ on a width- $31$ board using this technique.

6.5 Experimental Results

Figure 10. Visualization of the number of tuples classified by each technique. If $n%$ of the tuples classified by technique $i$ are classified by technique $j$ , then $n%$ of technique $i$ 's bar is filled with technique $j$ 's color.

On a width- $31$ board (the same width used by Wolfram [1]), we ran Techniques 1 through 5 on every rule tuple and all $j\leq 17$ for Technique 3. Together they classified $12,140$ tuples as having a non-infinite number of solvable $k$ . Next, for each tuple, we attempted to synthesize Grow- $k$ -Gadgets until either their GCD was $1$ , or $k>50$ . $20,220$ tuples had at least one grow gadget, and all but 25 of these had a finite number of unsolvable $k$ (i.e. $\gcd(G)=1$ in Lemma 5). Together with our BDD techniques §6.4, we classified 99% of tuples as having a finite or infinite number of solvable $k$ . Figure 10 visualizes the overlap between techniques and compares the number of tuples each was able to classify.

We gave analytical methods for Techniques 1 and 2, but the remainder use a SAT solver. This being said every technique shows considerable structure in its visualization (c.f. Figure 8), so it seems possible analytical methods exist for Techniques 3-5 as well.

For the rule tuples which were not categorized by Techniques 1-5, we gave sound and complete algorithms using BDDs in §6.4. These trade runtime for completeness, so we were not able to classify all rule tuples, but having complete techniques is somewhat philosophically satisfying.

BDDs are also capable of analysis we did not explore. For example, using a BDD it is possible to count the number of satisfying assignments to a formula. Future work could consider if the number of satisfying assignments is correlated with the performance of evolutionary synthesis algorithms. Future work could also improve our cycle detection algorithm by looking for loops which end in a rotated version of the initial state (the kind of "rotation" discussed in Lemma 2). By the pigeonhole principle, these rotating loops will eventually result in a regular loop, detectable by our current analysis, but it is possible looking for rotations could find cycles more efficiently.

7 Why Are SAT Solvers Effective?

The effectiveness of SAT solvers at synthesis is interesting given how large the search space is. For example, even for the Survive- $4$ -Steps example in the intro to this post, there are $2^{20}$ possible choices of rule array, and in our experiments (§5) we synthesize solutions on $31\times 50$ boards with $2^{1550}$ possible choices of rule array. For these synthesis problems to be tractable in some instances, there must be structural properties of Survive- $k$ -Steps or elementary cellular automata interactions which are exploited by the solver.

To investigate this we used Biere's SATCH [43] solver which is specifically designed for instrumentation and (dis/en)abling solver features. We ran two experiments. First (§7.2) we disabled six popular features one by one and measured synthesis performance. We found the configuration that disabled learned clauses [44] but kept heuristics related to their metadata was the fastest. To understand this, our second experiment measured synthesis performance on all permutations of features disambiguating the fastest configuration and pure DPLL [5]. We found variable selection heuristics explained the difference as they cause the solver to focus in on the rule array, $r_{i,j}$ variables (c.f. §4) as opposed to the $y_{i,j}$ variables or intermediate ones introduced during Tseitin [34] encoding. For a small sampling of tuples, we analyzed the frequency different $r_{i,j}$ variables were selected during synthesis and found considerable structure that differed across tuples.

7.1 Background

A Boolean formula is in conjunctive normal form (CNF) if it is a conjunction of disjunctions of variables which may or may not be negated. SAT solvers take formulas in CNF and determine if there exists an assignment to the formula's variables that makes the formula true.

For example, consider the toy problem of placing three pigeons in two holes, subject to the constraint that no pigeons share a hole. Obviously this is not possible, and we will step through how a SAT solver will determine this. Let the variable $p_{i\rightarrow j}$ be true if pigeon $i$ is placed in hole $j$ . So every pigeon goes in some hole, the following CNF formula must be true

\tag{14} (p_{1\rightarrow 1}\lor p_{1\rightarrow 2})\land (p_{2\rightarrow 1}\lor p_{2\rightarrow 2})\land (p_{3\rightarrow 1}\lor p_{3\rightarrow 2}).

Notice how this is a conjunction (logical and, $\land$ ) of disjunctions (logical or, $\lor$ ). To ensure no pigeons share a hole:

\begin{aligned} &(\lnot p_{1\rightarrow 1}\lor \lnot p_{2\rightarrow 1}) \land (\lnot p_{1\rightarrow 1}\lor \lnot p_{3\rightarrow 1}) \land (\lnot p_{2\rightarrow 1}\lor \lnot p_{3\rightarrow 1}) \land \\ &(\lnot p_{1\rightarrow 2}\lor \lnot p_{2\rightarrow 2}) \land (\lnot p_{1\rightarrow 2}\lor \lnot p_{3\rightarrow 2}) \land (\lnot p_{2\rightarrow 2}\lor \lnot p_{3\rightarrow 2}). \end{aligned}

The most fundamental algorithm in SAT solving is called DPLL [5] and relies on the observation that if a disjunction in a CNF formula has a single element, then the corresponding variable must be assigned in a way that makes the disjunction true. For example, in the formula $(a)\land (\neg a\lor b\lor c)$ , because $a$ appears alone, $a$ must be true, and after propagating this fact the satisfiability of the formula reduces to the satisfiability of $(b\lor c)$ . Disjunctions with one element are called unit and the combined procedure is called unit propagation. DPLL is an exhaustive search for variable assignments combined with unit propagation.

For example, suppose DPLL begins by setting $p_{1\rightarrow 1}$ to true. After propagating this assignment, unit propagation sets $p_{2\rightarrow 1}$ and $p_{3\rightarrow 1}$ to false to avoid conflicts in the first hole. Propagating those assignments through (14) causes unit propagation to set $p_{2\rightarrow 2}$ and $p_{3\rightarrow 2}$ to true, but this makes the clause $(\lnot p_{2\rightarrow 2}\lor \lnot p_{3\rightarrow 2})$ false, so DPLL backtracks. DPLL then tries setting $p_{1\rightarrow 1}$ to false, which by symmetry causes the solver to explore the same structure as before, just with the first and second holes swapped.

This is a limitation of DPLL: it performs redundant search through symmetric configurations, since many subproblems differ only by permuting pigeons or holes. Modern SAT solvers use an algorithm called conflict driven clause learning (CDCL) [44] whereby they learn conflict clauses that can generalize from one failure to symmetric cases, pruning large portions of the search space.

In our pigeonhole example with CDCL, suppose, as before, the solver sets $p_{1\rightarrow1}$ to true, leading to propagated assignments that falsify $(\lnot p_{2\rightarrow2}\lor \lnot p_{3\rightarrow2})$ . A DPLL solver would backtrack and flip $p_{1\rightarrow1}$ to false, re-exploring an equivalent symmetric space, but CDCL performs conflict analysis and adds a new clause

(\lnot p_{1\rightarrow1}),

to the formula, learning that the first pigeon cannot be put in the first hole. This learned clause then immediately triggers unit propogation, resulting in the solver determining the CNF is unsatisfiable. In this example, the difference between CDCL and pure DPLL is fairly minor, but broadly speaking the invention of CDCL has been a breakthrough in SAT solving [48], [49], [6].

DPLL and CDCL are the foundation of modern SAT solvers. On top of these are variable selection heuristics which cause DPLL to pivot on variables occurring frequently in learned clauses, strategies for optimizing unit propagation, and many other techniques. The remainder of this section considers what techniques are responsible for SAT solver's remarkable efficiency synthesizing inhomogeneous cellular automata

7.2 Experiment 1

Figure 11. Performance synthesizing solutions to Survive- $31$ -Steps on a width $31$ board across 1,000 random rule tuples for six SAT solver configurations relative to the default configuration. A negative value means the configuration was faster. From left to right, these correspond to the --no-learn, --no-restart, --no-cdcl, --no-vsids, and --no-watch configurations of the Biere's SATCH [43] solver.

We followed the approach of a previous empirical study [45] of SAT solver features. First, 1,000 tuples of elementary rules were sampled. For each tuple, we generated a CNF formula for Survive- $31$ -Steps on a width $31$ board, then randomized the formula's clauses' contents and clause order ten times, giving ten semantically equivalent versions of each formula to average performance data over. We ran each SAT solver configuration on on all 10,000 formulas and took the average runtime across a tuple's randomized CNFs as the tuple's solve time.

This experiment investigated six SAT solver configurations, listed below. These are the same configurations tested by Katebi et. al. [45], chosen because they cover some of the most significant advances in SAT solving.

No Learn disables learned clauses, but keeps features like variable selection heuristics that use metadata from learned clauses.
No Restart stops the solver from periodically restarting the search. Restarts can be useful, for example, when the solver makes a poor decision early on causing it to explore a doomed part of the search space.
No CDCL disables all features related to learned clauses and uses the pure DPLL algorithm.
No VSIDS causes the solver to use the VMTF [7] variable selection heuristic instead of VSIDS [6].
No Watch disables the two-watched literals technique [6] for detecting unit propagation opportunities.

An illustration of the results is given by Figure 11, we ran each configuration with a ten-second timeout, and timed-out runs were given a value of ten seconds in the illustration. Interestingly, disabling clause learning consistently sped up synthesis, whereas the pure DPLL algorithm had high variance and frequently timed out. This suggests, that while the overhead from learned clauses slows down the solver, their metadata is useful.

7.3 Experiment 2

Figure 12. A comparison of the features disambiguating No Learn' and No CDCL' in Figure 11. For each permutation of features disambiguating the two, a solver was compiled and timed solving Survive- $31$ -Steps for 1,000 random rule tuples on a width 31 board. The runtime above is the average across configurations where a feature is (en/dis)abled.

To better understand the difference between pure DPLL and the No Learn configurations, we repeated the experiment described in the first paragraph of §7.2 using SAT solvers configured with every permutation of the features which disambiguate pure DPLL and No Learn. Concretely, when the five configuration options listed below are disabled along with clause learning, the resulting configuration is equivalent to the pure DPLL configuration. For every way to (en/dis)able these five options along with No Learn we compiled a solver, and measured the performance of each solver using the method described in the first paragraph of §7.2. Then, for each feature, we took the average runtime on the randomly sampled CNF formulas for configurations it was (en/dis)abled in. The results are illustrated in Figure 12.

Best trail rephrasing, corresponds to the –no-best configuration option for SATCH.
Chrono corresponds to non-chronological backtracking and the –no-chrono configuration option.
Focused (en/dis)ables focused mode [46] which causes the solver to alternate between periods of slow/rapid restarts. There is some empirical evidence [47] that periods of slow/rapid restarts are better for UNSAT/SAT formulas. Corresponds to the $\texttt{--no-focused}$ configuration option.
VMTF/VSIDS corresponds to variable selection heuristics as described in §7.2.

Of the features disambiguating pure DPLL and the No Learn configurations, variable selection heuristics had the largest impact. To understand this, we generated heatmaps of the frequency $r_{i,j}$ variables were selected using the VSIDS heuristic for a small, random selection of rule tuples. Three interesting examples are shown in Figure 13 alongside a visualization of a random assignment to the rule array. These examples, together with the effectiveness of variable selection heuristics shown in Figure 12, suggest there may be structural properties of the automata that are exploited by variable selection heuristics. More analysis of this could be interesting for future work.

Figure 13. Three rule tuples. (Left Column) Heatmap of the frequency variables $r_{i,j}$ are selected using the VSIDS [6] variable selection heuristic while synthesizing a solution to Survive- $31$ -Steps on a width $31$ board. A darker value at row $j$ column $i$ means $r_{i,j}$ was more frequently a decision variable. (Right Column) An inhomogeneous automaton using the rule tuple with a randomly initialized rule array. Visual similarities suggest VMTF could be exploring interesting properties of the automata.

8 Conclusion

Wolfram proposed the Survive- $k$ -Steps problem and explored its connections to biology and neural networks. We gave novel methods for synthesizing and analyzing inhomogeneous cellular automata using Boolean satisfiability and BDDs and analyzed the problem. First, we exhaustively characterized what tuples of elementary cellular automata could be composed to survive $k$ steps on a width-31 board for $0<k\leq 50$ . Then, we introduced several analytical and computational techniques to determine if the set of solvable $k$ is finite or infinite, demonstrating significant underlying structure within the problem. Finally, we studied what heuristics drive SAT solver performance, finding learned clauses are not effective, but variable selection heuristics are. Our work provides new analysis techniques for inhomogeneous cellular automata and suggests avenues for future work such as investigating the structure in our computational techniques to derive analytical versions (c.f. Figure 8), looking for theoretical insights that explain the effectiveness of variable selection heuristics, and extending our results on SAT solver performance to new synthesis algorithms and cellular automata classes.

Code Availability

Data and code to reproduce the experiments and figures in this post is available at: https://git.sr.ht/~ekez/survive-k/

Appendix A. Extended Proofs

A.1 Lemma 1

Proof. For $x\in\{\blacksquare,\square\}^w$ with exactly one $\blacksquare$ , Let $\mathfrak{A}=\langle a_E,x\rangle$ and $\mathfrak{A}'=\langle a_E',x\rangle$ . We'll show that because $a_E$ solves Survive- $k$ -Steps, $\forall i,j: \mathfrak{A}(i,j)=\mathfrak{A}'(i,j)$ , so $a_E'$ solves Survive- $k$ -Steps.

When $j\leq k$ this is trivially the case. When $j\geq k$ we'll proceed by induction using $j=k$ as our base case. Note that when $j\geq k$ , $\mathfrak{A}(i,j)=\square$ as $a_E$ solves Survive- $k$ -Steps, so for our inductive hypothesis we get $\mathfrak{A}'(i,j-1)=\mathfrak{A}(i,j-1)=\square$ .

\begin{aligned} \mathfrak{A}'(i,j) &= a_E'(i,j)(\mathfrak{A}'(i^-,j-1),\mathfrak{A}'(i,j-1),\mathfrak{A}'(i^+,j-1)) \\ &= a_E'(i,j)(\square,\square,\square) \\ &= a_E(i,k+1)(\square,\square,\square) \\ &= \square \\ &= \mathfrak{A}(i,j). \end{aligned}

Notes

$\langle a_E,x\rangle(\cdot,j)_i=\langle a_E,x\rangle(i,j)$
With Lemma 3, it is relatively straightforward to prove that when $w$ is odd, rules 114 and 119 can only survive $k$ steps for $k\leq \lceil w/2 \rceil$ . The basic idea being, at this height, on odd-width boards, due to the similarity of rules 114 and 119, wrapping forces to occur.
Notice, the omission of (9) leaves the initial state unconstrained.
Obviously, when $w=31$ constructing this is intractable, so we use the logically-equivalent exist method provided by tulip-control/dd [42].

References

What's Really Going On in Machine Learning? Some Minimal Models. Wolfram, Stephen. Stephen Wolfram Writings. 2024.
Inhomogeneous Cellular Automata (INCA). Hartman, H. and Vichniac, G. Y.. Disordered Systems and Biological Organization. 1986.
Why Does Biological Evolution Work? A Minimal Model for Biological Evolution and Other Adaptive Processes. Wolfram, Stephen. Stephen Wolfram Writings. 2024.
Foundations of Biological Evolution: More Results & More Surprises. Wolfram, Stephen. Stephen Wolfram Writings. 2024.
A machine program for theorem-proving. Davis, Martin, Logemann, George, Loveland, Donald. Communications of the ACM. 1962.
Chaff: Engineering an Efficient SAT Solver. Matthew W. Moskewicz and Conor F. Madigan and Ying Zhao and Lintao Zhang and Sharad Malik. Proceedings of the 38th Design Automation Conference, DAC 2001, Las Vegas, NV, USA, June 18-22, 2001. 2001.
Efficient algorithms for clause-learning SAT solvers. L. Ryan. 2004.
On the generation of high-quality random numbers by two-dimensional cellular automata. Tomassini, M., Sipper, M., Perrenoud, M.. IEEE Transactions on Computers. 2000.
Inhomogeneous cellular automata modeling for mixed traffic with cars and motorcycles. Lan, Lawrence W., Chang, Chiung-Wen. Journal of Advanced Transportation. 2005.
Cellular automaton public-key cryptosystem. Rio Piedras, PR. Complex Systems. 1987.
A survey of cellular automata: types, dynamics, non-uniformity and applications. Bhattacharjee, Kamalika, Naskar, Nazma, Roy, Souvik, Das, Sukanta. Natural Computing. 2020.
Time-Reversal in Conway's Life as SAT. Stuart Bain. AI 2007: Advances in Artificial Intelligence, 20th Australian Joint Conference on Artificial Intelligence, Gold Coast, Australia, December 2-6, 2007, Proceedings. 2007.
Mathematical Games: The fantastic combinations of John Conway's new solitaire game. Gardner, Martin. Scientific American. 1970.
Compositional Object Synthesis in Game of Life Cellular Automata Using SAT Solver. Nishimura, Haruki, Hasebe, Koji. Advances in Swarm Intelligence: 12th International Conference, ICSI 2021, Qingdao, China, July 17–21, 2021, Proceedings, Part II. 2021.
SAT-Based Analysis of Cellular Automata. Massimo D'Antonio and Giorgio Delzanno. Cellular Automata, 6th International Conference on Cellular Automata for Research and Industry, ACRI 2004, Amsterdam, The Netherlands, October 25-28, 2004, Proceedings. 2004.
A Six-State Minimal Time Solution to the Firing Squad Synchronization Problem. Jacques Mazoyer. Theor. Comput. Sci.. 1987.
Remarks on the cellular automaton global synchronisation problem: deterministic versus stochastic models. Fates, Nazim. Natural Computing. 2019.
Discovery by Genetic Programming of a Cellular Automata Rule that is Better than any Known Rule for the Majority Classification Problem. David Andre, Forrest H. Bennett III, John R. Koza. Genetic Programming 1996: Proceedings of the First Annual Conference, July 28-31, 1996, Stanford University. 1996.
Research of Complex Forms in Cellular Automata by Evolutionary Algorithms. Sapin, Emmanuel and Bailleux, Olivier and Chabrier, Jean-Jacques. Artificial Evolution. 2004.
Synthesis of Desired Binary Cellular Automata Through the Genetic Algorithm. Suzuki, Satoshi and Saito, Toshimichi. Neural Information Processing. 2006.
Evolving Cellular Automata. Cenek, Martin and Mitchell, Melanie. Computational Complexity: Theory, Techniques, and Applications. 2012.
Evolving S-boxes based on cellular automata with genetic programming. Picek, Stjepan, Mariot, Luca, Leporati, Alberto, Jakobovic, Domagoj. Proceedings of the Genetic and Evolutionary Computation Conference Companion. 2017.
Evolutionary algorithms for designing reversible cellular automata. Mariot, Luca, Picek, Stjepan, Jakobovic, Domagoj, Leporati, Alberto. Genetic Programming and Evolvable Machines. 2021.
Co-evolving non-uniform cellular automata to perform computations. Moshe Sipper. Physica D: Nonlinear Phenomena. 1996.
Evolving Cellular Automata to Perform User-Defined Computations. Grouchy, Paul, D'Eleuterio, Gabriele M.T.. 2016.
Thread: Differentiable Self-organizing Systems. Mordvintsev, Alexander, Randazzo, Ettore, Niklasson, Eyvind, Levin, Michael, Greydanus, Sam. Distill. 2020.
Differentiable Logic CA: from Game of Life to Pattern Generation. Pietro Miotti, Eyvind Niklasson, Ettore Randazzo, Alexander Mordvintsev. Paradigms of Intelligence. 2025.
RC2: an Efficient MaxSAT Solver. Alexey Ignatiev and Antonio Morgado and Joao Marques-Silva. J. Satisf. Boolean Model. Comput.. 2019.
Program Synthesis. Sumit Gulwani, Oleksandr Polozov, Rishabh Singh. Foundations and Trends® in Programming Languages. 2017.
Syntax-Guided Synthesis. Rajeev Alur and Rastislav Bodik and Eric Dallal and Dana Fisman and Pranav Garg and Garvit Juniwal and Hadas Kress-Gazit and P. Madhusudan and Milo M. K. Martin and Mukund Raghothaman and Shambwaditya Saha and Sanjit A. Seshia and Rishabh Singh and Armando Solar-Lezama and Emina Torlak and Abhishek Udupa. Dependable Software Systems Engineering. 2015.
Cellular Automata Surviving $k$ Steps. Zeke Medley and Panagiotis Manolios. Proceedings of the 19th International Workshop on the ACL2 Theorem Prover and Its Applications, Austin, TX, USA and online, May 12-13, 2025. 2025.
ACL2s: "The ACL2 Sedan". Peter C. Dillinger, Panagiotis Manolios, Daron Vroon, J. Strother Moore. Proceedings of the 7th Workshop on User Interfaces for Theorem Provers (UITP 2006). 2007.
A New Kind of Science. Wolfram, Stephen. 2002.
On the Complexity of Derivation in Propositional Calculus. Tseitin, G. S.. Automation of Reasoning. 1968.
PySAT: A Python Toolkit for Prototyping with SAT Oracles. Alexey Ignatiev and Antonio Morgado and Joao Marques-Silva. Theory and Applications of Satisfiability Testing - SAT 2018 - 21st International Conference, SAT 2018, Held as Part of the Federated Logic Conference, FloC 2018, Oxford, UK, July 9-12, 2018, Proceedings. 2018.
Universality in Elementary Cellular Automata. Matthew Cook. Complex Syst.. 2004.
On Patterns and Dynamics of Rule 22 Cellular Automaton. Genaro J. Martinez and Andrew Adamatzky and Rolf Hoffmann and Dominique Deserable and Ivan Zelinka. Complex Syst.. 2019.
A Combinatorial Problem. De Bruijn, N. G.. Proceedings of the Koninklijke Nederlandse Akademie van Wetenschappen. 1946.
Finding and counting given length cycles. Noga Alon, Raphael Yuster, Uri Zwick. Algorithmica. 1997.
Towards an Optimal CNF Encoding of Boolean Cardinality Constraints. Carsten Sinz. Principles and Practice of Constraint Programming -- CP 2005. 2005.
Coin Problem. Weisstein, Eric W.. From MathWorld--A Wolfram Web Resource.
dd: Binary Decision Diagrams in pure Python and Cython wrappers of CUDD, Sylvan, and BuDDy. Filippidis, Ioannis, Haesaert, Sofie, Livingston, Scott C., Wenzel, Mario. Accessed: 2025-03-25. 2025.
CNF Encodings of Complete Pairwise Combinatorial Testing of our SAT Solver Satch. Armin Biere. Proc. of SAT Competition 2021 -- Solver and Benchmark Descriptions. 2021.
Conflict-Driven Clause Learning SAT Solvers. Joao Marques-Silva and Ines Lynce and Sharad Malik. Handbook of Satisfiability - Second Edition. 2021.
Empirical Study of the Anatomy of Modern Sat Solvers. Hadi Katebi and Karem A. Sakallah and Joao P. Marques Silva. Theory and Applications of Satisfiability Testing - SAT 2011 - 14th International Conference, SAT 2011, Ann Arbor, MI, USA, June 19-22, 2011. Proceedings. 2011.
Cadical, kissat, paracooba, plingeling and treengeling entering the sat competition 2020. Fleury, ABKFM, Heisinger, Maximilian. SAT COMPETITION. 2020.
Between SAT and UNSAT: The Fundamental Difference in CDCL SAT. Chanseok Oh. Theory and Applications of Satisfiability Testing - SAT 2015 - 18th International Conference, Austin, TX, USA, September 24-27, 2015, Proceedings. 2015.
The Silent (R)evolution of SAT. Johannes K. Fichte, Daniel Le Berre, Markus Hecher, Stefan Szeider. Communications of the ACM. 2023.
GRASP: a search algorithm for propositional satisfiability. Marques-Silva, J.P., Sakallah, K.A.. IEEE Transactions on Computers. 1999.

Cellular Automata Surviving kkk Steps