Impossibility of Distributed Consensus with One Faulty Process - Paper Review

This is a review of Impossibility of Distributed Consensus with One Faulty Process by M. J. Fischer, N. A. Lynch, and M. S. Patterson (referred to as the FLP result). I restate the paper's proof no consensus algorithm running in an asynchronous network can be totally correct. I've filled in the use of "by easy induction" with induction, use slightly different notation, and you can hover or click underlined terms to show their definition. I likely guild the lily.

Introduction

We are presented with a set of processes, $P$ , which implement a consensus algorithm. Each $p \in P$ may communicate with other processes with messages. Pending messages are stored in a message buffer. When $p$ attempts to receive a message, it receives a random pending message addressed to it, or $\varnothing$ . A process receiving infinitely many times is guaranteed to receive all pending messages. Though, for a single receive, it is valid for the system to return $\varnothing$ despite a pending message being available.

At any time, $p$ may take a step where it attempts to receive a message. A step by $p$ is an event, denoted $e_p \in E$ . A run, denoted $\sigma \in \Sigma$ , is a sequence of events. We assume processes don't have clocks (no timeouts), and thus only change their state upon receiving a message.

The set of possible configurations of the system is denoted $C$ . Each configuration contains a possible system state. The configuration after applying an event is denoted $e(C)$ , and after applying a run $\sigma(C)$ . A configuration $C'$ is reachable from $C$ if $\exists \sigma \in \Sigma: C'=\sigma(C)$ . Accessible configurations are reachable from an initial configuration.

Lemma 0: If an event is applicable to $C$ it is applicable to all $C'$ reachable from $C$ .

We say $P$ correctly implements a consensus algorithm, insofar as all messages sent can be received. Messages may be delated infinitely and are ordered randomly; thus, messages are applicable to all configurations reachable from the one in which they are sent.

Lemma 1: If the set of processes taking steps in $\sigma_1$ and $\sigma_2$ are disjoint, $\sigma_1(\sigma_2(C)) = \sigma_2(\sigma_1(C))$ .

This follows, as the runs modify disjoint parts of the system's state.

Consensus

Each $p \in P$ has a decision value $y_p \in \{\text{undecided}, 0, 1\}$ . As $P$ is correct, if any $p\in P$ has a decision value, all $p\in P$ will eventually have the same one. The goal of consensus is for all processes to agree on a decision value.

An initial configuration is a configuration with an empty message buffer, for which no process has a decision value, i.e. $\forall p \in P: y_p = \text{undecided}$ .

Partially correct consensus algorithms satisfy two requirements. (1) No accessible configuration has more than one decision value. (2) Both 0 and 1 are accessible decision values.

These correctness requirements seem reasonable. (1) requires processes come to agreement, and (2) they can come to agreement on all options.

A configuration is deciding if a process has a decision value. A consensus algorithm is totally correct if it is partially correct, and every admissible run is deciding. The difference between totally and partially correct consensus algorithms is, partially correct ones may never make a decision.

A configuration with a decision value of 1 is 1-valent, and 0 is 0-valent. If there are 1-valent and 0-valent configurations reachable from $C$ , $C$ is bivalent. If there are only 1-valent or only 0-valent configurations reachable from $C$ , $C$ is univalent.

Lemma 2: All totally correct consensus algorithms have a bivalent initial configuration.

Imagine each initial configuration arranged in a graph. Two configurations are connected if they differ by the state of only one process. As we've placed no constraints on initial configurations, other than the message buffer be empty and no decision value be reached, each configuration is connected to every other configuration by a chain of configurations.

We prove Lemma 2 by contradiction. Assume there is no bivalent initial configuration. For a totally correct consensus algorithm, each configuration must be deciding, and, as there are no bivalent configurations, must be one of 0-valent or 1-valent. Thus, our graph must contain a 0-valent configuration connected to a 1-valent one. Call these two configurations $C_0$ and $C_1$ respectively.

Let $p$ be the process who's state differs between $C_0$ and $C_1$ , and $\sigma$ be a run from $C_0$ in which $p$ takes no steps. As $C_0$ and $C_1$ differ only by the state of $p$ , and $p$ takes no steps in $\sigma$ , $C_{\sigma} = \sigma(C_0) = \sigma(C_1)$ . As $C_{\sigma}$ is reachable from $C_0$ , it must be 0-valent. As $C_{\sigma}$ is reachable from $C_1$ , it must be 1-valent. Thus, we have arrived at a contradiction and one of $C_1$ or $C_0$ is bivalent, proving Lemma 2.

Lemma 3: Let $C$ be a bivalent configuration of a partially correct consensus algorithm, and $e_p$ be an event applicable to $C$ . Let $\Psi$ denote the set of configurations reachable from $C$ without applying $e_p$ , and $\Omega$ the set of configurations reachable from $\Psi$ by applying $e_p$ : $\Omega = \{e_(\psi) \mid \psi \in \Psi\}$ . $\Omega$ contains a bivalent configuration.

By Lemma 0, as $e_p$ is applicable to $C$ , $e_p$ is applicable to all configurations in $\Psi$ . This validates our construction of $\Omega$ .

We prove Lemma 3 by contradiction. Assume there is no bivalent configuration in $\Omega$ . Let $E_i$ be an $i$ -valent configuration reachable from $C$ , as $C$ is bivalent $i \in \{0, 1\}$ . If $E_i \in \Psi$ , $e_p(E_i)$ is in $\Omega$ and $i$ -valent, as $E_i$ is $i$ -valent. Otherwise, $E_i \in \Omega$ . In either case $i$ can be $0$ or $1$ , so $\Omega$ contains both 0-valent and 1-valent configurations.

Call two configurations neighbors if one is reachable from the other via a single event. Let $e_p(C)$ be an $i$ -valent configuration $\in \Omega$ . As $\Omega$ contains both 0 and 1 valent configurations, $\exists C' \in \Psi:$ $e_p(C')$ is a $(1-i)$ -valent configuration. Let $\sigma$ be a run where $C'=\sigma(C)$ . As the valence of $e_p(C)$ and $e_p(C')$ is different, there must exist neighbor configurations $C_i$ and $C_{1-i}$ and an event $e_{p'} \in \sigma$ , such that $C_{1-i} = e_{p'}(C_i)$ , $e_p(C_{1-i})$ is $(1-i)$ -valent, and $e_p(C_i)$ is $i$ -valent.

Case 1, $p' \neq p$ . By Lemma 1 $e_p(e_{p'}(C_i)) = e_{p'}(e_p(C_i))$ ; however, this is a contradiction, as $e_p(e_{p'}(C_i)) = e_p(C_{1-i})$ is $(1-i)$ -valent, and $e_{p'}(e_p(C_i))$ is $i$ -valent (as $\Omega$ is assumed to contain only univalent configurations, and $e_p(C_i) \in \Omega$ is $i$ -valent).

Case 2, $p = p'$ . Consider any deciding run from $C_i$ in which $p$ takes no steps, denoted $\sigma$ . By Lemma 1, $e_p(\sigma(C_i)) = \sigma(e_p(C_i))$ , which is $i$ valent per the definition of $C_i$ . Note also, $e_p(e_{p'}(\sigma(C_i))) = \sigma(e_p(e_{p'}(C_i)))$ , which is $(1-i)$ -valent, by the definition of $e_{p'}$ and $C_i$ . Thus, $\sigma(C_i)$ is bivalent; however, this is a contradiction, as $\sigma$ is deciding (and thus univalent).

Consensus Is Impossible

Lemma 3 tells us, for any event $e_p$ , and bivalent configuration $C$ , there is an admissible run, $\sigma$ , such that $e_p(\sigma(C))$ is bivalent. Lemma 1 tells us there is always a bivalent initial configuration. We can use these to construct an algorithm for advancing a partially correct consensus algorithm, such that it never reaches a decision value.

Starting from a bivalent configuration $C$ , when a process sends a message to be received in $e_p$ , apply $\sigma$ , then apply $e_p$ . $e_p(\sigma(C))$ is bivalent, so this can be repeated indefinitely.

A previous essay described RAFT (a contemporary consensus algorithm), even RAFT is not totally correct. If the network is partitioned, and no single partition contains the majority of nodes, RAFT will stop advancing.