Time, Clocks, and the Ordering of Events in a Distributed System - Paper Review

This is a review of Leslie Lamport's paper Time, Clocks, and the Ordering of Events in a Distributed System. I focus on the paper's proof of conditions under which a distributed system of physical clocks can produce an ordering of events, which does not violate causality. The proof lies in the appendix of the paper, and is extremely brief: Not unlike reading a mystery novel. I expand on the paper by presenting the proof, all the math worked out, explained with a less mysterious level of brevity.

Introduction

We are presented with a set of processes, $P$ , each $P_i \in P$ has a clock $C_i$ . $C_i\langle a \rangle$ returns the timestamp of event $a$ occurring on $P_i$ ; $C_i(t)$ returns the timestamp of $P_i$ 's clock at physical time $t$ . We assume all processes are moving at the same speed, so there's no need to worry about relativity.

We define a relation $\rightarrow$ , or "happens before", as a relation on the set of events such that:

If $a$ and $b$ are events occurring on $P_i$ , and $C_i\langle a \rangle < C_i \langle b \rangle$ , then $a \rightarrow b$ .
If $a$ is the sending of a message, and $b$ the receiving, then $a \rightarrow b$ .

A nice property of $\rightarrow$ is, $a$ can causally effect $b$ if and only if $a\rightarrow b$ . Because, for events $a$ and $b$ occurring on $P_i$ and $P_j$ , $a$ can only effect $b$ if $P_j$ has learned about $a$ by receiving a message from $P_i$ .

To break ties, we order events according to $\prec$ .

\begin{equation} a \prec b = \begin{cases} i < j, & \text{if \( C_i\langle a\rangle = C_j\langle b\rangle \)} \\ C_i\langle a\rangle < C_j\langle b\rangle, & \text{otherwise} \end{cases} \end{equation}

$\prec$ will never violate causality, as two events occurring at the same time can't cause each other.

For our system of clocks to create an ordering of events satisfying $\rightarrow$ , we define two clock conditions.

C1: If $a$ and $b$ are events on $P_i$ , and $a$ comes before $b$ , then $C_i\langle a\rangle < C_i\langle b\rangle$ .

C2: If $a$ is the sending of a message by process $P_i$ , and $b$ the message's receipt by process $P_j$ , then $C_i\langle a\rangle < C_j\langle b\rangle$ .

C1 ensures the first rule of $\rightarrow$ is satisfied, and C2 the second. Thus, a system of clocks satisfying the clock conditions will order events with a $\rightarrow$ relation.

Logical Clocks

Before considering physical clocks, we consider a simpler clock. Logical clocks satisfy the clock conditions, but don't order events in a way compatible with our notion of time.

To order events with a logical clock, processes follow two rules.

IR1: Each process $P_i$ increments $C_i$ between any two successive events.

IR2: (a) If an event $a$ is the sending of a message $m$ by process $P_i$ , then the message contains a timestamp $T_m = C_i\langle a\rangle$ . (b) Upon receiving a message $m$ in event $b$ , a process $P_j$ sets $C_j$ greater than or equal to its present value, and greater than $T_m$ .

Processes implementing these rules satisfy the clock conditions C1 and C2, by IR1 and IR2 respectively.

Time, and Logical Clock Time

In a sense, logical clock time and the time we are used to line up. $\prec$ guarantees events happening before a message is sent, are ordered before events on the receiving process after the message is received. This preserves causality, as in order for an event on $P_i$ to cause an event on $P_j$ a message must be sent between the processes.

When events can not cause eachother, our logical clocks order events according to the number of events preceding them. This has no basis in our typical notion of time. To address this, we turn to physical clocks.

Physical Clocks

Physical clocks track time as we perceive it. A watch is a physical clock, and like a watch, physical clocks can become out of sync. This can cause causality to break; If one clock is so far ahead of another the receiving timestamp is smaller than the sending one, then receive $\rightarrow$ send.

This section models a physical clock, derives the conditions under which causality can break, and proves what conditions will ensure it does not.

Physical Clock Conditions

To model a physical clock, we put two conditions on its behavior.

PC1: There exists a constant $k \ll 1$ , such that for all $i$ : $|\frac{dC_i(t)}{dt} - 1| < k$ .

PC2: There exists a constant $\epsilon$ such that, for all $P_i$ , $P_j$ : $|C_i(t) - C_j(t)| < \epsilon$ .

There are two important things to note about these conditions. First, physical clocks are not naturally imbued with them. In a real system, a process might turn off indefinitely, in which case the clock ceases advancing, and $\epsilon$ is no longer bounded. In this sense, the math in this paper isn't easily mapped to reality. Second, while physical clocks will generally run at rates close to correct (making PC1 reasonable), they will not automatically synchronize (making PC2 unreasonable, without a synchronization algorithm).

Anomalous Clocks

Clocks become out of sync when they run at the wrong rate. Mathematically speaking we look to the derivative. For a clock running at the correct speed $\frac{dC(t)}{dt} = 1$ , too fast $\frac{dC(t)}{dt} > 1$ , and too slow $\frac{dC(t)}{dt} < 1$ .

Breaking Causality

For any two processes communicating, there is a lower bound on how quickly they may exchange messages. At the limit, this is the distance between the processes, divided by the speed of light. We call this lower bound $\mu$ .

Our goal is to find values of $\epsilon$ for which, for any message, sent by $P_i$ at $t$ , and received by $P_j$ at $t'$ , $C_i(t) < C_j(t')$ . If this is true, $\prec$ will satisfy C2. As $\mu$ is a lower bound on message send time, $t' > t + \mu$ . Thus, to satisfy C2:

\begin{equation} C_i(t) \leq C_j(t + \mu) \end{equation}

By PC1, the slowest possible clock advances at a rate $\frac{dC_j(t)}{dt}>(1-k)$ . Thus, $C_j(t+\mu) > C_j(t) + (1-k)\mu$ . As we are interested in solving the innequality for all possible values of $C_j(t+\mu)$ , we plug in the lower bound into the right side of (2), yielding a bound on $\epsilon$ .

\begin{equation} \begin{aligned} C_i(t) &\leq C_j(t + \mu) \\ C_i(t) &\leq C_j(t) + (1-k)\mu \\ C_i(t) - C_j(t) &\leq (1-k)\mu \\ \epsilon &\leq \mu(1-k) \end{aligned} \end{equation}

If this inequality is true, $\prec$ will satisfy C2. And thus, for software satisfying C1, $\prec$ will preserve causality. If the inequality is false, it will be possible to order the receiving of a message before its sending, causing $\prec$ to break causality.

Preserving Causality

Having discovered conditions under which causality breaks, we set about describing an algorithm which preserves causality, and prove it works. Our algorithm is the same one used for logical clocks, slightly modified for physical time.

IRP1: The physical clocks for each process are always advancing: $\frac{dC_i(t)}{dt} > 0$ .

IRP2: (a) For each $i$ , if $P_i$ sends a message $m$ at physical time $t$ , then $m$ contains a timestamp $T_m = C_i(t)$ . (b) Upon receiving a message $m$ at $t'$ , process $P_j$ sets $C_j(t') \geq \max(C_j(t'), T_m + \mu)$ .

We model our processes as being arranged in a graph, the edges of which represent communication links. We make three assumptions about our graph and the behavior of its processes.

The diameter, $d$ , of the graph is the shortest number of links which connects any two processes.
For a message sent at physical time $t$ , and received at $t'$ , we define the total delay $v_m = t' - t$ . $v_m < \xi_m + \mu_m$ where $\xi_m$ is the maximum unpredictable delay of the message, and $\mu_m$ the minimum predictable delay for the message. For speaking about the entire graph, we define $v=\max(v_m)$ , $\xi = \max(\xi_m)$ , and $\mu = \max(\mu_m)$ , for all possible $m$ .
Every $\tau$ seconds a message is sent along every communication link.

With these assumptions, we set about finding a value for $\epsilon$ . To do so, we will find lower and upper bounds on $C_i(t)$ , then subtract them to find a value for $\epsilon$ .

The Slowest Clock

Say process $P_1$ sends a message to $P_2$ at time $t_1$ , which is received at time $t_2$ . Per PC1, $C_2(t)$ must be at least as fast as the slowest clock.

\begin{equation} \begin{aligned} C_2(t) \geq C_2(t_2) + (1-k)(t-t_2) \\ \quad \text{for } t \geq t_2 \end{aligned} \end{equation}

Per IRP2, after the message has been received, the clock of the receiving process must satisfy $C_2(t_2) \geq C_1(t_1) + \mu_m$ . Combining this with (4).

\begin{equation} C_2(t) \geq C_1(t_1) + \mu_m + (1-k)(t-t_2) \end{equation}

Turning our attention to the second term on the right side of (5).

\begin{equation} \begin{aligned} (1-k)(t-t_2) &= (1-k)(t-t_1+t_1-t_2) \\ &= (1-k)(t-t_1) + (1-k)(t_1-t_2) \\ &= (1-k)(t-t_1) + (k-1)(t_2-t_1) \end{aligned} \end{equation}

Combining (5) and (6).

\begin{equation} \begin{aligned} & C_1(t_1) + \mu_m + (1-k)(t-t_2) \\ &= C_1(t_1) + \mu_m + (1-k)(t-t_1) + (k-1)(t_2-t_1) \\ &= C_1(t_1) + (1-k)(t-t_1) - [(t_2 - t_1) - \mu_m] + k(t_2-t_1) \end{aligned} \end{equation}

We now briefly digress into delay.

\begin{equation} \begin{aligned} \xi_m &= v_m - \mu_m \\ &= (t_2-t_1) - \mu_m \\ &\leq \xi \end{aligned} \end{equation}

Combining (7) and (8).

\begin{equation} \begin{aligned} C_2(t) &\geq C_1(t_1) + (1-k)(t-t_1) - [(t_2 - t_1) - \mu_m] + k(t_2-t_1) \\ &\geq C_1(t_1) + (1-k)(t-t_1) - \xi + k(t_2-t_1) \\ &\geq C_1(t_1) + (1-k)(t-t_1) - \xi \end{aligned} \end{equation}

(9) sets a lower bound on the receiving clock's time, in terms of the sending clock. Now, imagine a series of processes $[P_1,\dots,P_N]$ . At time $t_i$ , process $P_i$ receives a message from process $P_{i-1}$ , sent at time $t_{i-1}'$ , where $t_{i-1}' \leq t_i \leq t_i'$ . Consider what happens when we repeatedly apply (9) to $C_3$ .

\begin{equation} \begin{aligned} C_3(t) &\geq C_2(t_2') + (1-k)(t-t_2') \\ &\geq [C_1(t_1') + (1-k)(t_2'-t_1') - \xi] +(1-k)(t-t_2') -\xi \\ &= C_1(t_1') + (1-k)(t-t_1') - 2\xi \end{aligned} \end{equation}

Every time we apply the inequality, it subtracts $\xi$ , and decreases the time and clock subscript by one. For a process $P_i$ , there is a chain of $(i-1)$ processes leading up to it, so (9) can be applied $(i-1)$ times to find a lower bound on $P_i$ 's clock.

\begin{equation} \begin{aligned} C_{n+1}(t) \geq C_1(t_1')+(1-k)(t-t_1')-n\xi \\ \quad \text{for } t > t_n \end{aligned} \end{equation}

Finally, as a cleanup step, as $C_1(t_1') \geq (1-k)(t_1'-t_1)$ , we can write (11) in terms of $t_1$ .

\begin{equation} \begin{aligned} C_{n+1}(t) \geq C_1(t_1)+(1-k)(t-t_1)-n\xi \\ \quad \text{for } t > t_{n+1} \end{aligned} \end{equation}

You may want to take a bong rip at this point.

Now imagine this scenario, applied to our graph of processes. Recall assumptions (1) and (3), which say our graph has diameter $d$ , and every process sends a message to every other process every $\tau$ units of time. If we imagine timestamps propagating through the graph as messages are sent, every $d(\tau + v)$ units of time, a timestamp from every process will have had the opportunity to propagate to every other process.

We can thus apply our earlier inequality for the minimum value of a clock in a series of processes, for a series of processes of length $d$ . For any two processes, when $t\geq t_1+d(\tau + v)$ :

\begin{equation} \begin{aligned} C_i(t) \geq C_j(t_1)+(1-k)(t-t_1)-d\xi \\ \quad \text{for } t \geq t_1 + d(\tau + v) \end{aligned} \end{equation}

This sets a lower bound on the value of a clock, in terms of some known value of another clock and the amount of time that has passed. We are now half way to finding $\epsilon$ . In the next section, we look for an upper bound.

The Fastest Clock

Say at time $t_x$ , $C_x$ is the clock with the highest value. By PC1:

\begin{equation} \begin{aligned} C_i(t) \leq C_x(t_x) + (1+k)(t - t_x) \\ \quad \text{for } t \geq t_x \end{aligned} \end{equation}

However, this argument is subtly incomplete. When a process receives a message, it sets its clock's value to be $\geq T_m + \mu_m$ (IRP2). Imagine $P_x$ sends a message at $t_x$ , setting $T_m = C_x(t_x)$ . Upon receiving the message at time $t'$ , the receiver sets their clock to $T_m + \mu_m$ . Is it possible for $T_m + \mu_m > C_x(t_x) + (1+k)(t'-t_x)$ ? If so, (14) would be incorrect.

Let's examine the inequality.

\begin{equation} \begin{aligned} C_x(t_x) + (1+k)(t'-t_x) &< T_m + \mu_m \\ C_x(t_x) + (1+k)(t'-t_x) &< C_x(t_x) + \mu_m \\ (1+k)(t'-t_x) &< \mu_m \end{aligned} \end{equation}

The smallest value of $k$ is $0$ which occurs if a clock is running at exactly the correct rate. We can plug this into the inequality, and say it is satisfied if, $t'-t_x < \mu_m$ . This is false, as by definition $\mu_m$ is less than the smallest possible transmission time for the message. Hence, (14) remains true, even when considering clocks settings themselves ahead upon receiving messages from the fastest clock.

Synchronicity

Equation (13) holds for any $i, j$ , so setting $j=x$ , $t_1=t_x$ , and combining (13) and (14).

\begin{equation} \begin{aligned} C_x(t_x) + (1-k)(t-t_x)-d\xi \leq \\ C_i(t) \leq C_x(t_x) + (1+k)(t - t_x) \\ \quad \text{for } t \geq t_x + d(\tau + v) \end{aligned} \end{equation}

This holds for all $i$ , and thus bounds the value of a clock. The farthest off two clocks can be, is the gap between the fastest and slowest possible clocks. That is, the difference between the upper and lower bound of (16).

\begin{equation} \begin{aligned} &C_x(t_x) + (1+k)(t - t_x) - [C_x(t_x) + (1-k)(t-t_x)-d\xi] \\ &= (1+k)(t-t_x) - (1-k)(t-t_x) + d\xi \\ &= 2k(t-t_x) + d\xi \end{aligned} \end{equation}

Thus, the maximum difference between clocks in our system is:

\begin{equation} \begin{aligned} |C_i(t) - C_j(t)| \leq 2k(t-t_x) + d\xi \\ \quad \text{for } t \geq t_x + d(\tau + v) \end{aligned} \end{equation}

Notice how (18) applies to all $t$ , $t_x$ , where $t \geq t_x + d(\tau + v)$ . This means, for any $t$ , we can find a $t_x$ , such that $t-t_x = d(\tau + v)$ . This is the lowest value for $t-t_x$ , so we can replace the $t-t_x$ term in (18).

\begin{equation} \begin{aligned} |C_i(t) - C_j(t)| &\leq 2k(t-t_x) + d\xi \\ &\leq 2kd(\tau + v) + d\xi \\ &= d(2k(\tau + v) + \xi) \\ &\text{for } t \geq t_0 + d(\tau + v) \end{aligned} \end{equation}

This concludes our proof for a bound on $\epsilon$ . Our algorithm takes $d(\tau + v)$ units of time to synchronize, and then the amount two clocks may be out of sync is bounded.

\begin{equation} \begin{aligned} \epsilon \leq d(2k(\tau + v) + \xi) \\ \end{aligned} \end{equation}

Conclusion

In equation (3), we found the maximum $\epsilon$ that preserves causality. We can combine (3) and (20) investigate this further.

\begin{aligned} \epsilon &\leq (1-k)\mu \\ d(2k(\tau + v) + \xi) &\leq (1-k)\mu \\ \end{aligned}

All of these variables, except $\tau$ describe physical properties of the system. $\tau$ , the rate at which messages are sent, is a property of the software. So, we rearrange the equation in terms of $\tau$ .

\tau \leq \frac{(1-k)\mu/d - \xi}{2k} - v

Selecting a value for $\tau$ which satisfied this inequality, would guarantee ordering events such that, if $a$ caused $b$ , $a \prec b$ .

Thanks to M for reviewing.