korg-paper/sections/case_studies.tex

%!TEX root = ../main.tex

In this section we describe three case studies: the Transmission Control Protocol (TCP), a data transfer protocol, SCTP, another data transfer protocol, and Raft, a state machine replication protocol.

\subsection{TCP}%
\label{sub:TCP}

Transmission Control Protocol (TCP) is a transport-layer protocol designed to establish reliable, ordered communications between two peers. TCP is ubiquitous in today's internet, and therefore has seen ample formal verification efforts \cite{Cluzel_Georgiou_Moy_Zeller_2021, Smith_1997, Pacheco2022}, including using \promela and \spin \cite{Pacheco2022}.
%A previous version of \korg has been applied TCP in \cite{Pacheco2022, Hippel2022};
%in particular, we study our \korg extensions using the hand-written TCP \promela model from \cite{Pacheco2022}.
We construct a TCP \promela model referencing the set of TCP RFCs.
For our analysis, we borrow the four LTL properties used in \cite{Pacheco2022}, as detailed below:
%we study our \korg extensions using the \promela models from Pacheco et al., which includes a "gold" model whose underlying state machine is derived via an NLP-based algorithm applied to the SCTP RFC \cite{rfc9260} and a "canonical" model hand-written by domain experts \cite{Pacheco2022}.
\[
\begin{aligned}
\phi_1 &= \text{\parbox[t]{20em}{No half-open connections.}} \\
\phi_2 &= \text{\parbox[t]{20em}{Passive/active establishment eventually succeeds.}} \\
\phi_3 &= \text{\parbox[t]{20em}{Peers don't get stuck.}} \\
\phi_4 &= \text{\parbox[t]{20em}{\texttt{SYN\_RECEIVED} is eventually followed by \texttt{ESTABLISHED}, \texttt{FIN\_WAIT\_1}, or \texttt{CLOSED}.}}
\end{aligned}
\]

We evaluated the TCP \promela model against \korg's drop, replay, and reordering attacker models on a single uni-directional communication channel. The resulting breakdown of attacks discovered is shown in Figure \ref{res:tcp-table}.

%Evaluating the canonical TCP model using \korg led us to identify edge-cases in the connection establishment routine that weren't accounted for, leading us to construct a "revised" TCP model accounting for these missing edge cases.
\begin{figure}[h!]
\centering
\label{res:tcp-table}
\begin{scriptsize}
\begin{tabular}{|c|c|c|c|}
\hline
                & Drop Attacker & Replay Attacker & Reorder Attacker\\\hline
$\phi_1$  &                          &                          &\\
$\phi_2$  &                      x & x                       &  \\
$\phi_3$  &                         &                          &\\
$\phi_4$  &                         &                           &\\
\hline
\end{tabular}
\end{scriptsize}

\caption{Automatically discovered attacks against
%the hand-written TCP model from Pacheco et al.  and our own,
our TCP model for $\phi_1$ through $\phi_4$. "x" indicates an attack was discovered, and no "x" indicates \korg proved the absence of an attack via an exhaustive search. These experiments were ran on a laptop with an eighth generation i7 and 16gb of memory. Full attack traces are available in the artifact.}
\label{res:tcp-table}
\end{figure}

\begin{comment}
\begin{figure}[h!]
\centering
\begin{scriptsize}
\begin{tabular}{|@{}c@{}|@{}c@{}|@{}c@{}|@{}c@{}|@{}c@{}|@{}c@{}|@{}c@{}|@{}c@{}|@{}c@{}|@{}c@{}|}
\hline
& \multicolumn{3}{c|}{\footnotesize \raisebox{-0.15ex}{Drop Attacker} } & \multicolumn{3}{c|}{\footnotesize \raisebox{-0.15ex}{Replay Attacker}} & \multicolumn{3}{c|}{\footnotesize \raisebox{-0.15ex}{Reorder Attacker}} \\
\hline
& \: Gold \: & \: Expert \: & \: Revised \: & \: Gold \: & \: Expert \: & \: Revised \: & \: Gold \: & \: Expert \: & \: Revised \: \\
\hline
  $\phi_1$ & \rule{0pt}{8pt} & & & & The resulting breakdown of attacks discovered is shown in Figure \ref{res:tcp-table}.
          & & & & \\
$\phi_2$ & \rule{0pt}{8pt} & x & x & & x & x & & x & \\
$\phi_3$ & \rule{0pt}{8pt} & & & & & & & & \\
$\phi_4$ & \rule{0pt}{8pt} x & & & & & & x & & \\
\hline
\end{tabular}
\end{scriptsize}

\label{res:tcp-table}
\caption{Automatically discovered attacks against the gold, canonical (labeled "expert"), and revised TCP models for $\phi_1$ through $\phi_4$. "x" indicates an attack was discovered, and no "x" indicates \korg proved the absence of an attack via an exhaustive search. Full attack traces are available in the artifact.}
\end{figure}

\end{comment}

\subsection{SCTP}%
\label{sub:SCTP}
SCTP is a transport-layer protocol proposed as an alternative to TCP, featuring a four-way handshake, multi-homing, and multi-streaming. Among other use cases, SCTP is the data transfer protocol for various telecoms signaling protocols as well as WebRTC. For our analysis, we borrow the ten LTL properties and \promela models derived from the SCTP RFCs as described in \cite{Ginesin2024}. We evaluated the SCTP \promela model against \korg's drop, replay, and reordering attacker models on a single uni-directional communication channel. The drop attacker model was specified to max out at three dropped packets, while the replay and reordering attacker model was specified to max out at two packets. SCTP is designed to resist drop, replay, and reordering attackers \cite{rfc9260}, and we employ \korg to exhaustively demonstrate this is the case.

\subsection{Raft}%
\label{sub:Raft}
Raft is a consensus algorithm designed to replicate a state machine across distributed peers, and sees broad usage in distributed databases, key-value stores, distributed file systems, distributed load-balancers, and container orchestration. Historically, verification efforts of Raft using both constructive, mechanized proving techniques \cite{Woos_Wilcox_Anton_Tatlock_Ernst_Anderson_2016, Wilcox_Woos_Panchekha_Tatlock_Wang_Ernst_Anderson, Ongaro} and automated verification \cite{Ongaro} have reasoned about the protocol under certain assumptions about the stability of the communication channels. Previously, Raft has been proven to maintain properties of interest with respect volatile, attacker-controlled channels constructively using Rocq\footnote{Previously known as Coq} \cite{Wilcox_Woos_Panchekha_Tatlock_Wang_Ernst_Anderson}. However, no previous approach to Raft verification has reasoned explicitly about a coordinated, arbitrary on-channel attacker \textit{external} to the protocol itself. Uniquely, \korg enables us to study Raft in this context.

Referencing the original Raft thesis \cite{Ongaro} and other raft models \cite{Woos_Wilcox_Anton_Tatlock_Ernst_Anderson_2016}, we constructed a \promela model of the Raft protocol. Additionally, we derived and formalized the following properties, which our \promela model satisfies:
\[
\begin{aligned}
\phi_1 &= \text{\parbox[t]{20em}{No two servers can be leaders in the same term.}} \\
\phi_2 &= \text{\parbox[t]{20em}{Entries committed to the log at the same index must be equivalent.}} \\
\phi_3 &= \text{\parbox[t]{20em}{Only leaders may append entires to the log.}} \\
\phi_4 &= \text{\parbox[t]{20em}{If a leader commits at an index, any server that becomes leader afterwards must follow that commit.}} \\
\phi_5 &= \text{\parbox[t]{20em}{If any two servers commit the same log entry, the log entry at the previous index must be equivalent}}
\end{aligned}
\]
We construct our Raft model such that we can model-check an arbitrary number of peers. We also designed our model such that each peer maintains separate channels for receiving AppendEntry requests, AppendEntry responses, RequestVote requests, and RequestVote responses. This gives \korg ample handle to reason about Raft. In particular, we study Raft in the presence of drop and replay attackers on all four aforementioned channel types, attacking both a minority and majority of peers.

To test \korg, we altered our original Raft model to introduce a subtle bug in the raft consensus mechanism by not ensuring votes come from unique peers. We'll refer to our original, correct Raft model as \texttt{raft.pml}, and our buggy Raft model as \texttt{raft-bug.pml}. Both \texttt{raft.pml} and \texttt{raft-bug.pml} passed on $\phi_1$-$\phi_5$ (that is, assuming the channels are perfect). We assess \texttt{raft-bug.pml} with \korg, and a breakdown of our findings is shown in Figure \ref{res:raft_table}.

\begin{figure}[h!]
\label{res:raft_table}
\centering
\begin{scriptsize}
\begin{tabular}{|c|c|}
\hline
Scenario & Attack found? \\
\hline
Dropping AppendEntries messages & no \\
Dropping RequestVote messages & no \\
Replaying RequestVote messages & yes ($\phi_1, \phi_4$ violated) \\
Replaying AppendEntry messages & no \\
Dropping RequestVoteResponse messages & no \\
Dropping AppendEntryResponse messages & no \\
\hline
\end{tabular}
\end{scriptsize}
\caption{Breakdown of the attacker scenarios assessed with \korg against our buggy Raft \promela model, \texttt{raft-bug.pml}. In all experiments, the Raft model was set to five peers and the drop/replay limits of the gadgets \korg synthesized were set to two. We conducted our experiments on a research computing cluster, allocating 250GB of memory to each verification run. The full models and attacker traces are included in the artifact.}
\label{res:raft_table}
\end{figure}
In our experiments, we found just one attack on our \texttt{raft-bug.pml} \promela model, violating election safety in particular. In this scenario, peer A and peer B are candidates for election. Peer A receives three votes, one from itself and two from other peers, and Peer B receives two votes, one from itself and one from another peer. The replay attacker simply replays the vote sent to peer B. Then, both Peer A and Peer B are convinced they won the election and change their state to leader. Following this, leader completeness is also naturally violated. In this scenario, \korg demonstrates its ability to discover subtle bugs in protocol logic, exploiting the buggy Raft implementation.

% these attacker models, and we employ \korg to exhaustively demonstrate this is the case.

%our Raft model satisfies $\phi_1$-$\phi_5$ assuming perfect channels, and \korg allowed us to reason precisely about the effect of imperfect, vulnerable channels.


%To be clear, this is not an attack on the general Raft protocol, but rather an attack on our specific Raft implementation: in this case, the bug \korg exploits involves our Raft model not ensuring votes received are from unique peers\footnote{Naturally, this requires cryptography and therefore is challenging to express in the semantics of \promela.}. In general, the complete Raft protocol has been proven to resist drop and replay attackers \cite{Woos_Wilcox_Anton_Tatlock_Ernst_Anderson_2016}.
% We note our analysis is in no