Varol Cagdas Tok

Personal notes and articles.

Protocol State Machines as Attack Surfaces

Protocol implementations are state machines. A server receiving network input transitions through defined states based on what it receives, in what sequence, and under what conditions. Each state represents a commitment: memory allocated, timers registered, CPU spent on processing. An attacker who understands the state machine can find states that are expensive to enter and cheap to maintain, transitions that are asymmetric in cost, and terminal conditions that are hard to reach from hostile inputs.

The SYN flood is the most widely known example, but it is only the most prominent instance of a broader pattern. TLS handshake exhaustion, BGP session reset attacks, TCP connection table overflow, application session exhaustion, these all follow the same logic. The protocol's own design, applied adversarially, becomes the attack vector.


TCP: The Canonical Case

The TCP state machine is defined in RFC 793. A connection transitions through CLOSED → LISTEN → SYN-RECEIVED → ESTABLISHED → various termination states. Each state has specific processing requirements and resource implications.

The SYN flood targets the SYN-RECEIVED state. When a server in LISTEN receives a SYN, it allocates a connection record (a Transmission Control Block, or TCB), sends a SYN-ACK, and transitions to SYN-RECEIVED. It then waits for the final ACK that would complete the three-way handshake and move the connection to ESTABLISHED. If the ACK never arrives, the connection remains in SYN-RECEIVED until a timeout expires, typically controlled by the tcp_synack_retries parameter, each retry doubling the wait time, often yielding a total timeout of approximately 63 seconds with default settings.

The resource being exhausted is the SYN backlog queue, a finite table of SYN-RECEIVED connections. Implementations vary, but the kernel must allocate enough memory for each pending half-open connection to store the full TCB or at minimum a hash table entry representing it. When the queue fills, new SYN packets are silently dropped. Legitimate clients attempting to connect receive no response.

RFC 793 was written in 1981 for a cooperative internet with a known, small population of hosts. The three-way handshake's design reflects an environment where source address spoofing was not an operational concern. The first party to commit resources, the server, does so before verification of the client's reachability, because in 1981 the scenario being designed for did not include hostile participants systematically spoofing source addresses.

SYN Cookies: The Stateless Handshake

SYN cookies, proposed by Dan Bernstein and Eric Schenk in the mid-1990s, address the SYN flood by deferring state allocation until the handshake completes. The design encodes connection parameters into the initial sequence number (ISN) of the SYN-ACK:

ISN = MSS_encoded(3 bits) | hash(src_ip, src_port, dst_ip, dst_port, timestamp, secret)(24 bits) | timestamp(5 bits)

The hash function uses a server secret, making it computationally infeasible for an attacker to construct a valid ACK without having received the SYN-ACK. When the ACK arrives, the server verifies the ACK number equals ISN + 1, reconstructs the connection parameters from the encoded fields, and allocates the TCB only at that point.

The consequence is that no TCB exists between SYN and ACK. A flood of SYN packets from spoofed addresses consumes no persistent state on the server, each SYN is answered with a SYN-ACK, and if no ACK arrives, nothing was allocated. The server can continue accepting connections regardless of the SYN arrival rate (up to CPU limits for computing hashes and sending SYN-ACKs).

The TCP options problem with SYN cookies is worth understanding precisely. Options negotiated in the SYN/SYN-ACK exchange, SACK permission, window scaling, timestamps, must be conveyed in the ACK for the server to know what was agreed. Without state, the server cannot recover them. The 3-bit MSS field in the ISN can encode one of eight MSS values, allowing MSS to survive. Window scaling and SACK negotiation are handled by a separate mechanism introduced later: TSOPT-based SYN cookies encode option information in the timestamps field of the ACK when the client supports TCP timestamps. Without this, connections established through SYN cookies fall back to smaller windows and no selective acknowledgment, which impacts throughput for high-bandwidth connections.

This degradation is acceptable under attack conditions and invisible under normal conditions (when SYN cookies are activated only when the backlog fills). The Linux implementation activates SYN cookies dynamically when the SYN backlog overflows, allowing normal option negotiation for connections that arrive when the queue has capacity.

TCP Connection Table Overflow Beyond SYN Flooding

SYN flooding is not the only way to exhaust TCP connection state. A full connection that is ESTABLISHED also occupies a socket entry. An attacker who can complete the three-way handshake and maintain open connections exhausts the socket file descriptor table or the ephemeral port range.

For a server handling connections on port 443, the connection is identified by the four-tuple (server_ip, 443, client_ip, client_port). The server's side is fixed, so the number of simultaneous connections is bounded by the number of distinct (client_ip, client_port) combinations. An attacker controlling many IP addresses (a botnet, or a cloud provider with a large IP pool) can exhaust this by holding connections open without sending data or by sending data slowly enough to prevent the server from closing the connection.

This is functionally a CLOSE_WAIT exhaustion attack when the server has attempted to close connections that the client refuses to acknowledge. A server that calls close() on a socket moves to FIN_WAIT_1 and eventually to FIN_WAIT_2 or TIME_WAIT. If the client never sends the final ACK, the connection lingers. The socket is not usable for a new connection while it occupies a TIME_WAIT entry, and TIME_WAIT has a default duration of 2*MSL (Maximum Segment Lifetime), typically 60 to 120 seconds. Under conditions of high connection churn to hostile clients, the TIME_WAIT table fills.


TLS Handshake Exhaustion

TLS introduces an asymmetric computation cost at connection establishment that makes it a natural target for state exhaustion attacks. The server performs asymmetric cryptographic operations, RSA or elliptic curve Diffie-Hellman, during the handshake before client authentication. These operations are orders of magnitude more expensive than symmetric operations.

For RSA key exchange with a 2048-bit key, a server performs a private key operation (approximately 3–10 milliseconds on a general-purpose CPU) for each new connection. An attacker sending many TLS ClientHello messages forces the server to perform this operation repeatedly. The attacker's cost is sending a ClientHello (one round trip, minimal computation); the server's cost is a private key operation per connection.

The TLS 1.3 design improves this through its 1-RTT and 0-RTT modes, but the fundamental asymmetry remains. The server still performs expensive operations for each new connection before the client has demonstrated legitimacy.

Session resumption mechanisms (TLS session tickets, session IDs) address repeated connections from known clients by allowing the server to skip the key exchange for resumed sessions. Against an attacker who never resumes, who sends a fresh ClientHello each time, resumption provides no benefit.

TLS ClientHello flooding is distinct from SYN flooding in that it requires completing the TCP handshake, which prevents source address spoofing. This makes the attacker population smaller, it requires actual IP addresses that can complete connections, and enables source-based rate limiting and reputation filtering. However, distributed attacks from botnets or cloud infrastructure circumvent source-based controls.

The amplification factor is more extreme at the TLS layer than at the TCP layer. Each TLS ClientHello consumes far more server CPU than a SYN. A server that handles 10,000 new TCP connections per second may only handle a few hundred new TLS connections per second due to the asymmetric cryptographic cost. This makes TLS-level exhaustion accessible to attackers with modest traffic volumes.


BGP Session Reset

Border Gateway Protocol maintains persistent TCP sessions between peers. These sessions carry routing advertisements that determine how traffic flows across the internet. A BGP session that resets causes the peers to withdraw routes from the affected session and reconverge, which may cause routing instability and traffic blackholing.

The BGP session reset attack targets the TCP state machine underlying the BGP session. BGP runs over TCP (port 179). If an attacker can cause a RST packet to arrive at a BGP peer connection, the TCP connection terminates, the BGP session fails, and the peers must re-establish the session and re-exchange full routing tables.

The classical form of this attack exploits predictable TCP sequence numbers. If an attacker can guess the sequence number of an in-window RST packet, they can inject it into an existing TCP session without having been party to the connection. The BGP session must use source address spoofing, the attacker crafts a packet with the source address of one BGP peer targeting the other, but the TCP session is already established, so the attack does not need to complete a handshake.

RFC 4271 (BGP-4) and RFC 4272 (BGP Security Vulnerabilities Analysis) documented this. The sequence number space for a 32-bit ISN with a 1 Gbps connection is traversed in a finite time, making brute-force sequence number guessing feasible over time. GTSM (Generalized TTL Security Mechanism, RFC 5082) mitigates this by requiring BGP packets to have a TTL of 255, which spoofed packets from non-directly-connected attackers cannot satisfy. MD5 authentication (RFC 2385) for BGP TCP sessions adds a per-packet authentication tag that prevents injection. Both mechanisms are now standard practice.

A related attack is the BGP route injection attack, which is not a DoS in the traditional sense but achieves a denial of service effect: an attacker advertising more-specific prefixes can attract traffic to their infrastructure, causing a blackhole or redirection. This blurs the line between routing manipulation and denial of service.


SCTP: Designed for Resilience, Not Invulnerable

Stream Control Transmission Protocol was designed in part to address TCP's denial of service vulnerabilities. SCTP's INIT/INIT-ACK/COOKIE-ECHO/COOKIE-ACK four-way handshake defers state allocation to the third message, similar in principle to SYN cookies. The server includes a state cookie in the INIT-ACK that the client must echo back in the COOKIE-ECHO. Only upon receiving the valid COOKIE-ECHO does the server allocate the association state.

This design eliminates the half-open connection vulnerability that plagues TCP. An INIT flood does not fill a state table because no state is created at the INIT-ACK step, only a stateless response is generated.

SCTP has its own attack surfaces, however. Multi-homing support, the ability to associate multiple IP addresses with a single connection, introduces a path validation mechanism that can be abused. The ASCONF mechanism for adding new addresses to an existing association was found to have vulnerabilities related to address validation timing. SCTP chunk flooding using SACK and similar control chunks creates CPU exhaustion through protocol processing overhead. The protocol's complexity, in part a consequence of its resilience features, expands the attack surface in other dimensions.


Application Session State Exhaustion

The pattern repeats at the application layer. Any application protocol that allocates state before verifying the legitimacy of the requesting party is vulnerable to state exhaustion.

HTTP session management is the most common application-layer case. A web application that creates a database session, allocates application memory, or acquires a connection from a pool when a request arrives, before authentication or any rate limiting, provides an attacker with the ability to exhaust these resources by flooding requests. The server allocates a session for each request; the attacker's requests are never completed legitimately; the sessions accumulate until they time out.

The slowloris attack, examined in more detail in a later post, is a pure example of state exhaustion at the HTTP layer. An attacker opens many connections and sends HTTP headers slowly, one byte at a time, at intervals just below the server's header timeout. The server holds each connection open waiting for the complete request headers. The attacker accumulates connections until the server's connection limit is reached, blocking legitimate clients.

The key property of slowloris is that the attacker never completes a request, they only occupy the state of a connection waiting to receive one. The resource being exhausted is the server's thread pool (in threaded server models) or the event loop's file descriptor table (in event-driven models). Both are finite.


The General Pattern

Across all these cases, the attack structure is the same. The target protocol has states that represent resource commitments. These states can be entered in response to traffic the target receives. The conditions under which these states are exited, what causes the resource to be released, involve either successful completion of the protocol (the legitimate case) or timeout expiration. The attacker's strategy is to fill these states at a rate faster than timeouts expire them.

The defense, in the abstract, is to move resource commitment later in the protocol, to require the remote party to demonstrate something (reachability at the claimed address, possession of a secret, ability to perform some computation) before the server commits significant resources. SYN cookies demonstrate reachability. DTLS cookies demonstrate reachability over UDP. Proof-of-work challenges in HTTP push computational cost to the client. Each mechanism shifts the resource commitment point later and raises the attacker's cost per unit of server resource consumed.

The trade-off is functionality. Moving commitment later eliminates some information about what was negotiated before commitment, or adds round trips, or excludes clients who cannot perform the required computation. These costs are acceptable for some applications and unacceptable for others.

Legacy protocols designed without this awareness still run the bulk of internet traffic.