Varol Cagdas Tok

Personal notes and articles.

Botnet Architecture and DDoS Coordination

A single host cannot generate the traffic volume required to saturate modern network infrastructure. The terabit-scale attacks documented in recent years, Mirai in 2016 peaking at approximately 1.2 Tbps against OVH, Memcached attacks in 2018 reaching 1.35 Tbps against GitHub, several campaigns in 2020–2023 exceeding 2 Tbps, depend on coordinating traffic from thousands to millions of compromised hosts. The coordination infrastructure is the botnet.


The Botnet as a Distributed System

A botnet is a distributed system with adversarial design constraints. Like any distributed system, it must solve the problems of membership management, command propagation, state synchronization, and fault tolerance. Unlike most distributed systems, it must do this while evading detection by network defenders, law enforcement, and competing botnet operators, and while operating on heterogeneous hardware across arbitrary network conditions.

The design constraints imposed by the adversarial context shape the architecture significantly:

Detection evasion: command and control (C2) communications must be indistinguishable from legitimate traffic, or at least not easily filterable. C2 communications that use distinctive protocols or connect to known-bad infrastructure are detectable and disruptable.

Resilience: taking down a small number of servers should not disable the botnet. Centralized infrastructure is fragile, if the C2 server is seized, the botnet loses coordination capability.

Scalability: the C2 infrastructure must reach hundreds of thousands or millions of bots efficiently. Polling architectures where each bot checks in independently do not scale; push architectures require infrastructure that can push commands to large populations simultaneously.

Operator security: the botnet operator communicates with the C2 infrastructure. The infrastructure must not trivially reveal the operator's identity or location.


Centralized C2

The earliest botnets used IRC channels as command and control. The bot, once installed on a compromised host, connected to an IRC server and joined a specific channel. The operator posted commands in the channel; bots read the channel and executed commands. Attack commands specified targets, attack types, and duration.

IRC C2 has several properties that made it operationally convenient and technically straightforward: IRC is a well-understood protocol with broad support, channel membership provides implicit bot enumeration, and posting to a channel distributes a command to all members simultaneously.

The vulnerabilities are equally obvious. IRC servers are a single point of failure and a single point of detection. Identifying the C2 server requires only one compromised bot to reveal its IRC connection. Blocking the IP address of the IRC server takes the botnet offline immediately, at least until the operator moves to a new server.

Later centralized C2 used HTTP for communications, with bots polling a web server for commands. HTTP traffic is less distinctive than IRC traffic, and HTTP polling is harder to distinguish from legitimate web traffic. The vulnerability structure is the same: the HTTP server is a single point of failure and a single point of attribution.

Resilience improvements in centralized architectures focused on redundancy: multiple C2 servers, fast-flux DNS pointing to the C2 IP addresses (rotating the DNS records rapidly to avoid IP-based blocking), and domain generation algorithms (DGA) that make the C2 domain difficult to preemptively block.


Domain Generation Algorithms

Domain Generation Algorithms are a technique for generating a large number of potential C2 domain names from a seed, typically derived from a date, a pseudo-random function, or a combination of external inputs. Both the bot and the operator run the same algorithm. The bot generates a list of domains and attempts to connect to each. The operator registers a small subset of those domains and runs C2 servers at them. The bot connects to whichever domain resolves.

An attacker attempting to disrupt the botnet by preemptively registering or sinkholing the domains must register all of the domains generated by the algorithm, which may number in the thousands per day. This is economically feasible for law enforcement with knowledge of the specific DGA (obtained through botnet reverse engineering) but requires significant advance work.

DGA designs vary in how predictable they are. A time-based DGA that generates the same domains on the same calendar day is predictable once the algorithm is known, researchers and law enforcement can precompute the domain list and sinkhole proactively. A DGA that incorporates external entropy (e.g., values scraped from social media posts, headlines, or cryptocurrency blockchain data) is less predictable, since the external entropy is not known until the domain lookup occurs.

The Conficker worm, first observed in 2008, used a DGA generating 250 domains per day from a seed based on the current date and a pseudo-random function. The Conficker Working Group, a coordinated effort by security researchers and domain registrars, preemptively registered or blocked the generated domains. This substantially disrupted Conficker's C2, though the worm's peer-to-peer component (added in later variants) provided an alternative.


Peer-to-Peer C2

Peer-to-peer botnet architectures eliminate the central server entirely. Bots communicate with each other rather than with a central authority. Commands propagate through the network via peer communication, similar to how a distributed hash table propagates data.

P2P architectures are significantly more resilient than centralized ones. There is no single point that, if removed, disables coordination. Seizing one node, or even thousands, does not prevent command propagation through the remaining nodes. Identifying and seizing a node requires infiltrating the botnet P2P network and enumerating the peer list, which is technically more complex than simply tracing DNS lookups.

The Storm worm (2007) was an early prominent P2P botnet using an eDonkey-inspired DHT protocol for peer discovery and communication. Waledac used a layered P2P architecture with fast-flux infrastructure. ZeroAccess used a structured P2P network for binary update distribution.

P2P botnets have their own operational challenges. Command propagation in a DHT is slower than pushing to a centralized channel. The operator must maintain some entry points into the network, nodes they control directly, which can be targeted. Authentication of commands is important: if the botnet accepts commands from any peer, a defender who infiltrates the network can issue countermands. The Kelihos botnet was partially disrupted by Microsoft and researchers who flooded the P2P network with peer entries pointing to sinkhole servers, causing bots to connect to the sinkhole rather than legitimate peers.

Cryptographic command signing addresses the countermanding problem: commands are signed with a private key known only to the operator, and bots verify signatures before executing. A defender who does not possess the private key cannot issue countermands.


IoT Botnets: Scale Through Compromise at Scale

The Mirai botnet, first documented in August 2016, established a new scale ceiling for DDoS attacks by exploiting the IoT device population. Mirai's botnet construction methodology was straightforward: scan the internet for devices with Telnet accessible, attempt to authenticate using a hardcoded list of default credentials (admin/admin, root/password, and similar), install the bot software, and add the device to the C2.

The attack surface Mirai targeted, consumer devices with default credentials and no automatic update mechanism, was enormous. IP cameras, DVR systems, home routers, and similar devices numbered in the tens of millions. Many were configured with credentials their owners had never changed. Many were running embedded Linux with no meaningful security monitoring.

Mirai's scanner component ran continuously on infected devices, propagating to new devices. The worm-like spread, combined with a large and easily compromised target population, produced a botnet estimated at over 600,000 devices at peak. The distributed nature of the botnet across millions of residential IP addresses made source-based filtering impractical, the traffic came from address space belonging to legitimate ISPs.

The attack against Dyn's DNS infrastructure on October 21, 2016 demonstrated the second-order effects of large IoT botnets. Rather than targeting a single service directly, the attack targeted the DNS infrastructure that many services depended on, causing outages at Twitter, Reddit, GitHub, CNN, and other major services simultaneously. The DNS infrastructure itself was not the ultimate target, it was the dependency that, when broken, cascaded failure to many targets at once.

Subsequent IoT botnet families, Hajime, Brickerbot, Satori, Torii, and others, refined the Mirai approach. Hajime used a P2P architecture rather than centralized C2, making it more resilient to takedowns. Brickerbot deliberately bricked the devices it compromised by overwriting storage, in a vigilante attempt to remove vulnerable devices from the network. These developments illustrated that the IoT compromise model was not specific to Mirai but was a general class of attack on a permanently exposed and insufficiently defended device population.


Botnet-as-a-Service and the Commercial Layer

The commodification of DDoS capability has transformed the attacker population. Booter services (also called stresser services) provide DDoS attack capability for purchase through web interfaces, without requiring the buyer to operate any botnet infrastructure. A customer selects a target, an attack duration, and an attack method from a menu, makes a payment (typically in cryptocurrency), and the service launches the attack.

The existence of this commercial layer separates attack capability from technical sophistication. The operators of the booter services maintain the botnet and attack infrastructure; the buyers need only a target and payment. DDoS attacks are available for prices measured in tens to low hundreds of dollars per attack hour, accessible to individuals with no technical background.

This commercialization affects the threat landscape in several ways. The attacker population becomes much larger. Attribution becomes harder because the nominal buyer may have limited technical footprint. The botnet operators are distinct from the buyers, creating a multi-layer structure that requires dismantling multiple organizations to neutralize.

Law enforcement takedowns of booter services (WebStresser.org in 2018, multiple Lizard Squad and Booter operations over the years) have had temporary effects but the services recur. The underlying infrastructure, compromised IoT devices, cloud instances, vulnerable amplification reflectors, remains available to new operators.


Attack Orchestration

Once C2 is established and the botnet is operational, launching an attack requires commanding bots to begin transmitting traffic to a target at a specified rate using a specified method. The sophistication of this orchestration has increased substantially.

Attack method selection: modern DDoS toolkits include multiple attack methods selectable per campaign: SYN flood, UDP flood, HTTP flood, DNS amplification, NTP amplification, ICMP flood. The operator selects based on target characteristics, whether it is behind a CDN, whether it has bandwidth limitations, whether it is running application servers that are vulnerable to application-layer attacks.

Ramp rate: attacks that begin at full volume may trigger rate-based detection systems. Some attack orchestration includes a ramp phase that increases traffic gradually, staying below alerting thresholds during the ramp, then escalating to full volume.

Target rotation: rotating the attack target among multiple IPs or subdomains makes upstream filtering less effective. A filter rule targeting one IP does not protect adjacent IPs. Rotating also consumes mitigator response capacity.

Multi-vector coordination: deploying a volumetric flood simultaneously with an application-layer attack requires the defender to address multiple attack classes at the same time, each of which may require different countermeasures. The volumetric component may be a diversionary attack intended to consume mitigation capacity while the application-layer component bypasses it.

Pulse wave attacks: rather than a sustained constant-volume attack, pulse attacks send bursts at full volume for short intervals, then cease, then burst again. The pauses may cause automated mitigation to reduce scrubbing capacity (to avoid filtering legitimate traffic), and the bursts then overwhelm the reduced capacity. The on-off pattern also complicates traffic characterization.


Attribution and Infrastructure

The forensic chain for attributing a DDoS attack to its operator runs through botnet infrastructure that is designed to obscure it. Traffic arrives at the target from reflectors, bot devices, or cloud instances. These are not the operator's infrastructure. Identifying the operator requires:

  1. Obtaining a bot sample and reverse engineering the C2 protocol
  2. Infiltrating the C2 network or identifying a C2 server
  3. Obtaining logs or operational data from the C2 infrastructure
  4. Correlating with payment records if a commercial service is involved
  5. Correlating the operator's C2 interaction with network access logs that might reveal their location
  6. Each step crosses administrative boundaries, the bot device is operated by an unwitting third party; the C2 server may be in a different jurisdiction; payment records may be in cryptocurrency with no easily identified owner. Attribution that reaches the actual operator typically requires law enforcement action across multiple jurisdictions and cooperation among multiple entities.

    The operational security failures that have led to botnet operator identification include: C2 infrastructure registered with real contact information, cryptocurrency transactions that could be traced to exchanges with KYC requirements, operational mistakes that associated operator nicknames with real identities, and informants within criminal organizations. Technical attribution alone is rarely sufficient; human factors are consistently the most productive path.