Algorithmic Complexity Attacks

The worst-case complexity of an algorithm describes the maximum cost it can incur across all possible inputs of a given size. In security, worst-case complexity is an attack surface. If an algorithm has a worst case significantly worse than its average case, and if an attacker can construct inputs that trigger the worst case, they can exhaust server resources with inputs that appear reasonable in size.

This category is distinct from volumetric flooding and protocol exhaustion. The attack does not require high traffic volume. It requires knowledge of the algorithm being executed on user-supplied data and the ability to construct inputs that force the worst-case execution path. A single request, or a small number of requests, may be sufficient to consume server resources for an extended period.

Regular Expression Denial of Service (ReDoS)

Regular expression engines in most languages use Non-deterministic Finite Automaton (NFA)-based backtracking engines. These engines support backreferences and lookahead assertions that pure DFA-based engines cannot handle, but they have a pathological failure mode: for certain patterns and inputs, backtracking causes exponential time complexity.

The Backtracking Explosion

Consider the regex ^(a+)+ $$` applied to the string `aaaaaaaaaaaaaaaaaab`. The outer group `(a+)+` can be satisfied in multiple ways: the inner `a+` might consume all the 'a' characters, or it might consume subsets, with the outer `+` iterating over multiple inner matches. When the final 'b' fails to match the `$$ anchor, the engine backtracks and tries a different assignment of 'a' characters to inner vs. outer repetitions.

The number of different ways to partition n characters across a nested repetition is exponential in n. For a string of 30 'a' characters followed by a non-matching character, the number of backtracking paths is approximately 2^30. The engine is working as specified: exploring all possible match paths to determine definitively that no match exists. The problem is the combination of the pattern structure (exponential ambiguity) with the backtracking evaluation strategy.

Vulnerable Pattern Classes

Several regex pattern structures reliably produce catastrophic backtracking on adversarial inputs:

Nested quantifiers: (a+)+, (a), (a|aa)+, patterns where an outer repetition contains an inner repetition over overlapping character classes.

Alternation with overlap: (a|a)+, (ab|a)+b, patterns where alternatives can match the same input characters, creating ambiguity about which alternative applies.

Exponential patterns with multiple paths: (a|b|c)*d applied to a long string of 'a', 'b', 'c' characters followed by 'e', the engine must verify that no assignment of characters matches 'd' at the end.

The key diagnostic is whether the pattern can match a prefix of the input in multiple different ways. If it can, and if the suffix fails to match, the engine must try all of those ways, which may be exponential in the prefix length.

Real-World Occurrences

ReDoS vulnerabilities have been found in production systems across most major programming languages and widely-used libraries. The OWASP list of documented ReDoS vulnerabilities includes findings in:

Email validation regexes in multiple web frameworks (common because email validation involves complex patterns with overlapping structure)
Markdown parsing libraries where heading and formatting pattern matching has recursive ambiguity
HTTP header parsing in web servers and proxies
Log file parsing and analysis tools

The Node.js ecosystem has been particularly affected because of the language's single-threaded event loop: a single ReDoS attack that occupies the event loop blocks all other request handling simultaneously. In multi-threaded environments, a single ReDoS attack only occupies one thread.

Detection and Mitigation

Static analysis of regex patterns for catastrophic backtracking potential is an active research area. Tools like regexploit, vulnregex-detector, and academic implementations of the Weideman et al. vulnerability detection algorithm can identify patterns with exponential or polynomial worst-case complexity.

The implementation mitigations are:

Timeout enforcement: wrap regex evaluation in a timeout and abort if it exceeds a threshold. This prevents a single match from consuming unbounded time but adds latency to all matches that approach the threshold. Node.js provides worker_threads for running regex in a separate thread with timeout.

Non-backtracking engines: RE2 (from Google) implements a DFA/NFA hybrid that guarantees linear time in the length of the input string. It cannot support backreferences or lookahead assertions, which limits its applicability, but for patterns not requiring these features it is immune to catastrophic backtracking. RE2 bindings are available for most major languages.

Pattern rewriting: in many cases, a pattern with exponential worst-case behavior can be rewritten to an equivalent pattern with linear worst-case behavior. Atomic groups and possessive quantifiers (supported by PCRE but not all engines) prevent backtracking out of a successful match, eliminating the ambiguity that causes exponential behavior.

Hash Collision Attacks in Depth

The HashDoS vulnerability discovered by Klink and Wälde in 2011 exploited a specific property of commonly deployed hash functions: their determinism. If the hash function used by a language's runtime hash table is known and deterministic (no random seed), an attacker can precompute a large set of keys that all hash to the same bucket.

How Hash Collisions Exhaust CPU

A hash table with a good hash function distributes n entries approximately uniformly across k buckets, yielding O(1) amortized lookup. With all n entries in the same bucket, the table degenerates to a linked list: lookup time becomes O(n), and inserting all n entries becomes O(n^2) in total, since each insertion must traverse the existing entries in the bucket.

HTTP POST body parsing in PHP, Java, Python, Ruby, and .NET (all of which used unseeded or known-seeded hash functions in 2011) involved inserting all form field names into a hash map as part of parsing. A POST request with 100,000 form fields, all with names precomputed to collide in the target's hash function, required O(10^10) operations to parse, several minutes of CPU time per request.

The Murmur Hash and Seeding

The hash functions in use at the time of the Klink/Wälde disclosure were designed for performance, not unpredictability. Once the algorithm is known, generating collisions is a combinatorial problem solvable offline.

The universally adopted fix was hash randomization: seeding the hash function with a value chosen at process startup, unknown to the attacker. With an unknown seed, precomputed collision sets do not work.

In practice:

PYTHONHASHSEED (Python 3.3+): randomized by default
java.util.HashMap (JDK 7u6+): uses random seed in string hash
Perl's PERL_HASH_SEED
Ruby's RUBY_HASH_SEED

Residual Vulnerabilities

Hash randomization addresses known-function attacks but does not address all hash-based denial of service. Hash tables using weak randomization (e.g., a predictable seed derived from process start time) can still be attacked if the attacker can determine the seed. Timing side channels in hash table operations can leak information about the number of collisions, potentially allowing adaptive attacks.

Java 8's HashMap was updated to use balanced BSTs for buckets that exceed a threshold occupancy, replacing O(n) linked list traversal with O(log n) tree traversal. This does not eliminate the attack but bounds its worst case.

Decompression and Serialization Bombs

Any operation that transforms input of size n into output of size f(n), where f can be made arbitrarily large by input choice, is a potential amplification vector for algorithmic complexity attacks.

Zip and Gzip Bombs

Compression algorithms exploit redundancy in the input. A file consisting entirely of a single repeated byte has very high redundancy and compresses to a very small fraction of its original size.

A zip bomb compresses highly redundant data. The simplest form: dd if=/dev/zero bs=1M count=10240 | gzip > bomb.gz produces a gzip file of approximately 10 MB that decompresses to 10 GB of null bytes. A more extreme form uses zip's support for multiple files and nesting: 10 files, each containing 10 files, each containing a 1 GB compressed null file, produces a multi-level bomb with extreme expansion at multiple decompression steps.

Applications that accept user-submitted archives and decompress them without size limits are vulnerable. The defense is straightforward: enforce a maximum decompression output size and abort if it is exceeded. Most compression libraries provide a means to read decompressed output incrementally, allowing size checks before full expansion.

XML Entity Expansion (Billion Laughs)

XML entities allow a string to be defined once and referenced multiple times. The XML specification requires entity references to be expanded recursively, creating the opportunity for exponential expansion:

<!ENTITY a "x">
<!ENTITY b "&a;&a;&a;&a;&a;&a;&a;&a;&a;&a;">  <!-- 10x -->
<!ENTITY c "&b;&b;&b;&b;&b;&b;&b;&b;&b;&b;">  <!-- 100x -->
<!-- ... -->
<!ENTITY j "&i;&i;&i;&i;&i;&i;&i;&i;&i;&i;">  <!-- 10^10 x -->

A document referencing &j; requires the parser to expand through 10 levels of entity references, producing 10^10 copies of the original string. The memory required is catastrophic.

Defenses:

Setting a maximum entity expansion depth (most XML parsers now support this)
Setting a maximum total expansion size
Disabling entity expansion entirely for untrusted input (the recommended approach)

In Python's xml.etree.ElementTree, entity expansion is limited by default since 2.7.17/3.8.6. The defusedxml library provides hardened parsing that disables entity expansion and several other vectors.

YAML and Python Pickle

YAML's merge keys (<<) and anchors (&) provide a mechanism similar to XML entity references. A YAML document with deeply nested merge keys can produce exponential processing time:

a: &a ["lol","lol","lol","lol","lol","lol","lol","lol","lol","lol"]
b: &b [*a,*a,*a,*a,*a,*a,*a,*a,*a,*a]
c: &c [*b,*b,*b,*b,*b,*b,*b,*b,*b,*b]

Python's pickle format is a separately serious case. Pickle allows arbitrary Python code execution during deserialization via the __reduce__ method. This is not strictly a complexity attack but is related: any deserialization of untrusted data in a format supporting code execution is catastrophically insecure.

Sorting and Worst-Case Input

Quicksort's average case is O(n log n) but its worst case for naive pivot selection is O(n^2). If an application sorts user-supplied data with a naive quicksort implementation, an attacker who knows the pivot selection strategy can construct an input sequence that degrades sorting to quadratic time.

Classic quicksort with a fixed pivot (always selecting the first or last element) degrades on sorted or reverse-sorted input. For an array of n elements, this produces n recursive calls each processing a subarray of n-1, n-2, ... elements, summing to O(n^2) total work.

Most modern standard library sort implementations use either randomized pivot selection (making worst-case inputs impossible to precompute) or an introsort variant (quicksort that switches to heapsort when recursion depth exceeds a threshold, guaranteeing O(n log n) worst case). The vulnerability is primarily in custom sorting code, language runtimes with naive implementations in older versions, or sorting code where library implementations are not used.

Path Traversal in Graph Algorithms

Graph algorithms operating on user-controlled graphs have complexity dependent on graph structure. An algorithm with O(V + E) average-case complexity may have O(V^2) or worse complexity on specific graph structures.

Applications that allow users to define workflow graphs, dependency graphs, or any other structure where the application traverses the graph (cycle detection, topological sort, reachability analysis) are vulnerable to constructed adversarial graphs. A graph with pathological structure for the specific algorithm in use can exhaust CPU.

The Unifying Property

The common thread across all these cases: an algorithm applied to user-controlled input with a worst case far worse than its average case, and no bounds checking to prevent the worst case from triggering.

Worst-case complexity is rarely documented. Developers do not annotate functions with worst-case complexity assumptions about their inputs. Security reviews that examine algorithmic complexity require both knowledge of algorithm theory and the ability to trace user-controlled data through application processing to identify where it is fed into algorithms with unfavorable worst-case characteristics.