Varol Cagdas Tok

Personal notes and articles.

Concolic Execution

Symbolic Execution generates tests that cover all program paths. But it has limitations. It struggles when code involves complex math, external library calls, or dynamic features that a constraint solver cannot handle.

Concolic Execution addresses these problems. The name is a combination of CONCrete and symbOLIC.

Concolic execution combines concrete execution with symbolic execution. The concrete execution steers the symbolic execution, and the symbolic execution finds new paths for the next concrete execution to explore.

The Concolic Execution Process

The process is a loop that uses information from the previous run to guide the next one.

Step 1: Execute with an Initial Input

The process starts with a single, initial data state, which can be a random value or a user-provided seed. Assume the program has one input, x, and the initial input is x = 10.

The program is executed twice at the same time:

  1. Concrete Execution: The code runs as normal. x is 10.
  2. Symbolic Execution: The code runs with a symbolic variable, \(X_0\).
  3. As the program runs, the concrete state determines which path to take, and the symbolic executor just follows along, building the path condition.

    Consider tracing a simple program: if (x > 5) { ... } else { ... }

    • Start: (l0). Concrete: x=10. Symbolic: \(x \to X_0\), PC: true.
    • Branch <code>(x > 5)</code>: The concrete value 10 makes \(10 > 5\) true. The program follows the "true" path.
    • Symbolic Update: The symbolic executor follows and adds this to its path condition.
    • End of Path 1: The program terminates.
    • Final PC: \(X_0 > 5\).

    Step 2: Generate New Inputs by Flipping Conditions

    At the end of the run, we have a complete path condition. To find new paths to explore, the tool negates one of the conditions in the PC and asks a constraint solver for a new input.

    The PC was \(X_0 > 5\).

    • Flip #1 (the only branch):

    * New Constraint: \(\neg(X_0 > 5)\) which simplifies to \(X_0 \le 5\).

    * Solver: "Find an input that satisfies \(X_0 \le 5\)."

    * New Input Generated: x = 3.

    Step 3: Repeat with New Inputs

    The concolic tester adds x = 3 to its queue. It re-runs the process with this new input, which takes a different path (the else branch). It traces this new path, collects a new path condition (\(X_0 \le 5\)), and the process terminates, as there are no more paths to find.

    Why is this better than Symbolic Execution?

    Concolic execution is a compromise. If the program hits a line of code that the symbolic engine cannot understand (like a call to an external library), it uses the concrete value from the concrete execution to continue, then resumes symbolic analysis on the next line. This provides scalability and robustness for complex programs.