-
Notifications
You must be signed in to change notification settings - Fork 12
Methodology
This document describes the architecture and techniques behind CoBRA's Mixed Boolean-Arithmetic (MBA) expression simplifier. For the individual methods, see the Technique Index. For the mathematical details behind each technique, see the referenced papers.
MBA expressions interleave arithmetic operators (+, -, *) with bitwise operators (&, |, ^, ~) and shifts (<<, >>). This combination defeats both algebraic simplification (which expects pure arithmetic) and Boolean minimization (which expects pure logic). MBA obfuscation exploits this gap — a simple expression like x + y can be rewritten as (x & y) + (x | y), and layered rewrites make it arbitrarily complex.
CoBRA reverses this process. Given an obfuscated MBA expression, it recovers a simplified equivalent by combining signature-based techniques, semilinear techniques, decomposition, and lifting under a worklist-based orchestrator. Many of the signature-side techniques are rooted in the observation that linear MBA subproblems over n variables are fully determined by their values on the 2^n Boolean inputs {0, 1}^n. The rest of the pipeline adds full-width verification and residual handling for cases where Boolean data alone is not enough.
Foundational reference: Zhou et al., Information Hiding in Software with Mixed Boolean-Arithmetic Transforms (WISA 2007) — the original formalization of MBA obfuscation.
CoBRA uses a worklist-based orchestrator that treats simplification as a graph exploration problem. An input expression enters the worklist as a work item carrying a state kind that describes its current form. A scheduler selects passes to transform items through successive state kinds until a verified simplified expression is found.
Input Expression
|
[Seed: lower / classify]
|
kFoldedAst
| | | |
| | | +--> kLiftedSkeleton --> kFoldedAst
| | +-----> kCoreCandidate --> kRemainderState --> kCandidateExpr
| +--------> kSemilinearNormalizedIr --> kSemilinearCheckedIr
| |
| v
| kSemilinearRewrittenIr --> kCandidateExpr
|
+-----------> kSignatureState --> kSignatureCoeffState --> kCandidateExpr
\--> kCompetitionResolved --> kCandidateExpr
kCandidateExpr --> Verification or Immediate Acceptance --> Simplified Expression
The decomposition family can enter kRemainderState either directly via kPrepareDirectRemainder (boolean-null path) or by first producing a kCoreCandidate; the diagram shows the core-first branch.
Key architectural properties:
- Pass registry: 36 discrete passes, each consuming a specific state kind and producing 0+ child items with new state kinds.
- DAG-aware scheduling: Passes declare prerequisite dependencies. The scheduler respects these and consults an attempt cache to avoid redundant work.
- Competition groups: Specific passes that fork alternatives or child solves use local cost-based winner selection and continuations to recombine the winner. Outside those groups, the worklist returns the first fully verified top-level candidate.
- Deduplication: A fingerprint-based cache prevents re-running the same pass on equivalent states, bounding the search space.
- Policy bounds: Configurable limits on total work item expansions and structural rewrite generations prevent runaway exploration.
See orchestrator.md for the full architectural details.
The ClassifyAst pass performs structural analysis of the expression AST to determine which technique families are applicable. It sets structural flags:
| Flag | Meaning | Example |
|---|---|---|
kSfHasMultilinearProduct |
Variable-variable multiplication with distinct variable sets | x * y |
kSfHasSingletonPower |
Single variable raised to a power | x * x |
kSfHasMixedProduct |
Product of bitwise subexpressions | (x & y) * (x | y) |
kSfHasBitwiseOverArith |
Bitwise op wrapping arithmetic | (x + y) & z |
kSfHasArithOverBitwise |
Arithmetic op combining non-leaf bitwise children | (x & y) + (x | y) |
These flags determine which passes the scheduler will consider for the item. Semilinear eligibility is tracked separately as SemanticClass::kSemilinear when constants appear inside bitwise operators; there is no standalone HasConstantMask scheduler flag.
Before entering technique-specific passes, CoBRA checks whether any variables cancel out of the expression. If the signature vector is invariant under toggling a particular variable's input bit, that variable contributes nothing and is eliminated. This reduces the problem dimension.
After simplification, CoBRA validates correctness:
- Spot-check (default): Evaluates original and simplified expressions on random inputs and confirms they match at full bitwidth.
-
Z3 equivalence proof (optional,
--verify): Constructs a formal equivalence query and proves the result is correct for all inputs.
- Orchestrator Architecture — Worklist model, pass registry, scheduling, competition groups
- AST Processing — Classification, structural rewrites, operand simplification
- Signature Techniques — Signature Vector, Pattern Matching, Change of Basis (CoB) Butterfly Transform, Algebraic Normal Form (ANF), Shannon Decomposition, Coefficient Splitting, Singleton Power Recovery
- Semilinear Techniques — Linear Shortcut Detection, Constant Lowering (Semilinear), Structure Recovery (XOR Recovery), Bit Partitioning
- Decomposition — Decomposition Engine (Extract-Solve), Boolean-Null Residual Classification, Ghost Residual Solving, WeightedPolyFit
- Lifting — Arithmetic Atom Lifting, Repeated Subexpression Lifting
See the Technique Index for an alphabetical reference of all techniques with cross-links.
The full list of academic references that influenced CoBRA's design:
| Paper | Authors | Venue | Link |
|---|---|---|---|
| Information Hiding in Software with Mixed Boolean-Arithmetic Transforms | Zhou, Main, Gu, Johnson | WISA 2007 | ResearchGate |
| MBA-Blast: Unveiling and Simplifying Mixed Boolean-Arithmetic Obfuscation | Liu, Shen, Ming, Zheng, Li, Xu | USENIX Security 2021 | Paper |
| Paper | Authors | Venue | Link |
|---|---|---|---|
| Efficient Deobfuscation of Linear Mixed Boolean-Arithmetic Expressions (SiMBA) | Reichenwallner, Meerwald-Stadler | 2022 | arXiv:2209.06335 |
| Deobfuscation of Semi-Linear Mixed Boolean-Arithmetic Expressions (MSiMBA) | Skees | 2024 | arXiv:2406.10016 |
| Paper | Authors | Venue | Link |
|---|---|---|---|
| Simplification of General Mixed Boolean-Arithmetic Expressions (GAMBA) | Reichenwallner, Meerwald-Stadler | IEEE EuroS&PW 2023 | arXiv:2305.06763 |
| Efficient Normalized Reduction and Generation of Equivalent Multivariate Binary Polynomials | Gamez-Montolio, Florit, Brain, Howe | BAR / NDSS 2024 | NDSS · Open Access PDF |
| Simplifying Mixed Boolean-Arithmetic Obfuscation by Program Synthesis and Term Rewriting (ProMBA) | Lee, Lee | ACM CCS 2023 | ACM DL |
| Technique | Where Used | Reference |
|---|---|---|
| Mobius transform / ANF | Mobius Transform · Algebraic Normal Form (ANF) · Packed Mobius Transform | Barbier, Cheballah, Le Bars — On the computation of the Mobius transform (TCS 2019) |
| Shannon decomposition | Shannon Decomposition · Signature Techniques | Boole, Laws of Thought (1854); Shannon (1949) |
| Hensel lifting (modular inverse) | Hensel Lifting · Coefficient Extraction | Dumas — On Newton-Raphson iteration for multiplicative inverses modulo prime powers (2012) |
| Finite differences / falling factorial | Finite Differences (Forward) · Falling-Factorial Interpolation · Algorithm · Coefficient Extraction | Newton, Principia Mathematica (1687) |
| Tool | Authors | Venue | Link |
|---|---|---|---|
| NeuReduce | Feng, Liu, Xu, Zheng, Xu | EMNLP 2020 | ACL Anthology |
| Syntia | Blazytko, Contag, Aschermann, Holz | USENIX Security 2017 | USENIX |
| QSynth | David, Coniglio, Ceccato | BAR 2020 | |
| LOKI | Schloegel, Blazytko, Contag, Aschermann, Basler, Holz, Abbasi | USENIX Security 2022 | USENIX |
| OSES | Matteo | 2024 | GitHub |