Add example notebook: Unit-level counterfactuals via abduction, action, and prediction (Pearl's Primer §4.2)
Summary
PyMC can express and fit structural causal models, and pm.do supports interventional (rung 2) queries. But the example gallery currently has no notebook that demonstrates unit-level counterfactuals (rung 3 of Pearl's causal ladder) — questions like "what would have happened to this specific person under different circumstances?"
This proposal is for a new notebook that teaches the three-step counterfactual procedure (abduction, action, prediction) using a concrete worked example from Pearl, Glymour, and Jewell (2016), Causal Inference in Statistics: A Primer, Section 4.2.3–4.2.4.
Motivation
Pearl's causal ladder
- L1: Association — "What do we observe?" (
P(Y | X))
- L2: Intervention — "What happens if we force a variable?" (
P(Y | do(X=x)))
- L3: Counterfactual — "What would have happened to this specific unit under a different action, given what we already observed?" (
Y_x | X=x', Y=y')
pm.do directly supports L2. True unit-level counterfactuals are L3 and require an additional abduction step that infers unit-specific exogenous characteristics before predicting in the intervened world.
Why this matters
Without a clear L3 example, users may assume that pm.observe + pm.do is sufficient for individual-level counterfactual reasoning. It is not — that combination produces intervention-conditioned predictions (L2), not unit-level counterfactuals (L3). The gap is not a technicality; it produces numerically different answers.
Proposed example: the encouragement design
From Pearl's Primer §4.2.3. Three variables, all standardized:
- X — encouragement (time in an after-school program, randomized)
- H — homework (hours spent studying)
- Y — exam score
Causal structure: X → H → Y and X → Y (direct effect).
Structural equations:
X = U_X
H = a·X + U_H
Y = b·X + c·H + U_Y
with a = 0.5, b = 0.7, c = 0.4, and all U terms mutually independent standard normals.
The focal question
A student named Joe has observed values (X=0.5, H=1.0, Y=1.5). His teacher asks: "What would Joe's score have been had he doubled his homework to H = 2?"
This is not a population question — it is about one specific individual whose characteristics we have already observed.
Why do(H=2) gives the wrong answer for Joe
The population-level intervention E[Y | do(H=2)] uses population-mean exogenous values (U ≈ 0), yielding approximately 0.8. But Joe is not average — abduction reveals he has above-average inherent ability (U_Y ≈ 0.75). His personal counterfactual score is approximately 1.90.
The gap (0.8 vs 1.9) is entirely due to individual characteristics that the population intervention averages away.
Proposed notebook structure
The notebook should follow examples-first teaching: open with Joe's concrete question, show that the naive approach (do) gives the wrong individual answer, then reveal the three-step procedure.
1. Opening hook
Joe's question: "What would my score have been if I had done more homework?" Show that this is a different question from "what happens on average if we force homework to 2?"
2. Simulate data and fit the structural model
Generate data from the known DGP. Build and fit a PyMC model with the structural equations. Verify coefficient recovery.
3. The population intervention: do(H=2)
Use pm.do to compute E[Y | do(H=2)]. Show this is a valid causal quantity but answers the wrong question for Joe.
4. The conceptual shift: residuals are not noise
The key pedagogical section. In standard regression, residuals are exchangeable estimation error. In a structural causal model, U_Y encodes everything about a specific individual that causally affects Y but is not measured — ability, motivation, sleep quality. Across the population these look like zero-mean noise; for a specific person they are fixed causal properties.
This reinterpretation — from "discardable error" to "signal about the individual" — is what enables counterfactual reasoning.
Callout (important): The dual nature of U. Across the population, U_Y behaves like noise. For a specific individual, it is a fixed property encoding unmeasured causal factors.
5. Pearl's three-step procedure
Step 1: Abduction
Infer Joe's exogenous values from his observed data, per posterior draw:
U_H = H_obs - intercept_H - a·X_obs
U_Y = Y_obs - intercept_Y - b·X_obs - c·H_obs
Step 2: Action
Replace the homework equation with H = 2.
Step 3: Prediction
Compute Y_{H=2} using Joe's abducted U_Y:
Y_{H=2} = intercept_Y + b·X_obs + c·2 + U_Y
6. Compare intervention vs counterfactual (figure)
Side-by-side visualization:
- Population
E[Y | do(H=2)] with HDI (one color)
- Joe's counterfactual
Y_{H=2} with HDI (contrasting color)
- Analytical reference values as black dashed lines
Figure design notes (per figure-excellence skill):
- Width:
FIG_WIDTH constant, no multipliers
- Semantic colors: one color for population, one for Joe, consistent throughout
- True/analytical values: black dashed reference lines
- Technical caption describing what is shown, not narrative
7. The counterfactual posterior (figure)
Histogram or KDE of Joe's counterfactual score distribution. Analytical answer (1.90) as black dashed reference. Joe's observed score (1.50) as gray dashed reference.
8. Summary table
| | Intervention: E[Y | do(H=2)] | Counterfactual: Y_{H=2} for Joe |
|---|---|---|
| Question | What happens on average if we set everyone's homework to 2? | What would Joe's score have been if he had done H=2? |
| Uses individual data? | No | Yes — conditions on Joe's observed (X, H, Y) |
| Exogenous values | Population mean (U ≈ 0) | Joe's inferred values (U_Y ≈ 0.75) |
| Causal ladder | Rung 2 (intervention) | Rung 3 (counterfactual) |
| Result | ≈ 0.80 | ≈ 1.90 |
9. Summary and takeaways
Bulleted takeaways with bolded key terms:
- The U terms are not noise — they encode everything about a specific individual that causally affects the outcome but is not measured.
- Counterfactuals answer questions about specific individuals under hypothetical conditions by conditioning on observed evidence before intervening.
- Pearl's three-step procedure — abduction, action, prediction — turns a structural causal model into a counterfactual engine.
do() answers a different question: the population-level average effect. For Joe, do(H=2) predicts ≈ 0.8; the counterfactual predicts ≈ 1.9.
- In a Bayesian framework, the three-step procedure yields a full posterior over the counterfactual, naturally propagating coefficient uncertainty.
- In linear SEMs, the counterfactual simplifies:
Y_{H=h'} = Y_obs + c·(h' - H_obs).
10. Reflection prompt
"When have you wanted to answer a question about a specific case rather than a population average?"
- Medicine: A patient took drug A and recovered slowly. Would drug B have worked better for this patient?
- Marketing: A customer saw campaign A and didn't convert. Would they have converted under campaign B?
- Education: A student attended tutoring but still struggled. Would a different method have helped this student?
Callout strategy
| Type |
Content |
| Important |
The dual nature of U (population noise vs individual property) |
| Warning |
pm.observe + pm.do does not produce unit-level counterfactuals |
| Tip |
In linear SEMs, counterfactual change = direct effect × change in intervened variable |
| Note |
Link back to the existing do-operator notebook for L2 interventional analysis |
Acceptance criteria
References
- Pearl, Glymour, Jewell (2016), Causal Inference in Statistics: A Primer, Section 4.2.3–4.2.4
Add example notebook: Unit-level counterfactuals via abduction, action, and prediction (Pearl's Primer §4.2)
Summary
PyMC can express and fit structural causal models, and
pm.dosupports interventional (rung 2) queries. But the example gallery currently has no notebook that demonstrates unit-level counterfactuals (rung 3 of Pearl's causal ladder) — questions like "what would have happened to this specific person under different circumstances?"This proposal is for a new notebook that teaches the three-step counterfactual procedure (abduction, action, prediction) using a concrete worked example from Pearl, Glymour, and Jewell (2016), Causal Inference in Statistics: A Primer, Section 4.2.3–4.2.4.
Motivation
Pearl's causal ladder
P(Y | X))P(Y | do(X=x)))Y_x | X=x', Y=y')pm.dodirectly supports L2. True unit-level counterfactuals are L3 and require an additional abduction step that infers unit-specific exogenous characteristics before predicting in the intervened world.Why this matters
Without a clear L3 example, users may assume that
pm.observe+pm.dois sufficient for individual-level counterfactual reasoning. It is not — that combination produces intervention-conditioned predictions (L2), not unit-level counterfactuals (L3). The gap is not a technicality; it produces numerically different answers.Proposed example: the encouragement design
From Pearl's Primer §4.2.3. Three variables, all standardized:
Causal structure:
X → H → YandX → Y(direct effect).Structural equations:
with
a = 0.5,b = 0.7,c = 0.4, and allUterms mutually independent standard normals.The focal question
A student named Joe has observed values
(X=0.5, H=1.0, Y=1.5). His teacher asks: "What would Joe's score have been had he doubled his homework to H = 2?"This is not a population question — it is about one specific individual whose characteristics we have already observed.
Why
do(H=2)gives the wrong answer for JoeThe population-level intervention
E[Y | do(H=2)]uses population-mean exogenous values (U ≈ 0), yielding approximately0.8. But Joe is not average — abduction reveals he has above-average inherent ability (U_Y ≈ 0.75). His personal counterfactual score is approximately1.90.The gap (0.8 vs 1.9) is entirely due to individual characteristics that the population intervention averages away.
Proposed notebook structure
The notebook should follow examples-first teaching: open with Joe's concrete question, show that the naive approach (
do) gives the wrong individual answer, then reveal the three-step procedure.1. Opening hook
Joe's question: "What would my score have been if I had done more homework?" Show that this is a different question from "what happens on average if we force homework to 2?"
2. Simulate data and fit the structural model
Generate data from the known DGP. Build and fit a PyMC model with the structural equations. Verify coefficient recovery.
3. The population intervention:
do(H=2)Use
pm.doto computeE[Y | do(H=2)]. Show this is a valid causal quantity but answers the wrong question for Joe.4. The conceptual shift: residuals are not noise
The key pedagogical section. In standard regression, residuals are exchangeable estimation error. In a structural causal model,
U_Yencodes everything about a specific individual that causally affects Y but is not measured — ability, motivation, sleep quality. Across the population these look like zero-mean noise; for a specific person they are fixed causal properties.This reinterpretation — from "discardable error" to "signal about the individual" — is what enables counterfactual reasoning.
Callout (important): The dual nature of U. Across the population,
U_Ybehaves like noise. For a specific individual, it is a fixed property encoding unmeasured causal factors.5. Pearl's three-step procedure
Step 1: Abduction
Infer Joe's exogenous values from his observed data, per posterior draw:
Step 2: Action
Replace the homework equation with
H = 2.Step 3: Prediction
Compute
Y_{H=2}using Joe's abductedU_Y:6. Compare intervention vs counterfactual (figure)
Side-by-side visualization:
E[Y | do(H=2)]with HDI (one color)Y_{H=2}with HDI (contrasting color)Figure design notes (per figure-excellence skill):
FIG_WIDTHconstant, no multipliers7. The counterfactual posterior (figure)
Histogram or KDE of Joe's counterfactual score distribution. Analytical answer (1.90) as black dashed reference. Joe's observed score (1.50) as gray dashed reference.
8. Summary table
| | Intervention:
E[Y | do(H=2)]| Counterfactual:Y_{H=2}for Joe ||---|---|---|
| Question | What happens on average if we set everyone's homework to 2? | What would Joe's score have been if he had done H=2? |
| Uses individual data? | No | Yes — conditions on Joe's observed (X, H, Y) |
| Exogenous values | Population mean (U ≈ 0) | Joe's inferred values (U_Y ≈ 0.75) |
| Causal ladder | Rung 2 (intervention) | Rung 3 (counterfactual) |
| Result | ≈ 0.80 | ≈ 1.90 |
9. Summary and takeaways
Bulleted takeaways with bolded key terms:
do()answers a different question: the population-level average effect. For Joe,do(H=2)predicts ≈ 0.8; the counterfactual predicts ≈ 1.9.Y_{H=h'} = Y_obs + c·(h' - H_obs).10. Reflection prompt
"When have you wanted to answer a question about a specific case rather than a population average?"
Callout strategy
pm.observe+pm.dodoes not produce unit-level counterfactualsdo-operator notebook for L2 interventional analysisAcceptance criteria
do(H=2)gives a numerically different (and wrong-for-Joe) answer.do-operator notebook for L2 analysis.FIG_WIDTHconstant, and technical captions.References