QUESTION:
Which patients with condition D1 would you recommend be given treatment W to reduce their risk of outcome Y1?
ANSWER:
Running 'tx_effect.py' in Python 3.13 with dependencies will generate a file 'docs/results" tx_recommendations.csv" with a treatment recommendation for all patients with diagnosis D1. These are further grouped by patients with similar treatment effect scoring for easier outreach/followup assignment and prioritization.
For a light review the methodology used, please see the example whitepaper: "/docs/results/PrecisionMedicine_miniWhitepaper.pdf"
Project Structure
precision-medicine/
├── data/
│ └── data.csv ← source patient data flat file
├── docs/
│ ├── assets ← figures, images, tables
│ └── results ← deliverables
│ ├── tx_recommendations.csv ← treatment recommendations flat file artifact (deliverable)
│ └── uplift_distribution.png ← data-analytic plot of Average Tx Effect (CATE/uplift) distribution
├── utils/
│ ├── meta_author.json ← corresponding author metadata
│ ├── git_init.py ← python script to initialize git repo
│ └── logging_config.py ← custom logging config to track runtime kickoffs
├── utils/
│ ├── 1_descriptive_stats.py ← descriptive statistics of source dataframe and its characteristics
│ └── 2_treatment_effect.py ← derive ATE/CATE and cluster results for patient engagement prioritization
├── README.md ← this file you're viewing / describes data-analytic procedure
├── SECURITY.md ← example infosec policy to report CVEs or other exposures
└── requirements.txt ← python 3.13 library dependencies
Here is my procedure:
- Download the patient data file
- Initialize this git repository as the response
- Generate descriptive stats to learn attribute dTypes (view in console manually):
import pandas as pd
def describe_csv(csv_file_path: str, max_cats: int = 10) -> dict:
"""
Purpose: I manually view dummy variable data types
- specifically 1/0 integers/numbers/decimals/floats versus True/False boolean/strings
Method: Ingest a csv and compute descriptive statistics and freq counts.
Args: csv_file_path (str): Path to the CSV file; max_cats (int): Max number of value counts to display per column.
Returns: dictionary summary
"""
df = pd.read_csv(csv_file_path)
# Basic descriptive stats for numeric columns
numeric_summary = df.describe(include=[float, int]).to_dict()
# Column data types
dtypes_summary = df.dtypes.apply(lambda x: str(x)).to_dict()
# Frequency counts for each column
freq_summary = {}
for col in df.columns:
freq_summary[col] = (
df[col]
.value_counts(dropna=False)
.head(max_cats)
.to_dict()
)
summary = {
"shape": df.shape,
"columns": df.columns.tolist(),
"dtypes": dtypes_summary,
"numeric_summary": numeric_summary,
"frequency_summary": freq_summary
}
return summary
# call it and scroll through console output manually perusing dtypes
if __name__ == "__main__":
result = describe_csv("data/data.csv")
print("Dataset shape:", result["shape"])
print("\nColumn data types:")
for col, dtype in result["dtypes"].items():
print(f" {col}: {dtype}")
print("\n--- Numeric Summary ---")
print(pd.DataFrame(result["numeric_summary"]))
print("\n--- Frequency Counts ---")
for col, freqs in result["frequency_summary"].items():
print(f"\n{col}:")
for val, count in freqs.items():
print(f" {val}: {count}")| Category | Details |
|---|---|
| Target Cohort | Patients with disease of interest where D1 == 1 |
| Treatment (Tx) | W (1 / TRUE = received treatment) |
| Outcome to Reduce | Y1 (1 = bad outcome) |
| Goal | Part A: For each patient in the D1 cohort, estimate the individual treatment effect (uplift) on outcome Y1.Part B: Recommend treatment W for those with predicted risk reduction (> 0). |
| Assumptions | 1) Consistency: each observation’s outcome corresponds to the treatment actually received (no confounding). 2) Ignorability: given X1–X80 (+DDx, site), treatment assignment is as-if random (no unmeasured confounders).3) Positivity: every profile had a non-zero chance to be treated and untreated. |
| Deliverables | 1) Baseline for the D1 cohort (treated/untreated rates, outcome rates) 2) Average Treatment Effect (ATE/CATE) estimates (propensity weighting + doubly-robust checks) 3) Uplift (individual Tx effect) score per patient 4) Recommended-treatment list of patient IDs with positive estimated benefit |
Compute basic rates for D1 == 1 to ensure the data give us a naive risk comparison.
import pandas as pd
import numpy as np
from prettyprinter import pprint
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
# adjust path/file
precision_df = pd.read_csv("data/data.csv")
# normalize booleans to 0/1 if needed
# if we do this, we can just use sum() and mean() for descriptive stats if the boolean flag was text 'TRUE/FALSE'
bool_map = {"TRUE": 1, "True": 1, True: 1, "FALSE": 0, "False": 0, False: 0}
if precision_df['W'].dtype == object:
precision_df['W'] = precision_df['W'].map(bool_map).astype(int)
if precision_df['Y1'].dtype == object:
precision_df['Y1'] = precision_df['Y1'].map(bool_map).astype(int)
# cohort: D1 == 1
cohort = precision_df[precision_df['D1'] == 1].copy()
baseline_summary = {
"n_total_D1": len(cohort),
"treated_n": int(cohort['W'].sum()),
"untreated_n": int((1 - cohort['W']).sum()),
"treated_rate": float(cohort['W'].mean()),
"Y1_rate_overall": float(cohort['Y1'].mean()),
"Y1_rate_treated": float(cohort.loc[cohort['W'] == 1, 'Y1'].mean()),
"Y1_rate_untreated": float(cohort.loc[cohort['W'] == 0, 'Y1'].mean()),
}
baseline_summary["naive_risk_diff_treated_minus_untreated"] = (
baseline_summary["Y1_rate_treated"] - baseline_summary["Y1_rate_untreated"]
)
pd.DataFrame([baseline_summary])
pprint(baseline_summary)- We can now calculate and use the propensity score and doubly-robust estimator so that the result is less biased by baseline differences.
The model estimates the probability that a patient receives treatment (
W == 1) given their features (X):
$p(X) = P(W = 1 \mid X)$ Using logistic regression, this is:
$p(X) = \frac{1}{1 + e^{-(\beta_0 + \beta^\top X)}}$ Where:
- (X = (X_1, X_2, ... , X_{80}, site_id))
$\beta$ is the coefficient vector$\beta_0$ is the intercept
- Propensity model predicts the statistical likelihood that each patient would receive the treatment (
W) given all baseline features (X1 - X80andsite) - Outcome model predicts the likelihood of a bad outcome for
Y1given treatment assignment and features. - Combine both –> Double-robust estimator corrects some bias even if one model is a bit wonky.
- Here is an explanation of Average Tx Effect:
| Term | Meaning | Formula | Typical Usage |
|---|---|---|---|
| ATE | Average Treatment Effect | Overall average effect of treatment in a population. | |
| CATE | Conditional Average Treatment Effect | Average effect conditional on some covariates (e.g., age, comorbidities, site, severity). |
- ATE tells you “on average, what’s the treatment’s effect overall.”
- CATE tells you “on average, what’s the treatment’s effect for patients like this one (given X ).”
# filter cohort for treated pats
precision_df = pd.read_csv("data/data.csv")
cohort = precision_df[precision_df['D1'] == 1].copy()
# 1) prep feature matrix
feature_cols = [f"X{i}" for i in range(1, 81)] + ["site_id"]
X = cohort[feature_cols]
W = cohort["W"].astype(int)
Y = cohort["Y1"].astype(int)
# 2a) propensity model: P(W=1|X)
ps_model = Pipeline([
("scaler", StandardScaler()),
("logit", LogisticRegression(max_iter=500))
])
ps_model.fit(X, W)
cohort["p_hat"] = ps_model.predict_proba(X)[:, 1]
# 2b) remove extreme probabilities to avoid huge weights
cohort["p_hat"] = cohort["p_hat"].clip(0.01, 0.99)
# 3a) outcome model: E[Y|W,X] - fit the model separately for treated and untreated (t: with Tx W; c: without)
outcome_model_t = Pipeline([
("scaler", StandardScaler()),
("logit", LogisticRegression(max_iter=500))
])
outcome_model_c = Pipeline([
("scaler", StandardScaler()),
("logit", LogisticRegression(max_iter=500))
])
outcome_model_t.fit(X[W == 1], Y[W == 1])
outcome_model_c.fit(X[W == 0], Y[W == 0])
# 3b) predicted outcomes under each condition
mu1 = outcome_model_t.predict_proba(X)[:, 1] # if treated
mu0 = outcome_model_c.predict_proba(X)[:, 1] # if untreated
# 4a) create the doubly robust ATE composite estimate (aka ate_dr)
cohort["dr_term"] = (
(W * (Y - mu1) / cohort["p_hat"])
- ((1 - W) * (Y - mu0) / (1 - cohort["p_hat"]))
+ (mu1 - mu0)
)
ate_dr = cohort["dr_term"].mean() # just take the mean
print(f"Estimated ATE (Y1 bad outcome difference, treated - untreated): {ate_dr:.4f}")
print(f"Estimated Risk reduction by treatment: {-ate_dr:.4f}")Tx Risk MGMT
| ATE Sign | Interpretation |
|---|---|
| Negative | Treatment reduces bad outcomes → good news |
| Positive | Treatment worsens outcomes → bad news |
- can convert it to “% reduction in bad-outcome risk.”
- ate_dr is the doubly-robust average treatment effect on
Y1:
- The goal was to generate individual-level treatment recommendations — the CATE/uplift for each patient with D1.
- So far, we have already fit these models:
- Propensity model:
$(\hat p(X) = P(W=1|X))$ - Outcome models:
$(\mu_1(X) = E[Y|W=1,X])$ for treated, and$(\mu_0(X) = E[Y|W=0,X])$ for untreated pat_ids
- Propensity model:
- I want to try and keep it simple, stable, and interpretable.
Now CATE is:
Since Y1 == 1 means “bad outcome,” we want:
- Negative CATE → treatment reduces risk → recommend treatment
- Positive → treatment increases/worsens risk → do NOT treat
- Near zero → “no strong effect” meaning probably do NOT treat
# 4b) calculate individual-level CATE (uplift)
cohort["cate"] = mu1 - mu0 # predicted bad-outcome difference
# risk reduction is the negative of that (because Y1=1 means bad)
cohort["risk_reduction"] = -cohort["cate"]
# generate the Tx recommendation: treat if predicted risk reduction > 0
cohort["recommend_treatment"] = cohort["risk_reduction"] > 0
# recommendation table
recommend_df = cohort[
["patient_id", "cate", "risk_reduction", "recommend_treatment"]
].sort_values("risk_reduction", ascending=False)
# export to csv
recommend_df.to_csv("recommendation/tx_recommendations.csv", index=False)
print()
print()
pprint(recommend_df) # .head(10))| Column | Meaning |
|---|---|
| cate | Predicted change in risk if treated vs untreated |
| risk_reduction | Negative of CATE (lower Y1 = better) |
| recommend_treatment | Boolean: True = predicted benefit |
risk_reduction = 0.12→ treatment lowers bad-outcome risk by 12 percentage points.risk_reduction = -0.05→ treatment increases risk by 5 points (do not treat).
- Sorting by risk reduction gives the highest-priority patients to treat.
- We can organize them into treatment-priority cluster in step 5.
- Direct clinicians to call/contact/outreach/follow up with prioritized treatment-recommended individuals or groups first if working on a clinical schedule and timeline.
| Scenario | Interpretation | Recommendation |
|---|---|---|
| CATE < 0 | Treatment lowers risk of Y1 (good) | Treat |
| CATE > 0 | Treatment increases risk of Y1 (bad) | Do not treat |
| CATE ≈ 0 | No meaningful difference | Neutral / optional |
| Risk_Reduction = -CATE > 0 | Predicts fewer bad outcomes with treatment | Treat |
| Risk_Reduction = -CATE < 0 | Predicts more bad outcomes with treatment | Avoid |
We can see that distribution of good/bad treatment effects is normal:
print()
print('--------------------------------------------------------------------------------------------')
print('PLOT UPLIFT DISTRIBUTION:')
print()
print('please see recommendation/tx_recommendations.csv')
# Histogram of risk reduction
plt.figure(figsize=(10, 6))
plt.hist(cohort["risk_reduction"], bins=50)
plt.title("Distribution of Predicted Risk Reduction (–CATE)")
plt.xlabel("Risk Reduction (negative = harmful treatment; positive = beneficial)")
plt.ylabel("Number of Patients")
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('recommendation/uplift_distribution_plot.png')
plt.show()
# reshape dataframe
risk_vals = cohort["risk_reduction"].values.reshape(-1, 1)
# k-means: three uplift groups
kmeans = KMeans(n_clusters=3, random_state=42, n_init=10)
cohort["uplift_cluster"] = kmeans.fit_predict(risk_vals)
# sort the clusters by mean risk reduction for consistent labels
cluster_means = cohort.groupby("uplift_cluster")["risk_reduction"].mean().sort_values()
# map the clusters -> ordered labels (0 = worst, 2 = best)
cluster_order_map = {
old: new for new, old in enumerate(cluster_means.index)
}
cohort["uplift_cluster"] = cohort["uplift_cluster"].map(cluster_order_map)
pprint(cohort[["patient_id", "risk_reduction", "uplift_cluster"]].head(10))
uplift_clusters_summary = cohort.groupby("uplift_cluster")["risk_reduction"].agg(["count", "mean", "min", "max"])
print(uplift_clusters_summary)Cluster Number Meaning Map:
| uplift_cluster | Interpretation | Meaning for Treatment |
|---|---|---|
| 0 | treatment harmful | negative risk_reduction → avoid |
| 1 | neutral or weak effect | little/no benefit |
| 2 | strong positive effect | major benefit → treat |
Why use uplift / doubly-robust CI estimation instead of a T-learning model with XGBoost? (The Average Treatment Effect (ACE/CASE) is also known as 'uplift.')
- A T-learner with XGBoost is not a causal model.
- It is a predicitive model trained separately on treated and untreated outcomes.
Because of that, it suffers from three fundamental weaknesses and would be misapplied in our 'Tx Recommendations' use case:
T-Learner trains:
$𝑓_1(𝑋)$ on treated patients$𝑓_0(𝑋)$ on untreated patsBut those two groups almost always have a different 𝑋 distributions:
treated patients != untreated patientsso XGBoost ends up learning:
- differences in who gets treated
- NOT differences in treatment effect
This produces biased uplift.
ATE/CATE/DoublyRobust handles this using:
- propensity scores:
$\hat{p}(𝑋)$ (i.e., the estimated probability as a function of X)- outcome models:
$\hat{\mu_1}(𝑋)$ ,$\hat{\mu_0}(𝑋)$ - inverse probability weighting (IPW, i.e., a causal method that reweighs each observation by the inverse of its probability of receiving the treatment it actually received, making the treated and untreated groups comparable as if randomized.
- Doubly robust, written as var_dr in my python script. DR correction combines both propensity weighting and outcome modeling so the treatment effect remains unbiased if either model is correctly specified.
This makes treatment comparisons as if randomized--T-learning cannot do that.
The DR estimator is called "doubly robust" because
$\widehat{ATE_{DR}}$ is unbiased if either$\hat{p}(X)$ or$\hat{\mu}(X)$ is correct.T-learning has zero robustness:
- if either XGBoost model is even slightly mis-specified
- we get biased estimates
For example, if treatment
W=1is rare (common in medicine), then:
$f_1(X)$ treated model is weak$f_0(X)$ untreated model is strong- the uplift is dominated by noise from the treated model
ATE/DR uses:
- weights
- correction terms
- variance stabilization So it handles imbalance safely.
This is the most important point of why not to use it in this use case.
T-Learner model:
$E[Y | X,W]$
But we want:
$E[Y(1)] - Y(0) | X]$
T-learner predicts based on observed outcomes only, not counterfactuals.
ATE/DR explicity reconstructs counterfactuals using:
- proprensity adjustment
- residual correction
- cross-world estimate (i.e., hypothetical comparison of Tx and not Tx)
This is known as causal inference (CI), not standard prediciton.
To summarize the above:
| Method | Purpose | Problem | Causal? |
|---|---|---|---|
| XGBoost T-Learner | Predicts Y for treated & untreated separately | Learns treatment bias; no confounding correction | no |
| ATE/Doubly-Robust | Estimates effect of treatment holding confounding constant | Requires assumptions but statistically robust | yes |
The ATE/doubly-robust method is better because it removes confounding and identifies true causal effects, whereas an XGBoost T-learner only builds two prediction models and cannot produce unbiased Tx effects.
| Variable | Meaning | Model Role |
|---|---|---|
patient_id |
Unique patient identifier | 🟦 ID only (not used in models) |
site_id |
Hospital/site ID | 🟩 Propensity + Outcome |
D1–D4 |
Diagnoses | 🟩 Propensity + Outcome |
X1–X80 |
Patient covariates | 🟩 Propensity + Outcome |
W |
Treatment indicator | 🟥 Target (Propensity), Feature (Outcome) |
Y1 |
Main clinical outcome | 🟥 Target (Outcome model) |
Y2–Y5 |
Other outcomes | 🟧 Optional Outcome targets |
p̂(X) |
Propensity score | 🟪 Propensity model output |
μ̂₁(X) |
Predicted outcome if treated | 🟪 Outcome model output |
μ̂₀(X) |
Predicted outcome if untreated | 🟪 Outcome model output |
CATE_i |
Individual treatment effect | 🟨 Uplift/Causal Model |
risk_reduction_i |
−CATEᵢ | 🟨 Decision Rule |
uplift_cluster |
K-means uplift group | 🟫 Clustering Model |
ATE_DR |
Doubly-robust ATE | 🟫 Causal Estimator |
Legend: 🟦 ID 🟩 Features 🟥 Targets 🟪 Model Outputs 🟨 Causal/Uplift 🟫 Meta-analysis
┌───────────────────────────────────────────┐
│ DATA │
│ patient_id, site_id, D1–D4, X1–X80, W, Y1 │
└───────────────────────────────────────────┘
|
v
┌─────────────────────────────────────────────────────────┐
│ PROPENSITY MODEL p̂(X) │
│ Inputs: site_id, D1–D4, X1–X80 │
│ Output: p̂(X) = P(W=1 | X) │
└─────────────────────────────────────────────────────────┘
|
v
┌─────────────────────────────────────────────────────────────┐
│ OUTCOME MODELS μ̂₁(X), μ̂₀(X) │
│ Inputs: W, Y1, site_id, D1–D4, X1–X80 │
│ Outputs: μ̂₁(X), μ̂₀(X) │
└─────────────────────────────────────────────────────────────┘
|
v
┌──────────────────────────────────────────────────┐
│ DOUBLY-ROBUST ATE_DR │
│ Uses: Y1, W, p̂(X), μ̂₁(X), μ̂₀(X) │
└──────────────────────────────────────────────────┘
|
v
┌──────────────────────────────────────────────────────────────┐
│ CATE / UPLIFT MODEL │
│ CATE = μ̂₁(X) − μ̂₀(X) │
│ risk_reduction = −CATE │
│ uplift_cluster (K-means on risk_reduction) │
└──────────────────────────────────────────────────────────────┘