Add Rank-weighted Average Treatment Effect (RATE) metric by aman-coder03 · Pull Request #887 · uber/causalml

aman-coder03 · 2026-03-07T18:34:20Z

Proposed changes

implements the RATE metric proposed by Yadlowsky et al. (2021) as requested in #540
RATE evaluates how well a treatment prioritization rule (e.g. a CATE estimator) identifies units with above-average treatment benefit. It does this by computing the weighted area under the Targeting Operator Characteristic (TOC) curve, which compares the ATE among the top-q fraction of prioritized units to the overall ATE.

3 functions are added to causalml/metrics/rate.py following the same API conventions as the existing qini_score / get_qini / plot_qini

get_toc() computes the TOC curve
rate_score() computes the RATE scalar with either AUTOC (1/q) or Qini (q) weighting
plot_toc() visualizes the TOC curve

both oracle mode (simulated tau) and observed RCT mode (y + w) are supported. 16 tests are included in tests/test_rate.py
Closes #540

Types of changes

What types of changes does your code introduce to CausalML?
Put an x in the boxes that apply

Bugfix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation Update (if none of the other choices apply)

Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.

I have read the CONTRIBUTING doc
I have signed the CLA
Lint and unit tests pass locally with my changes
I have added tests that prove my fix is effective or that my feature works
I have added necessary documentation (if appropriate)
Any dependent changes have been merged and published in downstream modules

Further comments

If this is a relatively large or complex change, kick off the discussion by explaining why you chose the solution you did and what alternatives you considered, etc. This PR template is adopted from appium.

aman-coder03 · 2026-03-13T17:27:31Z

hey @jeongyoonlee can you please have a look at this PR?

jeongyoonlee

Thanks for adding the RATE metric! The implementation follows the existing get_qini/qini_score/plot_qini API pattern well. A few items to address:

Blocking

normalize division by zero — At q=1, TOC = 0 by definition (subset ATE == overall ATE when subset is the entire population), so toc.div(np.abs(toc.iloc[-1, :]), axis=1) will divide by zero and produce inf/NaN. Needs a guard or a different normalization reference point (e.g., max absolute value).
Unused random_seed parameter — All three functions accept random_seed=42 but never use it. The docstring says "deprecated" but this is brand-new code with no backward-compatibility obligation. Please remove it, or if kept for API consistency with get_qini, document why.
Missing test for normalize=True — Given the division-by-zero issue above, this path needs coverage.
Hardcoded seeds — Per project conventions, please use RANDOM_SEED from tests/const.py instead of hardcoded 42/0. Same for CONTROL_NAME and TREATMENT_NAMES if applicable.
Test that TOC ends at zero — At q=1, TOC should be 0 by definition. There's a test for TOC starting at zero but not ending at zero.

Non-blocking suggestions

O(n²) complexity in get_toc — The loop over every data point computes sorted_df.iloc[:top_k].mean() for each k. For large datasets this will be slow. Consider using cumulative sums (like get_qini does) for O(n) performance:
```
cumsum_tau = np.cumsum(sorted_tau)
subset_ate = cumsum_tau / np.arange(1, n_total + 1)
```
Integration formula — The weight normalization (weights / weights.sum()) computes a weighted mean rather than a true integral. This preserves model rankings (which is the primary use case, similar to Qini/AUUC), but the absolute values won't exactly match the paper's definition. Worth a brief note in the docstring.
Module-level plt.style.use("fivethirtyeight") — This is a side effect at import time that affects global matplotlib state. Consistent with visualize.py but worth noting.
pytest.raises(Exception) in test_get_toc_errors_on_nan — Use a more specific exception type (the code raises AssertionError, so use pytest.raises(AssertionError)).
Observed-outcome fallback — When t_mask.sum() == 0 or c_mask.sum() == 0 at a quantile, the code silently falls back to overall_ate making TOC(q) = 0. This is reasonable but worth documenting.

aman-coder03 · 2026-03-14T08:30:22Z

Hi @jeongyoonlee, thanks for the thorough review! I've already addressed all of these in the latest commit...

adding RATE metric for CATE evaluators

4a3348d

jeongyoonlee reviewed Mar 13, 2026

View reviewed changes

jeongyoonlee added the enhancement New feature or request label Mar 13, 2026

addressing suggested changes

30eaebd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Rank-weighted Average Treatment Effect (RATE) metric#887

Add Rank-weighted Average Treatment Effect (RATE) metric#887
aman-coder03 wants to merge 2 commits intouber:masterfrom
aman-coder03:feature/rate-metric

aman-coder03 commented Mar 7, 2026

Uh oh!

aman-coder03 commented Mar 13, 2026 •

edited

Loading

Uh oh!

jeongyoonlee left a comment

Uh oh!

aman-coder03 commented Mar 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aman-coder03 commented Mar 7, 2026

Proposed changes

Types of changes

Checklist

Further comments

Uh oh!

aman-coder03 commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeongyoonlee left a comment

Choose a reason for hiding this comment

Blocking

Non-blocking suggestions

Uh oh!

aman-coder03 commented Mar 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aman-coder03 commented Mar 13, 2026 •

edited

Loading