Skip to content

Add Rank-weighted Average Treatment Effect (RATE) metric#887

Open
aman-coder03 wants to merge 2 commits intouber:masterfrom
aman-coder03:feature/rate-metric
Open

Add Rank-weighted Average Treatment Effect (RATE) metric#887
aman-coder03 wants to merge 2 commits intouber:masterfrom
aman-coder03:feature/rate-metric

Conversation

@aman-coder03
Copy link
Contributor

Proposed changes

implements the RATE metric proposed by Yadlowsky et al. (2021) as requested in #540
RATE evaluates how well a treatment prioritization rule (e.g. a CATE estimator) identifies units with above-average treatment benefit. It does this by computing the weighted area under the Targeting Operator Characteristic (TOC) curve, which compares the ATE among the top-q fraction of prioritized units to the overall ATE.

3 functions are added to causalml/metrics/rate.py following the same API conventions as the existing qini_score / get_qini / plot_qini

  • get_toc() computes the TOC curve
  • rate_score() computes the RATE scalar with either AUTOC (1/q) or Qini (q) weighting
  • plot_toc() visualizes the TOC curve

both oracle mode (simulated tau) and observed RCT mode (y + w) are supported. 16 tests are included in tests/test_rate.py
Closes #540

Types of changes

What types of changes does your code introduce to CausalML?
Put an x in the boxes that apply

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation Update (if none of the other choices apply)

Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.

  • I have read the CONTRIBUTING doc
  • I have signed the CLA
  • Lint and unit tests pass locally with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)
  • Any dependent changes have been merged and published in downstream modules

Further comments

If this is a relatively large or complex change, kick off the discussion by explaining why you chose the solution you did and what alternatives you considered, etc. This PR template is adopted from appium.

@aman-coder03
Copy link
Contributor Author

aman-coder03 commented Mar 13, 2026

hey @jeongyoonlee can you please have a look at this PR?

Copy link
Collaborator

@jeongyoonlee jeongyoonlee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the RATE metric! The implementation follows the existing get_qini/qini_score/plot_qini API pattern well. A few items to address:

Blocking

  1. normalize division by zero — At q=1, TOC = 0 by definition (subset ATE == overall ATE when subset is the entire population), so toc.div(np.abs(toc.iloc[-1, :]), axis=1) will divide by zero and produce inf/NaN. Needs a guard or a different normalization reference point (e.g., max absolute value).

  2. Unused random_seed parameter — All three functions accept random_seed=42 but never use it. The docstring says "deprecated" but this is brand-new code with no backward-compatibility obligation. Please remove it, or if kept for API consistency with get_qini, document why.

  3. Missing test for normalize=True — Given the division-by-zero issue above, this path needs coverage.

  4. Hardcoded seeds — Per project conventions, please use RANDOM_SEED from tests/const.py instead of hardcoded 42/0. Same for CONTROL_NAME and TREATMENT_NAMES if applicable.

  5. Test that TOC ends at zero — At q=1, TOC should be 0 by definition. There's a test for TOC starting at zero but not ending at zero.

Non-blocking suggestions

  • O(n²) complexity in get_toc — The loop over every data point computes sorted_df.iloc[:top_k].mean() for each k. For large datasets this will be slow. Consider using cumulative sums (like get_qini does) for O(n) performance:

    cumsum_tau = np.cumsum(sorted_tau)
    subset_ate = cumsum_tau / np.arange(1, n_total + 1)
  • Integration formula — The weight normalization (weights / weights.sum()) computes a weighted mean rather than a true integral. This preserves model rankings (which is the primary use case, similar to Qini/AUUC), but the absolute values won't exactly match the paper's definition. Worth a brief note in the docstring.

  • Module-level plt.style.use("fivethirtyeight") — This is a side effect at import time that affects global matplotlib state. Consistent with visualize.py but worth noting.

  • pytest.raises(Exception) in test_get_toc_errors_on_nan — Use a more specific exception type (the code raises AssertionError, so use pytest.raises(AssertionError)).

  • Observed-outcome fallback — When t_mask.sum() == 0 or c_mask.sum() == 0 at a quantile, the code silently falls back to overall_ate making TOC(q) = 0. This is reasonable but worth documenting.

@jeongyoonlee jeongyoonlee added the enhancement New feature or request label Mar 13, 2026
@aman-coder03
Copy link
Contributor Author

Hi @jeongyoonlee, thanks for the thorough review! I've already addressed all of these in the latest commit...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add the Rank-weighted Average Treatment Effect (RATE) metric

2 participants