Feature/trimming count by n-gilbertson · Pull Request #35 · ihmeuw-msca/bopforge

n-gilbertson · 2026-04-03T20:17:50Z

Add trimming count to ensure we always trim outlier_pct = (1 - inlier_pct)*100% of datapoints in the continuous model. If submodels do not agree and fewer than outlier_pct of points fall below the w_soln < 0.1 threshold, this guarantees we still trim outlier_pct of datapoints. Using slightly more aggressive ceiling strategy to round up to nearest integer of points to trim to be consistent with the number of inliers being set with floor at the limetr level.

Note: needs eventual version bump.

If there are any suggestions for how to handle the extra outlier identification better/more safely that doesn't use indexing, happy to incorporate those!

… continuous pipeline

…eature/trimming_count Update with main branch

zhengp0 · 2026-04-06T17:58:39Z

src/bopforge/continuous_pipeline/functions.py

-    is_outlier = (signal_model.w_soln < 0.1).astype(int)
+
+    inlier_pct = signal_model.inlier_pct
+    target_outlier_count = int(np.ceil((1.0 - inlier_pct) * data.num_obs))


If we want, I think we can directly use this
https://github.com/ihmeuw-msca/limetr/blob/0.0.8/src/limetr/__init__.py#L205
It will be signal_model.lt.num_outliers
Realize that MRBeRT might not have this. Please see below for the alternative.

zhengp0 · 2026-04-06T18:19:45Z

src/bopforge/continuous_pipeline/functions.py

+        inlier_indices = np.where(is_outlier_arr == 0)[0]
+        additional_outlier_indices = np.argsort(signal_model.w_soln[inlier_indices])[:num_to_add]
+        is_outlier_arr[inlier_indices[additional_outlier_indices]] = 1
+    is_outlier = is_outlier_arr


Here the logic is more complicated than expected. I think we can simplify it as

trimming_weights = signal_model.w_soln sub_lt_model = signal_model.sub_models[0].lt num_outliers = sub_lt_model.num_outliers outlier_indices = np.argsort(trimming_weights)[:num_outliers] is_outlier = np.zeros(sub_lt_model.N, dtype=int) is_outlier[outlier_indices] = 1

Let me know what you think!

Hi Peng – this makes sense and is much cleaner! One question: I think it's technically possible (if unlikely) for us to have more than num_outliers trimmed under the old approach of
is_outlier = (signal_model.w_soln < 0.1).astype(int)
If that were to be the case, do we want to allow this 'more than num_outliers are trimmed' scenario? We could build in something like

trimming_weights = ... sub_lt_model = ... num_outliers = int(sub_lt_model.num_outliers) consensus_outliers = np.where(signal_model.w_soln < 0.1)[0] if len(consensus_outliers) < num_outliers: outlier_indices = np.argsort(trimming_weights)[:num_outliers] else: outlier_indices = consensus_outliers is_outlier = ... is_outlier[outlier_indices] = 1

if we do want to do this, or leave it at the strict cap of trimming num_outliers using your approach above.

n-gilbertson · 2026-04-07T19:03:37Z

pyproject.toml

 [project]
 name = "bopforge"
-version = "0.2.2"
+version = "0.2.3"


Let me know if this version bump should be something else!

zhengp0

This looks good, thanks @n-gilbertson!

Nora Gilbertson and others added 3 commits March 19, 2026 14:22

add functionality to ensure 1-inlier_pct of datapoints are trimmed in…

a910164

… continuous pipeline

Merge branch 'main' of https://github.com/ihmeuw-msca/bopforge into f…

b12e645

…eature/trimming_count Update with main branch

use ceiling instead of rounding to nearest integer for trimming count

709baef

n-gilbertson marked this pull request as ready for review April 3, 2026 21:49

n-gilbertson requested a review from zhengp0 April 3, 2026 21:50

zhengp0 reviewed Apr 6, 2026

View reviewed changes

Nora Gilbertson added 2 commits April 7, 2026 11:11

sipmlify outlier count logic

36ab6bd

bump version number

92bd128

n-gilbertson commented Apr 7, 2026

View reviewed changes

zhengp0 approved these changes Apr 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/trimming count#35

Feature/trimming count#35
n-gilbertson wants to merge 5 commits intomainfrom
feature/trimming_count

n-gilbertson commented Apr 3, 2026 •

edited

Loading

Uh oh!

zhengp0 Apr 6, 2026 •

edited

Loading

Uh oh!

zhengp0 Apr 6, 2026

Uh oh!

n-gilbertson Apr 6, 2026

Uh oh!

n-gilbertson Apr 7, 2026

Uh oh!

zhengp0 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

n-gilbertson commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhengp0 Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhengp0 Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

n-gilbertson Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

n-gilbertson Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

zhengp0 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

n-gilbertson commented Apr 3, 2026 •

edited

Loading

zhengp0 Apr 6, 2026 •

edited

Loading