Skip to content

Feature/trimming count#35

Open
n-gilbertson wants to merge 5 commits intomainfrom
feature/trimming_count
Open

Feature/trimming count#35
n-gilbertson wants to merge 5 commits intomainfrom
feature/trimming_count

Conversation

@n-gilbertson
Copy link
Copy Markdown
Contributor

@n-gilbertson n-gilbertson commented Apr 3, 2026

Add trimming count to ensure we always trim outlier_pct = (1 - inlier_pct)*100% of datapoints in the continuous model. If submodels do not agree and fewer than outlier_pct of points fall below the w_soln < 0.1 threshold, this guarantees we still trim outlier_pct of datapoints. Using slightly more aggressive ceiling strategy to round up to nearest integer of points to trim to be consistent with the number of inliers being set with floor at the limetr level.

Note: needs eventual version bump.

If there are any suggestions for how to handle the extra outlier identification better/more safely that doesn't use indexing, happy to incorporate those!

@n-gilbertson n-gilbertson marked this pull request as ready for review April 3, 2026 21:49
@n-gilbertson n-gilbertson requested a review from zhengp0 April 3, 2026 21:50
is_outlier = (signal_model.w_soln < 0.1).astype(int)

inlier_pct = signal_model.inlier_pct
target_outlier_count = int(np.ceil((1.0 - inlier_pct) * data.num_obs))
Copy link
Copy Markdown
Member

@zhengp0 zhengp0 Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want, I think we can directly use this
https://github.com/ihmeuw-msca/limetr/blob/0.0.8/src/limetr/__init__.py#L205
It will be signal_model.lt.num_outliers

Realize that MRBeRT might not have this. Please see below for the alternative.

inlier_indices = np.where(is_outlier_arr == 0)[0]
additional_outlier_indices = np.argsort(signal_model.w_soln[inlier_indices])[:num_to_add]
is_outlier_arr[inlier_indices[additional_outlier_indices]] = 1
is_outlier = is_outlier_arr
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the logic is more complicated than expected. I think we can simplify it as

trimming_weights = signal_model.w_soln
sub_lt_model = signal_model.sub_models[0].lt
num_outliers = sub_lt_model.num_outliers
outlier_indices = np.argsort(trimming_weights)[:num_outliers]
is_outlier = np.zeros(sub_lt_model.N, dtype=int)
is_outlier[outlier_indices] = 1

Let me know what you think!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Peng – this makes sense and is much cleaner! One question: I think it's technically possible (if unlikely) for us to have more than num_outliers trimmed under the old approach of
is_outlier = (signal_model.w_soln < 0.1).astype(int)
If that were to be the case, do we want to allow this 'more than num_outliers are trimmed' scenario? We could build in something like

trimming_weights = ...
sub_lt_model = ...
num_outliers = int(sub_lt_model.num_outliers)
consensus_outliers = np.where(signal_model.w_soln < 0.1)[0]
if len(consensus_outliers) < num_outliers:
    outlier_indices = np.argsort(trimming_weights)[:num_outliers]
else:
    outlier_indices = consensus_outliers
is_outlier = ...
is_outlier[outlier_indices] = 1

if we do want to do this, or leave it at the strict cap of trimming num_outliers using your approach above.

[project]
name = "bopforge"
version = "0.2.2"
version = "0.2.3"
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if this version bump should be something else!

Copy link
Copy Markdown
Member

@zhengp0 zhengp0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, thanks @n-gilbertson!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants