Skip to content

Tests/double counting#81

Open
AdaraPutri wants to merge 44 commits intodevfrom
tests/double-counting
Open

Tests/double counting#81
AdaraPutri wants to merge 44 commits intodevfrom
tests/double-counting

Conversation

@AdaraPutri
Copy link
Copy Markdown
Collaborator

@AdaraPutri AdaraPutri commented Mar 9, 2026

Summary

This PR adds tests for a specific double counting scenario in the process analysis pipeline: when the same logical work shows up across multiple related PRs.

Right now, the Markov process model treats each pr_id as its own session. So if work is first represented in one PR and then later appears again in another related PR, both PRs can contribute to the final transition counts. These tests are meant to make that behavior explicit.

Closes: #70

What was added

test_cross_pr_double_counting.py

This is the main integration test.

It creates a small synthetic labels CSV where two different PRs have the same event sequence. The idea is to simulate a case where the same work is being represented in two related PRs.

What it checks:

  • both PRs are treated as separate sessions
  • the same transition is counted once for each PR
  • pooled transition counts increase because the model is keyed by pr_id, not by unique underlying work

test_branch_reused_across_related_prs.py

This is another integration test, but it uses the actual labeling logic plus the transition edge computation.

It creates two PRs that share the same branch context, runs the branch-related labeling, then passes the output into the process model.

What it checks:

  • branch-derived labels are emitted for both PRs
  • both PRs contribute their own transition sequence
  • this can increase pooled edge counts when related PRs represent overlapping work

test_label_features_per_branch.py

This is the supporting unit test.

It focuses only on the upstream branch labeling behavior.

What it checks:

  • when two PRs share the same branch, the labeler emits one row per PR
  • both PRs receive the same branch-level event
  • this confirms one of the upstream mechanisms that can later contribute to semantic double counting

Why these tests

We already have tests for the individual scripts, but this PR is specifically meant to answer the question of whether double counting can happen when PRs are merged into other PRs or when the same work flows through multiple related PRs.

So this is less about duplicate rows within one PR, and more about cross-PR double counting.

Testing

Please run each test script individually:

python -m pytest tests/double_counting/test_cross_pr_double_counting.py -v
python -m pytest tests/double_counting/test_branch_reused_across_related_prs.py -v
python -m pytest tests/double_counting/test_label_features_per_branch.py -v

Mahatav and others added 30 commits January 19, 2026 11:48
…runs evething from extraction to anylysis to graphs. I also created a file that just runs the anylysis and graphing scripts given sometimes we dont need to re gather data.
…treamline pipeline execution, and improve error handling for team analysis
…raphing modules; update documentation and tests for clarity and accuracy
Add Unit Tests for process_model
@AdaraPutri AdaraPutri requested review from Mahatav and d2r3v March 9, 2026 18:35
@AdaraPutri AdaraPutri self-assigned this Mar 9, 2026
@AdaraPutri AdaraPutri marked this pull request as draft March 9, 2026 18:39
@d2r3v d2r3v force-pushed the dev branch 2 times, most recently from a8e6d00 to d14e5b5 Compare March 16, 2026 08:26
@AdaraPutri AdaraPutri marked this pull request as ready for review March 16, 2026 16:18
Copy link
Copy Markdown
Collaborator

@Mahatav Mahatav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job, much-needed changes.

Copy link
Copy Markdown
Collaborator

@d2r3v d2r3v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Works well. Thanks for these changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants