Open
Conversation
…runs evething from extraction to anylysis to graphs. I also created a file that just runs the anylysis and graphing scripts given sometimes we dont need to re gather data.
… error handling in main execution flow
Refactor/documentation
…treamline pipeline execution, and improve error handling for team analysis
…raphing modules; update documentation and tests for clarity and accuracy
Add Unit Tests for process_model
Feature/adding main run it all file
# Conflicts: # event_labelling/CodeStructure_Branching/main.py # event_labelling/PR/get_clean_pr_label.py
Feature/adding elbow score
Refactor: Clean Scripts into Utility
Feature/comm label
a8e6d00 to
d14e5b5
Compare
Mahatav
reviewed
Mar 23, 2026
Collaborator
Mahatav
left a comment
There was a problem hiding this comment.
Great job, much-needed changes.
Mahatav
approved these changes
Mar 23, 2026
d2r3v
approved these changes
Apr 7, 2026
Collaborator
d2r3v
left a comment
There was a problem hiding this comment.
LGTM! Works well. Thanks for these changes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds tests for a specific double counting scenario in the process analysis pipeline: when the same logical work shows up across multiple related PRs.
Right now, the Markov process model treats each
pr_idas its own session. So if work is first represented in one PR and then later appears again in another related PR, both PRs can contribute to the final transition counts. These tests are meant to make that behavior explicit.Closes: #70
What was added
test_cross_pr_double_counting.pyThis is the main integration test.
It creates a small synthetic labels CSV where two different PRs have the same event sequence. The idea is to simulate a case where the same work is being represented in two related PRs.
What it checks:
pr_id, not by unique underlying worktest_branch_reused_across_related_prs.pyThis is another integration test, but it uses the actual labeling logic plus the transition edge computation.
It creates two PRs that share the same branch context, runs the branch-related labeling, then passes the output into the process model.
What it checks:
test_label_features_per_branch.pyThis is the supporting unit test.
It focuses only on the upstream branch labeling behavior.
What it checks:
Why these tests
We already have tests for the individual scripts, but this PR is specifically meant to answer the question of whether double counting can happen when PRs are merged into other PRs or when the same work flows through multiple related PRs.
So this is less about duplicate rows within one PR, and more about cross-PR double counting.
Testing
Please run each test script individually: