Tests/double counting by AdaraPutri · Pull Request #81 · bohuie/processAnalysis

AdaraPutri · 2026-03-09T18:35:27Z

Summary

This PR adds tests for a specific double counting scenario in the process analysis pipeline: when the same logical work shows up across multiple related PRs.

Right now, the Markov process model treats each pr_id as its own session. So if work is first represented in one PR and then later appears again in another related PR, both PRs can contribute to the final transition counts. These tests are meant to make that behavior explicit.

Closes: #70

What was added

`test_cross_pr_double_counting.py`

This is the main integration test.

It creates a small synthetic labels CSV where two different PRs have the same event sequence. The idea is to simulate a case where the same work is being represented in two related PRs.

What it checks:

both PRs are treated as separate sessions
the same transition is counted once for each PR
pooled transition counts increase because the model is keyed by pr_id, not by unique underlying work

`test_branch_reused_across_related_prs.py`

This is another integration test, but it uses the actual labeling logic plus the transition edge computation.

It creates two PRs that share the same branch context, runs the branch-related labeling, then passes the output into the process model.

What it checks:

branch-derived labels are emitted for both PRs
both PRs contribute their own transition sequence
this can increase pooled edge counts when related PRs represent overlapping work

`test_label_features_per_branch.py`

This is the supporting unit test.

It focuses only on the upstream branch labeling behavior.

What it checks:

when two PRs share the same branch, the labeler emits one row per PR
both PRs receive the same branch-level event
this confirms one of the upstream mechanisms that can later contribute to semantic double counting

Why these tests

We already have tests for the individual scripts, but this PR is specifically meant to answer the question of whether double counting can happen when PRs are merged into other PRs or when the same work flows through multiple related PRs.

So this is less about duplicate rows within one PR, and more about cross-PR double counting.

Testing

Please run each test script individually:

python -m pytest tests/double_counting/test_cross_pr_double_counting.py -v

python -m pytest tests/double_counting/test_branch_reused_across_related_prs.py -v

python -m pytest tests/double_counting/test_label_features_per_branch.py -v

…runs evething from extraction to anylysis to graphs. I also created a file that just runs the anylysis and graphing scripts given sometimes we dont need to re gather data.

… error handling in main execution flow

Refactor/documentation

…core

…treamline pipeline execution, and improve error handling for team analysis

…raphing modules; update documentation and tests for clarity and accuracy

Add Unit Tests for process_model

…move wrappers

Feature/adding main run it all file

# Conflicts: # event_labelling/CodeStructure_Branching/main.py # event_labelling/PR/get_clean_pr_label.py

Feature/adding elbow score

Refactor: Clean Scripts into Utility

…ation

Feature/comm label

Mahatav

Great job, much-needed changes.

d2r3v

LGTM! Works well. Thanks for these changes.

Mahatav and others added 30 commits January 19, 2026 11:48

eddited the readme so the explantion for the toggle system is better

239fb8b

updated the redme again to better relft how the filtering works now

4decd28

fixed the clean lable

0279b12

remove the toggel feature in its totallity and creted main file that …

6327283

…runs evething from extraction to anylysis to graphs. I also created a file that just runs the anylysis and graphing scripts given sometimes we dont need to re gather data.

added init file for scripts

cb5d6c5

Added Tests for Process Model

32b9146

Refactored Clean Scripts

f764b63

Refactor label processing functions to use Optional types and improve…

2354db0

… error handling in main execution flow

removing the pdf that i added by mistake

a6be9de

Add compute_elbow_scores function and save elbow scores to CSV

2f58fff

fix typos in clean_labels.py references in documentation

d24d0ee

added helpers_comm.py

e996952

added prep_data.py

c91ac42

added llm_prompts.py

25c6fc2

added get_clean_comm_label.py

b6927f7

modified comm_label.py to use helper functions

36951c7

restored comm_label.py

89cde60

Merge remote-tracking branch 'origin/dev' into refactor/documentation

50bdaff

Refactor/Automatic-Clean

01c5077

Merge pull request #41 from bohuie/refactor/documentation

a847455

Refactor/documentation

Add checks for elbow score computation and CSV saving in main function

8be5381

Merge remote-tracking branch 'origin/dev' into feature/adding_elbow_s…

e10d4b1

…core

Enhance README and main execution flow: add LLM setup instructions, s…

7e2d521

…treamline pipeline execution, and improve error handling for team analysis

Refactor configuration and improve edge filtering in clustering and g…

e052d65

…raphing modules; update documentation and tests for clarity and accuracy

pulled dev and rexolved conflits so it is ready for a merge

358bdda

added temporary config for communication process model

bfa4bca

Add Unit Tests for process_model (#50)

6fa8a7c

Add Unit Tests for process_model

Refactor: Integrate cleaning logic into main labelling scripts and re…

85f1646

…move wrappers

Refactor: Moved Clean Script to Util

8f1b256

Refactor/ File Cleanup

112d503

Mahatav and others added 14 commits February 9, 2026 22:32

Merge pull request #55 from bohuie/feature/adding-main-run-it-all-file

ed42e07

Feature/adding main run it all file

Add elbow score plotting functionality and save plots in main function

6a720f1

merged with dev to insure evething is up to date

7ab182d

Merge remote-tracking branch 'origin/dev' into Refactor/Clean-Util

87a6eb0

# Conflicts: # event_labelling/CodeStructure_Branching/main.py # event_labelling/PR/get_clean_pr_label.py

Merge pull request #65 from bohuie/feature/adding_elbow_score

62eee9c

Feature/adding elbow score

Merge pull request #54 from bohuie/Refactor/Clean-Util

1d6e841

Refactor: Clean Scripts into Utility

added .venv to gitignore

6d80c00

updated toggle removal for graphing and transition edges

d695edb

Merge remote-tracking branch 'origin/dev' into feature/comm-label

50a90ca

updated clustering and zscore_calculation configs to include communic…

db95e92

…ation

Merge pull request #67 from bohuie/feature/comm-label

fac69ff

Feature/comm label

added test script for cross pr double counting

783a08c

added test script for branch reused across prs

e35735d

added test script for label features per branch

e878c77

AdaraPutri requested review from Mahatav and d2r3v March 9, 2026 18:35

AdaraPutri self-assigned this Mar 9, 2026

AdaraPutri marked this pull request as draft March 9, 2026 18:39

d2r3v force-pushed the dev branch 2 times, most recently from a8e6d00 to d14e5b5 Compare March 16, 2026 08:26

AdaraPutri marked this pull request as ready for review March 16, 2026 16:18

Mahatav reviewed Mar 23, 2026

View reviewed changes

Mahatav approved these changes Mar 23, 2026

View reviewed changes

d2r3v approved these changes Apr 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tests/double counting#81

Tests/double counting#81
AdaraPutri wants to merge 44 commits intodevfrom
tests/double-counting

AdaraPutri commented Mar 9, 2026 •

edited

Loading

Uh oh!

Mahatav left a comment

Uh oh!

d2r3v left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

AdaraPutri commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What was added

test_cross_pr_double_counting.py

test_branch_reused_across_related_prs.py

test_label_features_per_branch.py

Why these tests

Testing

Uh oh!

Mahatav left a comment

Choose a reason for hiding this comment

Uh oh!

d2r3v left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

AdaraPutri commented Mar 9, 2026 •

edited

Loading

`test_cross_pr_double_counting.py`

`test_branch_reused_across_related_prs.py`

`test_label_features_per_branch.py`