Skip to content

fix(flags): fix cohort version update#51994

Open
matheus-vb wants to merge 4 commits intomasterfrom
matheus-vb/fix-version-update
Open

fix(flags): fix cohort version update#51994
matheus-vb wants to merge 4 commits intomasterfrom
matheus-vb/fix-version-update

Conversation

@matheus-vb
Copy link
Member

@matheus-vb matheus-vb commented Mar 23, 2026

Problem

A customer reported their cohort showed stale data ("no new people since a day ago") despite last_calculation being recent. Removing and re-adding a filter fixed it by forcing a version bump.

The probable root cause: calculate_people_ch updates the Postgres Cohort.version after the try/finally block. If self.save() or _safe_reset_calculating_state() in the finally block raises, the version update is silently skipped. ClickHouse has the new version's data, but Postgres still points to the old version, so all queries return stale results.

Changes

Moved the Cohort.objects.filter(...).update(version=..., count=...) call from after the try/finally block into the try block, right after recalculate_cohortpeople succeeds. This ensures the version is persisted before the finally block runs, so exceptions there can no longer silently skip it.

  • Renamed update_fields to version_update_fields to avoid confusion with the self.save(update_fields=...) call in finally
  • Added a test that mocks _safe_reset_calculating_state to raise and verifies version is still updated

The concurrency guard (version__lt=pending_version | version__isnull=True) is preserved, lower versions still can't overwrite higher ones.

Tradeoff: If the version .update() DB call itself raises, errors_calculating is incremented even though ClickHouse data was written. This is visible and self-healing (next retry succeeds), far better than the current silent split-brain.

How did you test this code?

  • New unit test test_calculate_people_ch_updates_version_even_when_finally_raises
  • Existing cohort tests verified for regressions
  • ruff check and ruff format pass

@matheus-vb matheus-vb changed the title call update inside try block fix(flags): fix cohort version update Mar 23, 2026
@matheus-vb matheus-vb marked this pull request as ready for review March 23, 2026 20:28
@matheus-vb matheus-vb requested a review from a team March 23, 2026 20:28
@posthog-project-board-bot posthog-project-board-bot bot moved this to In Review in Feature Flags Mar 23, 2026
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 23, 2026

Prompt To Fix All With AI
This is a comment left during a code review.
Path: posthog/test/test_cohort_model.py
Line: 513-533

Comment:
**Test only covers one of two `finally` failure paths**

The new test mocks `_safe_reset_calculating_state` to raise, but `self.save()` is called *before* `_safe_reset_calculating_state` in the `finally` block. If `self.save()` raises, `_safe_reset_calculating_state` is never reached yet the fix still protects the version update. Adding a second case (or parameterising the test) would give complete coverage of the original problem statement.

For example, using `@pytest.mark.parametrize` over both `("_safe_reset_calculating_state", "DB connection lost")` and `("save", "DB error")` would demonstrate the fix handles both `finally`-block failure modes, which aligns with the PR description's claim that the fix addresses `self.save()` failures too.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "apply fmt" | Re-trigger Greptile

@matheus-vb matheus-vb requested a review from a team March 24, 2026 00:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

1 participant