Skip to content

feat(ci): smart upstream-monitor with release intelligenceΒ #126

@polaz

Description

@polaz

Summary

Current upstream-monitor.yml is a dumb sync β€” it counts commits behind upstream and tries to merge. It provides zero intelligence about what changed upstream. We need feature-level visibility to make informed merge decisions.

Current behavior

Feature Status
Detect new upstream commits yes
Auto-merge if clean yes
Create conflict issue yes
Detect merged feature branches yes
Parse upstream releases/tags no
Show what changed (features/fixes/breaking) no
Link to upstream issues/PRs no
Detect breaking changes no
Categorize by conventional commits no

Proposed behavior

1. Release detection

Check upstream tags/releases, not just commits. When a new release appears:

🏷️ New upstream release: lsm-tree v3.2.0 (was v3.1.2)

2. Commit categorization

Parse git log between our last sync point and upstream HEAD using conventional commit format:

## Upstream changes since last sync

### ⚠️ Breaking (1)
- `feat!: remove deprecated flush_sync()` β€” fjall-rs/lsm-tree#290

### ✨ Features (3)
- `feat: add custom merge operator support` β€” fjall-rs/lsm-tree#280
- `feat(bloom): partitioned bloom filters` β€” fjall-rs/lsm-tree#275
- `feat: io_uring backend` β€” fjall-rs/lsm-tree#270

### πŸ› Fixes (2)
- `fix: race condition in concurrent compaction` β€” fjall-rs/lsm-tree#285
- `fix(vlog): corrupted blob header on crash` β€” fjall-rs/lsm-tree#282

### ⚑ Performance (1)
- `perf: vectorized block decoding` β€” fjall-rs/lsm-tree#278

### πŸ“ Other (4)
- `docs: update MSRV to 1.91` β€” fjall-rs/lsm-tree#288
- `test: add property tests for range scan` β€” fjall-rs/lsm-tree#286
- `chore(deps): bump lz4_flex to 0.14` β€” fjall-rs/lsm-tree#284
- `ci: add aarch64 cross-compilation` β€” fjall-rs/lsm-tree#283

3. Issue/PR linking

Extract #NNN references from commit messages and link them:

git log origin/main..upstream/main --format="%s" | grep -oE '#[0-9]+' | sort -u

For each reference, fetch title from upstream repo API.

4. Fork overlap detection

Check if any upstream changes touch files that our fork patches have modified:

# Files we've changed vs upstream
OUR_FILES=$(git diff origin/main...upstream/main --name-only)
FORK_FILES=$(git log --all --format="" --name-only -- src/ | sort -u)
OVERLAP=$(comm -12 <(echo "$OUR_FILES" | sort) <(echo "$FORK_FILES" | sort))

If overlap exists β†’ flag as "needs manual review" even if merge is clean (semantic conflicts possible).

5. Smart PR/issue body

Instead of generic "N new commits", generate a structured body:

## Upstream Sync: v3.1.2 β†’ v3.2.0

**Release:** [v3.2.0](https://github.com/fjall-rs/lsm-tree/releases/tag/v3.2.0)
**Commits:** 47 new commits since last sync
**Breaking changes:** 1 ⚠️

### Changes by category
[categorized list from step 2]

### Fork overlap
These upstream changes touch files our fork has modified:
- `src/compaction/worker.rs` β€” our merge operator patches may need adaptation
- `src/tree/mod.rs` β€” our prefix bloom integration

### Review checklist
- [ ] Breaking changes evaluated for fork impact
- [ ] Overlapping files reviewed for semantic conflicts
- [ ] Fork-specific tests pass with upstream changes
- [ ] lsm-tree version in fjall Cargo.toml updated

Implementation

New workflow steps

- name: Analyze upstream changes
  id: analyze
  run: |
    # 1. Detect release
    LATEST_TAG=$(git tag -l --sort=-v:refname 'v*' | head -1)
    UPSTREAM_TAG=$(git ls-remote --tags upstream | grep -oE 'v[0-9]+\.[0-9]+\.[0-9]+$' | sort -V | tail -1)

    # 2. Categorize commits
    git log origin/main..upstream/main --format="%s" | while read msg; do
      case "$msg" in
        feat\!:*|*BREAKING*) echo "breaking: $msg" ;;
        feat:*|feat\(*) echo "feature: $msg" ;;
        fix:*|fix\(*) echo "fix: $msg" ;;
        perf:*|perf\(*) echo "perf: $msg" ;;
        *) echo "other: $msg" ;;
      esac
    done > /tmp/categorized.txt

    # 3. Extract issue references
    git log origin/main..upstream/main --format="%s %b" | grep -oE '#[0-9]+' | sort -un > /tmp/refs.txt

    # 4. Check fork overlap
    git diff --name-only origin/main..upstream/main > /tmp/upstream_files.txt
    # Compare with our fork-specific changes (commits not in upstream)
    git diff --name-only upstream/main..origin/main > /tmp/fork_files.txt
    comm -12 <(sort /tmp/upstream_files.txt) <(sort /tmp/fork_files.txt) > /tmp/overlap.txt

Backward compatible

  • Keep existing auto-merge/conflict behavior
  • Add intelligence as additional context in PR body / issue body
  • No change to schedule (Mon/Thu 8am UTC)

Acceptance criteria

  • Detects new upstream release tags (not just commits)
  • Categorizes commits by conventional commit type
  • Extracts and resolves upstream issue/PR references (title + URL)
  • Detects fork file overlap (potential semantic conflicts)
  • Generates structured PR body with all of the above
  • Generates structured issue body (conflict case) with all of the above
  • Breaking changes highlighted prominently
  • Backward compatible with current merge/conflict behavior

Related

  • fjall will get the same upgrade (separate issue)
  • gitlab-mcp needs upstream-monitor from scratch (separate issue)
  • strongswan already has sync-upstream.yml β€” evaluate if it needs the same intelligence

Time estimate

1d β€” shared script + workflow updates for lsm-tree (fjall is copy+adapt)


Additional: Add .coderabbit.yaml configuration

Currently no .coderabbit.yaml exists β€” CodeRabbit uses defaults. Add a proper config to:

  • Disable pauses (reviews should never be paused/skipped)
  • Set assertive profile (systems code needs thorough review)
  • Add path-specific instructions (Rust safety, tests, benchmarks)
  • Enable knowledge base with our coding guidelines
  • Disable poem (noise in review walkthrough)
  • Enable related issues/PRs detection

Proposed .coderabbit.yaml

# yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json
language: "en-US"
early_access: true
reviews:
  profile: "assertive"
  request_changes_workflow: false
  high_level_summary: true
  high_level_summary_placeholder: "@coderabbitai summary"
  poem: false
  review_status: true
  collapse_walkthrough: true
  commit_status: true
  assess_linked_issues: true
  related_issues: true
  related_prs: true
  suggested_labels: true
  auto_review:
    enabled: true
    auto_incremental_review: true
    drafts: false
    base_branches: ["main"]
    ignore_title_keywords: ["WIP", "DO NOT MERGE"]
  path_instructions:
    - path: "src/**/*.rs"
      instructions: |
        Review for memory safety, ownership correctness, and panic-free error handling.
        No unwrap() on I/O or user-input paths. Prefer Result<T, E> everywhere.
        No Box<dyn Any> as type bypass. No global mutable state (lazy_static Mutex<HashMap>).
        Check multi-instance safety: no in-memory-only mutable state that breaks with N replicas.
    - path: "tests/**"
      instructions: |
        Verify test coverage of edge cases and error paths.
        No unwrap() on paths that test error handling.
        Tests must have descriptive names and comments explaining WHAT is being tested.
    - path: "benches/**"
      instructions: |
        Check benchmark methodology. Use criterion properly.
        Measure P99/P999 latency, not just throughput.
    - path: "src/compaction/**"
      instructions: |
        Compaction is crash-safety critical. Every state mutation must be atomic.
        Verify no partial writes can corrupt on-disk state.
    - path: "src/encryption.rs"
      instructions: |
        Security-critical code. Review for timing attacks, nonce reuse, key handling.
        RNG must be CSPRNG. No hardcoded keys or IVs.
  tools:
    shellcheck:
      enabled: true
    markdownlint:
      enabled: true
    github-checks:
      enabled: true
      timeout_ms: 120000
chat:
  auto_reply: true
knowledge_base:
  learnings:
    scope: auto
  issues:
    scope: auto
  pull_requests:
    scope: auto
  code_guidelines:
    enabled: true
    filePatterns:
      - ".github/copilot-instructions.md"
      - ".github/instructions/*.instructions.md"
issue_enrichment:
  labeling:
    auto_apply_labels: true
    labeling_instructions:
      - label: "bug"
        instructions: "Code defect, incorrect behavior, crash, data corruption, wrong results"
      - label: "enhancement"
        instructions: "New feature, new API, new capability not previously available"
      - label: "performance"
        instructions: "Optimization, reduced allocations, faster path, benchmark improvement"
      - label: "refactor"
        instructions: "Code restructuring without behavior change β€” renames, extractions, trait threading"
      - label: "test"
        instructions: "New tests, test infrastructure, test helpers, flaky test fixes"
      - label: "ci"
        instructions: "CI/CD workflows, GitHub Actions, benchmarks pipeline, release automation"
      - label: "comparator"
        instructions: "UserComparator threading, custom key ordering, lexicographic vs comparator-aware"
      - label: "compaction"
        instructions: "LSM compaction logic, leveled/tiered strategy, L0/L1/L2 overlap, compaction picker"
      - label: "crash-safety"
        instructions: "Crash recovery, fsync ordering, atomic writes, WAL correctness, data durability"
      - label: "encryption"
        instructions: "Block encryption, AES-GCM, key management, nonce handling"
      - label: "fs-trait"
        instructions: "Filesystem abstraction, Fs trait, io_uring, per-level routing, StdFs"
      - label: "upstream-candidate"
        instructions: "Fix or feature that could be contributed back to fjall-rs upstream"
      - label: "fork-only"
        instructions: "Feature specific to CoordiNode fork β€” range tombstones, merge operators, prefix bloom, V4 format"
      - label: "upstream-sync"
        instructions: "Automated upstream synchronization β€” merge conflicts, release tracking"

Key decisions

Setting Value Why
profile assertive Systems/storage code needs thorough review β€” "chill" misses too much
poem false Noise in walkthrough, wastes review tokens
auto_incremental_review true Review each push, not just first
drafts false Don't review drafts β€” they're WIP
path_filters none needed lsm-tree repo has no dirs to exclude (donor codebases live in coordinode workspace, not here)
path_instructions Rust safety rules Encode our engineering principles directly into review guidelines
early_access true Get new CodeRabbit features as they ship
knowledge_base.code_guidelines copilot-instructions.md CodeRabbit auto-reads this for review context
commit_status true Block merge until review completes
assess_linked_issues true Check if PR actually addresses linked issue

Checklist addition

  • Create .coderabbit.yaml with config above
  • Verify CodeRabbit picks it up on next PR (comment @coderabbitai configuration to confirm)

Metadata

Metadata

Assignees

No one assigned

    Labels

    ciCI/CD workflows, GitHub Actions, release automationenhancementNew feature, new API, new capability

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions