feat(ci): smart upstream-monitor with release intelligence

## Summary

Current `upstream-monitor.yml` is a dumb sync — it counts commits behind upstream and tries to merge. It provides zero intelligence about **what** changed upstream. We need feature-level visibility to make informed merge decisions.

## Current behavior

| Feature | Status |
|---------|--------|
| Detect new upstream commits | yes |
| Auto-merge if clean | yes |
| Create conflict issue | yes |
| Detect merged feature branches | yes |
| **Parse upstream releases/tags** | **no** |
| **Show what changed (features/fixes/breaking)** | **no** |
| **Link to upstream issues/PRs** | **no** |
| **Detect breaking changes** | **no** |
| **Categorize by conventional commits** | **no** |

## Proposed behavior

### 1. Release detection

Check upstream tags/releases, not just commits. When a new release appears:

```
🏷️ New upstream release: lsm-tree v3.2.0 (was v3.1.2)
```

### 2. Commit categorization

Parse `git log` between our last sync point and upstream HEAD using conventional commit format:

```markdown
## Upstream changes since last sync

### ⚠️ Breaking (1)
- `feat!: remove deprecated flush_sync()` — fjall-rs/lsm-tree#290

### ✨ Features (3)
- `feat: add custom merge operator support` — fjall-rs/lsm-tree#280
- `feat(bloom): partitioned bloom filters` — fjall-rs/lsm-tree#275
- `feat: io_uring backend` — fjall-rs/lsm-tree#270

### 🐛 Fixes (2)
- `fix: race condition in concurrent compaction` — fjall-rs/lsm-tree#285
- `fix(vlog): corrupted blob header on crash` — fjall-rs/lsm-tree#282

### ⚡ Performance (1)
- `perf: vectorized block decoding` — fjall-rs/lsm-tree#278

### 📝 Other (4)
- `docs: update MSRV to 1.91` — fjall-rs/lsm-tree#288
- `test: add property tests for range scan` — fjall-rs/lsm-tree#286
- `chore(deps): bump lz4_flex to 0.14` — fjall-rs/lsm-tree#284
- `ci: add aarch64 cross-compilation` — fjall-rs/lsm-tree#283
```

### 3. Issue/PR linking

Extract `#NNN` references from commit messages and link them:

```bash
git log origin/main..upstream/main --format="%s" | grep -oE '#[0-9]+' | sort -u
```

For each reference, fetch title from upstream repo API.

### 4. Fork overlap detection

Check if any upstream changes touch files that our fork patches have modified:

```bash
# Files we've changed vs upstream
OUR_FILES=$(git diff origin/main...upstream/main --name-only)
FORK_FILES=$(git log --all --format="" --name-only -- src/ | sort -u)
OVERLAP=$(comm -12 <(echo "$OUR_FILES" | sort) <(echo "$FORK_FILES" | sort))
```

If overlap exists → flag as "needs manual review" even if merge is clean (semantic conflicts possible).

### 5. Smart PR/issue body

Instead of generic "N new commits", generate a structured body:

```markdown
## Upstream Sync: v3.1.2 → v3.2.0

**Release:** [v3.2.0](https://github.com/fjall-rs/lsm-tree/releases/tag/v3.2.0)
**Commits:** 47 new commits since last sync
**Breaking changes:** 1 ⚠️

### Changes by category
[categorized list from step 2]

### Fork overlap
These upstream changes touch files our fork has modified:
- `src/compaction/worker.rs` — our merge operator patches may need adaptation
- `src/tree/mod.rs` — our prefix bloom integration

### Review checklist
- [ ] Breaking changes evaluated for fork impact
- [ ] Overlapping files reviewed for semantic conflicts
- [ ] Fork-specific tests pass with upstream changes
- [ ] lsm-tree version in fjall Cargo.toml updated
```

## Implementation

### New workflow steps

```yaml
- name: Analyze upstream changes
  id: analyze
  run: |
    # 1. Detect release
    LATEST_TAG=$(git tag -l --sort=-v:refname 'v*' | head -1)
    UPSTREAM_TAG=$(git ls-remote --tags upstream | grep -oE 'v[0-9]+\.[0-9]+\.[0-9]+$' | sort -V | tail -1)

    # 2. Categorize commits
    git log origin/main..upstream/main --format="%s" | while read msg; do
      case "$msg" in
        feat\!:*|*BREAKING*) echo "breaking: $msg" ;;
        feat:*|feat\(*) echo "feature: $msg" ;;
        fix:*|fix\(*) echo "fix: $msg" ;;
        perf:*|perf\(*) echo "perf: $msg" ;;
        *) echo "other: $msg" ;;
      esac
    done > /tmp/categorized.txt

    # 3. Extract issue references
    git log origin/main..upstream/main --format="%s %b" | grep -oE '#[0-9]+' | sort -un > /tmp/refs.txt

    # 4. Check fork overlap
    git diff --name-only origin/main..upstream/main > /tmp/upstream_files.txt
    # Compare with our fork-specific changes (commits not in upstream)
    git diff --name-only upstream/main..origin/main > /tmp/fork_files.txt
    comm -12 <(sort /tmp/upstream_files.txt) <(sort /tmp/fork_files.txt) > /tmp/overlap.txt
```

### Backward compatible

- Keep existing auto-merge/conflict behavior
- Add intelligence as **additional context** in PR body / issue body
- No change to schedule (Mon/Thu 8am UTC)

## Acceptance criteria

- [ ] Detects new upstream release tags (not just commits)
- [ ] Categorizes commits by conventional commit type
- [ ] Extracts and resolves upstream issue/PR references (title + URL)
- [ ] Detects fork file overlap (potential semantic conflicts)
- [ ] Generates structured PR body with all of the above
- [ ] Generates structured issue body (conflict case) with all of the above
- [ ] Breaking changes highlighted prominently
- [ ] Backward compatible with current merge/conflict behavior

## Related

- fjall will get the same upgrade (separate issue)
- gitlab-mcp needs upstream-monitor from scratch (separate issue)
- strongswan already has sync-upstream.yml — evaluate if it needs the same intelligence

## Time estimate

1d — shared script + workflow updates for lsm-tree (fjall is copy+adapt)

---

## Additional: Add `.coderabbit.yaml` configuration

Currently no `.coderabbit.yaml` exists — CodeRabbit uses defaults. Add a proper config to:
- Disable pauses (reviews should never be paused/skipped)
- Set assertive profile (systems code needs thorough review)
- Add path-specific instructions (Rust safety, tests, benchmarks)
- Enable knowledge base with our coding guidelines
- Disable poem (noise in review walkthrough)
- Enable related issues/PRs detection

### Proposed `.coderabbit.yaml`

```yaml
# yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json
language: "en-US"
early_access: true
reviews:
  profile: "assertive"
  request_changes_workflow: false
  high_level_summary: true
  high_level_summary_placeholder: "@coderabbitai summary"
  poem: false
  review_status: true
  collapse_walkthrough: true
  commit_status: true
  assess_linked_issues: true
  related_issues: true
  related_prs: true
  suggested_labels: true
  auto_review:
    enabled: true
    auto_incremental_review: true
    drafts: false
    base_branches: ["main"]
    ignore_title_keywords: ["WIP", "DO NOT MERGE"]
  path_instructions:
    - path: "src/**/*.rs"
      instructions: |
        Review for memory safety, ownership correctness, and panic-free error handling.
        No unwrap() on I/O or user-input paths. Prefer Result<T, E> everywhere.
        No Box<dyn Any> as type bypass. No global mutable state (lazy_static Mutex<HashMap>).
        Check multi-instance safety: no in-memory-only mutable state that breaks with N replicas.
    - path: "tests/**"
      instructions: |
        Verify test coverage of edge cases and error paths.
        No unwrap() on paths that test error handling.
        Tests must have descriptive names and comments explaining WHAT is being tested.
    - path: "benches/**"
      instructions: |
        Check benchmark methodology. Use criterion properly.
        Measure P99/P999 latency, not just throughput.
    - path: "src/compaction/**"
      instructions: |
        Compaction is crash-safety critical. Every state mutation must be atomic.
        Verify no partial writes can corrupt on-disk state.
    - path: "src/encryption.rs"
      instructions: |
        Security-critical code. Review for timing attacks, nonce reuse, key handling.
        RNG must be CSPRNG. No hardcoded keys or IVs.
  tools:
    shellcheck:
      enabled: true
    markdownlint:
      enabled: true
    github-checks:
      enabled: true
      timeout_ms: 120000
chat:
  auto_reply: true
knowledge_base:
  learnings:
    scope: auto
  issues:
    scope: auto
  pull_requests:
    scope: auto
  code_guidelines:
    enabled: true
    filePatterns:
      - ".github/copilot-instructions.md"
      - ".github/instructions/*.instructions.md"
issue_enrichment:
  labeling:
    auto_apply_labels: true
    labeling_instructions:
      - label: "bug"
        instructions: "Code defect, incorrect behavior, crash, data corruption, wrong results"
      - label: "enhancement"
        instructions: "New feature, new API, new capability not previously available"
      - label: "performance"
        instructions: "Optimization, reduced allocations, faster path, benchmark improvement"
      - label: "refactor"
        instructions: "Code restructuring without behavior change — renames, extractions, trait threading"
      - label: "test"
        instructions: "New tests, test infrastructure, test helpers, flaky test fixes"
      - label: "ci"
        instructions: "CI/CD workflows, GitHub Actions, benchmarks pipeline, release automation"
      - label: "comparator"
        instructions: "UserComparator threading, custom key ordering, lexicographic vs comparator-aware"
      - label: "compaction"
        instructions: "LSM compaction logic, leveled/tiered strategy, L0/L1/L2 overlap, compaction picker"
      - label: "crash-safety"
        instructions: "Crash recovery, fsync ordering, atomic writes, WAL correctness, data durability"
      - label: "encryption"
        instructions: "Block encryption, AES-GCM, key management, nonce handling"
      - label: "fs-trait"
        instructions: "Filesystem abstraction, Fs trait, io_uring, per-level routing, StdFs"
      - label: "upstream-candidate"
        instructions: "Fix or feature that could be contributed back to fjall-rs upstream"
      - label: "fork-only"
        instructions: "Feature specific to CoordiNode fork — range tombstones, merge operators, prefix bloom, V4 format"
      - label: "upstream-sync"
        instructions: "Automated upstream synchronization — merge conflicts, release tracking"
```

### Key decisions

| Setting | Value | Why |
|---------|-------|-----|
| `profile` | assertive | Systems/storage code needs thorough review — "chill" misses too much |
| `poem` | false | Noise in walkthrough, wastes review tokens |
| `auto_incremental_review` | true | Review each push, not just first |
| `drafts` | false | Don't review drafts — they're WIP |
| `path_filters` | none needed | lsm-tree repo has no dirs to exclude (donor codebases live in coordinode workspace, not here) |
| `path_instructions` | Rust safety rules | Encode our engineering principles directly into review guidelines |
| `early_access` | true | Get new CodeRabbit features as they ship |
| `knowledge_base.code_guidelines` | copilot-instructions.md | CodeRabbit auto-reads this for review context |
| `commit_status` | true | Block merge until review completes |
| `assess_linked_issues` | true | Check if PR actually addresses linked issue |

### Checklist addition

- [ ] Create `.coderabbit.yaml` with config above
- [ ] Verify CodeRabbit picks it up on next PR (comment `@coderabbitai configuration` to confirm)




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ci): smart upstream-monitor with release intelligence #126

Summary

Current behavior

Proposed behavior

1. Release detection

2. Commit categorization

3. Issue/PR linking

4. Fork overlap detection

5. Smart PR/issue body

Implementation

New workflow steps

Backward compatible

Acceptance criteria

Related

Time estimate

Additional: Add `.coderabbit.yaml` configuration

Proposed `.coderabbit.yaml`

Key decisions

Checklist addition

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature	Status
Detect new upstream commits	yes
Auto-merge if clean	yes
Create conflict issue	yes
Detect merged feature branches	yes
Parse upstream releases/tags	no
Show what changed (features/fixes/breaking)	no
Link to upstream issues/PRs	no
Detect breaking changes	no
Categorize by conventional commits	no

Setting	Value	Why
`profile`	assertive	Systems/storage code needs thorough review — "chill" misses too much
`poem`	false	Noise in walkthrough, wastes review tokens
`auto_incremental_review`	true	Review each push, not just first
`drafts`	false	Don't review drafts — they're WIP
`path_filters`	none needed	lsm-tree repo has no dirs to exclude (donor codebases live in coordinode workspace, not here)
`path_instructions`	Rust safety rules	Encode our engineering principles directly into review guidelines
`early_access`	true	Get new CodeRabbit features as they ship
`knowledge_base.code_guidelines`	copilot-instructions.md	CodeRabbit auto-reads this for review context
`commit_status`	true	Block merge until review completes
`assess_linked_issues`	true	Check if PR actually addresses linked issue

feat(ci): smart upstream-monitor with release intelligence #126

Description

Summary

Current behavior

Proposed behavior

1. Release detection

2. Commit categorization

3. Issue/PR linking

4. Fork overlap detection

5. Smart PR/issue body

Implementation

New workflow steps

Backward compatible

Acceptance criteria

Related

Time estimate

Additional: Add .coderabbit.yaml configuration

Proposed .coderabbit.yaml

Key decisions

Checklist addition

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Additional: Add `.coderabbit.yaml` configuration

Proposed `.coderabbit.yaml`