-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Summary
Current upstream-monitor.yml is a dumb sync β it counts commits behind upstream and tries to merge. It provides zero intelligence about what changed upstream. We need feature-level visibility to make informed merge decisions.
Current behavior
| Feature | Status |
|---|---|
| Detect new upstream commits | yes |
| Auto-merge if clean | yes |
| Create conflict issue | yes |
| Detect merged feature branches | yes |
| Parse upstream releases/tags | no |
| Show what changed (features/fixes/breaking) | no |
| Link to upstream issues/PRs | no |
| Detect breaking changes | no |
| Categorize by conventional commits | no |
Proposed behavior
1. Release detection
Check upstream tags/releases, not just commits. When a new release appears:
π·οΈ New upstream release: lsm-tree v3.2.0 (was v3.1.2)
2. Commit categorization
Parse git log between our last sync point and upstream HEAD using conventional commit format:
## Upstream changes since last sync
### β οΈ Breaking (1)
- `feat!: remove deprecated flush_sync()` β fjall-rs/lsm-tree#290
### β¨ Features (3)
- `feat: add custom merge operator support` β fjall-rs/lsm-tree#280
- `feat(bloom): partitioned bloom filters` β fjall-rs/lsm-tree#275
- `feat: io_uring backend` β fjall-rs/lsm-tree#270
### π Fixes (2)
- `fix: race condition in concurrent compaction` β fjall-rs/lsm-tree#285
- `fix(vlog): corrupted blob header on crash` β fjall-rs/lsm-tree#282
### β‘ Performance (1)
- `perf: vectorized block decoding` β fjall-rs/lsm-tree#278
### π Other (4)
- `docs: update MSRV to 1.91` β fjall-rs/lsm-tree#288
- `test: add property tests for range scan` β fjall-rs/lsm-tree#286
- `chore(deps): bump lz4_flex to 0.14` β fjall-rs/lsm-tree#284
- `ci: add aarch64 cross-compilation` β fjall-rs/lsm-tree#2833. Issue/PR linking
Extract #NNN references from commit messages and link them:
git log origin/main..upstream/main --format="%s" | grep -oE '#[0-9]+' | sort -uFor each reference, fetch title from upstream repo API.
4. Fork overlap detection
Check if any upstream changes touch files that our fork patches have modified:
# Files we've changed vs upstream
OUR_FILES=$(git diff origin/main...upstream/main --name-only)
FORK_FILES=$(git log --all --format="" --name-only -- src/ | sort -u)
OVERLAP=$(comm -12 <(echo "$OUR_FILES" | sort) <(echo "$FORK_FILES" | sort))If overlap exists β flag as "needs manual review" even if merge is clean (semantic conflicts possible).
5. Smart PR/issue body
Instead of generic "N new commits", generate a structured body:
## Upstream Sync: v3.1.2 β v3.2.0
**Release:** [v3.2.0](https://github.com/fjall-rs/lsm-tree/releases/tag/v3.2.0)
**Commits:** 47 new commits since last sync
**Breaking changes:** 1 β οΈ
### Changes by category
[categorized list from step 2]
### Fork overlap
These upstream changes touch files our fork has modified:
- `src/compaction/worker.rs` β our merge operator patches may need adaptation
- `src/tree/mod.rs` β our prefix bloom integration
### Review checklist
- [ ] Breaking changes evaluated for fork impact
- [ ] Overlapping files reviewed for semantic conflicts
- [ ] Fork-specific tests pass with upstream changes
- [ ] lsm-tree version in fjall Cargo.toml updatedImplementation
New workflow steps
- name: Analyze upstream changes
id: analyze
run: |
# 1. Detect release
LATEST_TAG=$(git tag -l --sort=-v:refname 'v*' | head -1)
UPSTREAM_TAG=$(git ls-remote --tags upstream | grep -oE 'v[0-9]+\.[0-9]+\.[0-9]+$' | sort -V | tail -1)
# 2. Categorize commits
git log origin/main..upstream/main --format="%s" | while read msg; do
case "$msg" in
feat\!:*|*BREAKING*) echo "breaking: $msg" ;;
feat:*|feat\(*) echo "feature: $msg" ;;
fix:*|fix\(*) echo "fix: $msg" ;;
perf:*|perf\(*) echo "perf: $msg" ;;
*) echo "other: $msg" ;;
esac
done > /tmp/categorized.txt
# 3. Extract issue references
git log origin/main..upstream/main --format="%s %b" | grep -oE '#[0-9]+' | sort -un > /tmp/refs.txt
# 4. Check fork overlap
git diff --name-only origin/main..upstream/main > /tmp/upstream_files.txt
# Compare with our fork-specific changes (commits not in upstream)
git diff --name-only upstream/main..origin/main > /tmp/fork_files.txt
comm -12 <(sort /tmp/upstream_files.txt) <(sort /tmp/fork_files.txt) > /tmp/overlap.txtBackward compatible
- Keep existing auto-merge/conflict behavior
- Add intelligence as additional context in PR body / issue body
- No change to schedule (Mon/Thu 8am UTC)
Acceptance criteria
- Detects new upstream release tags (not just commits)
- Categorizes commits by conventional commit type
- Extracts and resolves upstream issue/PR references (title + URL)
- Detects fork file overlap (potential semantic conflicts)
- Generates structured PR body with all of the above
- Generates structured issue body (conflict case) with all of the above
- Breaking changes highlighted prominently
- Backward compatible with current merge/conflict behavior
Related
- fjall will get the same upgrade (separate issue)
- gitlab-mcp needs upstream-monitor from scratch (separate issue)
- strongswan already has sync-upstream.yml β evaluate if it needs the same intelligence
Time estimate
1d β shared script + workflow updates for lsm-tree (fjall is copy+adapt)
Additional: Add .coderabbit.yaml configuration
Currently no .coderabbit.yaml exists β CodeRabbit uses defaults. Add a proper config to:
- Disable pauses (reviews should never be paused/skipped)
- Set assertive profile (systems code needs thorough review)
- Add path-specific instructions (Rust safety, tests, benchmarks)
- Enable knowledge base with our coding guidelines
- Disable poem (noise in review walkthrough)
- Enable related issues/PRs detection
Proposed .coderabbit.yaml
# yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json
language: "en-US"
early_access: true
reviews:
profile: "assertive"
request_changes_workflow: false
high_level_summary: true
high_level_summary_placeholder: "@coderabbitai summary"
poem: false
review_status: true
collapse_walkthrough: true
commit_status: true
assess_linked_issues: true
related_issues: true
related_prs: true
suggested_labels: true
auto_review:
enabled: true
auto_incremental_review: true
drafts: false
base_branches: ["main"]
ignore_title_keywords: ["WIP", "DO NOT MERGE"]
path_instructions:
- path: "src/**/*.rs"
instructions: |
Review for memory safety, ownership correctness, and panic-free error handling.
No unwrap() on I/O or user-input paths. Prefer Result<T, E> everywhere.
No Box<dyn Any> as type bypass. No global mutable state (lazy_static Mutex<HashMap>).
Check multi-instance safety: no in-memory-only mutable state that breaks with N replicas.
- path: "tests/**"
instructions: |
Verify test coverage of edge cases and error paths.
No unwrap() on paths that test error handling.
Tests must have descriptive names and comments explaining WHAT is being tested.
- path: "benches/**"
instructions: |
Check benchmark methodology. Use criterion properly.
Measure P99/P999 latency, not just throughput.
- path: "src/compaction/**"
instructions: |
Compaction is crash-safety critical. Every state mutation must be atomic.
Verify no partial writes can corrupt on-disk state.
- path: "src/encryption.rs"
instructions: |
Security-critical code. Review for timing attacks, nonce reuse, key handling.
RNG must be CSPRNG. No hardcoded keys or IVs.
tools:
shellcheck:
enabled: true
markdownlint:
enabled: true
github-checks:
enabled: true
timeout_ms: 120000
chat:
auto_reply: true
knowledge_base:
learnings:
scope: auto
issues:
scope: auto
pull_requests:
scope: auto
code_guidelines:
enabled: true
filePatterns:
- ".github/copilot-instructions.md"
- ".github/instructions/*.instructions.md"
issue_enrichment:
labeling:
auto_apply_labels: true
labeling_instructions:
- label: "bug"
instructions: "Code defect, incorrect behavior, crash, data corruption, wrong results"
- label: "enhancement"
instructions: "New feature, new API, new capability not previously available"
- label: "performance"
instructions: "Optimization, reduced allocations, faster path, benchmark improvement"
- label: "refactor"
instructions: "Code restructuring without behavior change β renames, extractions, trait threading"
- label: "test"
instructions: "New tests, test infrastructure, test helpers, flaky test fixes"
- label: "ci"
instructions: "CI/CD workflows, GitHub Actions, benchmarks pipeline, release automation"
- label: "comparator"
instructions: "UserComparator threading, custom key ordering, lexicographic vs comparator-aware"
- label: "compaction"
instructions: "LSM compaction logic, leveled/tiered strategy, L0/L1/L2 overlap, compaction picker"
- label: "crash-safety"
instructions: "Crash recovery, fsync ordering, atomic writes, WAL correctness, data durability"
- label: "encryption"
instructions: "Block encryption, AES-GCM, key management, nonce handling"
- label: "fs-trait"
instructions: "Filesystem abstraction, Fs trait, io_uring, per-level routing, StdFs"
- label: "upstream-candidate"
instructions: "Fix or feature that could be contributed back to fjall-rs upstream"
- label: "fork-only"
instructions: "Feature specific to CoordiNode fork β range tombstones, merge operators, prefix bloom, V4 format"
- label: "upstream-sync"
instructions: "Automated upstream synchronization β merge conflicts, release tracking"Key decisions
| Setting | Value | Why |
|---|---|---|
profile |
assertive | Systems/storage code needs thorough review β "chill" misses too much |
poem |
false | Noise in walkthrough, wastes review tokens |
auto_incremental_review |
true | Review each push, not just first |
drafts |
false | Don't review drafts β they're WIP |
path_filters |
none needed | lsm-tree repo has no dirs to exclude (donor codebases live in coordinode workspace, not here) |
path_instructions |
Rust safety rules | Encode our engineering principles directly into review guidelines |
early_access |
true | Get new CodeRabbit features as they ship |
knowledge_base.code_guidelines |
copilot-instructions.md | CodeRabbit auto-reads this for review context |
commit_status |
true | Block merge until review completes |
assess_linked_issues |
true | Check if PR actually addresses linked issue |
Checklist addition
- Create
.coderabbit.yamlwith config above - Verify CodeRabbit picks it up on next PR (comment
@coderabbitai configurationto confirm)