feat: vibes branch consolidation (48 commits) by bakerboy448 · Pull Request #26 · baker-scripts/RedditModLog

bakerboy448 · 2026-05-10T04:16:01Z

Summary

Consolidation PR for the long-running vibes branch (48 commits ahead of main). Branch was created as experimental playground; latest commits look like real fixes that should land:

TRACE-level logging for Reddit API debugging
Correct database column-name mismatch in force-refresh query
Helper function for config access with default fallback
Wiki update logic — show all relevant actions, not just new ones
Various removal-reason / target-author / multi-subreddit fixes
Docker support + improved systemd configuration

Why this PR exists

Branch never had a PR opened. Reviewer should:

Decide which commits are genuinely additive vs experimental
Squash or cherry-pick a curated subset if 48 commits is too noisy
Close without merge if the work has been superseded

Test plan

Inspect 48-commit log: git log --oneline main..vibes
Cherry-pick or squash a curated subset
Run pre-commit + tests
Smoke-test against a sandbox subreddit before merging to main

🤖 Auto-opened during cross-repo branch consolidation sweep.

Summary by CodeRabbit

New Features
- Added multi-subreddit database support for better content organization.
- Implemented unique content IDs for tracking and deduplication.
- Pre-filled modmail inquiry links for removal discussions.
- Moderator anonymization option in configuration.
- Full removal reason transparency including AutoModerator rules.
- Wiki content caching to reduce unnecessary updates.
- Automatic database migration on startup.
Documentation
- Expanded setup, configuration, and database operation guides.
- Added content linking guidelines and best practices.
Tests
- Added validation for removal reason processing and wiki content generation.

- Add configuration limits and validation system with automatic enforcement - Implement database schema versioning and migration system (v0 -> v2) - Add improved action tracking with display IDs (P1a2b3c format) and metadata - Update wiki table format with new ID column for better content tracking - Add error handling improvements for continuous mode with exponential backoff - Add new CLI options: --show-config-limits, --force-migrate - Update configuration template with new defaults and additional options - Update README with comprehensive documentation for all new features - Replace OOP architecture with simpler functional approach - Add comprehensive database queries for content lifecycle tracking 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

- Auto-updates config file when new defaults are available - Creates backup before making changes (.backup extension) - Preserves existing user settings - Add --no-auto-update-config flag to disable if needed - Update documentation with auto-update information - Handles errors gracefully with fallback to in-memory defaults 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

- Fix timestamp handling for both datetime objects and Unix timestamps - Fix moderator name handling for both string and object types - Fix target attribute handling to prevent 'str' object has no attribute 'name' errors - Add moderator name censoring: AutoMod, Reddit, HumanModerator - Add configurable wiki_actions filter (default: removals and removal reasons only) - Preserve original wiki content formatting 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

…anonymization - Convert inquiry URLs to markdown format ([Inquire](url) instead of plain URLs) - Add configurable moderator anonymization (anonymize_moderators setting) - Store removal reasons from Reddit API in database (new removal_reason column) - Update database schema to version 3 with proper migration - Update CLAUDE.md and README.md with new features and configuration options 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

- Process removal reasons properly by stripping whitespace before storage - Update database query in get_recent_actions_from_db to include removal_reason column - Use actual removal reason content in MockAction instead of hardcoded text - Add test script to verify removal reason processing without Reddit API calls - Fix database schema query to use created_at instead of timestamp for consistency 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

- Cache SHA-256 hashes of wiki content per subreddit/page in database - Skip Reddit API calls when content hasn't changed - Add --force option to bypass hash check when needed - Add v4 database migration for wiki_hash_cache table - Update all wiki update calls to use hash caching 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

- Add subreddit column to database to prevent mixing modlogs from different subreddits - Fix ID generation to never use user IDs, always use post/comment/action IDs - Add email address censoring in removal reasons - Update database schema to version 5 - Fix get_recent_actions_from_db to filter by subreddit - Update store_processed_action to include subreddit context 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

…safety - Fixed removal reasons showing as numbers by checking mod_note field - Fixed display IDs to use permalinks instead of user IDs - Added strict subreddit validation to prevent mixed subreddit wiki corruption - Enhanced target ID extraction to get actual post/comment IDs - Added bot attribution footer to wiki pages - Improved permalink generation for better content linking 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

- Fix case-insensitive subreddit matching in database queries - Add subreddit extraction from permalinks - Simplify content link formatting to match main branch approach - Use actual Reddit permalinks for removed content - Add database update function for missing subreddit entries - Improve safety checks to prevent mixed subreddit data 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

- Force-refresh now fetches all modlog actions from Reddit to populate database - Store ALL action types in database but only show removal actions in wiki - Separate database population from wiki display logic - Ensure comprehensive modlog data collection while maintaining wiki focus Testing with Usenet subreddit 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

- Remove user profile fallback links - NEVER link to user profiles - Prioritize actual content permalinks (posts/comments) over user profiles - Implement proper content ID extraction for markdown ID field - Align modmail inquiry format with main branch (detailed prefilled message) - Add content link guidelines to CLAUDE.md - Fix case sensitivity in subreddit validation 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

…profile links - ID field now shows actual Reddit content IDs (t3_abc123, t1_def456) instead of user-based IDs - Content links only point to actual removed posts/comments, never user profiles - Show '-' for ID field when no content ID available (unlinkable actions) - Remove action ID fallback - only use actual content IDs for linking removals - Enable proper linking of removal actions with their removal reasons via content ID 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

- Use main branch logic for comment/post detection (permalink slash count) - Show actual titles when available - Show 'Comment by u/username' for comments without titles - Show 'Post by u/username' for posts without titles - Only create links for actual content URLs, not user profiles - Maintain main branch content field behavior while adding ID column 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

- Always prioritize mod_note (actual removal reason text) over numeric details - Show 'Removal reason applied' instead of 'Removal reason #7' for numeric details - Improve ID extraction to handle more cases (though many will still be '-' due to Reddit API limitations) - Apply consistent removal reason handling in both storage and display functions 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

- Add ID column back for tracking actions across the table - Fix removal reason prioritization to show text first, not numbers - For addremovalreason actions, use mod_note instead of details - Extract short content IDs from permalinks for table tracking - Ensure AutoModerator shows correctly (not HumanModerator) - Content links point to actual posts/comments, not user profiles 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

- Update README.md with current wiki output format examples - Add section on recent improvements including content linking fixes - Update database query examples to include subreddit column - Document intelligent removal reason handling (text over numbers) - Add multi-subreddit support documentation - Update CLAUDE.md with new database operations and schema info - Document content ID extraction and tracking features 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

- Include content ID in modmail subject line: [ID: 1mkz4jm] - Add Content ID field to modmail body for moderator reference - Enables moderators to quickly cross-reference inquiries with modlog entries - Improves modmail workflow and response efficiency 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

- Replace --force with more intuitive options: - --force-modlog: Fetch ALL from Reddit API and rebuild - --force-wiki: Force wiki update (bypass cache) - --force-all: Do both (replaces old --force) - Update documentation with clear explanations and examples - Add usage guidance for when to use each force option - Maintain backwards compatibility during transition 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

Removes the erroneous error condition that prevented the application from working with databases containing multiple subreddits. The application now properly supports multi-subreddit databases by filtering results by the requested subreddit while providing informational logging about available subreddits. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

Updates the database query to include target_author column and properly populates MockAction objects with actual usernames from the database, preventing the fallback to [deleted] placeholder text. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

…sage Changes action.description to action.details (Reddit API field) and updates logic to only show generic message for removal reason template numbers 1-15, allowing actual removal reason text to display properly. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

Removes logic that hides removal reason template numbers (1-15) behind generic "Removal reason applied" message. Now shows all available removal reason data including template numbers for full transparency. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

- Reorder regex patterns to extract comment IDs before post IDs, fixing duplicate ID issue - Fix display logic to use action.details consistently with storage logic - Ensures removal reasons show properly in wiki output 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

- Update README.md with latest improvements and multi-subreddit support - Update CLAUDE.md with v2.1 improvements summary - Fix --force-wiki to rebuild from database without API calls - Ensure --force-wiki always recreates wiki using existing data 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

- Reorganized sections for clarity and logical flow - Added prominent venv path requirement - Grouped configuration options by category - Enhanced development guidelines section - Improved formatting and structure 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

…tead of template numbers - For addremovalreason actions, use description field which contains actual text like 'Invites - No asking' - Fixes template numbers (6, 9, etc) showing instead of meaningful removal reasons - Maintains existing logic for other action types using mod_note and details fields - Improves removal reason transparency for manual moderator actions 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

- Updated sample wiki output to show actual removal reasons - Enhanced database schema documentation with transparency details - Updated features to highlight complete removal reason transparency - Shows AutoModerator rule text and addremovalreason descriptions - Documents unique content ID tracking improvements 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

- Add missing BASE_BACKOFF_WAIT and MAX_BACKOFF_WAIT constants for exponential backoff - Initialize first_run_force variable in continuous mode function - Add sanitize_for_markdown() helper function to eliminate code duplication - Replace all instances of pipe character replacement with consistent helper function - Improves code maintainability and prevents NameError exceptions 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

- Add get_config_with_default() helper to eliminate code duplication - Replace 7 instances of config.get(key, CONFIG_LIMITS[key]['default']) pattern - Improve code maintainability and consistency - Add validation for unknown config keys - Reduces repetitive nested dictionary access throughout codebase 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

Fixed incorrect variable name 'timestamp' to 'created_at' to match the database schema. This was causing force-wiki operations to fail with "table processed_actions has no column named timestamp" error. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

Added TRACE level (level 5) logging for prawcore and urllib3 when debug mode is enabled, providing detailed Reddit API request/response information for troubleshooting authentication and API issues. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

…ew ones Modified regular operation logic to rebuild wiki from ALL relevant actions in database (within retention period) rather than only new actions, matching the behavior of force operations for consistent wiki display. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

coderabbitai · 2026-05-10T04:16:19Z

Walkthrough

This PR refactors the modlog publisher from object-oriented classes to standalone functions, introducing database schema versioning (v5), wiki content hash caching to skip redundant edits, removal reason transparency with email censoring, multi-subreddit support, enhanced CLI force modes, continuous daemon operation, and expanded documentation with practical examples.

Changes

v2.1 Modlog Publisher Refactor

Layer / File(s)	Summary
Configuration Schema & Validation `config_template.json`, `modlog_wiki_publisher.py`	Configuration template adds wiki pagination, error handling, rate limiting, and display format fields. `load_config()` applies defaults/limits and optionally rewrites config with backup.
Database Schema & Migration `modlog_wiki_publisher.py`	New schema versioning system at v5 with incremental migration functions. `processed_actions` table tracks action metadata (moderator, target, reason, subreddit). `wiki_hash_cache` table stores content hashes for skip-optimization.
Hashing, Sanitization & Helpers `modlog_wiki_publisher.py`	SHA-256 content hashing, email/markdown sanitization, moderator anonymization, target/subreddit extraction, display ID generation, and permalink parsing.
Modlog Data Storage & Retrieval `modlog_wiki_publisher.py`	`store_processed_action()` saves all fetched actions with metadata. `get_recent_actions_from_db()` retrieves filtered/limited actions. `cleanup_old_entries()` and `update_missing_subreddits()` maintain database.
Wiki Content Generation `modlog_wiki_publisher.py`, `README.md`	`format_modlog_entry()` builds table row fields (time, action, moderator, content link, reason, modmail). `build_wiki_content()` groups by date, enforces limits, adds footer. Sample output updated with ID column and inquiry links.
Reddit Client & Wiki Caching `modlog_wiki_publisher.py`	PRAW client initialization. `update_wiki_page()` uses content hash comparison to skip edits when unchanged, unless `force=True`. Hashes cached in `wiki_hash_cache` table.
Modlog Processing & Deduplication `modlog_wiki_publisher.py`	`process_modlog_actions()` fetches via Reddit API, validates batch size, checks duplicates via database, stores all actions, filters to configured `wiki_actions` types.
CLI & Argument Parsing `modlog_wiki_publisher.py`, `README.md`	New force modes (`--force-modlog`, `--force-wiki`, `--force-all`, `--force-migrate`), continuous mode, test mode, debug logging. `show_config_limits()` displays validation constraints. CLI reference and force command examples added to README.
Continuous Operation Mode `modlog_wiki_publisher.py`, `README.md`	`run_continuous_mode()` implements infinite loop with periodic updates, retention cleanup, configurable intervals, and exponential backoff on errors. Force commands explained with examples.
Developer & User Documentation `CLAUDE.md`, `README.md`	CLAUDE.md adds venv Python requirement, updated command examples, database query samples, content link guidelines, v2.1 improvements, development standards. README.md expands features, CLI reference with min/max constraints, config validation documentation, and database schema details.
Removal Reason Processing Tests `test_removal_reasons.py`	New test script with `MockRedditAction` class. Tests database storage, retrieval, and wiki content generation without Reddit API calls. Validates removal reason text, whitespace trimming, and markdown output.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

baker-scripts/RedditModLog#13: Both PRs modify CLAUDE.md content and structure, creating potential for conflicts and requiring careful review of documentation changes.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The description acknowledges the PR is a consolidation of experimental work and explicitly asks the reviewer to decide whether to merge, cherry-pick, or close—indicating the work may be incomplete or superseded, violating standard merge readiness expectations.	Either commit to the changes with clear rationale for all 48 commits, or close the PR and submit curated feature branches individually with concrete testing evidence.
Title check	❓ Inconclusive	The title 'feat: vibes branch consolidation (48 commits)' is vague and non-descriptive, using generic language that doesn't convey the actual technical changes being introduced to reviewers scanning the history.	Replace with a specific summary of key changes (e.g., 'feat: refactor modlog processor, add database migrations, and improve wiki/config handling') to clearly communicate the main improvements.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch vibes

⚔️ Resolve merge conflicts

Resolve merge conflict in branch vibes

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

README.md (1)
37-40: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Wrong template filename — config.template.json does not exist; it's config_template.json.

The repo ships config_template.json (underscore), as referenced in CLAUDE.md line 25. The dotted form here will give first-time users a cp: cannot stat 'config.template.json' error.
📝 Proposed fix
-cp config.template.json config.json
+cp config_template.json config.json
As per coding guidelines: "Always update commands and flags in documentation".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@README.md` around lines 37 - 40, The README has the wrong template filename;
replace the cp command that references "config.template.json" with the correct
"config_template.json" (underscore) so the copy command succeeds and remains
consistent with CLAUDE.md; search for any other occurrences of
"config.template.json" and update them to "config_template.json" to keep
documentation consistent.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@CLAUDE.md`:
- Around line 65-71: Update the documented default flag values in the Core
Options section to match the runtime defaults defined by CONFIG_LIMITS in
modlog_wiki_publisher.py: change `--retention-days` default from 30 to 90,
`--batch-size` from 100 to 50, and `--interval` from 300 to 600 (leave
`--wiki-page` default "modlog" and other flags as-is); ensure the CLI docs in
this section now align with the values used by CONFIG_LIMITS and the README
table.

In `@modlog_wiki_publisher.py`:
- Around line 935-948: When cached_hash equals content_hash in update_wiki_page,
change the force=True path to use a warning-level log and, if running
interactively (check sys.stdin.isatty()), prompt the user to confirm the forced
update before proceeding; if the prompt is declined, return False. Keep the
non-force path unchanged (skip update). Refer to update_wiki_page,
get_content_hash, get_cached_wiki_hash and logger when making the change and
gate interactive confirmation on stdin being a TTY.
- Around line 1240-1250: CLI overrides currently write args
(args.retention_days, args.batch_size, args.interval) directly into the config
dict, bypassing CONFIG_LIMITS validation; after applying the CLI overrides to
config, call apply_config_defaults_and_limits(config) (or individually run
validate_config_value for each overridden key) so the values are
clamped/validated per CONFIG_LIMITS before the config is used by
load_config/SQL/PRAW.
- Around line 306-312: In censor_email_addresses, remove the redundant local
"import re" and correct the regex so the TLD character class doesn't include a
literal pipe; replace the pattern
r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b' with a proper
case-insensitive / letter-only TLD class (e.g. use [A-Za-z]{2,} or apply
re.IGNORECASE) in the re.sub call so the function only matches valid alphabetic
TLDs.
- Around line 414-448: The function get_target_permalink currently uses a bare
except: pass which swallows KeyboardInterrupt/SystemExit and programming errors;
replace the bare except with a specific Exception handler (e.g. except Exception
as e) and log the error (use your module logger or logging.exception) including
context (e.g. action id/attributes) before returning None; ensure you do not
catch SystemExit/KeyboardInterrupt (either re-raise them or only catch
Exception) and remove the silent pass so issues in building permalinks from
action.target_submission, action.target_comment or action.target_permalink are
visible in logs.
- Around line 478-543: The code lazily adds the target_author column on every
write (store_processed_action) which causes schema races and breaks fresh DB
reads; instead, bump CURRENT_DB_VERSION to 6 and add a migration in
migrate_database() that runs "ALTER TABLE processed_actions ADD COLUMN
target_author TEXT" for v5→v6 upgrades so the schema is deterministic. Remove
the runtime PRAGMA/ALTER logic in store_processed_action (the
cursor.execute("PRAGMA table_info(processed_actions)") and the conditional block
that adds target_author) and assume the column exists when inserting; keep the
extraction logic that sets the target_author value and write it as before.
- Around line 577-600: In cleanup_old_entries replace the convoluted timestamp
calculation int((datetime.now() - datetime.fromtimestamp(0)).total_seconds())
with int(time.time()) and remove the dead branch that checks if retention_days
<= 0 (since validate_config_value already guarantees retention_days >= 1); also
make the identical change in get_recent_actions_from_db so both use
int(time.time()) when computing cutoff_timestamp, and keep references to
DB_PATH, processed_actions, and created_at unchanged.

In `@README.md`:
- Around line 200-202: The README's example SQL uses LIKE '%[0-9]%' which treats
'[0-9]' as a literal string in SQLite; change the filter to use GLOB (or a
regexp if available) so it truly checks for digits. Replace the condition
"removal_reason NOT LIKE '%[0-9]%'" with "removal_reason NOT GLOB '*[0-9]*'"
(keeping the other clause removal_reason != 'remove' and the same SELECT/WHERE/
LIMIT structure on processed_actions.removal_reason) so the query excludes any
removal_reason containing digits as intended.

In `@test_removal_reasons.py`:
- Around line 14-57: The test currently imports DB_PATH by value via "from
modlog_wiki_publisher import *" so reassigning local global DB_PATH doesn't
affect the publisher functions; change the import to import the module (e.g.,
import modlog_wiki_publisher) and set modlog_wiki_publisher.DB_PATH = test_db
before calling setup_database(), store_processed_action(),
get_recent_actions_from_db(), etc., and update all calls to use
modlog_wiki_publisher.setup_database,
modlog_wiki_publisher.store_processed_action,
modlog_wiki_publisher.get_recent_actions_from_db (or ensure those functions
still reference the module-level DB_PATH) so the test operates against the test
database rather than the production DB.

---

Outside diff comments:
In `@README.md`:
- Around line 37-40: The README has the wrong template filename; replace the cp
command that references "config.template.json" with the correct
"config_template.json" (underscore) so the copy command succeeds and remains
consistent with CLAUDE.md; search for any other occurrences of
"config.template.json" and update them to "config_template.json" to keep
documentation consistent.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2f48b77a-32cf-4bfd-b626-894f6921bd03

📥 Commits

Reviewing files that changed from the base of the PR and between 7643db0 and a56a858.

📒 Files selected for processing (5)

CLAUDE.md
README.md
config_template.json
modlog_wiki_publisher.py
test_removal_reasons.py

coderabbitai · 2026-05-10T04:23:36Z

+### Core Options
 - `--source-subreddit`: Target subreddit for reading/writing logs
 - `--wiki-page`: Wiki page name (default: "modlog")
 - `--retention-days`: Database cleanup period (default: 30)
 - `--batch-size`: Entries fetched per run (default: 100)
 - `--interval`: Seconds between updates in daemon mode (default: 300)
 - `--debug`: Enable verbose logging


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Documented defaults are stale and disagree with code/README.

CONFIG_LIMITS in modlog_wiki_publisher.py sets retention_days=90, batch_size=50, update_interval=600, and the table in README.md (lines 86–88) reflects those. The values listed here (30, 100, 300) match the pre-refactor defaults and will mislead operators.

📝 Proposed fix

- `--source-subreddit`: Target subreddit for reading/writing logs - `--wiki-page`: Wiki page name (default: "modlog") -- `--retention-days`: Database cleanup period (default: 30) -- `--batch-size`: Entries fetched per run (default: 100) -- `--interval`: Seconds between updates in daemon mode (default: 300) +- `--retention-days`: Database cleanup period (default: 90) +- `--batch-size`: Entries fetched per run (default: 50) +- `--interval`: Seconds between updates in daemon mode (default: 600) - `--debug`: Enable verbose logging

As per coding guidelines: "Always update commands and flags in documentation".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

### Core Options

- `--source-subreddit`: Target subreddit for reading/writing logs

- `--wiki-page`: Wiki page name (default: "modlog")

- `--retention-days`: Database cleanup period (default: 30)

- `--batch-size`: Entries fetched per run (default: 100)

- `--interval`: Seconds between updates in daemon mode (default: 300)

- `--debug`: Enable verbose logging

### Core Options

- `--source-subreddit`: Target subreddit for reading/writing logs

- `--wiki-page`: Wiki page name (default: "modlog")

- `--retention-days`: Database cleanup period (default: 90)

- `--batch-size`: Entries fetched per run (default: 50)

- `--interval`: Seconds between updates in daemon mode (default: 600)

- `--debug`: Enable verbose logging

🧰 Tools

🪛 markdownlint-cli2 (0.22.1)

[warning] 65-65: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@CLAUDE.md` around lines 65 - 71, Update the documented default flag values in the Core Options section to match the runtime defaults defined by CONFIG_LIMITS in modlog_wiki_publisher.py: change `--retention-days` default from 30 to 90, `--batch-size` from 100 to 50, and `--interval` from 300 to 600 (leave `--wiki-page` default "modlog" and other flags as-is); ensure the CLI docs in this section now align with the values used by CONFIG_LIMITS and the README table.

coderabbitai · 2026-05-10T04:23:36Z

+def censor_email_addresses(text):
+    """Censor email addresses in removal reasons"""
+    if not text:
+        return text
+    import re
+    # Replace email addresses with [EMAIL]
+    return re.sub(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', '[EMAIL]', text)


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Email regex includes a literal | in the TLD class.

[A-Z|a-z]{2,} is a character class that allows A-Z, the literal |, and a-z. It happens to still match real emails (because real TLDs are letters), but the intent is [A-Za-z] and the current form will also accept noise like foo@bar.||a as a TLD-tail. Also, re is already imported at module level — the import re inside the function is redundant.

🛡️ Proposed fix

-def censor_email_addresses(text): - """Censor email addresses in removal reasons""" - if not text: - return text - import re - # Replace email addresses with [EMAIL] - return re.sub(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', '[EMAIL]', text) +def censor_email_addresses(text): + """Censor email addresses in removal reasons""" + if not text: + return text + return re.sub(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b', '[EMAIL]', text)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@modlog_wiki_publisher.py` around lines 306 - 312, In censor_email_addresses, remove the redundant local "import re" and correct the regex so the TLD character class doesn't include a literal pipe; replace the pattern r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b' with a proper case-insensitive / letter-only TLD class (e.g. use [A-Za-z]{2,} or apply re.IGNORECASE) in the re.sub call so the function only matches valid alphabetic TLDs.

coderabbitai · 2026-05-10T04:23:36Z

+def get_target_permalink(action):
+    """Get permalink for the target content - prioritize actual content over user profiles"""
+    # Check if we have a cached permalink from database
+    if hasattr(action, 'target_permalink_cached') and action.target_permalink_cached:
+        return action.target_permalink_cached
+
+    try:
+        # Priority 1: get actual post/comment permalinks from Reddit API
+        if hasattr(action, 'target_submission') and action.target_submission:
+            if hasattr(action.target_submission, 'permalink'):
+                return f"https://reddit.com{action.target_submission.permalink}"
+            elif hasattr(action.target_submission, 'id'):
+                # Construct permalink from submission ID
+                return f"https://reddit.com/comments/{action.target_submission.id}/"
+        elif hasattr(action, 'target_comment') and action.target_comment:
+            if hasattr(action.target_comment, 'permalink'):
+                return f"https://reddit.com{action.target_comment.permalink}"
+            elif hasattr(action.target_comment, 'id') and hasattr(action.target_comment, 'submission'):
+                # For comments, construct proper permalink with submission ID
+                return f"https://reddit.com/comments/{action.target_comment.submission.id}/_/{action.target_comment.id}/"
+            elif hasattr(action.target_comment, 'id'):
+                # Fallback for comment without submission info
+                return f"https://reddit.com/comments/{action.target_comment.id}/"

-        return result
-
-    def fetch_modlog_entries(self, limit: int = 100) -> List[Dict]:
-        """Fetch and process modlog entries with rate limit handling"""
-        subreddit = self.reddit.subreddit(self.config['source_subreddit'])
-        sub_logger = self.get_subreddit_logger(self.config['source_subreddit'])
-        entries = []
-
-        sub_logger.info("Starting to fetch modlog entries, limit: %s", limit)
-        try:
-            for entry in subreddit.mod.log(limit=limit):
-                try:
-                    processed = self._process_modlog_entry(entry)
-                    if processed:
-                        processed['subreddit'] = subreddit.display_name
-                        entries.append(processed)
-                        sub_logger.debug("Processed entry: %s [%s] by %s", 
-                                       processed['id'], processed['action_type'], processed['moderator'])
-                        # Mark as processed
-                        self.db.mark_processed(
-                            processed['id'],
-                            processed['action_type'],
-                            processed['timestamp']
-                        )
-                        self.db.store_entry(processed)
-                except praw.exceptions.APIException as e:
-                    if e.error_type == "RATELIMIT":
-                        # Extract wait time from message
-                        import re
-                        match = re.search(r'(\d+) minute', str(e))
-                        wait_time = int(match.group(1)) * 60 if match else 60
-                        sub_logger.warning("Rate limited, waiting %s seconds", wait_time)
-                        time.sleep(wait_time)
-                    else:
-                        raise
-
-            # Sort by timestamp (newest first)
-            entries.sort(key=lambda x: x['timestamp'], reverse=True)
-            sub_logger.info("Successfully fetched %s modlog entries", len(entries))
+        # Priority 2: Try to get content permalink from action.target_permalink if it's not a user profile
+        if hasattr(action, 'target_permalink') and action.target_permalink:
+            permalink = action.target_permalink
+            # Only use if it's actual content (contains /comments/) not user profile (/u/)
+            if '/comments/' in permalink and '/u/' not in permalink:
+                return f"https://reddit.com{permalink}" if not permalink.startswith('http') else permalink
+
+        # NEVER fall back to user profiles - only link to actual content
+    except:
+        pass
+    return None


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Bare except: plus try/except/pass silently swallows everything (including KeyboardInterrupt).

get_target_permalink wraps the whole permalink-building block in a bare except: with a pass, which:

Catches KeyboardInterrupt/SystemExit and hides them.

Eats programming errors (e.g., a typo in an attribute name) so you'll get silent None permalinks instead of a fixable stack trace.

Narrow the catch and at least log it, especially given the coding guideline to "Check for proper error handling and logging".

🛡️ Proposed fix

- # NEVER fall back to user profiles - only link to actual content - except: - pass - return None + # NEVER fall back to user profiles - only link to actual content + except AttributeError as e: + logger.debug("Could not resolve permalink for action %s: %s", getattr(action, 'id', '?'), e) + return None

As per coding guidelines: "Check for proper error handling and logging."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def get_target_permalink(action):

"""Get permalink for the target content - prioritize actual content over user profiles"""

# Check if we have a cached permalink from database

if hasattr(action, 'target_permalink_cached') and action.target_permalink_cached:

return action.target_permalink_cached

try:

# Priority 1: get actual post/comment permalinks from Reddit API

if hasattr(action, 'target_submission') and action.target_submission:

if hasattr(action.target_submission, 'permalink'):

return f"https://reddit.com{action.target_submission.permalink}"

elif hasattr(action.target_submission, 'id'):

# Construct permalink from submission ID

return f"https://reddit.com/comments/{action.target_submission.id}/"

elif hasattr(action, 'target_comment') and action.target_comment:

if hasattr(action.target_comment, 'permalink'):

return f"https://reddit.com{action.target_comment.permalink}"

elif hasattr(action.target_comment, 'id') and hasattr(action.target_comment, 'submission'):

# For comments, construct proper permalink with submission ID

return f"https://reddit.com/comments/{action.target_comment.submission.id}/_/{action.target_comment.id}/"

elif hasattr(action.target_comment, 'id'):

# Fallback for comment without submission info

return f"https://reddit.com/comments/{action.target_comment.id}/"

return result

def fetch_modlog_entries(self, limit: int = 100) -> List[Dict]:

"""Fetch and process modlog entries with rate limit handling"""

subreddit = self.reddit.subreddit(self.config['source_subreddit'])

sub_logger = self.get_subreddit_logger(self.config['source_subreddit'])

entries = []

sub_logger.info("Starting to fetch modlog entries, limit: %s", limit)

try:

for entry in subreddit.mod.log(limit=limit):

try:

processed = self._process_modlog_entry(entry)

if processed:

processed['subreddit'] = subreddit.display_name

entries.append(processed)

sub_logger.debug("Processed entry: %s [%s] by %s",

processed['id'], processed['action_type'], processed['moderator'])

# Mark as processed

self.db.mark_processed(

processed['id'],

processed['action_type'],

processed['timestamp']

)

self.db.store_entry(processed)

except praw.exceptions.APIException as e:

if e.error_type == "RATELIMIT":

# Extract wait time from message

import re

match = re.search(r'(\d+) minute', str(e))

wait_time = int(match.group(1)) * 60 if match else 60

sub_logger.warning("Rate limited, waiting %s seconds", wait_time)

time.sleep(wait_time)

else:

raise

# Sort by timestamp (newest first)

entries.sort(key=lambda x: x['timestamp'], reverse=True)

sub_logger.info("Successfully fetched %s modlog entries", len(entries))

# Priority 2: Try to get content permalink from action.target_permalink if it's not a user profile

if hasattr(action, 'target_permalink') and action.target_permalink:

permalink = action.target_permalink

# Only use if it's actual content (contains /comments/) not user profile (/u/)

if '/comments/' in permalink and '/u/' not in permalink:

return f"https://reddit.com{permalink}" if not permalink.startswith('http') else permalink

# NEVER fall back to user profiles - only link to actual content

except:

pass

return None

def get_target_permalink(action):

"""Get permalink for the target content - prioritize actual content over user profiles"""

# Check if we have a cached permalink from database

if hasattr(action, 'target_permalink_cached') and action.target_permalink_cached:

return action.target_permalink_cached

try:

# Priority 1: get actual post/comment permalinks from Reddit API

if hasattr(action, 'target_submission') and action.target_submission:

if hasattr(action.target_submission, 'permalink'):

return f"https://reddit.com{action.target_submission.permalink}"

elif hasattr(action.target_submission, 'id'):

# Construct permalink from submission ID

return f"https://reddit.com/comments/{action.target_submission.id}/"

elif hasattr(action, 'target_comment') and action.target_comment:

if hasattr(action.target_comment, 'permalink'):

return f"https://reddit.com{action.target_comment.permalink}"

elif hasattr(action.target_comment, 'id') and hasattr(action.target_comment, 'submission'):

# For comments, construct proper permalink with submission ID

return f"https://reddit.com/comments/{action.target_comment.submission.id}/_/{action.target_comment.id}/"

elif hasattr(action.target_comment, 'id'):

# Fallback for comment without submission info

return f"https://reddit.com/comments/{action.target_comment.id}/"

# Priority 2: Try to get content permalink from action.target_permalink if it's not a user profile

if hasattr(action, 'target_permalink') and action.target_permalink:

permalink = action.target_permalink

# Only use if it's actual content (contains /comments/) not user profile (/u/)

if '/comments/' in permalink and '/u/' not in permalink:

return f"https://reddit.com{permalink}" if not permalink.startswith('http') else permalink

# NEVER fall back to user profiles - only link to actual content

except AttributeError as e:

logger.debug("Could not resolve permalink for action %s: %s", getattr(action, 'id', '?'), e)

return None

🧰 Tools

🪛 Ruff (0.15.12)

[warning] 414-414: Too many return statements (8 > 6)

(PLR0911)

[warning] 425-425: Unnecessary elif after return statement

Remove unnecessary elif

(RET505)

[warning] 431-431: Unnecessary elif after return statement

Remove unnecessary elif

(RET505)

[error] 446-446: Do not use bare except

(E722)

[error] 446-447: try-except-pass detected, consider logging the exception

(S110)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@modlog_wiki_publisher.py` around lines 414 - 448, The function get_target_permalink currently uses a bare except: pass which swallows KeyboardInterrupt/SystemExit and programming errors; replace the bare except with a specific Exception handler (e.g. except Exception as e) and log the error (use your module logger or logging.exception) including context (e.g. action id/attributes) before returning None; ensure you do not catch SystemExit/KeyboardInterrupt (either re-raise them or only catch Exception) and remove the silent pass so issues in building permalinks from action.target_submission, action.target_comment or action.target_permalink are visible in logs.

coderabbitai · 2026-05-10T04:23:36Z

+def store_processed_action(action, subreddit_name=None):
+    """Store processed action to prevent duplicates"""
+    try:
+        conn = sqlite3.connect(DB_PATH)
+        cursor = conn.cursor()

-        # Format title with URL
-        if entry['url']:
-            title = f"[{entry['title']}]({entry['url']})"
-        else:
-            title = f"{entry['title']}"
-
-        # Format removal reason
-        reason = entry['removal_reason'] or entry['note'] or '-'
+        # Process removal reason properly - ALWAYS prefer actual text over numeric details
+        removal_reason = None

-        # Format inquire link
-        if entry['modmail_url']:
-            inquire = f"[Contact Mods]({entry['modmail_url']})"
-        else:
-            inquire = '-'
-
-        # Format time
-        time_str = self._format_timestamp(entry['timestamp'])
-        return f"| {time_str} | {action} | {moderator} | {title} | {reason} | {inquire} |"
+        # For addremovalreason actions, use description field (contains actual text)
+        if action.action == 'addremovalreason' and hasattr(action, 'description') and action.description:
+            removal_reason = censor_email_addresses(str(action.description).strip())
+        # First priority: mod_note (actual removal reason text)
+        elif hasattr(action, 'mod_note') and action.mod_note:
+            removal_reason = censor_email_addresses(str(action.mod_note).strip())
+        # Second priority: details (accept ALL details text, including numbers)
+        elif hasattr(action, 'details') and action.details:
+            details_str = str(action.details).strip()
+            removal_reason = censor_email_addresses(details_str)
+
+        # Extract subreddit from URL if not provided
+        target_permalink = get_target_permalink(action)
+        if not subreddit_name and target_permalink:
+            subreddit_name = extract_subreddit_from_permalink(target_permalink)
+
+        # Add subreddit column if it doesn't exist
+        cursor.execute("PRAGMA table_info(processed_actions)")
+        columns = [row[1] for row in cursor.fetchall()]
+        if 'subreddit' not in columns:
+            cursor.execute("ALTER TABLE processed_actions ADD COLUMN subreddit TEXT")
+
+        # Add target_author column if it doesn't exist
+        if 'target_author' not in columns:
+            cursor.execute("ALTER TABLE processed_actions ADD COLUMN target_author TEXT")
+
+        # Extract target author
+        target_author = None
+        if hasattr(action, 'target_author') and action.target_author:
+            if hasattr(action.target_author, 'name'):
+                target_author = action.target_author.name
+            else:
+                target_author = str(action.target_author)
+
+        cursor.execute("""
+            INSERT OR REPLACE INTO processed_actions 
+            (action_id, action_type, moderator, target_id, target_type, 
+             display_id, target_permalink, removal_reason, target_author, created_at, subreddit) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
+        """, (
+            action.id,
+            action.action,
+            get_moderator_name(action, False),  # Store actual name in database
+            extract_target_id(action),
+            get_target_type(action),
+            generate_display_id(action),
+            target_permalink,
+            sanitize_for_markdown(removal_reason),  # Store properly processed removal reason
+            target_author,
+            int(action.created_utc) if isinstance(action.created_utc, (int, float)) else int(action.created_utc.timestamp()),
+            subreddit_name or 'unknown'
+        ))
+
+        conn.commit()
+        conn.close()
+    except Exception as e:
+        logger.error(f"Error storing processed action: {e}")
+        raise


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

target_author is added via lazy ALTER TABLE on every write — move it into migrate_database.

store_processed_action does a PRAGMA table_info + conditional ALTER TABLE ... ADD COLUMN on every call (lines 504–511). Two concrete problems:

target_author is not present in any of the v0→v5 migrations in migrate_database(). New deployments on a fresh DB only get this column because the first call to store_processed_action adds it. Anything that reads the schema before the first write (e.g., a query tool, or get_recent_actions_from_db if no actions have been stored yet) will see a column-less schema and fail. This also means update_missing_subreddits() (called from setup_database) is the only deterministic path establishing the schema, and it doesn't add target_author.

Doing PRAGMA + ALTER per-row is needless overhead on the hot path and creates a schema-mutation race when multiple processes write concurrently.

Bump CURRENT_DB_VERSION to 6 and add a real migration that adds target_author (and removes the runtime ALTER from store_processed_action).

♻️ Proposed fix (sketch)

-CURRENT_DB_VERSION = 5 +CURRENT_DB_VERSION = 6 @@ + # Migration from version 5 to 6: Add target_author column + if current_version < 6: + logger.info("Applying migration: Add target_author column (v5 -> v6)") + cursor.execute("PRAGMA table_info(processed_actions)") + existing_columns = [row[1] for row in cursor.fetchall()] + if 'target_author' not in existing_columns: + cursor.execute("ALTER TABLE processed_actions ADD COLUMN target_author TEXT") + logger.info("Added column: target_author") + set_db_version(6) @@ - # Add subreddit column if it doesn't exist - cursor.execute("PRAGMA table_info(processed_actions)") - columns = [row[1] for row in cursor.fetchall()] - if 'subreddit' not in columns: - cursor.execute("ALTER TABLE processed_actions ADD COLUMN subreddit TEXT") - - # Add target_author column if it doesn't exist - if 'target_author' not in columns: - cursor.execute("ALTER TABLE processed_actions ADD COLUMN target_author TEXT") - # Extract target author

🧰 Tools

🪛 Ruff (0.15.12)

[warning] 528-528: Boolean positional value in function call

(FBT003)

[warning] 542-542: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

[warning] 542-542: Logging statement uses f-string

(G004)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@modlog_wiki_publisher.py` around lines 478 - 543, The code lazily adds the target_author column on every write (store_processed_action) which causes schema races and breaks fresh DB reads; instead, bump CURRENT_DB_VERSION to 6 and add a migration in migrate_database() that runs "ALTER TABLE processed_actions ADD COLUMN target_author TEXT" for v5→v6 upgrades so the schema is deterministic. Remove the runtime PRAGMA/ALTER logic in store_processed_action (the cursor.execute("PRAGMA table_info(processed_actions)") and the conditional block that adds target_author) and assume the column exists when inserting; keep the extraction logic that sets the target_author value and write it as before.

coderabbitai · 2026-05-10T04:23:36Z

+def cleanup_old_entries(retention_days: int):
+    """Remove entries older than retention_days"""
+    if retention_days <= 0:
+        retention_days = CONFIG_LIMITS['retention_days']['default']  # No config object available here
+
+    try:
+        conn = sqlite3.connect(DB_PATH)
+        cursor = conn.cursor()

-        try:
-            subreddit = self.reddit.subreddit(target_sub)
-            wiki_page = self.config.get('wiki_page', 'modlog')
-
-            sub_logger.info("Updating wiki page: /r/%s/wiki/%s", target_sub, wiki_page)
+        cutoff_timestamp = int((datetime.now() - datetime.fromtimestamp(0)).total_seconds()) - (retention_days * 86400)
+
+        cursor.execute(
+            "DELETE FROM processed_actions WHERE created_at < ?",
+            (cutoff_timestamp,)
+        )
+
+        deleted_count = cursor.rowcount
+        conn.commit()
+        conn.close()
+
+        if deleted_count > 0:
+            logger.info(f"Cleaned up {deleted_count} old entries")
+    except Exception as e:
+        logger.error(f"Error during cleanup: {e}")


🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Convoluted "now as Unix timestamp" expression and dead branch.

int((datetime.now() - datetime.fromtimestamp(0)).total_seconds()) is just int(time.time()) written the long way around (it relies on the local-timezone offset cancelling between both naive datetimes, which is fragile and trips DTZ005/DTZ006 in static analysis). Also, validate_config_value already enforces retention_days >= 1, so the if retention_days <= 0 branch is dead code in normal flow. Same expression is duplicated at line 626 in get_recent_actions_from_db.

♻️ Proposed simplification

-def cleanup_old_entries(retention_days: int): - """Remove entries older than retention_days""" - if retention_days <= 0: - retention_days = CONFIG_LIMITS['retention_days']['default'] # No config object available here - - try: - conn = sqlite3.connect(DB_PATH) - cursor = conn.cursor() - - cutoff_timestamp = int((datetime.now() - datetime.fromtimestamp(0)).total_seconds()) - (retention_days * 86400) +def cleanup_old_entries(retention_days: int): + """Remove entries older than retention_days""" + retention_days = validate_config_value('retention_days', retention_days, CONFIG_LIMITS) + + try: + conn = sqlite3.connect(DB_PATH) + cursor = conn.cursor() + + cutoff_timestamp = int(time.time()) - (retention_days * 86400)

(Apply the same int(time.time()) change at line 626.)

🧰 Tools

🪛 Ruff (0.15.12)

[warning] 586-586: datetime.datetime.now() called without a tz argument

(DTZ005)

[warning] 586-586: datetime.datetime.fromtimestamp() called without a tz argument

(DTZ006)

[warning] 598-598: Logging statement uses f-string

(G004)

[warning] 599-599: Do not catch blind exception: Exception

(BLE001)

[warning] 600-600: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

[warning] 600-600: Logging statement uses f-string

(G004)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@modlog_wiki_publisher.py` around lines 577 - 600, In cleanup_old_entries replace the convoluted timestamp calculation int((datetime.now() - datetime.fromtimestamp(0)).total_seconds()) with int(time.time()) and remove the dead branch that checks if retention_days <= 0 (since validate_config_value already guarantees retention_days >= 1); also make the identical change in get_recent_actions_from_db so both use int(time.time()) when computing cutoff_timestamp, and keep references to DB_PATH, processed_actions, and created_at unchanged.

coderabbitai · 2026-05-10T04:23:36Z

+def update_wiki_page(reddit, subreddit_name: str, wiki_page: str, content: str, force: bool = False):
+    """Update wiki page with content, using hash caching to avoid unnecessary updates"""
+    try:
+        # Calculate content hash
+        content_hash = get_content_hash(content)
+
+        # Check if content has changed (unless forced)
+        cached_hash = get_cached_wiki_hash(subreddit_name, wiki_page)
+        if cached_hash == content_hash:
+            if force:
+                logger.info(f"Wiki content unchanged, but you selected force for /r/{subreddit_name}/wiki/{wiki_page}, forcing update")
+            else:
+                logger.info(f"Wiki content unchanged for /r/{subreddit_name}/wiki/{wiki_page}, skipping update")
+                return False


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

force=True flow doesn't match the coding guideline ("warn if same, interactively ask to force refresh").

The repository guideline for this file is: "Confirm cache file of wiki page and warn if same, interactively ask to force refresh." Today, when force=True and the cached hash equals the new hash, the code only logs an info line (line 945) and proceeds to push the same content to Reddit unconditionally. There is no interactive confirmation prompt and no warning level log; CLI flags like --force-wiki go straight to a (rate-limit-costing) wiki edit even when the content is byte-identical.

If non-interactive use is required (daemon mode), gate the prompt on stdin being a TTY and at minimum upgrade the message to logger.warning(...).

As per coding guidelines: "Confirm cache file of wiki page and warn if same, interactively ask to force refresh."

🧰 Tools

🪛 Ruff (0.15.12)

[warning] 935-935: Boolean-typed positional argument in function definition

(FBT001)

[warning] 935-935: Boolean default positional argument in function definition

(FBT002)

[warning] 945-945: Logging statement uses f-string

(G004)

[warning] 947-947: Logging statement uses f-string

(G004)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@modlog_wiki_publisher.py` around lines 935 - 948, When cached_hash equals content_hash in update_wiki_page, change the force=True path to use a warning-level log and, if running interactively (check sys.stdin.isatty()), prompt the user to confirm the forced update before proceeding; if the prompt is declined, return False. Keep the non-force path unchanged (skip update). Refer to update_wiki_page, get_content_hash, get_cached_wiki_hash and logger when making the change and gate interactive confirmation on stdin being a TTY.

coderabbitai · 2026-05-10T04:23:36Z

+        # Override config with CLI args
+        if args.source_subreddit:
+            config['source_subreddit'] = args.source_subreddit
+        if args.wiki_page:
+            config['wiki_page'] = args.wiki_page
+        if args.retention_days is not None:
+            config['retention_days'] = args.retention_days
+        if args.batch_size is not None:
+            config['batch_size'] = args.batch_size
+        if args.interval is not None:
+            config['update_interval'] = args.interval


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

CLI overrides bypass CONFIG_LIMITS validation.

load_config runs values through apply_config_defaults_and_limits / validate_config_value, but the CLI overrides directly assign user-supplied values (--retention-days, --batch-size, --interval) into config afterwards. A user passing --retention-days 999999 or --batch-size 0 will silently bypass the published min/max and reach SQL/PRAW with out-of-range inputs.

Run the overrides through validate_config_value (or call apply_config_defaults_and_limits(config) once after the overrides):

🛡️ Proposed fix

# Override config with CLI args if args.source_subreddit: config['source_subreddit'] = args.source_subreddit if args.wiki_page: config['wiki_page'] = args.wiki_page if args.retention_days is not None: - config['retention_days'] = args.retention_days + config['retention_days'] = validate_config_value('retention_days', args.retention_days, CONFIG_LIMITS) if args.batch_size is not None: - config['batch_size'] = args.batch_size + config['batch_size'] = validate_config_value('batch_size', args.batch_size, CONFIG_LIMITS) if args.interval is not None: - config['update_interval'] = args.interval + config['update_interval'] = validate_config_value('update_interval', args.interval, CONFIG_LIMITS)

As per coding guidelines: "Review for security issues — validate all user inputs."

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@modlog_wiki_publisher.py` around lines 1240 - 1250, CLI overrides currently write args (args.retention_days, args.batch_size, args.interval) directly into the config dict, bypassing CONFIG_LIMITS validation; after applying the CLI overrides to config, call apply_config_defaults_and_limits(config) (or individually run validate_config_value for each overridden key) so the values are clamped/validated per CONFIG_LIMITS before the config is used by load_config/SQL/PRAW.

coderabbitai · 2026-05-10T04:23:36Z

+# View removal reasons that are text (not numbers)
+sqlite3 modlog.db "SELECT action_type, removal_reason FROM processed_actions WHERE removal_reason NOT LIKE '%[0-9]%' AND removal_reason != 'remove' LIMIT 5;"
+


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

LIKE '%[0-9]%' doesn't do what the comment implies in SQLite.

SQLite's LIKE only supports % and _ wildcards — [0-9] is treated as the literal four-character string [0-9], not "any digit". So the documented "View removal reasons that are text (not numbers)" query won't filter out numeric template IDs the way the section heading suggests; it will only exclude rows whose removal_reason literally contains [0-9].

If the intent is "no digits", use a GLOB pattern (which supports character classes) or a regexp:

📝 Proposed fix

-# View removal reasons that are text (not numbers) -sqlite3 modlog.db "SELECT action_type, removal_reason FROM processed_actions WHERE removal_reason NOT LIKE '%[0-9]%' AND removal_reason != 'remove' LIMIT 5;" +# View removal reasons that contain no digits (text-only reasons) +sqlite3 modlog.db "SELECT action_type, removal_reason FROM processed_actions WHERE removal_reason NOT GLOB '*[0-9]*' AND removal_reason != 'remove' LIMIT 5;"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# View removal reasons that are text (not numbers)

sqlite3 modlog.db "SELECT action_type, removal_reason FROM processed_actions WHERE removal_reason NOT LIKE '%[0-9]%' AND removal_reason != 'remove' LIMIT 5;"

# View removal reasons that contain no digits (text-only reasons)

sqlite3 modlog.db "SELECT action_type, removal_reason FROM processed_actions WHERE removal_reason NOT GLOB '*[0-9]*' AND removal_reason != 'remove' LIMIT 5;"

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@README.md` around lines 200 - 202, The README's example SQL uses LIKE '%[0-9]%' which treats '[0-9]' as a literal string in SQLite; change the filter to use GLOB (or a regexp if available) so it truly checks for digits. Replace the condition "removal_reason NOT LIKE '%[0-9]%'" with "removal_reason NOT GLOB '*[0-9]*'" (keeping the other clause removal_reason != 'remove' and the same SELECT/WHERE/ LIMIT structure on processed_actions.removal_reason) so the query excludes any removal_reason containing digits as intended.

coderabbitai · 2026-05-10T04:23:36Z

+from modlog_wiki_publisher import *
+
+# Mock Reddit action objects for testing
+class MockRedditAction:
+    def __init__(self, action_id, action_type, details, mod_name, target_type='post', target_id='abc123'):
+        self.id = action_id
+        self.action = action_type
+        self.details = details
+        self.created_utc = int(datetime.now().timestamp())
+
+        # Mock moderator
+        class MockMod:
+            def __init__(self, name):
+                self.name = name
+        self.mod = MockMod(mod_name)
+
+        # Mock targets based on type
+        if target_type == 'post':
+            self.target_submission = target_id
+            self.target_comment = None
+            self.target_author = 'testuser'
+            self.target_title = 'Test Post Title'
+            self.target_permalink = f'/r/test/comments/{target_id}/test_post/'
+        elif target_type == 'comment':
+            self.target_submission = None
+            self.target_comment = target_id
+            self.target_author = 'testuser'
+            self.target_title = None
+            self.target_permalink = f'/r/test/comments/parent123/test_post/{target_id}/'
+
+def test_removal_reasons():
+    """Test removal reason processing and storage"""
+    print("Testing Removal Reason Processing")
+    print("=" * 50)
+
+    # Clean up any existing test database
+    test_db = "test_modlog.db"
+    if os.path.exists(test_db):
+        os.remove(test_db)
+
+    # Override the global DB_PATH for testing
+    global DB_PATH
+    original_db_path = DB_PATH
+    DB_PATH = test_db


⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Critical: global DB_PATH does not actually point the publisher's functions at the test DB — this test will write to production modlog.db.

from modlog_wiki_publisher import * imports the value of DB_PATH into this module's namespace. The global DB_PATH on line 55 rebinds only the test module's DB_PATH. The imported functions (setup_database, store_processed_action, get_recent_actions_from_db, …) were defined inside modlog_wiki_publisher.py and resolve DB_PATH in that module — so every sqlite3.connect(DB_PATH) inside those functions still hits ./modlog.db.

Concrete consequences in this test:

setup_database() (line 62) runs migrations against your real modlog.db.

store_processed_action(action) (line 101) inserts the mock test rows into the real DB.

The verification SELECT on line 106 connects via the test module's local DB_PATH = "test_modlog.db", so it (correctly) reads an empty/different DB and the test silently fails to verify what it claims.

os.remove(test_db) in the finally removes only the empty test_modlog.db; the polluting rows persist in production.

Mutate the attribute on the publisher module instead of using import * + global:

🛡️ Proposed fix

-from modlog_wiki_publisher import * +import modlog_wiki_publisher +from modlog_wiki_publisher import ( + setup_database, + store_processed_action, + get_recent_actions_from_db, + build_wiki_content, +) @@ - # Override the global DB_PATH for testing - global DB_PATH - original_db_path = DB_PATH - DB_PATH = test_db + # Override the publisher module's DB_PATH for testing + original_db_path = modlog_wiki_publisher.DB_PATH + modlog_wiki_publisher.DB_PATH = test_db @@ - conn = sqlite3.connect(DB_PATH) + conn = sqlite3.connect(modlog_wiki_publisher.DB_PATH) @@ finally: - # Restore original DB path - DB_PATH = original_db_path + # Restore original DB path + modlog_wiki_publisher.DB_PATH = original_db_path

(Apply the same modlog_wiki_publisher.DB_PATH reference everywhere DB_PATH is used in this file.)

🧰 Tools

🪛 Ruff (0.15.12)

[error] 14-14: from modlog_wiki_publisher import * used; unable to detect undefined names

(F403)

[warning] 18-18: Missing return type annotation for special method __init__

Add return type annotation: None

(ANN204)

[warning] 22-22: datetime.datetime.now() called without a tz argument

(DTZ005)

[warning] 26-26: Missing return type annotation for special method __init__

Add return type annotation: None

(ANN204)

[warning] 44-44: Too many branches (16 > 12)

(PLR0912)

[warning] 44-44: Too many statements (70 > 50)

(PLR0915)

[warning] 55-55: Using the global statement to update DB_PATH is discouraged

(PLW0603)

[warning] 55-55: Using the global statement to update DB_PATH is discouraged

(PLW0603)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test_removal_reasons.py` around lines 14 - 57, The test currently imports DB_PATH by value via "from modlog_wiki_publisher import *" so reassigning local global DB_PATH doesn't affect the publisher functions; change the import to import the module (e.g., import modlog_wiki_publisher) and set modlog_wiki_publisher.DB_PATH = test_db before calling setup_database(), store_processed_action(), get_recent_actions_from_db(), etc., and update all calls to use modlog_wiki_publisher.setup_database, modlog_wiki_publisher.store_processed_action, modlog_wiki_publisher.get_recent_actions_from_db (or ensure those functions still reference the module-level DB_PATH) so the test operates against the test database rather than the production DB.

bakerboy448 and others added 30 commits August 8, 2025 21:24

Add force refresh functionality for wiki rebuilding from database

3d9560c

Update CLAUDE.md

a473211

fixes

8f0fa3b

Update modlog_wiki_publisher.py

2590ccc

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update modlog_wiki_publisher.py

dd62526

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

fix: don't force wiki just because modlog

9d6ddcb

human fixups

053f8bf

Update modlog_wiki_publisher.py

4b9df06

Update modlog_wiki_publisher.py

262db56

Update modlog_wiki_publisher.py

8c48d33

bakerboy448 and others added 18 commits August 9, 2025 00:36

Update modlog_wiki_publisher.py

76f7fd3

Update modlog_wiki_publisher.py

9e01a39

Update modlog_wiki_publisher.py

df5ad91

Update modlog_wiki_publisher.py

bc0e0d6

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update modlog_wiki_publisher.py

3949ec3

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Remove fallback text in MockAction details

7186292

coderabbitai Bot requested changes May 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: vibes branch consolidation (48 commits)#26

feat: vibes branch consolidation (48 commits)#26
bakerboy448 wants to merge 48 commits intomainfrom
vibes

bakerboy448 commented May 10, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 10, 2026 •

edited

Loading

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 10, 2026

Uh oh!

coderabbitai Bot May 10, 2026

Uh oh!

coderabbitai Bot May 10, 2026

Uh oh!

coderabbitai Bot May 10, 2026

Uh oh!

coderabbitai Bot May 10, 2026

Uh oh!

coderabbitai Bot May 10, 2026

Uh oh!

coderabbitai Bot May 10, 2026

Uh oh!

coderabbitai Bot May 10, 2026

Uh oh!

coderabbitai Bot May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

		# View removal reasons that are text (not numbers)
		sqlite3 modlog.db "SELECT action_type, removal_reason FROM processed_actions WHERE removal_reason NOT LIKE '%[0-9]%' AND removal_reason != 'remove' LIMIT 5;"

Conversation

bakerboy448 commented May 10, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why this PR exists

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

bakerboy448 commented May 10, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 10, 2026 •

edited

Loading