Add mcpg --demo: a curated demo dataset with a captured walkthrough by devopam · Pull Request #220 · devopam/MCPg

devopam · 2026-07-02T15:40:19Z

Summary

New-user onboarding (roadmap 17.1): previously, a new user's first five minutes with MCPg ran against whatever data they happened to have — often an empty scratch database that shows off none of the 250+ tool surface. This PR gives every user (and every future screenshot/recording) the same rich first experience:

MCPG_DATABASE_URL=postgresql://... mcpg --demo        # seed the mcpg_demo schema
MCPG_DATABASE_URL=postgresql://... mcpg --demo-drop   # remove it again

The dataset is curated, not random

A small, fully deterministic e-commerce schema (400 customers / 120 products / 3,000 orders / ~7,400 order items / 900 reviews) engineered so the pivotal tools all have something real to find:

orders.customer_id is an FK with no index — analyze_query_plan shows the sequential scan, and recommend_indexes genuinely flags the table (the sibling tables do carry FK indexes, so it reads as a finding, not a theme).
customers.email / customers.phone bait find_sensitive_columns; a camelCase reviews."reviewSource" column trips lint_naming_conventions.
Review prose is per-product-type plausible — feature phrases are keyed to the product type, so a robot vacuum gets complaints about battery life and a yoga mat about grip, never the reverse. "battery life" recurs across battery-having products on purpose: it's the walkthrough's canonical full_text_search query.
products.embedding (pgvector, 8-dim, deterministic) is added only when the vector extension is already installed — the seeder never creates extensions; everything else works without it.
Order dates skew recent (growth curve) and customer activity is heavy-tailed, so time-window and top-N questions return dashboard-shaped answers.

Safety

The whole seed is one transaction — a mid-seed failure leaves nothing behind.
Re-seeding refuses rather than clobbers ("run mcpg --demo-drop first").
--demo-drop checks the ownership marker (schema comment) and refuses to drop a schema MCPg didn't create; dropping a non-existent schema is a no-op, not an error.
CLI-only surface — no new MCP tools; the tool-surface snapshot and outputSchema contract manifests are untouched.

The captured walkthrough (`docs/demo.md`)

Captured, not written: every output block is a real tool run against the seeded dataset, rendered by tools/generate_demo_walkthrough.py (7 sections: table summary → SQL analytics → slow-query diagnosis → index advisor → FTS → PII/naming audit → graph projection). Because the dataset is deterministic, the numbers in the doc are the numbers users get. tests/integration/test_demo_integration.py pins every planted finding, so the walkthrough can't silently rot when a helper changes.

One non-obvious bit worth flagging for review: the index-advisor section resets pg_stat before replaying the workload — seeding itself generates ~7,400 FK-check index scans on orders' PK, which drowns the advisor's seq_scan > idx_scan signal; and since a backend's pending stats only flush at transaction end, the wait is a poll (which itself drives flushes), not a sleep. Both the generator and the integration test do this identically.

Docs

README quick-start section, docs/index.md link, CHANGELOG [Unreleased], and roadmap section 17 (shipped).

Test plan

8 new unit tests (tests/unit/test_demo.py): determinism, row counts, referential integrity, order totals = sum of items, unique emails, planted-flaw pinning (the missing index and camelCase column are asserted present in the DDL so nobody "fixes" them), per-type feature plausibility.
3 new CLI tests (tests/unit/test_main.py): --demo seeds and prints the summary + suggested prompts, --demo-drop reports, DemoError → exit 1.
2 integration tests: full lifecycle (seed → verify counts/marker/vector-column parity with pg_extension → re-seed refusal → all planted findings via the real tools → drop → double-drop no-op) and foreign-schema drop refusal. Skipped on the WarehousePG lane (demo targets stock PostgreSQL).
Verified end-to-end against a real PostgreSQL 16 in this environment: seeded via the actual mcpg --demo CLI, generated docs/demo.md from live runs, and ran the full suite: 2735 passed, coverage 90.08% (gate: 90%). ruff format --check / ruff check / mypy src/mcpg (strict) / bandit all clean.

Generated by Claude Code

New-user onboarding: `mcpg --demo` seeds a small, deterministic, deliberately curated e-commerce dataset (400 customers, 120 products, 3,000 orders, 900 reviews) into an mcpg_demo schema in the configured database, so the first five minutes with MCPg run against data the tools can actually show off. `mcpg --demo-drop` removes it. The dataset plants specific teaching moments: - orders.customer_id is an FK with no index — analyze_query_plan shows the seq scan, recommend_indexes catches it - customers.email/phone bait find_sensitive_columns; a camelCase reviews."reviewSource" column trips lint_naming_conventions - review prose is per-product-type plausible (a yoga mat is never praised for its battery life) and FTS-searchable - products.embedding (pgvector, 8-dim) is added only when the vector extension is already installed — never created by the seeder Safety: single-transaction seed, refuses to touch an existing schema, and --demo-drop only drops a schema carrying the MCPg ownership marker comment. CLI-only surface — no new MCP tools, snapshot unchanged. docs/demo.md is captured, not written: every output block is a real tool run against the seeded dataset (tools/generate_demo_walkthrough.py regenerates it), and tests/integration/test_demo_integration.py pins every planted finding so the walkthrough can't silently rot. Roadmap row 17.1.

sourcery-ai

Sorry @devopam, you have reached your weekly rate limit of 500000 diff characters.

Please try again later or upgrade to continue using Sourcery

gemini-code-assist-2

Code Review

This pull request introduces a new onboarding and demo feature for MCPg, allowing users to seed and drop a curated, deterministic e-commerce dataset using the mcpg --demo and mcpg --demo-drop CLI commands. The dataset is specifically engineered with planted flaws (such as an un-indexed foreign key, PII-shaped columns, and naming violations) to showcase the capabilities of MCPg's analysis, indexing, search, and auditing tools. The PR includes comprehensive unit and integration tests, updated documentation, and a script to automatically generate a walkthrough of the demo. No review comments were provided, so there is no additional feedback to address.

sourcery-ai Bot reviewed Jul 2, 2026

View reviewed changes

gemini-code-assist-2 Bot reviewed Jul 2, 2026

View reviewed changes

devopam merged commit 5ff68fe into main Jul 3, 2026
19 checks passed

devopam deleted the claude/demo-dataset branch July 3, 2026 03:05

devopam mentioned this pull request Jul 3, 2026

Publish MCP ToolAnnotations on every tool + cut v0.6.7 #221

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add mcpg --demo: a curated demo dataset with a captured walkthrough#220

Add mcpg --demo: a curated demo dataset with a captured walkthrough#220
devopam merged 1 commit into
mainfrom
claude/demo-dataset

devopam commented Jul 2, 2026

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

gemini-code-assist-2 Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

devopam commented Jul 2, 2026

Summary

The dataset is curated, not random

Safety

The captured walkthrough (docs/demo.md)

Docs

Test plan

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist-2 Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

The captured walkthrough (`docs/demo.md`)