feat(py): DataSourceReader bridge for ggsql database pushdown by cpsievert · Pull Request #221 · posit-dev/querychat

cpsievert · 2026-04-18T03:31:46Z

Motivation

querychat's current ggsql integration splits execution into two phases: run the SQL on the real database, then replay the VISUALISE portion locally in an in-memory DuckDB. This has two problems:

Scaling — the full SQL result must be pulled into Python memory, even when ggsql's stat transforms (histogram, density, boxplot) would reduce it to a small summary. A histogram of 10M rows pulls all 10M rows into memory only to bin them into ~30 buckets.
Multi-source layers — ggsql supports per-layer data sources (e.g., a CTE fed to a different DRAW clause). The two-phase approach loses intermediate tables at the DataSource boundary, so querychat rejects these queries entirely.

Both problems stem from the same root cause: querychat splits the query at the SQL/VISUALISE boundary rather than letting ggsql run the full pipeline against the real database.

Approach

For SQLAlchemySource data sources, this PR implements a DataSourceReader — a Python object that satisfies ggsql's reader protocol (execute_sql(), register(), unregister()) by routing SQL to the real database via SQLAlchemy. ggsql runs its entire pipeline (parsing, CTEs, stat transforms, layer queries) against the real DB.

sqlglot transpiles ggsql's ANSI-generated SQL to the target database dialect. The dialect mapping covers 22 SQLAlchemy backends, verified by installing the actual driver packages and checking engine.dialect.name.

Falls back to the current two-phase approach when the bridge fails (e.g., temp table permission denied, unsupported dialect, transpilation error) or for non-SQLAlchemy data sources.

Why a separate PR

This is split off from feat/ggsql-integration because the DataSourceReader bridge is Python-specific — it depends on SQLAlchemy and sqlglot, neither of which has an R equivalent. The parent branch (feat/ggsql-integration) contains ggsql prompt/syntax updates and other changes that apply to both R and Python. Keeping this separate makes it clear that the R package doesn't need a corresponding change.

Changes

New: _datasource_reader.py — DataSourceReader class, SQLGLOT_DIALECTS mapping (22 dialects), transpile_sql(), register_sqlglot_dialect() for custom additions
Modified: _viz_ggsql.py — execute_ggsql() tries bridge path first, falls back to execute_two_phase() (renamed current logic); logs a warning for unknown dialects
Modified: _viz_tools.py, _shiny_module.py — updated callers to pass original query string
New dep: sqlglot>=26.0 added to the viz extra (pure Python, zero transitive deps, 6.6 MB)
Tests: 19 new tests covering dialect mapping, transpilation, DataSourceReader lifecycle, and end-to-end ggsql integration against real SQLite

Test plan

All existing ggsql tests pass with updated 3-arg execute_ggsql() signature
New DataSourceReader unit tests pass (SQLite in-memory)
End-to-end: ggsql.execute(query, reader) works for scatter, filter, Form B, aggregation
Manual test with a real Snowflake/Postgres connection (TODO)

🤖 Generated with Claude Code

Add visualization support to querychat via ggsql, a SQL extension for declarative data visualization. Includes Altair/Vega-Lite rendering, Shiny widget integration, ggsql syntax reference prompt, Playwright tests, and documentation. Co-Authored-By: Claude Opus 4.6 <[email protected]>

Co-Authored-By: Claude Opus 4.6 <[email protected]>

Fixes and improvements accumulated after the initial ggsql integration: - Bookmark restore, Shiny tool wiring, and pre-commit fixes - Remove legacy shiny client path - Send PNG thumbnail to LLM in viz tool results - Port visualization best practices from ggsqlbot (axis readability, labels with units, data-ink ratio, overplotting, chart type guide) - Move behavioral guidance from tool description to system prompt - Add interpretation skepticism to guidelines - Add collapsed parameter to query tool for preparatory queries - Add uv required-environments for CI Co-Authored-By: Claude Opus 4.6 <[email protected]>

…inction, trim redundant notes Co-Authored-By: Claude Sonnet 4.6 <[email protected]> fix(prompts): polish tool descriptions — collapsed default, error-mode reinforcement, tighter ggsql param Co-Authored-By: Claude Sonnet 4.6 <[email protected]> fix(prompts): tool name consistency, conditional fallback, standalone routing section Co-Authored-By: Claude Sonnet 4.6 <[email protected]> style: fix docstring formatting in new test class Co-Authored-By: Claude Opus 4.6 <[email protected]> fix(ggsql): guard unsupported layer sources

…deps lazy Move pathlib, copy, importlib.util, and chevron to top-level imports. Move internal module imports (_viz_altair_widget, _viz_ggsql) to top level in _viz_tools.py. Keep ggsql, altair, and shinywidgets as runtime imports since they are optional (viz extra). Co-Authored-By: Claude Opus 4.6 <[email protected]>

…pabilities - Move visualization into "Your Capabilities" as a peer of filtering/querying - Add "Choosing Between Query and Visualization" decision preamble - Clean up conditional branching (drop viz-only fallback, simplify nesting) - Integrate viz suggestions into existing suggestion examples - Replace "(if available)" hedging with Mustache conditionals - Trim tool-query.md and tool-update-dashboard.md: remove routing guidance and cross-tool references now covered by the main prompt - Add structural verification tests for prompt conditionals Co-Authored-By: Claude Opus 4.6 <[email protected]>

…ueries Guide the LLM to prefer visualization for comparisons/distributions/trends and always collapse query results when a chart covers the same data. Also remove stale _extract_column_names monkeypatch that was failing CI. Co-Authored-By: Claude Opus 4.6 <[email protected]>

Co-Authored-By: Claude Opus 4.6 <[email protected]>

- Include vl-convert-python in viz dependency error message - Use querychat_tool_starts_open() for viz tool open state - Fix docs: viz tool is opt-in, not default - Fix docs: each viz call produces a new message, not a replacement Co-Authored-By: Claude Opus 4.6 <[email protected]>

Integrate main's deferred client initialization (#207) and narwhals version constraint (#218) with the ggsql visualization feature branch. Extended _create_session_client with visualize_query parameter and updated stub session test to match eager client construction. Co-Authored-By: Claude Opus 4.6 <[email protected]>

chatlas 0.16.0 added stream_content as a new abstract method on Provider, causing DummyProvider instantiation to fail in CI. Co-Authored-By: Claude Opus 4.6 <[email protected]>

shallow copy + nested attribute mutation was modifying the original chart's sub-specs through shared references, violating the function's documented contract. Co-Authored-By: Claude Opus 4.6 <[email protected]>

narwhals.stable.v1.from_native wraps ibis Tables as DataFrame (not LazyFrame), so the isinstance(nw_df, nw.LazyFrame) check was False and collect() was never called, causing 'IbisLazyFrame has no attribute to_polars'. Delegate to as_narwhals() which already handles ibis correctly. Co-Authored-By: Claude Opus 4.6 <[email protected]>

…identifiers Snowflake uppercases unquoted identifiers, causing case mismatches when ggsql validates VISUALISE column references against DuckDB results. Replace the generic casing note with explicit wrong/correct examples instructing the LLM to alias uppercase columns to lowercase in SELECT. Borrowed from ggsqlbot's proven prompt pattern. Co-Authored-By: Claude Opus 4.6 <[email protected]>

Snowflake uppercases all unquoted identifiers, so columns come back as AVG_INTEREST_RATE even when the LLM writes AS avg_interest_rate. Since ggsql validates VISUALISE column references case-sensitively and DuckDB is case-insensitive, lowercasing the DataFrame columns before registering ensures the LLM's lowercase references always match. Co-Authored-By: Claude Opus 4.6 <[email protected]>

Adds truncate_error() in _utils to cap error strings sent to the LLM, stripping schema dumps and applying a hard character limit. Wired into all tool error catch blocks in tools.py and _viz_tools.py. Co-Authored-By: Claude Opus 4.6 <[email protected]>

Updates syntax guide for ggsql v0.2.4 breaking changes: rect→tile, linear merged into rule, array syntax to parentheses, updated text aesthetics. Fixes pre-existing errors (invalid position fill, nonexistent density stacking). Bumps ggsql dependency to >=0.2.4. Co-Authored-By: Claude Opus 4.6 <[email protected]>

Implements a DataSourceReader that routes ggsql's full pipeline through the real database via SQLAlchemy, using sqlglot for dialect transpilation. For SQLAlchemySource with a known dialect, ggsql runs CTEs, stat transforms, and layer queries directly on the real DB — avoiding the need to pull large result sets into Python memory. Falls back to the existing two-phase approach (now `execute_two_phase`) for other DataSource types or on bridge failure. Includes verified dialect mappings for 22 SQLAlchemy backends and register_sqlglot_dialect() for custom additions. Co-Authored-By: Claude Opus 4.6 <[email protected]>

cpsievert and others added 22 commits April 8, 2026 17:36

refactor: remove developer-facing viz API from public surface

d7a3223

Co-Authored-By: Claude Opus 4.6 <[email protected]>

refactor: simplify viz preload and trim brittle tests

afebc15

refactor(viz): simplify chart feedback path

29e731b

docs: add viz tool and collapsed query param to changelog

8c4a781

Co-Authored-By: Claude Opus 4.6 <[email protected]>

chore: update github remotes

014f279

fix: add stream_content stub to DummyProvider for chatlas 0.16.0 compat

a8aeae4

chatlas 0.16.0 added stream_content as a new abstract method on Provider, causing DummyProvider instantiation to fail in CI. Co-Authored-By: Claude Opus 4.6 <[email protected]>

fix(viz): use deepcopy in fit_chart_to_container to avoid mutating input

b253363

shallow copy + nested attribute mutation was modifying the original chart's sub-specs through shared references, violating the function's documented contract. Co-Authored-By: Claude Opus 4.6 <[email protected]>

Merge branch 'main' into feat/ggsql-integration

db17b55

Base automatically changed from feat/ggsql-integration to main April 28, 2026 21:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(py): DataSourceReader bridge for ggsql database pushdown#221

feat(py): DataSourceReader bridge for ggsql database pushdown#221
cpsievert wants to merge 22 commits intomainfrom
feat/datasource-reader-bridge

cpsievert commented Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cpsievert commented Apr 18, 2026

Motivation

Approach

Why a separate PR

Changes

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant