Add font embedding and introspection for HTML export by jonmmease · Pull Request #256 · vega/vl-convert

jonmmease · 2026-03-13T16:17:51Z

Summary

Two additions to HTML export:

Font embedding — Google Fonts are referenced via CDN <link> tags by default. In bundle mode, all fonts are subsetted to only the glyphs used in the chart, compressed to WOFF2, and inlined as base64 @font-face CSS for fully offline viewing. Local system fonts can also be embedded via an opt-in flag.
Font introspection API — New vegalite_fonts/vega_fonts functions return structured FontInfo metadata for each font used in a rendered spec, including per-variant weight/style data and optional @font-face CSS blocks (gated by include_font_face to avoid the expensive subsetting pipeline when not needed).

Motivation

HTML export (vl2html/vg2html) currently produces pages with no font information. When a chart uses non-system fonts (e.g. Google Fonts), the browser falls back to a default font or requires internet access. This makes HTML export unreliable for offline viewing, cross-machine consistency, and archival use cases.

Font embedding behavior

`bundle`	`embed_local_fonts`	Google Fonts	Local fonts
false	false	CDN `<link>` tags	not included
false	true	CDN `<link>` tags	inline `@font-face`
true	false	inline `@font-face`	not included
true	true	inline `@font-face`	inline `@font-face`

Both CDN and bundle modes are variant-aware: they use only the weight/style combinations actually present in the chart, via the Google Fonts CSS2 API ital,wght@ parameter syntax.

How it works (bundle mode)

Render the Vega scenegraph in the JS runtime
Walk all text marks to collect unique characters per (font, weight, style)
For each font variant: download TTF from Google Fonts or read from fontdb
Subset each TTF to only the required glyphs using the font-subset crate
Compress to WOFF2 and base64-encode
Generate @font-face CSS blocks and inject as <style> in the HTML <head>

Limitations

The font-subset crate only supports TrueType (TTF) outline fonts. Local CFF/OTF and TTC fonts are skipped with a warning (or error, depending on missing_fonts policy). Google Fonts always serves TTF, so this limitation does not affect them.

Changes

Core library (`vl-convert-rs`)

New font_embed.rs module — Font subsetting pipeline: TTF → WOFF2 → base64 → @font-face CSS
FontInfo/FontVariant structs in extract.rs — Structured API types returned by vega_fonts/vegalite_fonts, with Serialize for Python interop
FontSource/FontForHtml/FontKey types in extract.rs — Classify fonts as GoogleFonts or Local; extract_text_by_font() Rust scenegraph walker collecting unique chars per (font, weight, style)
classify_scenegraph_fonts() — Classifies fonts as Google or Local with CDN-first policy for HTML
vegaToTextByFont JS function — JS-side scenegraph walker for character extraction
vega_fonts()/vegalite_fonts() API — Return Vec<FontInfo> with structured per-family metadata (name, source, variants, url, link_tag, import_rule) and optional @font-face CSS per variant
build_font_head_html() consumes vega_fonts() internally — Exercises the public API on every HTML export
index_font_face_blocks() in font_embed.rs — Parses generated CSS blocks back into a (family, weight, style) → CSS index for attaching to the correct FontVariant
HTML generation — vegalite_to_html/vega_to_html now inject font <link> and/or <style> blocks
VlConverterConfig.html_embed_local_fonts — Opt-in for local font embedding
Variant-aware CDN helpers in html.rs — font_cdn_url(), font_link_tag(), font_import_rule() using actual chart variants

CLI

--embed-local-fonts on vl2html/vg2html subcommands only

Python

html_embed_local_fonts config option
vegalite_fonts()/vega_fonts() sync + async functions returning list[FontInfo] via pythonize
FontInfo and FontVariant TypedDicts in type stubs (.pyi)
include_font_face parameter to gate subsetting pipeline

Dependencies

font-subset v0.1 (with woff2 feature)

Review tour

Data flow: vegalite_to_html → build_font_head_html → vega_fonts → analyze_html_fonts → classify_scenegraph_fonts (font discovery) + extract_text_by_font (character extraction) → build_font_info → generate_font_face_css + index_font_face_blocks (subsetting/encoding)

Start here:

vl-convert-rs/src/extract.rs — FontInfo/FontVariant public API types, FontSource enum, FontForHtml struct, FontKey, and extract_text_by_font() Rust scenegraph walker

Core pipeline:

vl-convert-rs/src/font_embed.rs — TextByFontEntry/FontKey types → aggregate_chars_by_font_key() → generate_google_fonts_css() / generate_local_font_css() → index_font_face_blocks()
vl-convert-rs/src/html.rs — Variant-aware CDN URL formatting helpers

Integration (converter.rs is large; focus on these sections):

vegaToTextByFont JS function (~line 1310)
classify_scenegraph_fonts() (~line 3085)
analyze_html_fonts() (~line 4266)
vega_fonts() (~line 4345) / build_font_info() (~line 4375) / vegalite_fonts() (~line 4483)
build_font_head_html() (~line 4512) — consumes vega_fonts() internally

Surface area:

vl-convert/src/main.rs — CLI (small change)
vl-convert-python/src/lib.rs — Python bindings (follows existing patterns)
vl-convert-python/vl_convert.pyi — Type stubs

Test plan

All existing tests pass (106 Rust, 107 CLI)
cargo fmt and cargo clippy clean
Manual: open HTML output with Google Fonts chart in browser, confirm fonts render without network
Manual: test --embed-local-fonts with system font, verify @font-face block in HTML source
Follow-up: integration tests for extract_text_by_font and the full FontFace pipeline

🤖 Generated with Claude Code

Add FontSource, FontForHtml, FontKey types and extract_text_by_font() function that walks Vega scenegraph JSON to collect unique characters per (font, weight, style) combination. Handles multiline text arrays matching Vega's String([...]) comma-join behavior. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

@font-face

New font_embed module with TTF → WOFF2 → base64 → @font-face CSS pipeline using the font-subset crate. Supports both Google Fonts (downloaded TTF) and local fonts (from fontdb). Includes unit tests with Caveat font for subsetting, encoding, and CSS generation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add format_variant_tuples(), font_cdn_url(), font_link_tag(), and font_import_rule() that generate Google Fonts CSS2 API URLs using actual chart variants (ital,wght@ tuples) instead of hardcoded weight/style combinations. Includes comprehensive unit tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add FontFormat enum, HtmlFontAnalysis, and the full HTML font pipeline: - classify_and_request_fonts() with prefer_cdn parameter (CDN-first for HTML, local-first for SVG/PNG/PDF) - classify_scenegraph_fonts() with explicit_google_families support to prevent false missing-font warnings for per-call overrides - render_scenegraph_for_html() private helper respecting per-call auto_google_fonts override - analyze_html_fonts() orchestrating font discovery and character extraction from the rendered scenegraph - vega_fonts()/vegalite_fonts() introspection API returning font metadata in 5 formats (Name, Url, LinkTag, ImportRule, FontFace) - build_font_head_html() injecting <link>/<style> tags into HTML output - VlConverterConfig.html_embed_local_fonts opt-in field Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

@font-face

Add --embed-local-fonts option to vl2html and vg2html subcommands, wiring through to VlConverterConfig.html_embed_local_fonts for inline @font-face embedding of local system fonts in HTML output. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add vegalite_fonts()/vega_fonts() sync and async Python bindings returning font metadata in 5 formats (name, url, link_tag, import_rule, font_face). Add html_embed_local_fonts config option. Update .pyi type stubs with docstrings for all new functions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…t API vega_fonts/vegalite_fonts now return Vec<FontInfo> with per-family metadata (name, source, variants, url, link_tag, import_rule) instead of Vec<String> selected by a FontFormat enum. Each FontVariant carries an optional font_face field populated only when include_font_face=true, gating the expensive subsetting pipeline. build_font_head_html now consumes vega_fonts() internally, exercising the public API on every HTML export. Python bindings use pythonize to return list[FontInfo] as TypedDicts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Pan/zoom interactions can reveal axis labels with digits that weren't in the initial view, so always include all numeric digits in subsetted fonts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Fonts registered via register_google_fonts_font() were classified as local in the HTML pipeline because fontdb has no source tracking. Track registered Google Font families in a separate set so classify_scenegraph_fonts can emit FontSource::GoogleFonts with proper CDN URLs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace the stringly-typed `source: String` field with `source: FontSource`, a serde-tagged enum that serializes as `{"type": "google", "font_id": "..."}` or `{"type": "local"}`. Exposes the Google Fonts font ID to API consumers. Rename `FontSource::GoogleFonts` to `FontSource::Google` for cleaner serialization. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When explicit google_fonts are provided, bundle mode is off, and no local font embedding is needed, build <link> tags directly from the font requests without rendering the scenegraph or invoking V8. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Use CSS2 API range syntax (0,100..900;1,100..900) instead of enumerating specific weights — browsers only download variants they need so there is no bandwidth cost. Extend the CDN fast path to cover auto-google-fonts mode using static spec analysis, avoiding V8/scenegraph rendering for all non-embedding HTML exports. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

jonmmease and others added 14 commits March 13, 2026 11:51

chore: regenerate thirdparty_rust.yaml for font-subset dependency

631d6e0

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat: always include digits 0-9 in font subsets

79ca8c0

Pan/zoom interactions can reveal axis labels with digits that weren't in the initial view, so always include all numeric digits in subsetted fonts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

style: remove block separator comments from new modules

f0bc8f6

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add font embedding and introspection for HTML export#256

Add font embedding and introspection for HTML export#256
jonmmease wants to merge 14 commits intomainfrom
jonmmease/html-font-embedding

jonmmease commented Mar 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jonmmease commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Font embedding behavior

How it works (bundle mode)

Limitations

Changes

Core library (vl-convert-rs)

CLI

Python

Dependencies

Review tour

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jonmmease commented Mar 13, 2026 •

edited

Loading

Core library (`vl-convert-rs`)