Skip to content

Add font embedding and introspection for HTML export#256

Draft
jonmmease wants to merge 14 commits intomainfrom
jonmmease/html-font-embedding
Draft

Add font embedding and introspection for HTML export#256
jonmmease wants to merge 14 commits intomainfrom
jonmmease/html-font-embedding

Conversation

@jonmmease
Copy link
Collaborator

@jonmmease jonmmease commented Mar 13, 2026

Summary

Two additions to HTML export:

  1. Font embedding — Google Fonts are referenced via CDN <link> tags by default. In bundle mode, all fonts are subsetted to only the glyphs used in the chart, compressed to WOFF2, and inlined as base64 @font-face CSS for fully offline viewing. Local system fonts can also be embedded via an opt-in flag.

  2. Font introspection API — New vegalite_fonts/vega_fonts functions return structured FontInfo metadata for each font used in a rendered spec, including per-variant weight/style data and optional @font-face CSS blocks (gated by include_font_face to avoid the expensive subsetting pipeline when not needed).

Motivation

HTML export (vl2html/vg2html) currently produces pages with no font information. When a chart uses non-system fonts (e.g. Google Fonts), the browser falls back to a default font or requires internet access. This makes HTML export unreliable for offline viewing, cross-machine consistency, and archival use cases.

Font embedding behavior

bundle embed_local_fonts Google Fonts Local fonts
false false CDN <link> tags not included
false true CDN <link> tags inline @font-face
true false inline @font-face not included
true true inline @font-face inline @font-face

Both CDN and bundle modes are variant-aware: they use only the weight/style combinations actually present in the chart, via the Google Fonts CSS2 API ital,wght@ parameter syntax.

How it works (bundle mode)

  1. Render the Vega scenegraph in the JS runtime
  2. Walk all text marks to collect unique characters per (font, weight, style)
  3. For each font variant: download TTF from Google Fonts or read from fontdb
  4. Subset each TTF to only the required glyphs using the font-subset crate
  5. Compress to WOFF2 and base64-encode
  6. Generate @font-face CSS blocks and inject as <style> in the HTML <head>

Limitations

  • The font-subset crate only supports TrueType (TTF) outline fonts. Local CFF/OTF and TTC fonts are skipped with a warning (or error, depending on missing_fonts policy). Google Fonts always serves TTF, so this limitation does not affect them.

Changes

Core library (vl-convert-rs)

  • New font_embed.rs module — Font subsetting pipeline: TTF → WOFF2 → base64 → @font-face CSS
  • FontInfo/FontVariant structs in extract.rs — Structured API types returned by vega_fonts/vegalite_fonts, with Serialize for Python interop
  • FontSource/FontForHtml/FontKey types in extract.rs — Classify fonts as GoogleFonts or Local; extract_text_by_font() Rust scenegraph walker collecting unique chars per (font, weight, style)
  • classify_scenegraph_fonts() — Classifies fonts as Google or Local with CDN-first policy for HTML
  • vegaToTextByFont JS function — JS-side scenegraph walker for character extraction
  • vega_fonts()/vegalite_fonts() API — Return Vec<FontInfo> with structured per-family metadata (name, source, variants, url, link_tag, import_rule) and optional @font-face CSS per variant
  • build_font_head_html() consumes vega_fonts() internally — Exercises the public API on every HTML export
  • index_font_face_blocks() in font_embed.rs — Parses generated CSS blocks back into a (family, weight, style) → CSS index for attaching to the correct FontVariant
  • HTML generationvegalite_to_html/vega_to_html now inject font <link> and/or <style> blocks
  • VlConverterConfig.html_embed_local_fonts — Opt-in for local font embedding
  • Variant-aware CDN helpers in html.rsfont_cdn_url(), font_link_tag(), font_import_rule() using actual chart variants

CLI

  • --embed-local-fonts on vl2html/vg2html subcommands only

Python

  • html_embed_local_fonts config option
  • vegalite_fonts()/vega_fonts() sync + async functions returning list[FontInfo] via pythonize
  • FontInfo and FontVariant TypedDicts in type stubs (.pyi)
  • include_font_face parameter to gate subsetting pipeline

Dependencies

  • font-subset v0.1 (with woff2 feature)

Review tour

Data flow: vegalite_to_htmlbuild_font_head_htmlvega_fontsanalyze_html_fontsclassify_scenegraph_fonts (font discovery) + extract_text_by_font (character extraction) → build_font_infogenerate_font_face_css + index_font_face_blocks (subsetting/encoding)

Start here:

  • vl-convert-rs/src/extract.rsFontInfo/FontVariant public API types, FontSource enum, FontForHtml struct, FontKey, and extract_text_by_font() Rust scenegraph walker

Core pipeline:

Integration (converter.rs is large; focus on these sections):

  • vegaToTextByFont JS function (~line 1310)
  • classify_scenegraph_fonts() (~line 3085)
  • analyze_html_fonts() (~line 4266)
  • vega_fonts() (~line 4345) / build_font_info() (~line 4375) / vegalite_fonts() (~line 4483)
  • build_font_head_html() (~line 4512) — consumes vega_fonts() internally

Surface area:

Test plan

  • All existing tests pass (106 Rust, 107 CLI)
  • cargo fmt and cargo clippy clean
  • Manual: open HTML output with Google Fonts chart in browser, confirm fonts render without network
  • Manual: test --embed-local-fonts with system font, verify @font-face block in HTML source
  • Follow-up: integration tests for extract_text_by_font and the full FontFace pipeline

🤖 Generated with Claude Code

jonmmease and others added 14 commits March 13, 2026 11:51
Add FontSource, FontForHtml, FontKey types and extract_text_by_font()
function that walks Vega scenegraph JSON to collect unique characters
per (font, weight, style) combination. Handles multiline text arrays
matching Vega's String([...]) comma-join behavior.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
New font_embed module with TTF → WOFF2 → base64 → @font-face CSS
pipeline using the font-subset crate. Supports both Google Fonts
(downloaded TTF) and local fonts (from fontdb). Includes unit tests
with Caveat font for subsetting, encoding, and CSS generation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add format_variant_tuples(), font_cdn_url(), font_link_tag(), and
font_import_rule() that generate Google Fonts CSS2 API URLs using
actual chart variants (ital,wght@ tuples) instead of hardcoded
weight/style combinations. Includes comprehensive unit tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add FontFormat enum, HtmlFontAnalysis, and the full HTML font pipeline:
- classify_and_request_fonts() with prefer_cdn parameter (CDN-first for
  HTML, local-first for SVG/PNG/PDF)
- classify_scenegraph_fonts() with explicit_google_families support to
  prevent false missing-font warnings for per-call overrides
- render_scenegraph_for_html() private helper respecting per-call
  auto_google_fonts override
- analyze_html_fonts() orchestrating font discovery and character
  extraction from the rendered scenegraph
- vega_fonts()/vegalite_fonts() introspection API returning font
  metadata in 5 formats (Name, Url, LinkTag, ImportRule, FontFace)
- build_font_head_html() injecting <link>/<style> tags into HTML output
- VlConverterConfig.html_embed_local_fonts opt-in field

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add --embed-local-fonts option to vl2html and vg2html subcommands,
wiring through to VlConverterConfig.html_embed_local_fonts for
inline @font-face embedding of local system fonts in HTML output.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add vegalite_fonts()/vega_fonts() sync and async Python bindings
returning font metadata in 5 formats (name, url, link_tag,
import_rule, font_face). Add html_embed_local_fonts config option.
Update .pyi type stubs with docstrings for all new functions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…t API

vega_fonts/vegalite_fonts now return Vec<FontInfo> with per-family
metadata (name, source, variants, url, link_tag, import_rule) instead
of Vec<String> selected by a FontFormat enum. Each FontVariant carries
an optional font_face field populated only when include_font_face=true,
gating the expensive subsetting pipeline.

build_font_head_html now consumes vega_fonts() internally, exercising
the public API on every HTML export.

Python bindings use pythonize to return list[FontInfo] as TypedDicts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Pan/zoom interactions can reveal axis labels with digits that weren't
in the initial view, so always include all numeric digits in subsetted
fonts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fonts registered via register_google_fonts_font() were classified as
local in the HTML pipeline because fontdb has no source tracking.
Track registered Google Font families in a separate set so
classify_scenegraph_fonts can emit FontSource::GoogleFonts with proper
CDN URLs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the stringly-typed `source: String` field with `source:
FontSource`, a serde-tagged enum that serializes as
`{"type": "google", "font_id": "..."}` or `{"type": "local"}`.
Exposes the Google Fonts font ID to API consumers.

Rename `FontSource::GoogleFonts` to `FontSource::Google` for cleaner
serialization.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When explicit google_fonts are provided, bundle mode is off, and no
local font embedding is needed, build <link> tags directly from the
font requests without rendering the scenegraph or invoking V8.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use CSS2 API range syntax (0,100..900;1,100..900) instead of enumerating
specific weights — browsers only download variants they need so there is
no bandwidth cost.  Extend the CDN fast path to cover auto-google-fonts
mode using static spec analysis, avoiding V8/scenegraph rendering for all
non-embedding HTML exports.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant