Skip to content

Swap remarkable for markdown-it#4884

Merged
fiskus merged 15 commits into
masterfrom
worktree-remarkable-to-markdown-it
May 12, 2026
Merged

Swap remarkable for markdown-it#4884
fiskus merged 15 commits into
masterfrom
worktree-remarkable-to-markdown-it

Conversation

@fiskus
Copy link
Copy Markdown
Member

@fiskus fiskus commented May 8, 2026

Summary

  • Swap unmaintained remarkable (last release ~2020) for markdown-it, the de-facto successor. Same linkify-it engine, so URL detection is unchanged.
  • Mechanical near-drop-in: linkify plugin → linkify: true option, state.push({obj})state.push(type, tag, nesting) then mutate.
  • parseTasklist ported to markdown-it StateInline API; also cleaned up internally — string matching via startsWith, two separate token types (tasklist_checked / tasklist_unchecked) instead of one mutable .checked property, word-boundary check so a[x]b is left as text, and silent-mode guard so parseLinkLabel's lookahead doesn't push spurious tokens.
  • ALLOWED_TAGS gains s (markdown-it strikethrough → <s>). Existing tags retained, including those that only reach the sanitizer via raw HTML pass-through (input, abbr, dd, dl, dt, ins, mark, sub, sup, del). A NOTE comment now documents the two categories so the rationale survives future edits.
  • Stale remarkable@1 async comment in Preview/loaders/Markdown.jsx removed; underlying TODO preserved.
  • Markdown.spec.ts extended with feature-coverage cases (code highlight, linkify, nofollow, tasklist glyphs incl. escaped-bracket, mid-word rejection, and nested-image-label regression, typographer, DOMPurify script-strip). All existing strict-string assertions preserved as inline snapshots — markdown-it produces byte-identical HTML for those cases.

Note on alt text

Inside markdown image syntax ![alt](src), escape sequences (\!), HTML entities (&amp;), and inline `code` aren't reflected in the rendered alt attribute. This is markdown-it's default behavior (markdown-it/markdown-it#1142) — its image renderer concatenates a fixed subset of inline token types when building alt. Raw HTML <img alt="..."> is left untouched; attribute values are inert text per HTML5 parsing rules. No workaround locally — the patterns are uncommon in real markdown, and the rendered output is well-formed in all cases.

This PR also drops the unescapeMd(alt) step previously run inside the DOMPurify handleImage hook. It was a remarkable-era leftover: remarkable stored the raw source slice in alt without decoding escapes, so the hook decoded them post-hoc. Both current paths now handle decoding themselves — markdown-it's inline tokenizer stores the decoded character in text_special.content (subject to #1142 above), and the browser's HTML5 parser decodes entities in raw <img alt="..."> at parse time. The removed call additionally applied markdown backslash-escape rules to raw-HTML attribute values, which was a category error (markdown semantics on HTML-authored attributes).

Test plan

  • npm run lint (Prettier + ESLint) passes
  • npm run test:only -- parseTasklist — 11 tests pass (incl. silent-mode contract)
  • npm run test:only -- Markdown — 28 tests pass (Markdown 13 + parseTasklist 11 + Preview/quick/Markdown 4)
  • npm run build clean (only preexisting bundle-size warnings)
  • Visual smoke (light pass — integration tests cover most cases):
    • <Bucket/Summarize> — render a quilt_summarize.json README block
    • <Preview/loaders/Markdown> — open a .md file preview
    • <Preview/renderers/Markdown> — render an inline preview
    • <Chat/Message> — send a chat message with markdown
    • <Assistant/UI/Chat/Chat> — assistant response with code fences

Out of scope

  • Replacing parseTasklist with markdown-it-task-lists (would change rendered output: <input> checkboxes vs ☐/☑ glyphs; the maintained plugins are also stale, last updated 8+ years ago).
  • Re-enabling abbr / mark / sub / sup / ins / deflist / footnote parser features via the matching markdown-it-* plugins (the corresponding tags are already in ALLOWED_TAGS).
  • unified / remark + rehype migration (drops dangerouslySetInnerHTML but is a much larger refactor across 5 consumers).
  • Lazy-loading <Markdown> for bundle-size reduction (~250 KB gzipped) — independent of this swap.

Rollback

Single PR, single revert if anything regresses. No data migrations, no persisted state.

Greptile Summary

This PR replaces the unmaintained remarkable library with markdown-it across the catalog's markdown rendering pipeline. The migration is near-drop-in, preserving all existing output for supported syntax while gaining proper CommonMark/GFM conformance.

  • parseTasklist is ported to markdown-it's StateInline API with a correct silent-mode guard (no token pushes during lookahead), a word-boundary check to prevent mid-word [x] matches, and two distinct token type constants (tasklist_checked / tasklist_unchecked) replacing the old single mutable token.
  • Markdown.tsx drops the remarkable-era unescapeMd post-processing of alt attributes; linkify and typographer are now options rather than plugins; 's' is added to ALLOWED_TAGS for markdown-it's ~~strikethrough~~ output.
  • Tests are expanded with 9 new inline-snapshot cases covering code highlight, linkify, nofollow, all tasklist variants, silent-mode contract, nested-image-label regression, typographer, and DOMPurify script-stripping.

Confidence Score: 5/5

Safe to merge — the migration is a clean swap with no functional regressions identified; all previously flagged concerns have been addressed in this branch.

The silent-mode contract in parseTasklist is correctly implemented: tokens are not pushed during lookahead, state.pos is advanced as markdown-it's skipToken requires, and the spec explicitly verifies both branches. The word-boundary guard closes the mid-word false-positive. The unescapeMd removal is justified by markdown-it decoding at the token level for parser-generated images and the browser's HTML5 parser handling entities in raw img alt attributes. The ALLOWED_TAGS addition of 's' is narrowly correct for strikethrough. Test coverage is meaningfully extended for every changed behaviour.

No files require special attention.

Important Files Changed

Filename Overview
catalog/app/components/Markdown/parseTasklist.ts Ported from Remarkable's StateInline to markdown-it's StateInline; adds silent-mode guard, word-boundary check, and two distinct token type constants
catalog/app/components/Markdown/Markdown.tsx Swaps Remarkable constructor + linkify plugin for MarkdownIt constructor with linkify/typographer options; removes remarkable-era unescapeMd call from handleImage; adds 's' to ALLOWED_TAGS for strikethrough
catalog/app/components/Markdown/Markdown.spec.ts Converts existing toBe assertions to toMatchInlineSnapshot; adds 9 new feature-coverage cases
catalog/app/components/Markdown/parseTasklist.spec.ts Updated to markdown-it's StateInline API; adds silent-mode contract tests, uppercase-X test, and mid-word/boundary tests
catalog/app/components/Preview/loaders/Markdown.jsx Removes stale comment referencing remarkable@1's async limitation; no functional change
catalog/package.json Replaces remarkable + @types/remarkable with markdown-it + @types/markdown-it

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["markdown source string"] --> B["MarkdownIt.render()"]
    B --> B1["inline rules chain"]
    B1 --> B2["link rule"]
    B1 --> B3["escape rule"]
    B1 --> B4["tasklist.parse (pushed last)"]
    B4 --> B4a{"isAtBoundary?"}
    B4a -- "no" --> B4b["return false (skip)"]
    B4a -- "yes" --> B4c{"silent mode?"}
    B4c -- "yes" --> B4d["advance pos, return true (no token pushed)"]
    B4c -- "no" --> B4e["push tasklist_checked or tasklist_unchecked token"]
    B --> C["renderer.rules: tasklist_checked→☑, tasklist_unchecked→☐"]
    C --> D["HTML string"]
    D --> E["DOMPurify.sanitize(html, SANITIZE_OPTS)"]
    E --> E1["uponSanitizeElement hook → handleLink / handleImage"]
    E --> F["sanitized HTML string"]
    F --> G["dangerouslySetInnerHTML in Container"]
Loading

Reviews (5): Last reviewed commit: "Merge branch 'master' into worktree-rema..." | Re-trigger Greptile

fiskus added 2 commits May 8, 2026 18:10
remarkable is unmaintained (last release ~2020); markdown-it is the
de-facto successor and shares the same linkify-it engine, so URL detection
behavior is preserved. Mechanical near-drop-in: linkify plugin → linkify
option, unescapeMd → unescapeAll, state.push({obj}) → state.push(type,
tag, nesting). Custom image renderer added so backslash-escaped
punctuation in alt text is preserved (markdown-it's default
renderInlineAsText skips text_special tokens). Allowlist gains `s` for
~~strike~~; documented in a NOTE so future plugin additions and raw-HTML
pass-through are explicit. Markdown.spec.ts gains feature-coverage
regression tests (linkify, typographer, code highlight, tasklist,
DOMPurify, alt escapes).
@codecov
Copy link
Copy Markdown

codecov Bot commented May 8, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 46.55%. Comparing base (aadae4b) to head (a96c614).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4884      +/-   ##
==========================================
- Coverage   46.55%   46.55%   -0.01%     
==========================================
  Files         832      832              
  Lines       34131    34120      -11     
  Branches     5833     5833              
==========================================
- Hits        15890    15884       -6     
+ Misses      16239    16237       -2     
+ Partials     2002     1999       -3     
Flag Coverage Δ
api-python 93.14% <ø> (ø)
catalog 21.55% <100.00%> (-0.02%) ⬇️
lambda 96.63% <ø> (ø)
py-shared 98.18% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@fiskus fiskus marked this pull request as ready for review May 8, 2026 16:14
Comment thread catalog/app/components/Markdown/Markdown.tsx Outdated
Comment thread catalog/app/components/Markdown/Markdown.tsx Outdated
…nescapeAll

Address Greptile inline review on PR #4884.

- `imageAltHandler` now mirrors markdown-it's own `renderInlineAsText`
  semantics for multi-line alt text by mapping `softbreak`/`hardbreak`
  children to `'\n'` instead of dropping them.
- Add a comment to `handleImage` explaining that `unescapeAll` is a
  no-op for markdown images (already decoded by `imageAltHandler`) but
  still does work for raw HTML `<img alt="...">` written directly in
  markdown source — preserving pre-migration behavior.
- New regression test covering the softbreak case.
@fiskus
Copy link
Copy Markdown
Member Author

fiskus commented May 8, 2026

@greptileai please re-review and update the confidence score

fiskus added 2 commits May 11, 2026 13:44
* parseTasklist: switch to `startsWith`, separate token types
  (`tasklist_checked`/`tasklist_unchecked`), add word-boundary check so
  `a[x]b` is left as text. Renamed exports to `parse`/`CHECKED`/`UNCHECKED`.
* Drop `imageAltHandler` and the `unescapeAll(alt)` call in `handleImage`;
  use markdown-it defaults. Loses backslash-escape decoding inside alt
  text (rare edge case) but keeps DOMPurify XSS protection unchanged.
* Update SANITIZE_OPTS comment: strikethrough is part of the default
  preset, and add `input` to the raw-HTML examples list.
Comments should describe current state, not contrast with prior libraries.
@fiskus
Copy link
Copy Markdown
Member Author

fiskus commented May 11, 2026

@greptileai please re-review. Notable changes since last review:

  • Dropped imageAltHandler (alt-text escapes/entities/code-spans now hit the markdown-it#1142 limitation; accepted as too rare to warrant the workaround).
  • Dropped unescapeMd(alt) post-processing in handleImage — was applying markdown semantics to raw HTML attribute values.
  • Reworked parseTasklist: startsWith matching, two distinct token types (tasklist_checked/tasklist_unchecked), word-boundary check (mid-word a[x]b no longer renders a checkbox).
  • Added boundary/escape/mid-word tests in parseTasklist.spec.ts and Markdown.spec.ts.

PR description updated to reflect the current state.

fiskus added 3 commits May 11, 2026 14:59
Convert the eight feature-coverage tests added in this PR to the
input/output template-literal + exact toBe() form used by the
pre-existing tests. Names are capitalized verb phrases. Exact HTML
output replaces partial toContain/toMatch checks, so future markdown-it
upgrades will surface unintended rendering changes.
Test names read as the subject of an implicit "it" — they should agree
in person/number. Convert "Process / Detect / Strip / ..." to
"Processes / Detects / Strips / ..." across Markdown.spec.ts and
parseTasklist.spec.ts. Negative-form names already use "Does not …",
which is the correct third-person inflection of "do".
Pin exact HTML output via toMatchInlineSnapshot instead of hand-written
output strings. Maintenance under markdown-it / highlight.js upgrades
is now `vitest --update` plus a PR diff review, instead of editing the
expectations by hand and risking copy-paste drift.
@fiskus
Copy link
Copy Markdown
Member Author

fiskus commented May 11, 2026

@greptileai please re-review. Changes since last review:

  • Renderer test assertions in Markdown.spec.ts converted to toMatchInlineSnapshot — same byte-exact coverage, less brittle to maintain on future markdown-it / highlight.js upgrades.
  • Test names normalized to third-person singular (Renders, Strips, Detects, ...).
  • Stale remarkable reference removed from the SANITIZE_OPTS comment block.
  • PR description softened: "Known limitation (accepted)" → "Note on alt text"; markdown-it#1142 framed as default behavior rather than a defect.

Drop the second `renderPlain` renderer. The test-fixture processors now
return scheme-valid URLs (`https://link/...`, `https://image/...`) so a
single `render` instance works for tests that exercise processor
behavior and tests that don't — the URLs survive DOMPurify's URL
validator either way. Snapshots regenerated.
@fiskus fiskus changed the title refactor(catalog): swap remarkable for markdown-it Swap remarkable for markdown-it May 11, 2026
fiskus added 3 commits May 11, 2026 16:31
markdown-it calls inline rules with `silent=true` during lookahead via
`ParserInline.skipToken` — used internally by `parseLinkLabel` when
scanning inner tokens of image/link labels. Without the guard, the
rule was pushing tokens during this scan, corrupting state.tokens:

- `![ [x] image](url)` rendered `☑<img alt="image">` (spurious glyph
  before the image, alt content stripped).
- `[ [x] inner](url)` rendered `☑[ ☑ inner](url)` (no link, doubled
  glyph).

Fix follows the pattern used by markdown-it's own `link.mjs`: still
advance `state.pos` and return true/false, but only `state.push(...)`
when `!silent`. Adds two unit tests for the silent-mode contract and
one integration snapshot for the nested-image-label case.

Reported by greptile-apps on PR #4884.
Reframe the remarkable→markdown-it entry around CommonMark + GFM
conformance and enumerate the non-standard Pandoc / PHP-Markdown-Extra
shortcut syntaxes that no longer render as markdown (raw inline HTML for
the corresponding tags still works).
@fiskus
Copy link
Copy Markdown
Member Author

fiskus commented May 11, 2026

@greptileai please re-review

drernie
drernie previously approved these changes May 11, 2026
nl0
nl0 previously approved these changes May 12, 2026
Comment thread catalog/CHANGELOG.md Outdated
Co-authored-by: Alexei Mochalov <[email protected]>
@fiskus fiskus dismissed stale reviews from nl0 and drernie via 5cb90d1 May 12, 2026 11:15
@fiskus fiskus enabled auto-merge May 12, 2026 11:31
@fiskus fiskus requested review from drernie and nl0 May 12, 2026 11:31
@fiskus fiskus added this pull request to the merge queue May 12, 2026
Merged via the queue into master with commit 3b1ba5b May 12, 2026
46 checks passed
@fiskus fiskus deleted the worktree-remarkable-to-markdown-it branch May 12, 2026 12:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants