Skip to content

Twitter extractor: hardcoded English aria-label breaks reply extraction for non-English interfaces #272

@nanma

Description

@nanma

Bug

TwitterExtractor uses a hardcoded English aria-label to locate the conversation timeline:

// src/extractors/twitter.ts line 16
const timeline = document.querySelector('[aria-label="Timeline: Conversation"]');

When the X/Twitter interface is set to a non-English language, the aria-label is localized (e.g. `"时间线:对话"` in Simplified Chinese), so `timeline` is always `null`. The extractor then falls back to finding a single `article[data-testid="tweet"]`, which locates the main tweet but leaves `replyTweets` empty — so replies are silently dropped even though the DOM is fully loaded.

To reproduce

  1. Set X/Twitter display language to any non-English language (e.g. Simplified Chinese)
  2. Open a tweet with replies loaded in the DOM
  3. Clip with Obsidian Web Clipper — only the main tweet is captured, replies are missing

Suggested fix

Match the container by a stable, language-independent attribute instead. Two options:

Option A — match on a substring common to all locales (the timeline container always contains cellInnerDiv children):

const timeline = Array.from(
    document.querySelectorAll('[aria-label]')
).find(el => el.querySelector('[data-testid="cellInnerDiv"]') !== null) ?? null;

Option B — partial aria-label match using a known locale map (more explicit, easier to extend):

A mapping of known localized values could be maintained, or a looser selector like [aria-label*="Conversation"] combined with a structural check.

Option A is simpler and doesn't rely on any string content at all.

Environment

  • defuddle version: 0.18.1
  • Observed with: Simplified Chinese (zh) interface on x.com

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions