Skip to content

Fix SearchTimeline 404 + defensive parsing fixes#419

Open
sakenuGOD wants to merge 9 commits intod60:mainfrom
sakenuGOD:fix/searchtimeline-404-plus-defensive-parsing
Open

Fix SearchTimeline 404 + defensive parsing fixes#419
sakenuGOD wants to merge 9 commits intod60:mainfrom
sakenuGOD:fix/searchtimeline-404-plus-defensive-parsing

Conversation

@sakenuGOD
Copy link
Copy Markdown

@sakenuGOD sakenuGOD commented Apr 18, 2026

Summary

Four independent fixes that together restore client.search_tweet() and harden get_tweet_by_id() against schema drift on live x.com. Each commit is self-contained and can be reviewed / reverted independently.

  1. Refresh SearchTimeline queryId/features/variables — X rotated the SearchTimeline GraphQL endpoint. Old doc_id (flaR-PUMshxFWZWPNpq4zA) now 404s, and X also expanded the accepted feature set and added a new withGrokTranslatedBio variable. Without all three in sync the endpoint returns 404 wholesale (not a partial shape), so search_tweet() has been completely non-functional on the current x.com. Adds a dedicated SEARCH_TIMELINE_FEATURES constant instead of extending the global FEATURES — avoids side effects on every other endpoint.

  2. Defensively parse trailing cursor in get_tweet_by_id — X serves two shapes for the trailing cursor entry in TweetDetail: legacy content.itemContent.value and the newer flatter content.value. The current code reads the legacy path unconditionally and raises KeyError: 'itemContent' for any tweet served with the new shape — which aborts the whole get_tweet_by_id() call before tweet.replies is populated, even though the reply entries themselves were parsed fine. Read both shapes; fall back to _fetch_more_replies=None when neither is present (pagination of further replies is the only thing actually affected).

  3. Guard against recursion in _get_user_state on rate-limitrequest() calls _get_user_state() on 429 to distinguish TooManyRequests from AccountSuspended. But that call routes back through request(), and X rate-limits the account (not per-endpoint), so the nested call also 429s and we recurse until Python raises RecursionError. The real 429 is thus masked by an unrelated crash. Trap exceptions from the nested v11.user_state() call and fall back to 'normal'; the outer request() still raises TooManyRequests correctly.

  4. Update ondemand.s hash extraction for current webpack bundleClientTransaction.init() fails with Couldn't get KEY_BYTE indices on every run, causing the generator to fall back to a dummy X-Client-Transaction-Id. The dummy is accepted by HomeTimeline/UserTweets but rejected with 404 by SearchTimeline (selective endpoint enforcement, nearly invisible from the traceback). Root cause is a webpack bundle layout change: \"ondemand.s\":\"HASH\" is no longer a contiguous pair; the chunk id and the hash are keyed separately. Two-step lookup mirrors the strategy from the x-client-transaction-id PyPI package. After this fix, combined with (1), real search queries come back with real results.

Test plan

  • Manually verified (2)-fix on a live tweet that previously raised KeyError('itemContent') from get_tweet_by_id(); it now returns successfully with all replies parsed.
  • Manually verified (4)-fix by calling ClientTransaction.init() against live x.com — indices are extracted correctly and generate_transaction_id() returns a valid 94-char token.
  • Manually verified (1)+(4) together by running client.search_tweet(\"streetwear\", \"Top\"), client.search_tweet(\"japanese fashion\", \"Top\"), and four more queries — each returns populated results where previously every query returned HTTP 404.
  • (3) reproduced pre-fix by deliberately throttling an account and observing the RecursionError; post-fix the expected TooManyRequests is raised instead.

No new dependencies. No breaking API changes.

Scope note

These are all defensive / data-plane fixes. No changes to public method signatures, no new features, no migrations.

Summary by Sourcery

Restore search timeline functionality against current x.com schema and harden tweet retrieval and client transaction handling against recent platform changes.

Bug Fixes:

  • Handle both legacy and new cursor shapes in get_tweet_by_id to avoid KeyError and preserve replies parsing when pagination metadata changes.
  • Prevent recursive 429 handling in _get_user_state by treating failures of the nested user_state call as a normal state so TooManyRequests is raised correctly.
  • Update ondemand.s hash detection to match the current webpack bundle structure so real X-Client-Transaction-Id values can be generated instead of failing with missing KEY_BYTE indices.
  • Update SearchTimeline GraphQL endpoint, features set, and variables (including withGrokTranslatedBio) so search_timeline/search_tweet no longer return 404 responses.

Enhancements:

  • Introduce a dedicated SEARCH_TIMELINE_FEATURES constant to scope search-specific feature flags without affecting other GraphQL endpoints.

Summary by CodeRabbit

  • Bug Fixes

    • Improved reply pagination to handle multiple cursor formats and avoid broken fetches when entries are missing.
    • Enhanced rate-limit and user-state handling to return a normal state on certain errors and prevent recursive failures.
    • Fixed asset/index resolution for on-demand bundles to avoid missed resources.
  • New Features

    • Added a dedicated feature configuration for the search timeline and updated its GraphQL request parameters (including bio translation).

X rotated the SearchTimeline GraphQL endpoint; the previous
`flaR-PUMshxFWZWPNpq4zA` doc_id now returns 404 for every request.
Replacing it with the current doc_id `R0u1RWRf748KzyGBXvOYRA` is not
enough on its own — X also expanded the accepted feature set and added
a new field to the variables. Without all three changes in sync the
endpoint responds 404 (not a partial shape), which made
`client.search_tweet()` completely unusable.

Changes:
- `constants.py`: add `SEARCH_TIMELINE_FEATURES` (37 flags), the
  current set captured from a live x.com DevTools request.
- `gql.py::Endpoint.SEARCH_TIMELINE`: point at the new doc_id.
- `gql.py::GQLClient.search_timeline`: add `withGrokTranslatedBio:
  True` to variables and use `SEARCH_TIMELINE_FEATURES` instead of the
  generic `FEATURES` constant.

Introducing a dedicated feature constant (instead of extending
`FEATURES` in place) avoids side effects on every other endpoint that
still consumes the original set.
X serves two shapes for the trailing cursor entry in TweetDetail:

- legacy: `entries[-1].content.itemContent.value`
- current: `entries[-1].content.value` (TimelineTimelineCursor without an
  itemContent wrapper)

The current code reads the legacy path unconditionally and raises
`KeyError: 'itemContent'` for any tweet returned with the new shape.
That aborts the whole `get_tweet_by_id` call before `tweet.replies` is
populated, so callers see an exception even though the reply entries
were already parsed successfully above — pagination of *further*
replies is the only thing that actually needs the cursor.

Read both shapes, and fall back to `_fetch_more_replies = None` when
neither is present. The function now returns successfully in all three
cases (legacy cursor, new cursor, no cursor), and only pagination is
disabled when the shape is unknown.
When `request()` sees a 429 it calls `_get_user_state()` to distinguish
`TooManyRequests` from `AccountSuspended`. But `_get_user_state()`
itself makes an HTTP call routed back through `request()`, and X rate
limits the *account* (not per-endpoint), so the nested call also
returns 429 — we re-enter the same branch and recurse until Python
raises `RecursionError`.

The end result is that an ordinary rate-limit gets masked by a
`RecursionError` traceback from deep inside the library, which is
misleading and hard to diagnose in user code.

Trap any exception from the nested `v11.user_state()` call and fall
back to 'normal'. The outer `request()` still raises
`TooManyRequests` for the original 429 — callers see the correct
exception. The only cost is that we may miss a suspension signal
during a rate-limit window, but the next non-throttled call will
detect it correctly.
ClientTransaction.init() fails with "Couldn't get KEY_BYTE indices" on
every run against the live x.com homepage, which in turn causes the
generator to fall back to a dummy X-Client-Transaction-Id header. The
dummy value is accepted by some endpoints (HomeTimeline, UserTweets)
but rejected with HTTP 404 by others (notably SearchTimeline) — a
difference which is invisible from the traceback and very hard to
debug in user code.

Root cause: webpack bundle layout change. The old layout shipped
`"ondemand.s":"HASH"` as a contiguous key/value pair, which the old
regex matched directly. The current layout splits that into two
entries keyed by a chunk id:

    ,123:"ondemand.s"...
    ...
    ,123:"7a3c9e1b"

Two-step lookup matches the chunk id from the `ondemand.s` label, then
finds the hash that was emitted against the same id. Mirrors the
strategy used by the `x-client-transaction-id` PyPI package against
the same bundle.

With this change, combined with the earlier SearchTimeline refresh,
`client.search_tweet()` returns results again on live X.
@sourcery-ai
Copy link
Copy Markdown

sourcery-ai Bot commented Apr 18, 2026

Reviewer's Guide

Restores the SearchTimeline GraphQL endpoint, hardens tweet detail parsing, prevents recursive rate-limit handling in user state checks, and updates X client transaction hash extraction to match the current webpack bundle, collectively making search_tweet() and get_tweet_by_id() robust against recent x.com changes.

Sequence diagram for updated search_timeline GraphQL request

sequenceDiagram
    actor Developer
    participant Client
    participant ClientTransaction
    participant XAPI as X_GraphQL_API

    Developer->>Client: search_timeline(query, product, count, cursor)
    Client->>Client: Build variables
    Client->>Client: Set withGrokTranslatedBio = True
    Client->>Client: Select SEARCH_TIMELINE_FEATURES

    Client->>ClientTransaction: init() / generate_transaction_id()
    ClientTransaction->>ClientTransaction: get_indices(home_page_response)
    ClientTransaction->>XAPI: GET ondemand.s.<hash>a.js
    XAPI-->>ClientTransaction: ondemand script with key byte indices
    ClientTransaction-->>Client: X-Client-Transaction-Id

    Client->>XAPI: gql_get(Endpoint.SEARCH_TIMELINE, variables, SEARCH_TIMELINE_FEATURES)
    XAPI-->>Client: SearchTimeline response
    Client-->>Developer: Parsed search results
Loading

Updated class diagram for Client and ClientTransaction methods

classDiagram
    class Client {
        +async get_tweet_by_id(tweet_id)
        +async search_timeline(query, product, count, cursor)
        +async _get_user_state() Literal_normal_bounced_suspended
        +async _get_more_replies(tweet_id, cursor)
    }

    class ClientTransaction {
        +home_page_response
        +async get_indices(home_page_response, session, headers)
        +async init()
        +generate_transaction_id()
    }

    class V11API {
        +async user_state()
    }

    Client --> V11API : uses
    Client --> ClientTransaction : uses for X_Client_Transaction_Id
Loading

File-Level Changes

Change Details Files
Defensive parsing of trailing cursor in get_tweet_by_id to support both legacy and new TweetDetail cursor shapes and avoid hard failures when only pagination is affected.
  • Initialize reply_next_cursor and _fetch_more_replies to None before cursor handling.
  • Check that entries exists and last entryId starts with 'cursor' before accessing content.
  • Safely read cursor value from either content.itemContent.value or content.value, depending on which shape is present.
  • Only construct _get_more_replies partial when a cursor value was successfully extracted.
twikit/client/client.py
Prevent recursion and spurious RecursionError when resolving user state on HTTP 429 rate-limits.
  • Wrap v11.user_state() call in _get_user_state in a try/except block.
  • Return response['userState'] on success, otherwise fall back to 'normal' when any exception is raised.
  • Rely on the outer request() logic to still raise TooManyRequests while avoiding recursive calls on repeated 429s.
twikit/client/client.py
Update SearchTimeline GraphQL endpoint configuration and features to match current x.com requirements.
  • Introduce a dedicated SEARCH_TIMELINE_FEATURES dict containing the feature flags expected by the SearchTimeline endpoint.
  • Switch search_timeline() to send SEARCH_TIMELINE_FEATURES instead of the global FEATURES to avoid side effects on other endpoints.
  • Add withGrokTranslatedBio=True to the GraphQL variables for SearchTimeline requests.
  • Update Endpoint.SEARCH_TIMELINE to use the new query id path that no longer returns 404.
twikit/constants.py
twikit/client/gql.py
Adapt ondemand.s hash extraction in ClientTransaction to current webpack bundle layout so that real X-Client-Transaction-Id values can be generated.
  • Change ON_DEMAND_FILE_REGEX to capture the webpack chunk id associated with 'ondemand.s' instead of assuming the hash is adjacent.
  • Introduce ON_DEMAND_HASH_PATTERN and search the home page HTML for the hash keyed by the captured chunk id.
  • Construct the ondemand.s URL using the resolved hash and request the corresponding JS bundle.
  • Parse KEY_BYTE indices from the updated ondemand bundle response text and raise an exception if no indices are found.
twikit/x_client_transaction/transaction.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 18, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 11126127-f99f-4a9a-a7ce-cc8dd295c476

📥 Commits

Reviewing files that changed from the base of the PR and between 5aa7493 and e9b5acf.

📒 Files selected for processing (3)
  • twikit/client/client.py
  • twikit/client/v11.py
  • twikit/x_client_transaction/transaction.py

📝 Walkthrough

Walkthrough

Added a check_user_state flag to Client.request to avoid recursive user-state checks; reply pagination and get_tweet_by_id now only derive cursors when entries exist and support nested or flat content shapes; _get_user_state catches rate/transport errors and returns 'normal'; added SEARCH_TIMELINE_FEATURES and adjusted search GraphQL; webpack ondemand resolution now does an id→hash lookup.

Changes

Cohort / File(s) Summary
Client request, reply pagination & user state
twikit/client/client.py
Added check_user_state: bool = True to Client.request and gated 429→AccountSuspended behind it to avoid recursion; reply pagination and get_tweet_by_id now only compute cursors when entries exist and extract cursor from content.itemContent.value or content.value; _get_user_state wraps calls in try/except, invokes self.v11.user_state(check_user_state=False), and returns 'normal' on TooManyRequests or transport HTTPError.
Search Timeline GraphQL
twikit/client/gql.py
Changed Endpoint.SEARCH_TIMELINE identifier, added withGrokTranslatedBio: True to GraphQL variables, and switched the features set to SEARCH_TIMELINE_FEATURES in search_timeline(...).
Search Timeline Features
twikit/constants.py
Added new exported constant SEARCH_TIMELINE_FEATURES containing feature-flag mappings used by the search timeline GraphQL call.
Webpack ondemand resolution
twikit/x_client_transaction/transaction.py
Refactored ClientTransaction.get_indices to a two-step resolution: extract numeric chunk id from page, lookup corresponding ondemand hash via a second regex, fail explicitly if hash missing, then fetch ondemand.s.{HASH}a.js and parse INDICES_REGEX.
V11 user_state signature
twikit/client/v11.py
Changed V11Client.user_state() signature to user_state(**kwargs) and forward **kwargs to self.base.get(...) to permit callers to pass request flags (e.g., check_user_state=False).

Sequence Diagram(s)

sequenceDiagram
    participant Client as ClientTransaction
    participant Page as Page HTML
    participant HashSearch as Hash Resolver
    participant CDN as CDN / ondemand.s.{HASH}a.js
    Client->>Page: fetch page response text
    Page-->>Client: response_text
    Client->>HashSearch: extract numeric chunk id (ON_DEMAND_FILE_REGEX)
    HashSearch-->>Client: chunk id
    Client->>HashSearch: search response_text for hash (ON_DEMAND_HASH_PATTERN with id)
    HashSearch-->>Client: hash (if found)
    alt hash found
        Client->>CDN: fetch ondemand.s.{HASH}a.js
        CDN-->>Client: JS bundle
        Client->>Client: parse indices from bundle (INDICES_REGEX)
    else no hash
        Client-->>Client: abort indices fetch / return no indices
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I hopped through cursors, nested and flat,
Skipped recursive checks with a cautious pat,
Found chunk ids, then chased their hash,
Fetched ondemand, parsed with a dash—
A rabbit's fix, a tidy patch. 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 18.18% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main objectives: fixing a SearchTimeline 404 error and adding defensive parsing fixes for cursor handling and recursion prevention.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • The broad except Exception in _get_user_state risks silently treating non-rate-limit failures (e.g. JSON shape changes, network errors, or auth issues) as 'normal'; consider narrowing the exception to the specific 429 path (or HTTP/transport errors) and at least logging unexpected exceptions for visibility.
  • In get_tweet_by_id, you can further harden the cursor parsing by using entries[-1].get('entryId', '') instead of indexing ['entryId'] directly, which will avoid a KeyError if X ever emits a cursor-like entry without that key.
  • The new SEARCH_TIMELINE_FEATURES largely overlaps with the global FEATURES; consider building it from a shared base (e.g. {**FEATURES, **overrides}) to reduce the chance of configuration drift when feature flags change in other endpoints.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The broad `except Exception` in `_get_user_state` risks silently treating non-rate-limit failures (e.g. JSON shape changes, network errors, or auth issues) as `'normal'`; consider narrowing the exception to the specific 429 path (or HTTP/transport errors) and at least logging unexpected exceptions for visibility.
- In `get_tweet_by_id`, you can further harden the cursor parsing by using `entries[-1].get('entryId', '')` instead of indexing `['entryId']` directly, which will avoid a `KeyError` if X ever emits a cursor-like entry without that key.
- The new `SEARCH_TIMELINE_FEATURES` largely overlaps with the global `FEATURES`; consider building it from a shared base (e.g. `{**FEATURES, **overrides}`) to reduce the chance of configuration drift when feature flags change in other endpoints.

## Individual Comments

### Comment 1
<location path="twikit/client/client.py" line_range="4344-4347" />
<code_context>
+        # Trap any exception from the nested call and fall back to
+        # 'normal'. The original 429 is still raised by the outer
+        # `request()`; we just avoid turning it into a recursive crash.
+        try:
+            response, _ = await self.v11.user_state()
+            return response['userState']
+        except Exception:
+            return 'normal'
</code_context>
<issue_to_address>
**issue (bug_risk):** Catching `Exception` broadly in `_get_user_state` can hide unrelated bugs and runtime issues.

The recursion fix is good, but this `except Exception` will also hide programming and runtime errors (e.g. unexpected response structure, JSON parsing issues) that should fail loudly. Consider catching only the expected failure modes (e.g. rate-limit/network exceptions, `RecursionError`) or re-raising non-network exceptions so that genuine bugs aren’t masked:

```python
try:
    response, _ = await self.v11.user_state()
    return response['userState']
except ExpectedRateLimitErrors:
    return 'normal'
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread twikit/client/client.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
twikit/client/client.py (1)

4332-4348: Recursion guard looks correct; consider narrowing the except.

The fallback-to-'normal' cleanly breaks the recursion when v11.user_state() itself trips the 429 path inside request(), and the outer request() still re-raises the original TooManyRequests. Behavior LGTM.

Two optional notes (not blockers):

  • The blanket except Exception (Ruff BLE001) will also swallow programmer errors such as KeyError: 'userState' if the response schema drifts, silently classifying a suspended account as 'normal'. Consider narrowing to the exceptions you actually expect (TwitterException, httpx.HTTPError, KeyError, RecursionError) so genuine bugs surface.
  • A simple in-flight reentrancy guard (e.g. an asyncio.Lock or a boolean flag set on self) would prevent the recursion at the source rather than relying on it raising — but the current approach is already sufficient for the stated bug.
♻️ Optional narrower exception handling
-        try:
-            response, _ = await self.v11.user_state()
-            return response['userState']
-        except Exception:
-            return 'normal'
+        try:
+            response, _ = await self.v11.user_state()
+            return response['userState']
+        except (TwitterException, RecursionError):
+            # Nested call hit a rate limit / recursion; assume normal so the
+            # outer request() raises the original TooManyRequests.
+            return 'normal'
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@twikit/client/client.py` around lines 4332 - 4348, The current broad except
in _get_user_state swallows unrelated errors; change the handler in async def
_get_user_state(self) to catch only expected exceptions (e.g., TwitterException,
httpx.HTTPError, KeyError, RecursionError) instead of bare Exception so schema
or programmer errors surface; locate the call to self.v11.user_state() inside
_get_user_state and replace the blanket except with an except tuple listing
those specific exception types and return 'normal' only for those cases.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@twikit/client/client.py`:
- Around line 1633-1650: get_tweet_by_id added defensive cursor extraction for
two cursor shapes but _get_more_replies still assumes the legacy shape and
unconditionally accesses entries[-1]['content']['itemContent']['value'], causing
KeyError for the newer flat shape; update _get_more_replies to mirror the same
logic used in get_tweet_by_id: inspect entries[-1].get('content') or {}, check
for content.get('itemContent') being a dict with 'value' and fall back to
content['value'] if present, then use that extracted value as next_cursor before
proceeding (refer to _get_more_replies, get_tweet_by_id, entries,
reply_next_cursor).

In `@twikit/x_client_transaction/transaction.py`:
- Around line 20-22: The regexes require a leading comma and the hash pattern
only allows double quotes; update ON_DEMAND_FILE_REGEX and
ON_DEMAND_HASH_PATTERN so the leading comma is optional (so entries at the start
like {123:"ondemand.s"} match) and both single and double quotes are accepted
for the hash; specifically change ON_DEMAND_FILE_REGEX to make the leading comma
optional and to use ['"] for quotes around ondemand.s, and change
ON_DEMAND_HASH_PATTERN to make its leading comma optional/allow start-of-object
and to accept either single or double quotes around the captured hex group
(while still interpolating the id placeholder {}).

---

Nitpick comments:
In `@twikit/client/client.py`:
- Around line 4332-4348: The current broad except in _get_user_state swallows
unrelated errors; change the handler in async def _get_user_state(self) to catch
only expected exceptions (e.g., TwitterException, httpx.HTTPError, KeyError,
RecursionError) instead of bare Exception so schema or programmer errors
surface; locate the call to self.v11.user_state() inside _get_user_state and
replace the blanket except with an except tuple listing those specific exception
types and return 'normal' only for those cases.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 677cb0db-3165-4a71-a268-5a09f8e82874

📥 Commits

Reviewing files that changed from the base of the PR and between c3b7220 and 0b1152d.

📒 Files selected for processing (4)
  • twikit/client/client.py
  • twikit/client/gql.py
  • twikit/constants.py
  • twikit/x_client_transaction/transaction.py

Comment thread twikit/client/client.py
Comment thread twikit/x_client_transaction/transaction.py Outdated
Follow-up to the earlier `get_tweet_by_id` fix. The new flatter
`content.value` cursor shape bypasses the parent call just fine, but
`_fetch_more_replies` is still bound to `_get_more_replies`, and that
method did an unconditional legacy-shape access — so the very first
`await tweet.replies.next()` would reintroduce
`KeyError: 'itemContent'`, defeating the point of the earlier commit.

Mirror the same two-shape handling here, and also harden the entryId
read on the main call with `.get('entryId', '')` for good measure.
Two robustness issues with the initial webpack-layout fix, both raised
in review:

- The leading-boundary was a hard comma, which misses valid chunk
  maps where `ondemand.s` happens to be the first key
  (`{123:"ondemand.s",...}`). Widening to `[,{]` covers both.
- The hash-lookup pattern only accepted double-quoted values, while
  the file-label pattern already accepted both quote styles. A
  mismatch would have silently skipped the hash on single-quoted
  builds. Accept `["']` on both sides.

Also escape the literal `{` in the hash pattern as `{{` and use a
named placeholder (`{chunk_id}`) so `str.format()` doesn't try to
parse the character class as an unnamed field (which raises
`ValueError: unexpected '{' in field name`).

Re-verified against live x.com: KEY_BYTE indices still extracted
correctly (`[22, 37, 13]`).
Review feedback: the bare `except Exception` was too wide — it would
also swallow genuine bugs (unexpected JSON shape, programming errors,
auth failures), making them indistinguishable from the rate-limit
scenario the guard is supposed to address.

Keep the trap targeted at the actual recursion-causing paths:

  * `TooManyRequests` — the nested v11 call hits the same
    account-level throttle and would otherwise re-enter this branch.
  * `RecursionError` — belt-and-braces if a different code path ever
    reintroduces the loop.
  * `HTTPError` — transport-level failures on the nested request
    should also fall back to 'normal' rather than masking the caller's
    original 429 with a networking traceback.

Unexpected exceptions keep propagating, so real bugs surface instead
of being silently converted into a 'normal' user state.
@sakenuGOD
Copy link
Copy Markdown
Author

Thanks for the reviews, both points were valid. Pushed three follow-up commits:

  • 49c178b — addresses @coderabbitai's note that _get_more_replies still had the legacy-only cursor access, so await tweet.replies.next() would have re-raised the same KeyError: 'itemContent' the parent commit was meant to eliminate. Mirrored the two-shape handling there, and switched both call sites to entries[-1].get('entryId', '') so a cursor-like entry missing the key falls back cleanly instead of throwing.
  • b9c5e26 — addresses @coderabbitai's other note: the ondemand regexes required a leading comma and accepted different quote styles on the two halves. Widened the leading boundary to [,{] so we also match when ondemand.s is the first entry in a webpack chunk map, and accept both quote styles on both halves. Also escaped the literal { in the hash pattern as {{ and switched to a named {chunk_id} placeholder so str.format() stops parsing the character class as a format field (otherwise ValueError: unexpected '{' in field name). Re-verified against live x.com — KEY_BYTE indices still extract correctly ([22, 37, 13]).
  • 5aa7493 — addresses @sourcery-ai's note on the broad except Exception in _get_user_state. Narrowed to TooManyRequests | RecursionError | HTTPError — these are the actual recursion-causing and transport paths; everything else (shape drift, programming errors, auth) now propagates so real bugs surface.

Didn't adopt the {**FEATURES, **overrides} suggestion for SEARCH_TIMELINE_FEATURES — the new endpoint's required set diverges quite a bit from the global FEATURES (different Grok flags, jetfuel_frame, content-disclosure, etc.), so a delta-over-base would be almost as long as the full list and would also need to remove keys from the base rather than just override. A flat constant felt clearer here, but happy to restructure if you'd prefer.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 18, 2026

Tip

For best results, initiate chat on the files or code changes.

@sakenuGOD: Thanks for the thorough follow-up — all three commits look good:

  • 49c178b — The mirrored two-shape cursor logic in _get_more_replies and the .get('entryId', '') fallback at both call sites are exactly the right fix. That was the last place where KeyError: 'itemContent' could still surface during paginated reply fetching.

  • b9c5e26 — The [,{] boundary widening, symmetric quote-style handling, and the {{ / named {chunk_id} placeholder are all correct. That ValueError: unexpected '{' in field name from str.format() parsing the regex character class would have been a hard-to-diagnose runtime crash, so catching it here is important.

  • 5aa7493 — Narrowing to TooManyRequests | RecursionError | HTTPError is the right trade-off. Letting auth errors, shape drift, and programming mistakes propagate keeps debugging tractable.

On SEARCH_TIMELINE_FEATURES: your reasoning holds. When a delta would require removing keys from the base (not just overriding values), a flat constant is unambiguously clearer than {k: v for k, v in {**FEATURES, **overrides}.items() if k not in remove_set}. The current approach is fine.

[approve]

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
twikit/client/client.py (1)

195-198: ⚠️ Potential issue | 🟠 Major

Avoid the recursive 429 loop before it reaches RecursionError.

Line 4360 still calls V11Client.user_state(), which routes through Client.request(). If that nested request also returns 429, Line 196 re-enters _get_user_state() repeatedly; the new RecursionError catch only recovers after the recursion limit and many extra HTTP calls.

🐛 Proposed fix: disable account-state rechecks for the nested user-state request
 async def request(
     self,
     method: str,
     url: str,
     auto_unlock: bool = True,
     raise_exception: bool = True,
+    check_user_state: bool = True,
     **kwargs
 ) -> tuple[dict | Any, Response]:
@@
             elif status_code == 429:
-                if await self._get_user_state() == 'suspended':
+                if check_user_state and await self._get_user_state() == 'suspended':
                     raise AccountSuspended(message, headers=response.headers)
                 raise TooManyRequests(message, headers=response.headers)
@@
         try:
-            response, _ = await self.v11.user_state()
+            response, _ = await self.v11.user_state(check_user_state=False)
             return response['userState']
-        except (TooManyRequests, RecursionError, HTTPError):
+        except (TooManyRequests, HTTPError):
             return 'normal'

And pass the flag through the v11 wrapper:

-async def user_state(self):
+async def user_state(self, **kwargs):
     return await self.base.get(
         Endpoint.USER_STATE,
-        headers=self.base._base_headers
+        headers=self.base._base_headers,
+        **kwargs
     )

Also applies to: 4359-4363

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@twikit/client/client.py` around lines 195 - 198, The error path in
Client.request -> status_code 429 calls self._get_user_state(), which itself
uses V11Client.user_state() that routes back through Client.request and can
cause a recursive loop of 429-handling calls; modify _get_user_state (and the
V11Client.user_state wrapper) to accept and pass a flag like
allow_account_state_recheck=False so that when you call V11Client.user_state
from within the 429 handling path you pass allow_account_state_recheck=False to
prevent re-invoking Client.request’s 429 handler; update the V11 wrapper
signature and calls accordingly so nested user-state lookups do not trigger
another account-state recheck and thus avoid recursion.
🧹 Nitpick comments (1)
twikit/x_client_transaction/transaction.py (1)

75-87: Consider distinguishing the "no hash found" failure mode.

When on_demand_file matches but hash_match is None (e.g., the bundle layout shifts again or the chunk id appears in the name map but its hash is emitted from a different chunk file), execution silently falls through to the generic raise Exception("Couldn't get KEY_BYTE indices") at line 87. Since this exception propagates from init() straight up through request() in client.py without any fallback, a more specific message (e.g., including the resolved chunk_id) would make future bundle-layout regressions much easier to diagnose from user bug reports.

♻️ Suggested diagnostic improvement
             if hash_match:
                 on_demand_file_url = (
                     f"https://abs.twimg.com/responsive-web/client-web/"
                     f"ondemand.s.{hash_match.group(1)}a.js"
                 )
                 on_demand_file_response = await session.request(
                     method="GET", url=on_demand_file_url, headers=headers)
                 key_byte_indices_match = INDICES_REGEX.finditer(
                     str(on_demand_file_response.text))
                 for item in key_byte_indices_match:
                     key_byte_indices.append(item.group(2))
+            else:
+                raise Exception(
+                    f"Couldn't resolve ondemand.s hash for chunk id {chunk_id}; "
+                    "webpack bundle layout may have changed again."
+                )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@twikit/x_client_transaction/transaction.py` around lines 75 - 87, The current
flow can silently hit the generic "Couldn't get KEY_BYTE indices" when
on_demand_file exists but hash_match is None; update the logic in init() where
hash_match is used (around on_demand_file_url and key_byte_indices collection)
to detect the hash_match==None case and raise a more specific exception that
includes the chunk_id (and optionally the on_demand_file_url) so callers like
request() in client.py receive a clear diagnostic; ensure you still raise the
original "Couldn't get KEY_BYTE indices" only if no key_byte_indices were
collected after attempting to fetch and parse with INDICES_REGEX.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@twikit/client/client.py`:
- Around line 195-198: The error path in Client.request -> status_code 429 calls
self._get_user_state(), which itself uses V11Client.user_state() that routes
back through Client.request and can cause a recursive loop of 429-handling
calls; modify _get_user_state (and the V11Client.user_state wrapper) to accept
and pass a flag like allow_account_state_recheck=False so that when you call
V11Client.user_state from within the 429 handling path you pass
allow_account_state_recheck=False to prevent re-invoking Client.request’s 429
handler; update the V11 wrapper signature and calls accordingly so nested
user-state lookups do not trigger another account-state recheck and thus avoid
recursion.

---

Nitpick comments:
In `@twikit/x_client_transaction/transaction.py`:
- Around line 75-87: The current flow can silently hit the generic "Couldn't get
KEY_BYTE indices" when on_demand_file exists but hash_match is None; update the
logic in init() where hash_match is used (around on_demand_file_url and
key_byte_indices collection) to detect the hash_match==None case and raise a
more specific exception that includes the chunk_id (and optionally the
on_demand_file_url) so callers like request() in client.py receive a clear
diagnostic; ensure you still raise the original "Couldn't get KEY_BYTE indices"
only if no key_byte_indices were collected after attempting to fetch and parse
with INDICES_REGEX.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: dbc5109c-0e1b-48c8-b654-e1261f89c8dc

📥 Commits

Reviewing files that changed from the base of the PR and between 0b1152d and 5aa7493.

📒 Files selected for processing (2)
  • twikit/client/client.py
  • twikit/x_client_transaction/transaction.py

ctrl-alt-raccoon pushed a commit to ctrl-alt-raccoon/twikit that referenced this pull request Apr 18, 2026
- PATCHES.md documents the cherry-picks that diverge this branch from
  upstream d60/twikit:main. Anyone landing here can see at a glance
  which commits are ours and why each exists.
- .github/workflows/drift-check.yml runs weekly, checks whether any of
  the upstream PRs listed in PATCHES.md have been merged, and opens an
  issue here when one has so we know to retire the cherry-pick.

Cherry-picks currently carried (see PATCHES.md for detail):
  - d60#419 (7 commits) — SearchTimeline queryId refresh +
    defensive parsing + 429 recursion guard + ondemand.s extractor.
  - d60#418 (1 commit)  — .get() for optional fields in
    User.__init__ and Client.request.
Follow-up to review feedback. The earlier `RecursionError` trap
catches the crash, but only *after* Python has walked the whole
recursion limit — so we still do ~1000 redundant HTTP calls against
X every time an account gets rate-limited. That's both slow and a
great way to earn an extra throttling window.

Cleaner: add a `check_user_state` keyword to `Client.request()`,
default `True`. The 429 recovery branch consults it before calling
`_get_user_state()`. When `_get_user_state()` itself dispatches the
nested `v11.user_state()` GET, it sets `check_user_state=False` so
that if the nested call also returns 429, `request()` raises
`TooManyRequests` straight away — no recursion, no retries,
correct exception bubbles up on the first round trip.

The `HTTPError` trap stays: a transport-level failure on the
nested GET still shouldn't mask the original 429 for the caller.
The `RecursionError` trap is gone — that code path no longer
exists once the flag breaks the loop at the source.

Threading the flag through required widening `V11Client.user_state`
to accept and forward `**kwargs` to `base.get`.
Nitpick follow-up. The previous two-step flow silently fell through
to "Couldn't get KEY_BYTE indices" when the label regex matched but
the hash pattern didn't, conflating two very different failure modes:

- label missing → webpack bundle layout drifted, both regexes need a
  look.
- label found, hash missing → only the hash pattern needs adjusting
  (typically a quote-style or chunk-id format change).

Splitting the two lets a future X-side rotation be diagnosed from
the traceback alone, instead of having to instrument `get_indices`.
The chunk id is included in the message so it's clear which mapping
to inspect in the page source.

The generic "Couldn't get KEY_BYTE indices" stays for the case where
indices parsing itself fails (bundle contents changed).
@sakenuGOD
Copy link
Copy Markdown
Author

Thanks @coderabbitai. Both valid, pushed:

  • 4615568 addresses the 429 recursion note. Added check_user_state: bool = True to Client.request(); the 429 recovery branch consults it before calling _get_user_state(), and _get_user_state() passes check_user_state=False into the nested v11.user_state() GET. So if the nested call also returns 429, request() raises TooManyRequests immediately instead of recursing. Dropped the RecursionError trap — the loop is broken at the source now, no reason to catch the crash it was catching. Threaded the kwarg through V11Client.user_state via **kwargs.

  • e9b5acf addresses the ondemand-hash nitpick. When the label regex matches but the hash regex doesn't, we now raise a distinct "Couldn't find ondemand.s hash for chunk id {id}" instead of falling through to the generic "KEY_BYTE indices" error — the two failure modes call for different fixes (full layout drift vs. just a quote-style/chunk-id tweak), and a specific message makes that obvious from the traceback alone. The generic error stays for the case where the indices parsing itself fails.

Both re-verified live: ClientTransaction.init() still returns indices correctly after the regex restructure ([34, 19, 12] on this run — values rotate per page load, that's expected).

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 18, 2026

Tip

For best results, initiate chat on the files or code changes.

@sakenuGOD: Both commits are clean and well-reasoned:

  • 4615568 — The check_user_state=False flag is the right architectural fix. Breaking the cycle at the source (preventing _get_user_state from ever making a nested request() call that re-enters the 429 path) is strictly better than catching RecursionError after the fact. Dropping the RecursionError trap is correct now that the loop can't form in the first place. Threading via **kwargs keeps the change minimal.

  • e9b5acf — Separate error messages for the two distinct failure modes is exactly right: "couldn't find hash for chunk id X" tells you the webpack layout is fine but the hash extraction broke, while the generic "KEY_BYTE indices" message tells you the file was fetched but the indices format changed. A single generic message would obscure which of the two steps failed.

Both look good — approving.

[approve]

@zainulabideen1122
Copy link
Copy Markdown

Can someone merge these fixes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants