Skip to content

fix: add transport retries to hosted_api_client to handle SQLite lock contention#21118

Draft
devin-ai-integration[bot] wants to merge 1 commit intomainfrom
devin/1773432765-fix-flaky-work-queue-sqlite-lock
Draft

fix: add transport retries to hosted_api_client to handle SQLite lock contention#21118
devin-ai-integration[bot] wants to merge 1 commit intomainfrom
devin/1773432765-fix-flaky-work-queue-sqlite-lock

Conversation

@devin-ai-integration
Copy link
Contributor

Fixes intermittent httpx.ReadError failures in tests using hosted_api_client, such as test_get_runs_in_queue_concurrency_limit_and_limit[1].

Root cause: The hosted_api_client fixture connects to a uvicorn server running in a subprocess that shares the same SQLite database as the test process. During parallel test execution (pytest-xdist), SQLite lock contention (database is locked) can cause the server's error handler to close the HTTP connection before the response is fully sent, resulting in httpx.ReadError on the client side.

Fix: Add httpx.AsyncHTTPTransport(retries=3) to the hosted_api_client fixture. This uses httpx's built-in transport retry mechanism, which retries only on connection-level errors (e.g., ReadError, ConnectError) — not on HTTP error responses (4xx/5xx). On retry, httpx establishes a new connection, recovering from the broken one.

This is a broader fix that addresses the issue for all tests using hosted_api_client rather than switching individual tests to ephemeral_client_with_lifespan.

Important review considerations

  • Scope: This affects all ~160 test references using hosted_api_client across many test files. Transport retries should be safe since they only fire on connection-level failures, but worth considering if any tests intentionally validate connection behavior.
  • Non-idempotent requests: If a POST request is processed by the server but the connection breaks before the response, the retry will re-send the POST, potentially causing a duplicate/conflict error instead of a ReadError.

Checklist

  • This pull request references any related issue by including "closes <link to issue>"
  • If this pull request adds new functionality, it includes unit tests that cover the changes
  • If this pull request removes docs files, it includes redirect settings in mint.json.
  • If this pull request adds functions or classes, it includes helpful docstrings.

Link to Devin session: https://app.devin.ai/sessions/62533fe09ad943fa936cf77f7de1abcb
Requested by: bot_apk (apk@cognition.ai)

… contention

The hosted_api_client fixture connects to a server running in a subprocess
that shares the same SQLite database as the test process. During parallel
test execution, SQLite lock contention can cause the server to close the
HTTP connection, resulting in httpx.ReadError on the client side.

Add transport-level retries (retries=3) to the httpx client, which
automatically retries on connection-level errors like ReadError by
establishing a new connection. This only retries transport-level failures,
not HTTP error responses, so it won't mask real test failures.

This is a broader fix that addresses SQLite lock contention for all tests
using hosted_api_client, rather than switching individual tests to
ephemeral_client_with_lifespan.

Co-authored-by: apk <apk@cognition.ai>
Co-Authored-By: bot_apk <apk@cognition.ai>
@devin-ai-integration
Copy link
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@devin-ai-integration devin-ai-integration bot added the development Tech debt, refactors, CI, tests, and other related work. label Mar 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

development Tech debt, refactors, CI, tests, and other related work.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants