Skip to content

feat: batch job pulling for native workers [ee]#8244

Draft
hugocasa wants to merge 6 commits intomainfrom
batch-pulling
Draft

feat: batch job pulling for native workers [ee]#8244
hugocasa wants to merge 6 commits intomainfrom
batch-pulling

Conversation

@hugocasa
Copy link
Collaborator

@hugocasa hugocasa commented Mar 5, 2026

Summary

Batch job pulling for native workers — server batch-fetches N jobs in one SELECT...FOR UPDATE SKIP LOCKED LIMIT N query instead of N individual queries from N subworkers.

  • Server batch-fetches N jobs in one query (N = count of native subworkers from worker_ping)
  • Native workers pop from in-memory buffer via existing pull_job HTTP endpoint (no DB hit per pull)
  • Dedicated BATCH_PULL_URL env var (auto-detected in standalone mode)
  • Self-signed JWT with exp claim for native worker auth
  • ScriptLang::as_worker_tag() helper to centralize bunnative→nativets tag logic

Benchmark Results (3 native workers = 24 subworkers)

nativets — 10,000 fast jobs:

Metric Direct SQL Batch Pull
Throughput 253 jobs/s 291 jobs/s (+15%)
Duration 39.5s 34.3s

nativets — 1,000 fast jobs:

Metric Direct SQL Batch Pull
Throughput 288 jobs/s 272 jobs/s
Pull queries 20,367 4,801 (-76%)
Tuples returned 28.7M 3.4M (-88%)
Cache hits 1.3M 280K (-78%)

nativets_sleep (500ms avg) — 1,000 jobs:

Metric Direct SQL Batch Pull
Throughput 43.8 jobs/s 43.8 jobs/s (same — execution time dominates)
Disk reads 204 blocks 115 blocks (-44%)

Projected scaling (from model calibrated on real data):

Native workers (subworkers) Batch advantage (fast jobs)
1 (8) +23%
5 (40) +13%
10 (80) +25%
20 (160) +88%

SQL throughput plateaus around 15-20 native workers due to O(N²) SKIP LOCKED contention. Batch keeps scaling linearly.

Test plan

  • Benchmark nativets 1K and 10K jobs (1W and 3W)
  • Benchmark nativets_sleep 1K jobs
  • Capture pg_stat_statements and pg_stat_database metrics
  • Build throughput model validated against real data
  • Test worker-only mode with explicit BATCH_PULL_URL
  • Kill server with buffered jobs → verify zombie monitor requeues after 60s
  • Verify 30s reaper requeues unclaimed buffered jobs

🤖 Generated with Claude Code

Reduce DB polling overhead for native workers by batch-fetching jobs
server-side and serving them from an in-memory buffer via HTTP.

- Add batch_pull() in windmill-queue: single SELECT...FOR UPDATE SKIP LOCKED LIMIT N
- Add batch pull SQL helpers (make_batch_pull_query, format_batch_pull_query)
- OSS stubs for agent-workers accept batch_buffer parameter (4-tuple return)
- Native workers self-sign JWT and pull jobs via HTTP when co-located with server
- Add uses_batch_http_pull column to worker_ping for server-side tracking
- Worker pull loop: HTTP batch pull when client available, SQL otherwise

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Mar 5, 2026

Deploying windmill with  Cloudflare Pages  Cloudflare Pages

Latest commit: e033c73
Status: ✅  Deploy successful!
Preview URL: https://b75f9115.windmill.pages.dev
Branch Preview URL: https://batch-pulling.windmill.pages.dev

View logs

hugocasa and others added 4 commits March 5, 2026 16:33
Native workers in Mode::Worker (no co-located server) can now use HTTP
batch pull when BASE_INTERNAL_URL is explicitly set pointing to the
remote server. The batch buffer itself only runs on the server side.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
- Replace BASE_INTERNAL_URL overloading with dedicated BATCH_PULL_URL
  env var for native workers' HTTP pull endpoint
- Add exp claim to JWT token (required by jsonwebtoken validation)
- Token expires in 30 days, renewed on worker restart

Co-Authored-By: Claude Opus 4.6 <[email protected]>
- Extract bunnative→nativets tag logic into ScriptLang::as_worker_tag()
- Add benchmark results for batch pull vs direct SQL (1W and 3W)
- Add throughput model script comparing batch vs SQL at scale
- Add nativets_sleep benchmark script support

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@hugocasa hugocasa changed the title feat: batch job pulling for native workers feat: batch job pulling for native workers [ee] Mar 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant