Skip to content

Fix/latency issues#150

Draft
chrishagglund-ship-it wants to merge 2 commits into
gated-metrics-standardizationfrom
fix/latency-issues
Draft

Fix/latency issues#150
chrishagglund-ship-it wants to merge 2 commits into
gated-metrics-standardizationfrom
fix/latency-issues

Conversation

@chrishagglund-ship-it
Copy link
Copy Markdown
Contributor

@chrishagglund-ship-it chrishagglund-ship-it commented May 14, 2026

Thread starvation and broken async patterns (async void)

This is the core latency problem. The worker's hot path — WorkOnce, ProcessTasks, ProcessTask — were all declared async void instead of async Task. In C#, async void methods are fire-and-forget: the caller has no way to await them, and any exception after the first await becomes an unobserved thread-pool exception. The consequences were:

  • The poll loop re-entered immediately without waiting for the current batch of tasks to complete, because WorkOnce returned void and couldn't be awaited. The RunningWorkerDone() monitor count drifted from reality.
  • Thread.Sleep was used everywhere (poll interval, error backoff, retry backoff), which blocks thread-pool threads instead of releasing them. Under load, this could starve the thread pool — the very thing that async/await exists to prevent.
  • The PollTask and UpdateTask API calls in TaskResourceApi were fake-async — they wrapped synchronous HTTP calls in Task.FromResult(...), so even the nominal async path blocked a thread.

The fix converts all async void to async Task, replaces all Thread.Sleep with await Task.Delay, and introduces truly async PollTaskAsync/UpdateTaskAsync/CallApiAsync methods that use RestClient.ExecuteAsync end-to-end.

Miscellaneous bugs

  • Cancellation check was inverted: In ProcessTask's finally block, the condition was == CancellationToken.None instead of !=, meaning the cancellation token was never checked when an actual token was provided.
  • Shutdown hot-loop: When OperationCanceledException was caught in the worker loop, it slept 10ms and re-entered while(true), immediately re-throwing — creating an infinite tight loop on shutdown. Now it cleanly breaks out of the loop.
  • RecordUncaughtException() lost the exception type: It was a bare counter increment with no labels, so you couldn't tell what kind of exception was occurring.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant