[BugFix] Fix use-after-free in ThreadPool::do_submit when thread creation fails#71276
[BugFix] Fix use-after-free in ThreadPool::do_submit when thread creation fails#71276sevev wants to merge 2 commits intoStarRocks:mainfrom
Conversation
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e9c7dfe97c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
This PR addresses a use-after-free scenario in the BE ThreadPool::do_submit() path when min_threads=0 and thread creation fails, which could leave an orphaned task in the queue and later execute it against already-destroyed state.
Changes:
- Modify
ThreadPool::do_submit()to attempt thread creation before enqueuing when creating the pool’s first worker thread. - Add a
SyncPointhook inThreadPool::create_thread()to deterministically inject thread-creation failures in tests. - Add regression tests covering failed submit when no threads exist and verifying no UAF/crash scenario.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| be/src/common/thread/threadpool.cpp | Adds “sole thread” pre-create logic in do_submit() and a SyncPoint injection point in create_thread(). |
| be/test/common/thread/threadpool_test.cpp | Adds regression tests using SyncPoint to simulate thread creation failures and validate post-fix behavior. |
…tion fails When a ThreadPool has min_threads=0 and all worker threads have exited, a new submit() must create a thread. The old code enqueued the task first, then attempted thread creation. If creation failed (e.g. EAGAIN due to system thread exhaustion), it returned an error WITHOUT removing the task from the queue. The caller, seeing the error, would perform its own cleanup (e.g. counting down a latch). A later successful submit could then create a thread that picks up the orphaned task, executing it against already-destroyed state, causing a use-after-free crash (SIGSEGV in butil::LinkNode::InsertBefore). The fix creates the thread BEFORE enqueuing the task when the pool has zero existing threads (sole_thread path). If thread creation fails, the task is never enqueued and the caller can safely handle the error. When the pool already has threads, the original enqueue-first behavior is preserved since submit always returns OK in that path. Signed-off-by: sevev <qiangzh95@gmail.com>
e9c7dfe to
5e67b46
Compare
luohaha
left a comment
There was a problem hiding this comment.
LGTM. The fix correctly addresses the use-after-free by creating the thread before enqueuing the task when the pool has zero threads (sole_thread path).
[Java-Extensions Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[FE Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[BE Incremental Coverage Report]✅ pass : 12 / 12 (100.00%) file detail
|
Why I'm doing:
When ThreadPool has min_threads=0, all worker threads can idle-exit, leaving _num_threads == 0. A new submit() would:
The caller receives the error and performs cleanup (e.g. counting down a latch). But the task remains in the queue. A subsequent successful submit() creates a thread that picks up the orphaned task, executing it against already-destroyed state — causing a SIGSEGV crash in butil::LinkNode::InsertBefore due to use-after-free on a stack-allocated BThreadCountDownLatch.
Key Logs
What I'm doing:
Introduce a sole_thread flag: when _num_threads + _num_threads_pending_start == 1 after incrementing (meaning we're creating the pool's only thread), attempt thread creation before enqueuing. If it fails, return error immediately — the task was never enqueued.
When the pool already has other threads, the original behavior is preserved (enqueue first, then create thread), since that path always returns OK.
Fixes https://github.com/StarRocks/StarRocksTest/issues/11218
What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check: