-
Notifications
You must be signed in to change notification settings - Fork 751
fix: clear exec_env_tls when destroying exec_env #4774
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: clear exec_env_tls when destroying exec_env #4774
Conversation
When an exec_env is destroyed, check if it matches the current thread's exec_env_tls and clear it to avoid dangling pointer issues. Without this fix, in daemon-style execution where the same thread runs multiple WASM modules sequentially (like Cloudflare Workers), the exec_env_tls can point to freed memory after an exec_env is destroyed, causing crashes on subsequent executions when the signal handler tries to access it. This is critical for AOT mode with hardware bounds checking enabled, where signal handlers rely on exec_env_tls to handle SIGSEGV properly.
|
We are hoping to get more details about how the host side is using WAMR's APIs, especially regarding getting/setting From my perspective, if you follow the pattern mentioned here, every call to a WASM function would have the proper #include "core/iwasm/common/wasm_runtime_common.h"
call_worker ()
{
exec_env_backup = wasm_runtime_get_exec_env_tls();
wasm_runtime_set_exec_env_tls(NULL); // clear
call worker wasm module // pass local exec_env, then call_wasm_with_hw_bound_check() will wasm_runtime_set_exec_env_tls(exec_env), and clean it like wasm_runtime_set_exec_env_tls(NULL) when finished execution.
wasm_runtime_set_exec_env_tls(exec_env_backup ); // restore
} |
thanks for the feedback! I've added a reproducible test case that demonstrates the bug. The BugThe issue is in invoke_native_with_hw_bound_check (both aot_runtime.c and wasm_runtime.c): When the native stack overflow check fails, the function returns early without clearing exec_env_tls. If the application then destroys the exec_env and creates a new one, subsequent WASM calls fail with "invalid exec env" because exec_env_tls still points to the destroyed exec_env. Reproducible Test CaseAdded tests/standalone/test-exec-env-tls/ with a test that:
About the save/restore pattern The save/restore pattern you mentioned would work if the application explicitly manages TLS. However, in this case:
The fix is defensive cleanup in wasm_exec_env_destroy() |
Add test case that reproduces the bug where exec_env_tls is not cleared on early return paths in invoke_native_with_hw_bound_check. The test triggers native stack overflow check failure, which causes wasm_runtime_call_wasm to return early after setting exec_env_tls but without clearing it. This leaves exec_env_tls pointing to a destroyed exec_env, causing subsequent calls to fail with "invalid exec env". Test confirms the fix in wasm_exec_env_destroy correctly clears exec_env_tls when destroying the exec_env it points to.
0f18a9c to
9f73f59
Compare
|
The test/standalone is about to be archived. It's better to use a few unit test cases. Here are the reference I believe it would be beneficial to clear exec_env_tls in the early return branch as well, for both wasm_runtime and aot_runtime. diff --git a/core/iwasm/interpreter/wasm_runtime.c b/core/iwasm/interpreter/wasm_runtime.c
index a59bc9257..cac3730bf 100644
--- a/core/iwasm/interpreter/wasm_runtime.c
+++ b/core/iwasm/interpreter/wasm_runtime.c
@@ -3618,6 +3618,7 @@ call_wasm_with_hw_bound_check(WASMModuleInstance *module_inst,
native stack to run the following codes before actually calling
the aot function in invokeNative function. */
if (!wasm_runtime_detect_native_stack_overflow(exec_env)) {
+ wasm_runtime_set_exec_env_tls(NULL);
return;
} |
…check Move the fix to clear exec_env_tls at the source - in the early return path of invoke_native_with_hw_bound_check when native stack overflow check fails. Changes: - aot_runtime.c: Clear exec_env_tls before early return on stack overflow - wasm_runtime.c: Clear exec_env_tls before early return on stack overflow - Remove defensive fix from wasm_exec_env_destroy (no longer needed) - Move test from standalone to unit tests (runtime-common) The bug: When wasm_runtime_call_wasm sets exec_env_tls but returns early due to native stack overflow check failure, TLS was not cleared. This caused subsequent calls with a different exec_env to fail with "invalid exec env" error.
Thanks. I've updated the PR. |
Problem
When
wasm_exec_env_destroy()is called,exec_env_tls(thread-local storage used by signal handlers for hardware bounds checking) may still point to the exec_env being destroyed. On subsequent WASM executions in the same thread, if a signal occurs (e.g., SIGSEGV for bounds checking), the signal handler accesses freed memory and crashes.Solution
Clear
exec_env_tlsif it points to the exec_env being destroyed. This is a simple defensive check that prevents dangling pointer issues.Use Case
Daemon-style execution patterns (like Cloudflare Workers) where the same thread runs multiple WASM modules sequentially without forking. Each module creates its own exec_env, runs, then destroys it. Without this fix, the TLS can point to a destroyed exec_env, causing crashes on subsequent runs.
Testing
Related