fix: clear exec_env_tls when destroying exec_env #4774

teamchong · 2025-12-29T18:21:26Z

Problem

When wasm_exec_env_destroy() is called, exec_env_tls (thread-local storage used by signal handlers for hardware bounds checking) may still point to the exec_env being destroyed. On subsequent WASM executions in the same thread, if a signal occurs (e.g., SIGSEGV for bounds checking), the signal handler accesses freed memory and crashes.

Solution

Clear exec_env_tls if it points to the exec_env being destroyed. This is a simple defensive check that prevents dangling pointer issues.

#ifdef OS_ENABLE_HW_BOUND_CHECK
    WASMExecEnv *current_tls = wasm_runtime_get_exec_env_tls();
    if (current_tls == exec_env) {
        wasm_runtime_set_exec_env_tls(NULL);
    }
#endif

Use Case

Daemon-style execution patterns (like Cloudflare Workers) where the same thread runs multiple WASM modules sequentially without forking. Each module creates its own exec_env, runs, then destroys it. Without this fix, the TLS can point to a destroyed exec_env, causing crashes on subsequent runs.

Testing

Tested in production daemon-style execution with 100+ consecutive AOT runs without crashes
No regression in existing tests expected (the fix only adds a NULL check)

When an exec_env is destroyed, check if it matches the current thread's exec_env_tls and clear it to avoid dangling pointer issues. Without this fix, in daemon-style execution where the same thread runs multiple WASM modules sequentially (like Cloudflare Workers), the exec_env_tls can point to freed memory after an exec_env is destroyed, causing crashes on subsequent executions when the signal handler tries to access it. This is critical for AOT mode with hardware bounds checking enabled, where signal handlers rely on exec_env_tls to handle SIGSEGV properly.

lum1n0us · 2026-01-05T02:28:18Z

We are hoping to get more details about how the host side is using WAMR's APIs, especially regarding getting/setting exec_env_tls and calling WASM functions. A reproducible case would be great.

From my perspective, if you follow the pattern mentioned here, every call to a WASM function would have the proper exec_env_tls, and runtime_signal_handler() will not encounter a dangling pointer.

#include "core/iwasm/common/wasm_runtime_common.h"
call_worker ()
{
    exec_env_backup = wasm_runtime_get_exec_env_tls();
    wasm_runtime_set_exec_env_tls(NULL);  // clear
    call worker wasm module  // pass local exec_env, then call_wasm_with_hw_bound_check() will wasm_runtime_set_exec_env_tls(exec_env), and clean it like wasm_runtime_set_exec_env_tls(NULL) when finished execution.
    wasm_runtime_set_exec_env_tls(exec_env_backup ); // restore
}

teamchong · 2026-01-08T12:22:02Z

We are hoping to get more details about how the host side is using WAMR's APIs, especially regarding getting/setting exec_env_tls and calling WASM functions. A reproducible case would be great.

From my perspective, if you follow the pattern mentioned here, every call to a WASM function would have the proper exec_env_tls, and runtime_signal_handler() will not encounter a dangling pointer.
#include "core/iwasm/common/wasm_runtime_common.h"
call_worker ()
{
    exec_env_backup = wasm_runtime_get_exec_env_tls();
    wasm_runtime_set_exec_env_tls(NULL);  // clear
    call worker wasm module  // pass local exec_env, then call_wasm_with_hw_bound_check() will wasm_runtime_set_exec_env_tls(exec_env), and clean it like wasm_runtime_set_exec_env_tls(NULL) when finished execution.
    wasm_runtime_set_exec_env_tls(exec_env_backup ); // restore
}

thanks for the feedback! I've added a reproducible test case that demonstrates the bug.

The Bug

The issue is in invoke_native_with_hw_bound_check (both aot_runtime.c and wasm_runtime.c):

// exec_env_tls is SET here
wasm_runtime_set_exec_env_tls(exec_env);

// Early return WITHOUT clearing exec_env_tls!
if (!wasm_runtime_detect_native_stack_overflow(exec_env)) {
   return false;  // BUG: TLS never cleared
}

When the native stack overflow check fails, the function returns early without clearing exec_env_tls. If the application then destroys the exec_env and creates a new one, subsequent WASM calls fail with "invalid exec env" because exec_env_tls still points to the destroyed exec_env.

Reproducible Test Case

Added tests/standalone/test-exec-env-tls/ with a test that:

Creates exec_env_A
Sets native_stack_boundary high to trigger overflow check failure
Calls WASM → fails with "native stack overflow", but TLS is not cleared
Destroys exec_env_A → TLS is now a dangling pointer
Creates exec_env_B
Calls WASM → fails with "invalid exec env" (without fix)

About the save/restore pattern

The save/restore pattern you mentioned would work if the application explicitly manages TLS. However, in this case:

The bug is inside WAMR's invoke_native_with_hw_bound_check function
The application just calls the public wasm_runtime_call_wasm API
The early return path doesn't clear TLS, leaving it in an inconsistent state

The fix is defensive cleanup in wasm_exec_env_destroy()
if TLS points to the exec_env being destroyed, clear it. This handles any case where TLS wasn't properly cleared.

Add test case that reproduces the bug where exec_env_tls is not cleared on early return paths in invoke_native_with_hw_bound_check. The test triggers native stack overflow check failure, which causes wasm_runtime_call_wasm to return early after setting exec_env_tls but without clearing it. This leaves exec_env_tls pointing to a destroyed exec_env, causing subsequent calls to fail with "invalid exec env". Test confirms the fix in wasm_exec_env_destroy correctly clears exec_env_tls when destroying the exec_env it points to.

lum1n0us · 2026-01-12T06:11:46Z

The test/standalone is about to be archived. It's better to use a few unit test cases. Here are the reference

I believe it would be beneficial to clear exec_env_tls in the early return branch as well, for both wasm_runtime and aot_runtime.

diff --git a/core/iwasm/interpreter/wasm_runtime.c b/core/iwasm/interpreter/wasm_runtime.c
index a59bc9257..cac3730bf 100644
--- a/core/iwasm/interpreter/wasm_runtime.c
+++ b/core/iwasm/interpreter/wasm_runtime.c
@@ -3618,6 +3618,7 @@ call_wasm_with_hw_bound_check(WASMModuleInstance *module_inst,
        native stack to run the following codes before actually calling
        the aot function in invokeNative function. */
     if (!wasm_runtime_detect_native_stack_overflow(exec_env)) {
+        wasm_runtime_set_exec_env_tls(NULL);
         return;
     }

…check Move the fix to clear exec_env_tls at the source - in the early return path of invoke_native_with_hw_bound_check when native stack overflow check fails. Changes: - aot_runtime.c: Clear exec_env_tls before early return on stack overflow - wasm_runtime.c: Clear exec_env_tls before early return on stack overflow - Remove defensive fix from wasm_exec_env_destroy (no longer needed) - Move test from standalone to unit tests (runtime-common) The bug: When wasm_runtime_call_wasm sets exec_env_tls but returns early due to native stack overflow check failure, TLS was not cleared. This caused subsequent calls with a different exec_env to fail with "invalid exec env" error.

teamchong · 2026-01-12T22:56:38Z

The test/standalone is about to be archived. It's better to use a few unit test cases. Here are the reference

I believe it would be beneficial to clear exec_env_tls in the early return branch as well, for both wasm_runtime and aot_runtime.

diff --git a/core/iwasm/interpreter/wasm_runtime.c b/core/iwasm/interpreter/wasm_runtime.c
index a59bc9257..cac3730bf 100644
--- a/core/iwasm/interpreter/wasm_runtime.c
+++ b/core/iwasm/interpreter/wasm_runtime.c
@@ -3618,6 +3618,7 @@ call_wasm_with_hw_bound_check(WASMModuleInstance *module_inst,
        native stack to run the following codes before actually calling
        the aot function in invokeNative function. */
     if (!wasm_runtime_detect_native_stack_overflow(exec_env)) {
+        wasm_runtime_set_exec_env_tls(NULL);
         return;
     }

Thanks. I've updated the PR.

teamchong requested review from TianlongLiang, loganek, lum1n0us, no1wudi and yamt as code owners December 29, 2025 18:21

teamchong force-pushed the fix-exec-env-tls-dangling-pointer branch from 0f18a9c to 9f73f59 Compare January 8, 2026 12:27

lum1n0us approved these changes Jan 19, 2026

View reviewed changes

lum1n0us merged commit 5664589 into bytecodealliance:main Jan 19, 2026
429 of 498 checks passed

lum1n0us added the bug-fix Determine if this PR addresses a bug. It will be used by scripts to classify PRs. label Jan 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: clear exec_env_tls when destroying exec_env #4774

fix: clear exec_env_tls when destroying exec_env #4774

teamchong commented Dec 29, 2025 •

edited

Loading

Uh oh!

lum1n0us commented Jan 5, 2026

Uh oh!

teamchong commented Jan 8, 2026

Uh oh!

lum1n0us commented Jan 12, 2026

Uh oh!

teamchong commented Jan 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: clear exec_env_tls when destroying exec_env #4774

fix: clear exec_env_tls when destroying exec_env #4774

Conversation

teamchong commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Use Case

Testing

Related

Uh oh!

lum1n0us commented Jan 5, 2026

Uh oh!

teamchong commented Jan 8, 2026

The Bug

Reproducible Test Case

Uh oh!

lum1n0us commented Jan 12, 2026

Uh oh!

teamchong commented Jan 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

teamchong commented Dec 29, 2025 •

edited

Loading