Skip to content

Conversation

@djgaven588
Copy link

A bit ago I spent some time optimizing Rhai for my project.

These changes helped out a lot for my use case, and are mostly focused on calling lots of scripts, and those scripts calling lots of methods.

Most of these changes involve reducing function calls (map_or_else for instance), using hashbrown always, or preventing excessive iteration (drains).

@schungx
Copy link
Collaborator

schungx commented Sep 7, 2025

Thanks for the contribution! There seem to be conflicts. Can you resolve them?

@schungx
Copy link
Collaborator

schungx commented Sep 7, 2025

Also, I see that you have replaced map_or etc. with if let, and replaced iterators with for loops. Theoretically speaking, these two styles should compile down to very similar machine codes, under release builds with LTO. For example, map_or etc. are usually inlined and then turned into essentially if let. Most iterators are replaced with their equivalent implementation of for loops. That's what I found most when doing Rust.

Do you have benchmarks to show clear improvements in runtime?

Also can you elaborate a bit on what you intend to do with hashbrown?

@djgaven588
Copy link
Author

Need to set aside some time to circle back to this, but yes I can resolve them as well as clean up the PR a bit when I do some performance comparisons.

In the experience I had when optimizing it, I found that replacing things such as map_or reduced function calls. I just have a regular release build, so I'm unsure of the impacts of LTO settings (they default to thin I think, which seems reasonable).

For hashbrown, the goal is to use it over the std hashmap and hashset, as it is dramatically faster.

@schungx
Copy link
Collaborator

schungx commented Sep 11, 2025

In the experience I had when optimizing it, I found that replacing things such as map_or reduced function calls. I just have a regular release build, so I'm unsure of the impacts of LTO settings (they default to thin I think, which seems reasonable).

Then I would strongly suggest you try with a full LTO build. Sometimes Rust would not inline across crate boundaries and LTO forces it to do so, resulting in drastically reduced code. I never have release builds without LTO these days.

For hashbrown, the goal is to use it over the std hashmap and hashset, as it is dramatically faster.

I have heard similar but it seems the Rust standard hashmap is the hashbrown one for std builds. And Rhai already uses a fast hashing implementation so I won't expect performance to differ.

I would appreciate some benchmarks.

@djgaven588
Copy link
Author

Finally making it back to this @schungx

Then I would strongly suggest you try with a full LTO build. Sometimes Rust would not inline across crate boundaries and LTO forces it to do so, resulting in drastically reduced code. I never have release builds without LTO these days.

This doesn't seem to improve performance (mostly just file size), and requires 12+ minutes of compilation (I'd like to be able to play my game as I'm developing it, it's currently 40 seconds). I've done a bit more testing with some changes, I'll need to make a new PR since I restarted, but it doesn't seem hashbrown itself was what gave me performance improvements.

I have been wondering how to cut the stack depth down, as Rhai sends spikes up my profiler. My current idea I am looking into is trying to make the statement walking iterative instead of recursive, as well as converting things like .map(||), .map_or_else(||, ||), .find(||), .or_else() ect., as these seem to eat as that as well with some performance overhead (some of this may just be without LTO, but changing them to simpler primitives doesn't seem to harm it either).

Here's an example of what it looks like when profiling:
image

@djgaven588
Copy link
Author

This is a bit messy, but if you look at the right graph, as well as the "Rate" in the top left, you can see there are notable improvements.

Retest Base Retest Third

@schungx
Copy link
Collaborator

schungx commented Dec 21, 2025

I have been wondering how to cut the stack depth down,

I don't think you can. That's a limitation of Rhai's engine architecture, which is a recursive AST walker. Each level of code pushes a few new stack frames during evaluation.

If stack is what you're worried about then the only way is to move to an engine that compiles the AST down to byte codes.

And the stack issue really has nothing to do with using closures in things like mal_or_else as those stack frames are quite small. Also I have been quite careful to make sure stack frames are minimized in most recursive calls.

If you're looking into performance, I'd suggest you look into turning off features. For example, did you use unchecked? Or only_i32 and f32_float? Turn off decimal or use no_custom_syntax if you don't need it?

These would yield you more performance than counting stack levels, which really has to do with memory limitation instead of run speed.

Also, in your release builds, a lot of map_or_else etc should be optimized away and inlined. If you're seeing your code run littered with them, then the optimizer has not been doing its job.

In fact Rust Clippy would always advise you to use iterators and mapping functions instead of if statements and loops due to better optimization. So your results are quite strange from where I see it...

@djgaven588
Copy link
Author

I don't think you can. That's a limitation of Rhai's engine architecture, which is a recursive AST walker. Each level of code pushes a few new stack frames during evaluation.

The main idea I'd have for this is effectively pushing Stmts (maybe a StmtBlock?) to a queue that a higher level can loop back over, it'd not really hold much, maybe even a &mut Option would do, just so it wouldn't tunnel down the stack.

And the stack issue really has nothing to do with using closures in things like mal_or_else as those stack frames are quite small. Also I have been quite careful to make sure stack frames are minimized in most recursive calls.

For things like map_or_else, I mainly found I got performance gains as these still execute things like closures, which are more expensive than simpler things like if or match, so the cost does add up, and still has a cost when not using a super optimized build I can't use during development. Replacing a try_fold for instance improved performance.

If you're looking into performance, I'd suggest you look into turning off features. For example, did you use unchecked? Or only_i32 and f32_float? Turn off decimal or use no_custom_syntax if you don't need it?

I run Rhai already with no default features, internals, no_custom_syntax, only_i64, std, and unchecked, hence why I'm on the search to try and improve the underlying systems.

Also, in your release builds, a lot of map_or_else etc should be optimized away and inlined. If you're seeing your code run littered with them, then the optimizer has not been doing its job.

In fact Rust Clippy would always advise you to use iterators and mapping functions instead of if statements and loops due to better optimization. So your results are quite strange from where I see it...

Clippy makes no such advisory for the changes I've made, I've found these functions have consistently added overhead, even in full LTO mode, and have applied these kind of changes to my own project with success. If you used non closure variants this may be zero cost, but the closures definitely have a cost.

You can view the initial changes I did here: djgaven588@852a75e

@schungx
Copy link
Collaborator

schungx commented Dec 23, 2025

The main idea I'd have for this is effectively pushing Stmts (maybe a StmtBlock?) to a queue that a higher level can loop back over, it'd not really hold much, maybe even a &mut Option would do, just so it wouldn't tunnel down the stack.

I tried something similar but it ran afoul of Rust's borrow checker.

There are certain things that need to be &mut (for example the first parameter). If you keep multiple statements in eval state (instead of only one), then there will be too many mutable references to the scope.

The only way you can have multiple mutable references to a single piece of data in Rust is to keep them in separate function call frames. That's why it is difficult to get rid of recursive function calls without going unsafe.

@schungx
Copy link
Collaborator

schungx commented Dec 23, 2025

For things like map_or_else, I mainly found I got performance gains as these still execute things like closures, which are more expensive than simpler things like if or match, so the cost does add up, and still has a cost when not using a super optimized build I can't use during development. Replacing a try_fold for instance improved performance.

This is exactly the point I'm getting at. The closures should no longer exist after optimization. map_or_else, for example, should have the closure inlined and the whole thing optimize down to a simple if-else. If it doesn't and keep those closures around, then the optimization is not happening correctly. Same with try_fold, which should optimize down to a simple loop.

If I replace all these with simple constructs, I'm afraid Clippy will give me a ton of complaints about not using them.

@schungx
Copy link
Collaborator

schungx commented Dec 23, 2025

Clippy makes no such advisory for the changes I've made, I've found these functions have consistently added overhead, even in full LTO mode, and have applied these kind of changes to my own project with success. If you used non closure variants this may be zero cost, but the closures definitely have a cost.

This is very strange indeed. I'm quite sure I checked it in Compiler Explorer before and Rust optimizes away the closures. Let me check again.

EDIT: I just checked with Compiler Explorer and all those closures optimize away, leaving no trace whatsoever. map_or_else no longer exists when compiled in release mode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants