| Benchmark | Perry | Node | Factor |
|---|---|---|---|
| sort | 1ms | 31ms | 31x faster |
| prime-sieve | 2ms | 6ms | 3x faster |
| array-ops | <1ms | 3ms | >3x faster |
| json-parse | <1ms | 0.5ms | parity |
| Benchmark | Perry | Node | Gap | Priority |
|---|---|---|---|---|
| string-ops | 1ms | 0.1ms | 10x slower | P0 (runtime crash) |
| fibonacci | 1271ms | 1049ms | 1.2x slower | P1 |
| object-create | 12ms | 5ms | 2.4x slower | P2 |
| matrix-multiply | 43ms | 15ms | 2.8x slower | P2 |
Impact: Unblocks 10x improvement in string-ops, enables idiomatic TypeScript patterns
Reproduction:
const s = "ABCDEFABC";
const idx = s.indexOf("ABC"); // SIGSEGV
const idx2 = s.indexOf("ABC", 1); // SIGSEGVLocation: crates/perry-runtime/src/string.rs — js_string_index_of and js_string_index_of_from
Likely cause: The runtime function exists (uses Rust's str::find()), but the codegen may be passing incorrect arguments or the NaN-boxing of the return value (-1 for not found) may be broken.
Fix approach:
- Add a standalone test case in Perry's test suite
- Check codegen for
indexOfcalls — verify argument marshaling - Check return value handling for -1 (should be valid f64, not a special NaN-boxed value)
Reproduction:
const s = "ABC";
const chars = s.split(''); // SIGSEGVLocation: crates/perry-runtime/src/string.rs — js_string_split
Likely cause: Empty string as delimiter is a special case (split into individual characters). The runtime may not handle this pattern.
Fix approach:
- Check if
js_string_splithandles empty delimiter - If not, add special case: when delimiter is empty string, iterate UTF-8 chars and create array
Current: 1271ms Perry vs 1049ms Node (was 2778ms before optimizations)
The 2.2x improvement from compiler optimizations was significant, but there's still a 22% gap.
The remaining overhead is likely from:
- Function call overhead — Each recursive call still has prologue/epilogue
- NaN-boxing on return — Even with optimizations, the return value needs boxing
Not directly applicable to fib(n-1) + fib(n-2), but could help with accumulator-style rewrites.
Compiler could detect pure recursive functions and auto-memoize. Complex to implement.
Inline the first 2-3 levels of recursion to reduce call overhead.
Recommendation: Accept the 1.2x gap for now — fibonacci is a micro-benchmark that doesn't reflect real-world patterns. The improvement from 2778ms to 1271ms is already excellent.
Current: 12ms Perry vs 5ms Node (was 45ms before optimizations)
The benchmark creates objects like:
{
id: i,
name: 'item',
value: i * 2,
nested: { a: i, b: i * 2 }
}The 3.75x improvement suggests js_object_alloc_fast is being used for some cases, but not all.
File: crates/perry-codegen/src/codegen.rs
Check:
- Is
js_object_alloc_fastused for the outer object? - Is it used for the nested
{ a: i, b: i * 2 }object? - Are there other allocation paths being taken?
The nested object literal may be falling back to js_object_alloc (slow path). Ensure the codegen uses js_object_alloc_fast for ALL object literals where every field has an initializer.
Current: 43ms Perry vs 15ms Node
The benchmark is a triple-nested loop with array indexing:
for (let i = 0; i < size; i++) {
for (let j = 0; j < size; j++) {
let sum = 0;
for (let k = 0; k < size; k++) {
sum = sum + a[i * size + k] * b[k * size + j];
}
c[i * size + j] = sum;
}
}Expected Perry to win here due to loop unrolling + BCE, but it's 2.8x slower.
File: crates/perry-codegen/src/codegen.rs — loop optimization passes
Check:
- Is bounds check elimination (BCE) triggering for
a[i * size + k]? - Is loop unrolling happening for the inner
kloop? - Is the multiplication
i * size + kbeing hoisted/optimized?
- BCE not triggering — The index expression
i * size + kmay be too complex for BCE to prove safe - No loop unrolling — The inner loop may not be unrolled because
sizeis a runtime value - Repeated index calculation —
i * sizecould be hoisted out of thejandkloops
- Add loop-invariant code motion (LICM) pass to hoist
i * sizecomputation - Consider partial BCE for patterns where index is bounded by loop counter
- Profile the generated assembly to identify bottlenecks
| Priority | Task | Expected Impact | Effort |
|---|---|---|---|
| P0 | Fix indexOf crash |
Unblocks 10x string-ops improvement | Medium |
| P0 | Fix split('') crash |
Enables idiomatic string patterns | Low |
| P1 | Accept fibonacci gap | N/A (already 2.2x better) | None |
| P2 | Debug object-create alloc path | Could close 2.4x → 1.5x gap | Medium |
| P2 | Investigate matrix-multiply codegen | Could close 2.8x → 1.5x gap | High |
After each fix:
# Rebuild Perry compiler
cd /Users/amlug/projects/perry && cargo build --release
# Rebuild demo
cd /Users/amlug/projects/perry-demo
/Users/amlug/projects/perry/target/release/perry src/perry-server.ts -o dist/perry-server
# Start server and test
PORT=3003 PERRY_RUNTIME=1 ./dist/perry-server &
curl -s 'http://localhost:3003/api/benchmarks/run/string-ops?iterations=100&size=10000'
curl -s 'http://localhost:3003/api/benchmarks/run/object-create?iterations=20&size=50000'
curl -s 'http://localhost:3003/api/benchmarks/run/matrix-multiply?iterations=10&size=200'