update doc

gfx · gfx · commit 99d10a6a861a · 2025-12-28T13:18:20.000+09:00
diff --git a/package.json b/package.json
@@ -25,7 +25,7 @@
   "sideEffects": false,
   "scripts": {
     "build": "npm publish --dry-run",
-    "prepare": "npm run clean && webpack --bail && tsgo --build tsconfig.dist.cjs.json tsconfig.dist.esm.json && tsimp tools/fix-ext.mts --mjs dist.esm/*.js dist.esm/*/*.js dist.esm/*.d.ts dist.esm/*/*.d.ts && tsimp tools/fix-ext.mts --cjs dist.cjs/*.js dist.cjs/*/*.js dist.cjs/*.d.ts dist.cjs/*/*.d.ts",
+    "prepare": "npm run clean && ./wasm/build.sh && webpack --bail && tsgo --build tsconfig.dist.cjs.json tsconfig.dist.esm.json && tsimp tools/fix-ext.mts --mjs dist.esm/*.js dist.esm/*/*.js dist.esm/*.d.ts dist.esm/*/*.d.ts && tsimp tools/fix-ext.mts --cjs dist.cjs/*.js dist.cjs/*/*.js dist.cjs/*.d.ts dist.cjs/*/*.d.ts",
     "prepublishOnly": "npm run test:dist",
     "clean": "rimraf build dist dist.*",
     "test": "mocha 'test/**/*.test.ts'",
diff --git a/wasm/README.md b/wasm/README.md
@@ -51,6 +51,38 @@ Three-tier dispatch based on string/byte length:
 | 51-1000 | WASM | Optimal for medium strings |
 | > 1000 | TextEncoder/TextDecoder | SIMD-optimized for bulk |
 
+## Optimization Attempts (2025)
+
+Several optimization approaches were tested for `utf8Count`:
+
+### 1. Bulk Array Copy (intoCharCodeArray)
+
+**Hypothesis**: Replace N `charCodeAt` calls with 1 bulk `intoCharCodeArray` + N array reads.
+
+**Result**: 17-29% slower. GC array allocation overhead outweighs boundary-crossing savings.
+
+### 2. codePointAt Instead of charCodeAt
+
+**Hypothesis**: Simplify surrogate pair handling with `codePointAt`.
+
+**Result**: Slightly slower. `codePointAt` does more internal work to decode surrogates.
+
+### 3. SIMD Processing
+
+**Hypothesis**: Copy to linear memory, then use SIMD to process 8 chars at once.
+
+**Result**: 23-49% slower. The O(n) copy from GC array to linear memory negates SIMD gains.
+
+```
+JS String → GC Array (1 call) → Linear Memory (N scalar ops) → SIMD
+                                       ↑
+                                 This kills SIMD
+```
+
+### Conclusion
+
+The scalar `charCodeAt` loop is already near-optimal. The `js-string-builtins` implementation is highly optimized, making per-character calls very cheap. The 2-3x speedup over pure JS is about as good as it gets with current WASM capabilities.
+
 ## References
 
 - [js-string-builtins proposal](https://github.com/WebAssembly/js-string-builtins)