Skip to content

Commit 670dad5

Browse files
YoanSallamiclaude
andcommitted
v0.2.0: remove decompose_* ops
- Drop decompose_json / decompose_schema from the Rust crate (breaking). - Drop them from bench harness and refresh the side-by-side numbers. - Bench compares against synalinks _py_<op> reference implementations. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 8bc2be1 commit 670dad5

15 files changed

Lines changed: 62 additions & 539 deletions

Cargo.lock

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[package]
22
name = "synaops"
3-
version = "0.1.0"
3+
version = "0.2.0"
44
edition = "2021"
55
license = "Apache-2.0"
66
description = "Native Rust implementations of synalinks JSON/schema operations."

README.md

Lines changed: 16 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ callers do not need to know there is Rust underneath.
1111

1212
Parity with the Python reference is asserted on every op and payload size
1313
(see `bench/test_parity.py`). Headline speedups on realistic payloads:
14-
**~470× on `factorize_schema`**, **~290× on `factorize_json`** at 600 keys,
14+
**~485× on `factorize_schema`**, **~280× on `factorize_json`** at 600 keys,
1515
4–8× on masking ops, 2–4× on simple key rewrites. Full table below.
1616

1717
## Build
@@ -34,8 +34,7 @@ import synaops
3434
| `prefix_json` | `(json, prefix)` | Prepend `prefix_` to every top-level key. |
3535
| `suffix_json` | `(json, suffix)` | Append `_suffix` to every top-level key. |
3636
| `concatenate_json` | `(json1, json2)` | Merge two objects; on key collision append `_1`, `_2`, … to disambiguate. |
37-
| `factorize_json` | `(json)` | Group keys sharing a singular base into a single array under the plural key. Inverse of `decompose_json`. |
38-
| `decompose_json` | `(json)` | Expand plural-keyed array properties into individual singular-keyed properties with numerical suffixes. Inverse of `factorize_json`. |
37+
| `factorize_json` | `(json)` | Group keys sharing a singular base into a single array under the plural key. |
3938
| `out_mask_json` | `(json, mask=None, pattern=None, recursive=True)` | Drop keys whose base name is in `mask` or whose base name matches the regex `pattern`. Numerical suffixes are ignored when matching. |
4039
| `in_mask_json` | `(json, mask=None, pattern=None, recursive=True)` | Keep only the keys whose base name is in `mask` or matches `pattern`. In recursive mode, arrays are preserved and their object items are filtered in place. |
4140

@@ -49,7 +48,6 @@ Operate on JSON-Schema-shaped dicts (`properties`, `required`, `$defs`, `type`,
4948
| `suffix_schema` | `(schema, suffix)` | Append `_suffix` to every property key and update `title` / `required` accordingly. |
5049
| `concatenate_schema` | `(schema1, schema2)` | Merge two schemas (properties, `required`, `$defs`); on key collision append numeric suffixes and regenerate titles. |
5150
| `factorize_schema` | `(schema)` | Group similar singular-keyed properties into array-typed plural-keyed properties; folds heterogeneous `items` into `anyOf`. |
52-
| `decompose_schema` | `(schema)` | Expand plural-keyed array properties into a single singular-keyed property carrying the `items` schema. |
5351
| `out_mask_schema` | `(schema, mask=None, pattern=None, recursive=True)` | Remove properties whose base name is in `mask` or matches `pattern`. With `recursive=True`, descends into nested object/array properties and `$defs`, then prunes `$defs` entries no longer referenced. |
5452
| `in_mask_schema` | `(schema, mask=None, pattern=None, recursive=True)` | Keep only properties whose base name is in `mask` or matches `pattern`. Same recursive/`$defs`-pruning behavior as `out_mask_schema`. |
5553
| `standardize_schema` | `(schema)` | Placeholder for schema normalization (currently identity). |
@@ -58,7 +56,7 @@ Operate on JSON-Schema-shaped dicts (`properties`, `required`, `$defs`, `type`,
5856
5957
## Matching semantics
6058

61-
Both `*_mask_*` families and `factorize_*` / `decompose_*` rely on the NLP
59+
Both `*_mask_*` families and `factorize_*` rely on the NLP
6260
helpers in `nlp_utils.rs`: they strip trailing numerical suffixes
6361
(`answer_3``answer`) and normalize singular/plural forms
6462
(`answers``answer`) before comparing keys. The `pattern` argument is a
@@ -80,21 +78,19 @@ Ratio `py_median / rs_median` per op. Higher is better; dashed line is parity (1
8078

8179
| Operation | small (12) | medium (96) | large (600) |
8280
|---|---:|---:|---:|
83-
| `factorize_schema` | 9.43× | 68.0× | 472× |
84-
| `factorize_json` | 10.3× | 48.9× | 291× |
85-
| `in_mask_json` | 8.11× | 7.37× | 7.75× |
86-
| `out_mask_json_pattern` | 4.16× | 4.27× | 4.49× |
87-
| `out_mask_json` | 4.37× | 4.17× | 4.42× |
88-
| `in_mask_schema` | 5.14× | 4.21× | 4.31× |
89-
| `out_mask_schema` | 4.67× | 3.92× | 4.16× |
90-
| `prefix_schema` | 3.91× | 3.79× | 4.16× |
91-
| `suffix_schema` | 3.90× | 3.80× | 4.10× |
92-
| `decompose_schema` | 2.65× | 2.44× | 3.33× |
93-
| `concatenate_schema` | 2.25× | 2.08× | 2.89× |
94-
| `decompose_json` | 2.74× | 2.48× | 2.56× |
95-
| `prefix_json` | 2.41× | 2.41× | 2.46× |
96-
| `suffix_json` | 2.63× | 2.40× | 2.42× |
97-
| `concatenate_json` | 2.44× | 2.46× | 2.30× |
81+
| `factorize_schema` | 8.78× | 64.8× | 485× |
82+
| `factorize_json` | 9.73× | 46.2× | 282× |
83+
| `in_mask_json` | 7.75× | 7.11× | 7.47× |
84+
| `out_mask_json` | 4.23× | 4.12× | 4.29× |
85+
| `out_mask_json_pattern` | 3.77× | 4.15× | 4.20× |
86+
| `in_mask_schema` | 4.83× | 4.12× | 4.15× |
87+
| `out_mask_schema` | 4.25× | 3.78× | 4.11× |
88+
| `prefix_schema` | 3.63× | 3.62× | 3.89× |
89+
| `suffix_schema` | 3.66× | 3.64× | 3.85× |
90+
| `concatenate_schema` | 2.21× | 2.15× | 2.85× |
91+
| `suffix_json` | 2.36× | 2.22× | 2.28× |
92+
| `concatenate_json` | 2.25× | 2.18× | 2.27× |
93+
| `prefix_json` | 2.26× | 2.25× | 2.26× |
9894

9995
`factorize_*` scales super-linearly because the Python reference does
10096
repeated O(n) key scans per group; the Rust path groups in a single pass.

bench/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -46,5 +46,5 @@ under `.benchmarks/` and renders a grouped speedup chart.
4646
Keys rotate through six realistic variants (`answer_*`, `item_*`,
4747
`items_*`, `nested_*` with depth-3 objects, `person_*` via `$ref`,
4848
`people_*` arrays of `$ref`-based objects) so every operation — including
49-
`factorize_*` / `decompose_*` / `*_mask_*` and the schema paths that walk
50-
`$defs` — has realistic work to do.
49+
`factorize_*` / `*_mask_*` and the schema paths that walk `$defs` — has
50+
realistic work to do.

bench/before_after.png

-19.8 KB
Loading

bench/conftest.py

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,6 @@
1919
("suffix_json", lambda f, d, **kw: f(d, "s")),
2020
("concatenate_json", lambda f, d, **kw: f(d, d)),
2121
("factorize_json", lambda f, d, **kw: f(d)),
22-
("decompose_json", lambda f, d, **kw: f(d)),
2322
(
2423
"out_mask_json",
2524
lambda f, d, **kw: f(d, mask=["answer", "item"], recursive=True),
@@ -39,7 +38,6 @@
3938
("suffix_schema", lambda f, s, **kw: f(s, "s")),
4039
("concatenate_schema", lambda f, s, **kw: f(s, s)),
4140
("factorize_schema", lambda f, s, **kw: f(s)),
42-
("decompose_schema", lambda f, s, **kw: f(s)),
4341
(
4442
"out_mask_schema",
4543
lambda f, s, **kw: f(s, mask=["answer", "item"], recursive=True),
@@ -51,8 +49,13 @@
5149
]
5250

5351

54-
def _resolve(name: str, module):
52+
def _resolve_py(name: str, module):
5553
# strip "_pattern" variant suffix used only to pick a different call shape
54+
base = name.removesuffix("_pattern")
55+
return getattr(module, f"_py_{base}")
56+
57+
58+
def _resolve_rs(name: str, module):
5659
base = name.removesuffix("_pattern")
5760
return getattr(module, base)
5861

@@ -61,15 +64,15 @@ def _resolve(name: str, module):
6164
def ops_json():
6265
"""Returns list of (name, py_fn, rs_fn, call_shape)."""
6366
return [
64-
(name, _resolve(name, py_json), _resolve(name, rs), call)
67+
(name, _resolve_py(name, py_json), _resolve_rs(name, rs), call)
6568
for name, call in JSON_FNS
6669
]
6770

6871

6972
@pytest.fixture
7073
def ops_schema():
7174
return [
72-
(name, _resolve(name, py_sch), _resolve(name, rs), call)
75+
(name, _resolve_py(name, py_sch), _resolve_rs(name, rs), call)
7376
for name, call in SCHEMA_FNS
7477
]
7578

bench/speedup.png

-23.6 KB
Loading

bench/test_bench.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ def test_json_py(benchmark, op, size_name, json_payload_by_size):
2727
from synalinks.src.backend.common import json_utils as py_json
2828

2929
base = op_name.removesuffix("_pattern")
30-
fn = getattr(py_json, base)
30+
fn = getattr(py_json, f"_py_{base}")
3131
payload = json_payload_by_size[size_name]
3232

3333
benchmark.group = f"{op_name}[{size_name}]"
@@ -57,7 +57,7 @@ def test_schema_py(benchmark, op, size_name, schema_payload_by_size):
5757
from synalinks.src.backend.common import json_schema_utils as py_sch
5858

5959
base = op_name.removesuffix("_pattern")
60-
fn = getattr(py_sch, base)
60+
fn = getattr(py_sch, f"_py_{base}")
6161
payload = schema_payload_by_size[size_name]
6262

6363
benchmark.group = f"{op_name}[{size_name}]"

bench/test_parity.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ def test_json_parity(op, size_name, json_payload_by_size):
1919

2020
base = op_name.removesuffix("_pattern")
2121
payload = json_payload_by_size[size_name]
22-
py_out = call(getattr(py_json, base), payload)
22+
py_out = call(getattr(py_json, f"_py_{base}"), payload)
2323
rs_out = call(getattr(rs, base), payload)
2424
assert py_out == rs_out, f"{op_name}[{size_name}]: py != rs"
2525

@@ -33,6 +33,6 @@ def test_schema_parity(op, size_name, schema_payload_by_size):
3333

3434
base = op_name.removesuffix("_pattern")
3535
payload = schema_payload_by_size[size_name]
36-
py_out = call(getattr(py_sch, base), payload)
36+
py_out = call(getattr(py_sch, f"_py_{base}"), payload)
3737
rs_out = call(getattr(rs, base), payload)
3838
assert py_out == rs_out, f"{op_name}[{size_name}]: py != rs"

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "maturin"
44

55
[project]
66
name = "synaops"
7-
version = "0.1.0"
7+
version = "0.2.0"
88
description = "Native Rust implementations of synalinks JSON/schema operations."
99
readme = "README.md"
1010
license = "Apache-2.0"

0 commit comments

Comments
 (0)