Skip to content

Commit ffeea1a

Browse files
committed
feat: add comprehensive ERRATA document outlining Williams Bound and hotpath architecture
1 parent cf4b145 commit ffeea1a

File tree

1 file changed

+363
-0
lines changed

1 file changed

+363
-0
lines changed

ERRATA.md

Lines changed: 363 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,363 @@
1+
# ERRATA
2+
3+
## Williams Bound - Comprehensive Hotpath Architecture
4+
5+
### TL;DR
6+
7+
Apply the Williams 2025 result S = O(sqrt(t log t)) as a universal sublinear growth law everywhere the system trades space against time: the resident hotpath index, per-tier hierarchy quotas, per-community graph budgets, and Daydreamer maintenance batch sizing. Define t = |V| + |E| (total graph mass). Derive the resident representative capacity H(t) = ceil(c * sqrt(t * log2(1 + t))). Hebbian-derived node salience drives promotion and eviction, but representative selection also enforces hierarchical tier quotas and graph-community coverage quotas so the hotpath is both hot and diverse.
8+
9+
---
10+
11+
### Phase A - Theoretical Foundation
12+
13+
#### A1. Formalize the theorem mapping
14+
15+
- Define t = |V| + |E| (pages + Hebbian edges + Metroid edges).
16+
- Define H(t) = ceil(c * sqrt(t * log2(1 + t))), the resident hotpath capacity.
17+
- State the design principle: every subsystem that can trade space for time must target sublinear growth at this rate.
18+
- List what counts toward resident capacity: promoted pages, tier prototypes, and active Metroid neighbor entries.
19+
- Define the three-zone model:
20+
- HOT: resident index, capacity H(t)
21+
- WARM: indexed in IndexedDB but not memory-resident
22+
- COLD: vector bytes in OPFS, metadata in IndexedDB, no index entry
23+
- Note that all data stays local; zones affect lookup cost, not retention.
24+
- Reference Williams 2025 as the source and state that c is an empirically tuned constant, not a theorem output.
25+
26+
#### A2. Define node salience
27+
28+
The current schema has edge-level Hebbian weights but no node-level score. Define node salience sigma(v) for a page v:
29+
30+
sigma(v) = alpha * H_in(v) + beta * R(v) + gamma * Q(v)
31+
32+
Where:
33+
34+
- H_in(v) = sum of incident Hebbian edge weights
35+
- R(v) = recency score using exponential decay from createdAt or lastUpdatedAt
36+
- Q(v) = query-hit count for the node
37+
- alpha, beta, gamma are tunable weights summing to 1.0
38+
39+
This requires lightweight per-page activity metadata such as queryHitCount and lastQueryAt.
40+
41+
#### A3. Define hierarchical tier quotas
42+
43+
Partition H(t) across the 4-level hierarchy so no single tier monopolizes the hotpath:
44+
45+
- Shelf quota: q_s * H(t), example q_s = 0.10 for routing prototypes
46+
- Volume quota: q_v * H(t), example q_v = 0.20 for cluster prototypes
47+
- Book quota: q_b * H(t), example q_b = 0.20 for book medoids
48+
- Page quota: q_p * H(t), example q_p = 0.50 for individual page representatives
49+
50+
Subject to:
51+
52+
q_s + q_v + q_b + q_p = 1.0
53+
54+
Each quota tier holds the highest-salience representatives of that tier's entities. Shelf, Volume, and Book representatives are selected by medoid statistic within their cluster and then ranked by salience for admission.
55+
56+
#### A4. Define graph-community coverage quotas
57+
58+
Within each tier's budget, allocate slots proportionally across detected communities so one dense topic cannot consume all capacity. Community detection uses the existing Metroid neighbor graph through connected components or lightweight label propagation during Daydreamer idle passes.
59+
60+
For community C_i with n_i pages out of N total:
61+
62+
community_quota(C_i) = max(1, ceil(tier_budget * n_i / N))
63+
64+
This dual constraint, tier plus community, ensures both vertical coverage across hierarchy levels and horizontal coverage across topics.
65+
66+
---
67+
68+
### Phase B - Core Policy Module
69+
70+
#### B1. Create core/HotpathPolicy.ts
71+
72+
This becomes the central source of truth. It should export:
73+
74+
- computeCapacity(graphMass: number): number
75+
- computeSalience(hebbianIn: number, recency: number, queryHits: number, weights?): number
76+
- deriveTierQuotas(capacity: number, quotaRatios?): TierQuotas
77+
- deriveCommunityQuotas(tierBudget: number, communitySizes: number[]): number[]
78+
79+
All numeric constants such as c, alpha, beta, gamma, q_s, q_v, q_b, and q_p should live here as a frozen default policy object, analogous to the existing routing-policy and model-derivation defaults.
80+
81+
#### B2. Add tests for HotpathPolicy
82+
83+
Write tests first for:
84+
85+
- H(t) grows sublinearly
86+
- H(t) is monotonically non-decreasing
87+
- Tier quotas sum to capacity
88+
- Community quotas sum to tier budget and each remain at least 1
89+
- Salience is deterministic for the same inputs
90+
91+
#### B3. Extend core/types.ts
92+
93+
Add:
94+
95+
- PageActivity interface with queryHitCount and lastQueryAt
96+
- HotpathEntry interface with entityId, tier, salience, and optional communityId
97+
- MetadataStore hotpath methods such as putHotpathEntry, getHotpathEntries, evictWeakest, and getResidentCount
98+
99+
#### B4. Extend storage/IndexedDbMetadataStore.ts
100+
101+
Add:
102+
103+
- hotpath_index object store keyed by entityId
104+
- page_activity object store or equivalent page metadata extension
105+
- persistence methods for the new hotpath interfaces
106+
- storage tests covering hotpath persistence and resident counts
107+
108+
---
109+
110+
### Phase C - Salience Engine and Promotion Lifecycle
111+
112+
#### C1. Create core/SalienceEngine.ts
113+
114+
Add helpers such as:
115+
116+
- computeNodeSalience(pageId, metadataStore)
117+
- batchComputeSalience(pageIds, metadataStore)
118+
- shouldPromote(candidateSalience, weakestResidentSalience, capacityRemaining)
119+
- selectEvictionTarget(tier, communityId, metadataStore)
120+
121+
#### C2. Promotion and eviction lifecycle
122+
123+
Bootstrap phase:
124+
125+
- While hotpath size is below H(t), admit the highest-salience node not yet resident.
126+
127+
Steady-state phase:
128+
129+
- When a new or updated node has salience greater than the weakest resident in its tier and community bucket, evict the weakest and promote the candidate.
130+
- Break ties by recency.
131+
132+
Trigger points:
133+
134+
- On ingest: newly ingested pages become candidates
135+
- On query: queryHitCount increases and salience is recomputed
136+
- On Daydreamer pass: after LTP or LTD, recompute salience and run a promotion sweep
137+
138+
#### C3. Add tests for promotion and eviction
139+
140+
- Promotion during bootstrap fills to H(t)
141+
- Promotion in steady state evicts the weakest resident
142+
- Community quotas prevent topic collapse
143+
- Tier quotas prevent one hierarchy level from dominating
144+
- Eviction is deterministic under the same state
145+
146+
---
147+
148+
### Phase D - Hierarchical Quota Integration
149+
150+
#### D1. Upgrade hippocampus/HierarchyBuilder.ts
151+
152+
After building Books, Volumes, and Shelves, compute the medoid or prototype for each and attempt hotpath admission:
153+
154+
- Book medoid -> page-tier quota
155+
- Volume prototypes -> volume-tier quota
156+
- Shelf routing prototypes -> shelf-tier quota
157+
158+
If a tier is full, evict the weakest-salience entry in that tier.
159+
160+
#### D2. Upgrade cortex/Ranking.ts
161+
162+
The ranking cascade should search the resident hotpath first:
163+
164+
- Hot shelves first
165+
- Then hot volumes
166+
- Then hot books
167+
- Then hot pages
168+
169+
Only spill to warm or cold lookup when resident coverage is insufficient. This makes H(t) the primary latency-control mechanism.
170+
171+
#### D3. Apply the bound to per-level fanout
172+
173+
Max children per hierarchy node should also respect a Williams-derived limit:
174+
175+
- Max volumes per shelf: O(sqrt(|volumes| * log |volumes|))
176+
- Max books per volume: O(sqrt(|books_in_volume| * log |books_in_volume|))
177+
178+
When exceeded, trigger a split through HierarchyBuilder or ClusterStability.
179+
180+
---
181+
182+
### Phase E - Graph-Community Quota Integration
183+
184+
#### E1. Add community detection to Daydreamer
185+
186+
Use lightweight label propagation on the Metroid neighbor graph during idle passes. Store community labels in page activity metadata or a dedicated community-label store. Rerun when dirty-volume flags indicate meaningful structural change.
187+
188+
#### E2. Wire community labels into promotion
189+
190+
- If a community has remaining quota, promote freely.
191+
- If a community is at quota, the candidate must beat the weakest resident in that community.
192+
- If the community is unknown, place the node into a temporary pending pool that borrows from the page-tier budget.
193+
194+
#### E3. Add community-aware eviction tests
195+
196+
- Dense communities do not consume all slots
197+
- New communities get at least one slot
198+
- Empty communities release their slots
199+
200+
---
201+
202+
### Phase F - Metroid Maintenance Under the Bound
203+
204+
#### F1. Upgrade hippocampus/FastMetroidInsert.ts
205+
206+
- Derive max neighbors per page from H(t) or a related hotpath policy constant instead of hardcoded K
207+
- If a page is already at max degree, evict the neighbor with the lowest Hebbian edge weight
208+
- After insertion, check whether the new page qualifies for hotpath admission
209+
210+
#### F2. Upgrade daydreamer/FullMetroidRecalc.ts
211+
212+
- Bound dirty-volume recalc batch size by an H(t)-derived maintenance budget
213+
- Process at most O(sqrt(t log t)) pairwise comparisons per idle cycle
214+
- Prioritize dirtiest volumes first
215+
- Recompute salience for affected nodes and run a promotion sweep after recalculation
216+
217+
#### F3. Upgrade daydreamer/HebbianUpdater.ts
218+
219+
- After LTP or LTD, recompute sigma(v) for all nodes whose incident edges changed
220+
- Run a promotion and eviction sweep for changed nodes
221+
- Prune edges whose weight falls below threshold while keeping Metroid degree within bounds
222+
223+
#### F4. Upgrade daydreamer/PrototypeRecomputer.ts
224+
225+
- After recomputing volume or shelf prototypes, recompute salience for affected representative entries
226+
- Run tier-quota promotion or eviction for volume and shelf tiers
227+
228+
---
229+
230+
### Phase G - Retrieval Path Under the Bound
231+
232+
#### G1. Upgrade cortex/Query.ts
233+
234+
Full query flow:
235+
236+
1. Embed query
237+
2. Score against resident shelf prototypes
238+
3. Score against resident volume prototypes within top shelves
239+
4. Score against resident book medoids within top volumes
240+
5. Score against resident pages within top books
241+
6. Expand subgraph via getInducedMetroidSubgraph(seeds, maxHops)
242+
7. Solve coherent path via OpenTSPSolver
243+
8. Return result with provenance
244+
245+
The key constraint is that steps 2 through 5 operate on the resident set of size H(t), not the full corpus. Step 6 may touch warm or cold storage but remains bounded by maxHops and degree limits derived from the same policy.
246+
247+
Add a query cost meter that counts vector operations. If cost exceeds a Williams-derived budget, early-stop and return best-so-far.
248+
249+
#### G2. Apply the bound to subgraph expansion
250+
251+
Replace the fixed <30 node target with a dynamic bound:
252+
253+
- maxSubgraphSize = min(30, floor(sqrt(t * log2(1 + t)) / log2(t)))
254+
- maxHops = ceil(log2(log2(1 + t)))
255+
- perHopBranching = floor(maxSubgraphSize^(1 / maxHops))
256+
257+
These formulas shrink gracefully as the graph grows and keep expansion cost sublinear.
258+
259+
---
260+
261+
### Phase H - Verification and Benchmarks
262+
263+
#### H1. Unit tests per phase
264+
265+
- HotpathPolicy tests for capacity, quotas, and salience
266+
- SalienceEngine tests for promotion, eviction, and determinism
267+
- Hierarchy quota tests for tier budgets, fanout bounds, and spill behavior
268+
- Community quota tests for label propagation, proportional allocation, and minimum guarantees
269+
- Metroid tests for bounded degree and maintenance batch limits
270+
- Query tests for cost metering and subgraph size bounds
271+
272+
#### H2. Scaling benchmarks
273+
274+
Add tests/benchmarks/HotpathScaling.bench.ts with synthetic graphs at 1K, 10K, 100K, and 1M node-plus-edge counts.
275+
276+
Measure:
277+
278+
- resident set size vs H(t)
279+
- query latency vs corpus size
280+
- promotion and eviction throughput
281+
282+
Assert:
283+
284+
- resident count never exceeds H(t)
285+
- query cost scales sublinearly
286+
287+
#### H3. Guard extension
288+
289+
Treat c and the quota ratios as policy-derived, not model-derived. Keep them in core/HotpathPolicy.ts and consider adding a separate guard or lint rule to prevent hotpath constants from being hardcoded elsewhere.
290+
291+
#### H4. CI gate commands
292+
293+
- npm run guard:model-derived
294+
- npm run build
295+
- npm run lint
296+
- npm run test:unit
297+
- npm run benchmark
298+
- npm run test:browser
299+
- npm run test:electron
300+
301+
---
302+
303+
### Relevant Files
304+
305+
- DESIGN.md for theorem mapping, three-zone model, salience, quotas, fanout, and subgraph bounds
306+
- PLAN.md for rescoping Hippocampus, Cortex, and Daydreamer around the hotpath lifecycle
307+
- TODO.md for concrete tasks covering HotpathPolicy, SalienceEngine, community detection, and upgrades to ingest, retrieval, and maintenance
308+
- core/types.ts for PageActivity, HotpathEntry, and MetadataStore hotpath methods
309+
- core/HotpathPolicy.ts for central hotpath policy
310+
- core/SalienceEngine.ts for per-node salience and promotion logic
311+
- storage/IndexedDbMetadataStore.ts for hotpath persistence and resident metadata
312+
- Policy.ts for interaction points with routing policy
313+
- core/ModelDefaults.ts remains unchanged and separate from hotpath policy
314+
- hippocampus/FastMetroidInsert.ts for bounded degree and hotpath admission
315+
- hippocampus/HierarchyBuilder.ts for medoid admission and fanout bounds
316+
- cortex/Query.ts for resident-first retrieval and dynamic query limits
317+
- cortex/Ranking.ts for hot, warm, and cold spill logic
318+
- daydreamer/HebbianUpdater.ts for post-LTP or LTD salience recomputation and promotion sweeps
319+
- daydreamer/FullMetroidRecalc.ts for bounded maintenance batches and salience-aware recalculation
320+
- daydreamer/PrototypeRecomputer.ts for tier-quota promotion after prototype updates
321+
- daydreamer/ClusterStability.ts for community detection and split or merge triggers
322+
- tests/Persistence.test.ts for hotpath persistence and bounded graph behavior
323+
- tests/benchmarks/HotpathScaling.bench.ts for scaling validation
324+
325+
---
326+
327+
### Decisions
328+
329+
- t = |V| + |E| (pages + all edge types)
330+
- H(t) = ceil(c * sqrt(t * log2(1 + t)))
331+
- c is empirically tuned, not theorem-given
332+
- sigma(v) = alpha * H_in(v) + beta * R(v) + gamma * Q(v)
333+
- Default salience weights: alpha = 0.5, beta = 0.3, gamma = 0.2
334+
- Tier quotas: Shelf 10%, Volume 20%, Book 20%, Page 50%
335+
- Community quotas: proportional to community size with a minimum of 1 slot
336+
- Bootstrap rule: fill the hotpath greedily by salience until H(t)
337+
- Steady-state rule: promote only if candidate salience exceeds the weakest resident in the same tier and community bucket
338+
- Preserve the existing 4-level hierarchy, but bound fanout using Williams-derived limits and trigger split or merge through ClusterStability
339+
- Keep model-derived numerics entirely separate from hotpath policy
340+
- Apply the bound wherever space-time tradeoffs exist: resident index size, per-tier fanout, subgraph expansion, Metroid degree, and Daydreamer batch size
341+
342+
---
343+
344+
### Dependency Graph
345+
346+
A1 theorem docs
347+
A2 salience definition
348+
A3 tier quotas
349+
A4 community quotas
350+
-> B1 HotpathPolicy
351+
-> B2 HotpathPolicy tests
352+
-> B3 core types extension
353+
-> B4 IndexedDB extension
354+
-> C1 SalienceEngine
355+
-> C2 promotion lifecycle
356+
-> C3 promotion tests
357+
-> D1-D3 hierarchy integration
358+
-> E1-E3 community integration
359+
-> F1-F4 Metroid maintenance integration
360+
-> G1-G2 retrieval integration
361+
-> H1-H4 verification and benchmarks
362+
363+
D, E, and F can proceed in parallel once the policy and salience foundations are in place. Retrieval depends on hierarchy and community integration. Verification runs continuously.

0 commit comments

Comments
 (0)