[fm] Add disk diagnoser with typed fact tables#10541
Conversation
| ) | ||
| .execute_async(&conn) | ||
| .await?; | ||
| // Delete every child row by sitrep_id. Driving this off |
There was a problem hiding this comment.
This looks a little involved, but it's worth noting: This is only used in tests? So doing some one-time complexity to avoid a need for future updates on each new fact type seems like a nice tradeoff.
…lerate duplicate zpools - The per-kind CHECK constraint on fm_fact_physical_disk is now an implication (kind != 'zpool_unhealthy' OR columns present), so future fact kinds add their own constraint instead of rewriting this one - The fm_analysis in-service disk projection skips soft-deleted physical_disk rows, matching what its comment already claimed - A duplicate physical disk in the zpool listing now logs a warning and keeps the first zpool instead of panicking the background task
SitrepBuilder::cases is seeded from the open cases of the Input the builder was constructed from, but diagnosis::analyze() previously took the Input as a separate argument; nothing tied the two together, and an engine handed a mismatched pair would panic looking up a case that was never seeded. Store the Input on the builder and have engines read it from there, making the mismatch unrepresentable.
Facts are only read during sitrep load, which paginates over the (sitrep_id, id) primary key; nothing queries by case directly. Easy to re-add with a migration if omdb grows a case-scoped fact query.
Replace the per-disk linear scan of parent cases with a one-time inverse index. First case ID wins on (pathological) collisions, same as the scan it replaces.
| /// The analysis input this builder was constructed from. `cases` is | ||
| /// seeded from this input's open cases, so diagnosis engines must read | ||
| /// their inputs from here rather than taking a separate `Input` argument | ||
| /// that might disagree. | ||
| input: &'a analysis_input::Input, |
There was a problem hiding this comment.
So: I'm shoving this in here because we previously called analyze with SitrepBuilder and input as two distinct arguments.
Realistically, they would come from the same input, but I want to prevent them from ever diverging, so... now they're both part of SitrepBuilder.
(This is a very tiny change but it does change the API of analyze)
| -- UUID of the case this fact attaches to. | ||
| case_id UUID NOT NULL, |
There was a problem hiding this comment.
originally I had an index for "lookup fact by case ID" but I actually removed it; it's not necessary with how we do sitrep loading currently.
This PR provides an alternative to #10460 , using strongly-typed "enum-ish" tables for each DE's set of facts, rather than a JSON blob. It should be mutually exclusive with that PR (we either do JSON, or we don't).
The first fault management diagnosis engine: opens a case for any non-Online zpool whose backing physical disk is currently in service in the control plane, and closes it on recovery or expungement.
Supporting infrastructure introduced along the way: