Skip to content

[Feature] Two-level impl granularity in artefact catalog (module narrative + symbol-grounded) with authoring heuristic #20

@mchamier

Description

@mchamier

Skill

pharaoh-author, pharaoh-req-from-code, plus catalog schema (shared/artefact-catalog.md). Potentially a new sibling atom pharaoh-impl-granularity-decide.

Use Case

When a project authors impl records that map 1:1 to source files, individual file-scope records can grow to ~200-300 LOC of behaviour described by a single narrative paragraph. pharaoh-req-code-grounding-check runs cleanly at file scope but cannot mechanically verify that a specific function inside the file matches the slice of body text describing it. Reviewers fall back to reading the whole module against the whole body.

The reverse extreme is just as bad: a project that emits one impl record per public symbol, including trivial wrappers and one-liners, produces RST boilerplate that carries no audit signal.

The audit half of this problem is already solved: pharaoh-req-code-grounding-check supports symbol scope via :source_symbol: (already merged in #9). What remains is a catalog-level decision about how to express the two scopes as first-class artefact types, plus a mechanical heuristic for deciding which scope a given public symbol belongs to.

This proposal arises from a planned adoption in useblocks/ubconnect. The local adoption-only work (declaring :source_symbol: and authoring four symbol-scope impl records as a worked example) is tracked under useblocks/ubconnect#274. That issue explicitly defers the catalog evolution upstream rather than inventing it unilaterally, to avoid divergent vocabulary across useblocks projects.

Proposed Behavior

Two pieces.

1. Catalog evolution

Add a second artefact-catalog type representing module-level narrative, distinct from symbol-level grounded impl. Working name task (open to bikeshedding). Concrete shape:

task:
  required_fields: [id, title, status, source_doc]
  optional_fields: [tags, rationale]
  required_links: [satisfies]
  optional_links: []
  lifecycle: [open, in_progress, implemented]

impl:
  required_fields: [id, title, status, source_doc, source_symbol]
  optional_fields: [tags, rationale]
  required_links: [implements]
  optional_links: []
  lifecycle: [open, in_progress, implemented]
  • task carries the module-level narrative ("RST converter for Jira wiki and Cloud ADF inputs") and audits structurally (file exists, parent resolves).
  • impl becomes symbol-scoped (one record per public symbol), body in audit-grounded prose, audits via pharaoh-req-code-grounding-check at symbol scope.

This is one possible shape. Alternatives the maintainers may prefer:

  • Keep impl as the only type, parameterised by presence/absence of :source_symbol:. The catalog declares both shapes as valid under the same type. Less vocabulary, less catalog churn, but the two shapes have different audit policies, which conflates them in pharaoh-link-completeness-check.
  • Use a different name for the module-level type (task is overloaded with project-management vocabulary). Candidates: module, impl_module, narrative.

2. Authoring heuristic skill

A new atom (working name pharaoh-impl-granularity-decide) that takes one public symbol from a source file and emits a decision: emit a symbol-scoped impl RST record, OR emit a codelinks marker only. The skill is read-only and mechanical.

Proposed heuristic axes (any one triggers "emit RST impl"):

  • 2 or more alternative control-flow paths excluding pure error handling.
  • Raises at least one project-defined exception (sphinx-needs :source_doc: resolves to a project file, exception class is defined in the project source tree).
  • Type-transforming signature (input type differs from output type at the language level).
  • Touches an external boundary (network, filesystem, subprocess).

Symbols that fail all four axes are wrappers / one-liners / constants; they emit a codelinks marker only.

This is intentionally narrow. The axes are all already extracted by pharaoh-req-code-grounding-check and pharaoh-api-coverage-check; this skill just reuses the same extraction passes to gate authoring rather than to verify finished records.

3. Documentation

Update the relevant guidance in pharaoh-author and pharaoh-req-from-code to surface the two-level option when the project's catalog declares both types.

Alternatives Considered

  • Keep impl as file-scope only. Already implemented. The audit gap (symbol-level grounding) is exactly what motivated :source_symbol: support in Plan-driven orchestration, atomic skills, review + grounding axes, diagrams #9; reverting that is not an option.

  • Make impl symbol-scope only. Forces every project to author one RST record per public method, even for wrappers and constants. This produces high RST boilerplate density for low audit signal and was discarded in early ubconnect adoption planning.

  • Carry both shapes in a single type, parameterised by :source_symbol: presence. Less catalog churn but conflates two audit policies under one identifier. pharaoh-link-completeness-check would need per-need branching, which the current atom design avoids.

  • Skip the heuristic entirely; rely on author judgement. Workable but produces inconsistent corpora across projects. The heuristic is mechanical and reuses extraction logic already shipping in other audit atoms, so the marginal cost is low.

Pilot adopter

useblocks/ubconnect plans to adopt symbol-scope impl on one module (markup.py) as a worked example, tracked under useblocks/ubconnect#274. That work uses :source_symbol: directly and authors per-symbol records ad-hoc; once this upstream proposal lands, the same records can be relabelled to whatever final type names this issue settles on.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions