Skip to content

Agent Project: Tool Extensions #2197

@disconcision

Description

@disconcision

Hazel Agentic Assistant: Edit Actions & Read Queries

A guide for contributors working on edit actions and read queries for the Hazel Agentic Assistant. Covers goals, framing, concrete directions, and process. This is rough notes from me (Andrew) cleaned up via LLM so WARNING: SLOP AHEAD. This document is a work in progress... it's designed to help inform ideas of what people might want to work on. If you're interested in working on something here, a good first step might be to try and pick and choose from this document to make a concrete plan for a direction of investigation: What do you want to make? How are you going to test it? Etc. I list some specific ideas at the end; a concrete project in this space could address one of these, some subset, or something else entirely.

Framing: The Meta-Programming REPL

A big aspect of the agentic assistant project is working towards kind of meta-programming REPL: a system for writing programs to help write programs. This in an increasingly standard generalization the tool call paradigm, where an LLM's ability to generate code is used not just to produce the final artifact but to operate on it at each step, reading, querying, and editing through programmatic interfaces.

Individual tools are the atomic operations in this REPL. The potential lies in how they compose and sequence into larger processes, an edit action language and a query language that let the agent (or a human-agent pair) express programming workflows at a semantic level.

Why Semantic Tools?

Standard agentic coding operates at the level of text: read line ranges, replace line ranges. This works, but:

  1. For agents: the hypothesis is that semantic tools, operating on definitions, bindings, types, case arms rather than line ranges, reduce the gap between what the agent intends ("update the definition of x") and what it must express ("replace lines 14-17 with ..."), reducing errors. Whether this actually helps is what we're investigating.

  2. For trace legibility: "updated definition of translate; added case arm for Reset; queried type of result" tells a story. A sequence of line-range edits doesn't. This matters for trust, auditability, and collaborative human-agent workflows.

  3. For composition: semantic actions are natural building blocks for higher-level programming strategies (see below). "Rename variable x to y" should bottom out at a semantic rename, not a pile of regex substitutions.

It's an open question how much these advantages matter in practice!

Compositionality and API Design

There's a useful analogy to VIM's compositional model, where verbs, nouns, and modifiers compose into edit commands. Our existing actions are often combinations of a syntactic edit with some semantic gating (e.g. "update the definition at this path" = navigate semantically + replace syntactically, gating on internal type errors). We should think about whether and how to factor these apart, having a compositional internal vocabulary even if individual tools exposed to the agent are higher-level packages.

This raises a batteries-included design question: what level of abstraction should we expose to agents? Options range from low-level composable primitives (more flexible, harder to use) to high-level packaged actions (easier to use, less flexible). Agents are already fairly adept with basic text tools (bash, grep, sed), possibly due to training data, but we can compensate for unfamiliarity with our tools by providing many concrete examples. The right answer likely involves both: a compositional internal language and higher-level convenience actions for common operations, with the agent's system prompt teaching it when to use which.

Concrete Scenarios

It's generally not effective to simply define a tool and make it available to an agent. At minimum the tool needs documentation with concrete examples. But the most productive direction here is to develop tools in tandem with concrete scenarios of use.

Write a scenario, a story of an agent (or human-agent pair) using this tool in a realistic task. This serves triple duty:

  1. It justifies the tool: if you can't tell a compelling story, reconsider building it.
  2. It becomes the documentation: the scenario goes into the system prompt to teach the agent when and how to use the tool. The exercise of "selling" a tool to a paper reader and "selling" it to an agent via system prompt examples are substantially the same exercise.
  3. It seeds evaluation: the scenario is a test case. As the tool matures, replace the mocked-up trace with an actual one.

The narrative scenario is simultaneously design document, documentation, and test spec... writing it is not overhead.

Connection to Programming Strategies

LaTosa and van der Hoek (2020), "Explicit Programming Strategies" describe structured, executable strategies that scaffold programming processes, programs whose steps are human/agent actions, not machine instructions. Strategies decompose complex tasks into explicit steps with decision points.

Semantic edit and read actions are natural leaf operations in such strategies. Example, "add a new variant to a sum type":

  1. Query the type definition (select)
  2. Update the type to add the constructor (selector_update)
  3. Find all case expressions over that type (select with wildcard arms)
  4. Insert a new arm in each (selector_insert_after)
  5. Query for remaining type errors (get_completeness)

This is the aspirational vision: semantic tools compose into semantic strategies. This is more future-orieneted, but ideally should inform how you design individual tools. They should compose.

Evaluation

Three dimensions:

  • Ablation: provide different tool sets to agents, measure relative performance on fixed tasks
  • Trace legibility: can a human reconstruct what the agent did and why?
  • Composability: can tools chain into multi-step workflows reliably?

A basic bar for individual contributions: show that an agent can meaningfully use your tool / tool modification. Given documentation and examples, the agent selects and applies it appropriately.

Current System Overview

The infrastructure (src/haz3lcore/CompositionCore/) curently provides ~41 tools in 5 categories. As time goes on we'll probably want to more more selective/intentional about how many tools we expose simultaneously; existing work shows that more tools fills up context windows with docs without yielding commensurate benefits. Good results here are probably going to involve some combination of (1) tight sets of expressive tools to be used compositionally, and/or (2) an initial small seed set of tools, with a system for contextually suggesting appropriate tools so we can load in docs/examples dynamically. Existing tools:

  • View: expand, collapse (code visibility)
  • Probe: place_probe, remove_probe, toggle_probe (runtime values)
  • Edit (14): definition/pattern/type/body updates, binding insert/delete, selector-based edits (selector_update, selector_delete, selector_insert_before, selector_insert_after)
  • Read (8): get_syntax, get_statics, get_context, select, get_canonical, get_completeness, view operations
  • Workbench (14): task/subtask management for plan-act-verify workflows

The Selector Language

Selectors are a DSL for addressing code by structure rather than position. Full spec in plans/selector-calculus.md. Key ideas:

  • Match form structure: let x = % focuses on x's definition, \| Inc => % focuses on Inc arm's body
  • \... descends into descendants; _ and _... are wildcards
  • Chains (App/update/) navigate binding structure; #N indexes positionally
  • Multiple matches for queries, unique match required for edits

The selector language is central to both reads and writes. The spec has gaps, and the current implemention is rough and buggy. Many directions below extend/improve it.


Directions: Edit Actions

Each direction is scoped for an individual contributor, though some are more ambitious.

1. Robust Insert Before/After

Now: handles case arms, list/tuple elements, module items, and a let-binding fallback.

To do:

  • Generalize across definition/declaration divide (module item vs. top-level let wrapping should feel uniform)
  • Support test expressions and index-based selectors
  • Decide scope: all selectors, or only variable-length forms (lists, tuples, case arms, modules, let sequences)? Inserting "before" an if condition has no obvious semantics, but the let-binding fallback blurs the line. Articulate a clear policy.

Scenario to develop: agent building up a case expression incrementally, or populating a module one definition at a time.

2. Multi-Cursor / Bulk Edits

Now: write actions require unique selector match (query_unique).

To do:

  • Selectively relax uniqueness: \... 4 with update_all could replace every 4 with 5
  • Which operations allow multi-match? Bulk replace/delete seem safe; bulk insert is trickier
  • Explicit opt-in (e.g. update_all variant) rather than changing defaults

Scenario to develop: renaming a constant, or updating all case arms to use a new helper.

3. Semantic Attribute Filters

Now: selectors match syntactic structure only.

To do: extend selectors with semantic attribute filters. (Very!!) tentative syntax:

  • node[sort: Pat] (match by sort)
  • node[type.expected: (String, Bool)] (match by expected type)
  • node[type.expected: consistent_with(? -> MyType)] (type consistency)
  • node[values: include(10)] (match by live values)

Minimal version: sort filtering only. Ambitious version: full type system + live evaluation. Key test: can you write a scenario where an attribute filter lets the agent express something it can't with purely syntactic selectors?

Scenario to develop: "find all expressions of type Color" for a refactor, or "find all case arms with type errors" for debugging.

4. Non-Linear Patterns / Reference Selection

Now: single focus (%), no cross-position constraints.

To do: investigate scope-respecting variable reference selection. Speculative:

let x[id] = _ in \... x[id]

where [id] constrains both occurrences to the same binding. Half-baked, but the question matters: can we express scope-aware refactors compositionally, rather than building them in as primitives? Even a partial answer like @refs(x) (all references to binding x) would be valuable.

Scenario to develop: variable rename as "select all references" + "bulk update".

5. Composite Actions / Simple Scripts

Now: each tool call is atomic; multi-step edits need multiple round-trips.

To do: allow a sequence of actions as a single compound operation. Simplest: ordered list of (selector, edit) pairs. More sophisticated: conditional branching on query results. Even just sequencing would let "rename variable" be one compound action instead of N updates.

Scenario to develop: multi-step refactor (rename, extract function, add annotation) as a script.


Directions: Read Queries

6. Path-Augmented Code Maps

Now: get_syntax returns pretty-printed code; expand/collapse control visibility.

To do: investigate whether annotating code with selector paths helps agents make better edits.

  • Do path comments (/* App/update */ let update = ...) improve targeting accuracy?
  • Absolute vs. relative paths?
  • Think of these as materialized views: query results with code alongside addressing selectors
  • Could forms advertise available actions (REST-like affordances)?

Scenario to develop: agent reads annotated code map, then edits using those paths, vs. same task without annotations.

7. Semantic Query Expansion

Now: get_statics for types, get_context for in-scope bindings.

To do: richer queries leveraging Hazel's type system and live evaluation:

  • Type-based: "all expressions of type T" or "consistent with T -> ?"
  • Error-focused: "all nodes with type errors" or "all holes"
  • Value-based: "all expressions evaluating to 0" (via dynamics)

Overlaps with Direction 3 but oriented toward read: building understanding before acting.

Scenario to develop: debugging a type error by narrowing down through semantic queries.

8. Canonical Selectors in Read Output

Now: select returns code; get_canonical returns selectors separately.

To do: should read results systematically include canonical selectors? When querying \... let _ = %, annotate each result with its canonical selector for immediate use in follow-up edits. This is about read output feeding directly into write input.

Scenario to develop: wildcard query to survey definitions, then edit one using its canonical selector from the result.

9. Results Within Results

Now: flat list of matches; sub-matches not distinguished.

To do: decide on nesting policy. (\_, 2) applied to ((1, 2), 2): match the outer tuple, the inner, or both? For reads, nested results seem useful. For edits, nesting complicates things (editing an outer match may invalidate inner ones). Articulate a policy, potentially different for reads vs. writes.


How to Approach This Work

Process

  1. Pick a direction and scope it. Subsets are fine.
  2. Write the scenario first. What task, what tool calls, what does success look like?
  3. Implement. Key files in CompositionCore/:
    • Action types: CompositionActions.re
    • Execution: CompositionGo.re (edits) or read dispatch
    • Selectors: Selector.re
    • Tool JSON: ToolJsonDefinitions/
    • Registration: CompositionUtils.re
    • Tests: Test_AgentTools.re
  4. Document. Ground it in your scenario with concrete examples. Goes into both the tool JSON (for the agent) and your write-up (for humans).
  5. Evaluate. Demonstrate the agent uses your tool appropriately. Ideally, ablation against the same task without the tool.
  6. Think holistically. How does your tool interact with others? Does it enable new workflows?

Branching and Collaboration

Create a feature branch off coding-agent-actions and open a draft pull request. Use the PR description as a living notebook: share in-progress notes, link to documentation or design files, describe your current status. This lets others check in on your work and see what's happening across the project.

Running the Assistant

To experiment with the in-editor assistant, get an OpenRouter API key from Cyrus or Andrew. A good default model is Gemini 3.0 Flash, which is a good balance of quality and cost.

Bug Reports and Feedback

The assistant is a work in progress. If you encounter issues (bugs, usability problems, confusing behavior), file an issue or bring it up in the assistant-agent Slack channel. Even if you're not sure whether something is a bug or a UX issue, reporting it is valuable. Don't hesitate.

CLI / Claude Code Integration

If you're interested in using semantic edit actions from an existing harness like Claude Code (rather than the in-editor assistant), Andrew has a branch extending the Hazel CLI for this. Ask him about it.


References

  • Selector calculus: plans/selector-calculus.md
  • Implementation: src/haz3lcore/CompositionCore/
  • System prompt: src/haz3lcore/CompositionCore/prompt_factory/CompositionPrompt.re
  • Tests: Test_AgentTools.re
  • LaTosa, T. D. and van der Hoek, A. "Explicit Programming Strategies." Empirical Software Engineering, 2020.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions