ContextSchema

Validate retrieved context before AI agents act.

Experimental pre-0.1 project. The core API is usable for examples and early review, but public API stability, package publishing, and integrations are still pending. Current package version: 0.0.1.

ContextSchema is a small Python library for checking whether retrieved context is fresh, provenance-backed, event-valid, source-appropriate, and complete enough for a specific decision before an agent acts.

It is also a design-time contract: it forces PMs and engineers to declare what context is enough, expected, stale, or unusable for a decision, instead of assuming the agent will always receive perfect context.

It sits after retrieval and before action:

retriever / memory / tool output -> ContextSchema -> proceed | soft_flag | retry_recommended | hard_gate

No runtime dependencies are required.

At A Glance

Question	Short Answer
What problem does it solve?	Retrieved context can be stale, incomplete, weakly sourced, or invalidated before an agent acts.
What design habit does it enforce?	Declare the context sufficiency boundary before relying on agent recommendations.
Where does it run?	After retrieval/tool output, before action/tool execution.
What does it return?	Field confidence, schema confidence, reasons, evidence, and an action recommendation.
What does it depend on?	Nothing at runtime. It accepts plain Python objects.
Does it replace my agent framework?	No. It is a small validation layer you call from your existing workflow.

Where It Fits

flowchart LR
    A[Retriever, memory, or tool output] --> B[RetrievedItem metadata]
    B --> C[ContextSchema]
    D[EventRecord invalidations] --> C
    E[TTLs, sources, reliability] --> C
    C --> F{ActionPolicy}
    F --> G[proceed]
    F --> H[soft_flag]
    F --> I[retry_recommended]
    F --> J[hard_gate]

What This Is

A deterministic post-retrieval validation layer.
A way to declare required decision fields with TTLs, sources, criticality, and invalidation events.
A scoring and evidence layer for field confidence and whole-schema confidence.
A small action recommendation surface for orchestrators and agent runtimes.
A local JSONL-compatible evidence record that avoids storing raw retrieved text by default.

What This Is Not

Not a vector database.
Not a retriever.
Not an agent framework.
Not a memory store.
Not a data catalog.
Not an observability platform.
Not a policy engine like OPA/Cedar.
Not an LLM classifier or extractor.

Those systems can feed or consume ContextSchema. The core library only answers:

Is this retrieved context valid enough for this decision right now?

Who Should Use This

Use ContextSchema if you are building:

AI agents that take actions based on retrieved documents, memory, or tool output.
RAG applications where stale or weakly sourced context can cause bad decisions.
Customer-service, coding, finance, procurement, HR, sales, security, or ops agents that need a pre-action validity check.
Evaluation or observability pipelines that need a compact, replayable evidence record for why an agent proceeded, retried, soft-flagged, or stopped.

You probably do not need it if your app only summarizes text, chats over static documents, or never lets an agent take consequential actions.

For a scenario-driven explanation of context sufficiency, provenance, relevance, timeliness, retrieval metadata, and prompt-vs-contract tradeoffs, see WHY_CONTEXTSCHEMA.md.

When To Use It

Use ContextSchema When	Use Something Else When
You already have retrieved context and need to decide whether it is safe enough to act on.	You need to retrieve, rank, embed, or store documents.
Context freshness, provenance, or event invalidation affects the decision.	You only need output moderation or prompt guardrails.
You want deterministic reasons and evidence before an action runs.	You need a full tracing, dashboarding, or eval platform.
You want a small Python core that can sit inside an existing stack.	You want an end-to-end agent framework.

Compatibility With Existing Tools

ContextSchema is designed to be called from other systems, not to replace them.

flowchart TB
    R[Retrievers and memory systems] --> CS[ContextSchema validation]
    C[Data catalogs and freshness checks] --> CS
    EV[Business or system events] --> CS
    CS --> O[Agent orchestrators]
    CS --> P[Policy engines]
    CS --> T[Tracing and eval tools]

Tool Category	How It Fits
LangChain / LangGraph	Call `validate_retrieved()` in middleware, before a tool call, or before committing an agent action.
LlamaIndex	Validate retrieved nodes/documents after retrieval and before response synthesis or tool execution.
Redis / Zep / Mem0	Treat memory or context-engine output as upstream context; pass timestamps, source refs, and reliability through metadata.
dbt / Great Expectations / Tecton / Feast	Use freshness, feature, or data-quality metadata as input signals for context fields.
OPA / Cedar / custom policy	Feed `result.to_policy_input()` into a policy layer if you want external allow/deny rules.
Langfuse / Braintrust / Phoenix / OpenTelemetry	Export or attach the evidence record after validation; ContextSchema is not an observability backend.
Agent scaffolds and internal platforms	Use it as a plain Python pre-action gate because it has no runtime dependencies.

Multi-Decision Agents

A super-agent that handles many decision types should usually use multiple schemas, not one giant schema.

For example, a merchandising agent may:

recommend markdowns
recommend price changes
suggest store-to-store inventory transfers
answer root-cause questions

Each decision has a different validity contract. Markdown decisions may require margin guardrails and pricing policy. Store-transfer decisions may require source-store inventory, destination-store demand, replenishment state, and transfer constraints. Root-cause analysis may allow a qualified answer when competitor pricing is missing, while automated action should hard-gate.

Use DecisionRegistry and SchemaRouter to make that routing explicit:

from contextschema import DecisionRegistry, SchemaRouter

registry = DecisionRegistry(
    {
        "markdown": MarkdownDecision,
        "store_transfer": StoreTransferDecision,
        "root_cause": RootCauseDecision,
    }
)
router = SchemaRouter(registry)

result = router.validate(
    decision_type,
    retrieved_items,
    decision_id="agent-run-123",
    events=events,
)

Pattern:

agent intent / decision type
-> selected ContextSchema
-> context validity check
-> proceed | soft_flag | retry_recommended | hard_gate

See examples/merchandising_super_agent_router.py for a runnable example.

Installation

From this repository:

git clone https://github.com/Novice-ninja/contextschema-py.git
cd contextschema-py
python3 -m venv .venv
source .venv/bin/activate
pip install -e .

Directly from GitHub:

pip install "git+https://github.com/Novice-ninja/contextschema-py.git"

For local test runs without installing:

PYTHONPATH=src python3 -m unittest discover -s tests -v

Quickstart

from datetime import UTC, datetime, timedelta

from contextschema import ActionPolicy, ContextField, ContextSchema, EventRecord, RetrievedItem


class BuyDecision(ContextSchema):
    schema_version = "1"
    action_policy = ActionPolicy(
        retry_below=0.75,
        soft_flag_below=0.75,
        hard_gate_on_event_invalidated_required_field=True,
    )

    inventory_available = ContextField(
        source=["inventory_api", "warehouse_snapshot"],
        ttl=timedelta(minutes=5),
        criticality=1.0,
        required=True,
        invalidates_on=["inventory_adjusted"],
    )

    current_price = ContextField(
        source="pricing_api",
        ttl=timedelta(minutes=15),
        criticality=1.0,
        required=True,
    )


items = [
    RetrievedItem(
        id="inv-1",
        text="2 units available",
        metadata={
            "context_field": "inventory_available",
            "source": "warehouse_snapshot",
            "source_ref": "warehouse:EWR-1:COAT-742",
            "valid_at": "2026-05-25T16:00:00Z",
        },
    ),
    RetrievedItem(
        id="price-1",
        text="$139.00",
        metadata={
            "context_field": "current_price",
            "source": "pricing_api",
            "source_ref": "pricebook:US:COAT-742",
            "valid_at": "2026-05-25T16:04:00Z",
        },
    ),
]

events = [
    EventRecord(
        event_id="evt-1",
        event_type="inventory_adjusted",
        occurred_at=datetime(2026, 5, 25, 16, 3, tzinfo=UTC),
        affected_fields=["inventory_available"],
        affected_sources=["warehouse_snapshot"],
        source_ref="warehouse:EWR-1:COAT-742",
    )
]

result = BuyDecision.validate_retrieved(
    items,
    decision_id="retail-buy-001",
    events=events,
    evaluated_at=datetime(2026, 5, 25, 16, 5, tzinfo=UTC),
)

print(result.action)
print(result.schema_confidence.score)
print(result.field("inventory_available").status)
print(result.to_policy_input())

Expected output starts with:

hard_gate
0.0
event_invalidated

The inventory evidence was retrieved at 16:00, then an inventory_adjusted event happened at 16:03, so the required field is no longer valid for the buy decision.

API Surface

API	Purpose
`ContextSchema`	Base class for decision schemas. Define `ContextField` attributes and call `validate_retrieved()`.
`ContextField`	Field contract: source, TTL, requiredness, criticality, invalidation events, event policy, source reliability, privacy tags.
`RetrievedItem`	Normalized retrieved context item with `id`, optional `text`, and metadata.
`EventRecord`	Business or system event that may invalidate context after retrieval.
`ActionPolicy`	Maps field/schema confidence into `proceed`, `soft_flag`, `retry_recommended`, or `hard_gate`.
`DecisionRegistry`	Registers multiple `ContextSchema` subclasses by decision type for multi-capability agents.
`SchemaRouter`	Routes validation to the schema registered for a decision type.
`ValidationResult`	Top-level result with field confidence, schema confidence, action, warnings, retry recommendations, and evidence.
`FieldConfidence`	Per-field score, status, component factors, reasons, matched item IDs, source refs, timestamps, invalidating events.
`SchemaConfidence`	Whole-decision score, aggregation method, weakest fields, missing required fields, event-invalidated fields.
`DecisionEvidenceLog`	Append-only JSONL writer for validation evidence records.
`ContextSchemaError`	Public exception raised for invalid schemas, fields, retrieved items, or event records.
`load_events_jsonl()`	Load event records from JSONL with strict or skip-bad-record behavior.
`explain_evidence()`	Lightweight human-readable explanation for a stored evidence record.

Useful result helpers:

result.field("field_name")
result.weak_fields(threshold=0.75)
result.to_policy_input()
result.to_dict()
result.to_json()

Stability notes for early users:

ContextSchemaError is intentionally exported as the public package error.
ContextSchema.schema_definition() is intended as the public schema export shape.
ValidationResult.to_policy_input() is intended as the public pre-policy handoff shape.
Because this is 0.0.1, future releases may add fields or refine scoring behavior, but these names are the current public API boundary.

Core Behavior

Case	Behavior
Missing required field	Field score `0.0`, schema score capped at `0.0`, retry or hard gate depending on policy.
Missing optional field	Field status `missing_optional`, score `1.0`, schema not penalized.
Multiple candidates	Highest-scoring candidate is selected, all candidate IDs are reported with `ambiguous_multiple_candidates`.
Malformed timestamps	Ignored for matching; TTL-bound fields receive timestamp metadata penalties.
Event comparison impossible	Optional event policy warns only; required event policy reduces confidence.
Event invalidation	Matching events after evidence timestamp set event factor to `0.0`.
Weak source reliability	Field status becomes `weak_source`; retry can be recommended.
Raw retrieved text	Excluded from evidence by default; stored only with `store_raw_text=True`.
Empty schema	Raises `ContextSchemaError`.
Malformed event JSONL	Raises in strict mode; skips bad records with `strict=False`.

Examples

The examples/ folder covers common enterprise agent use cases where an agent may act on stale, incomplete, or invalidated context.

Example	File
Customer service refund decision	`examples/customer_service_refund.py`
Coding agent code-change decision	`examples/coding_agent_change.py`
Merchandising super-agent schema router	`examples/merchandising_super_agent_router.py`
Sales opportunity next step	`examples/sales_opportunity_next_step.py`
Procurement vendor onboarding	`examples/procurement_vendor_onboarding.py`
Finance invoice approval	`examples/finance_invoice_approval.py`
HR leave-policy guidance	`examples/hr_employee_policy.py`
Security access review	`examples/security_access_review.py`

Run all examples:

for file in examples/*.py; do
  echo "$file"
  PYTHONPATH=src python3 "$file"
done

More context: examples/README.md

Project Status And Releases

Current publishing target: public GitHub repository only.

PyPI publishing is deferred until the project reaches a tagged 0.1.0 release candidate with a more stable public API, passing CI, and a clearer changelog. Until then, install from GitHub or local checkout.

Helpful repo docs:

Research Notes

The long research ledgers are intentionally not part of this public package repository. The public summary is in RESEARCH.md: it explains the current positioning, what this package is not trying to replace, and why the first release is a small post-retrieval validation library.

Run Tests

PYTHONPATH=src python3 -m unittest discover -s tests -v

Current suite covers core scoring, event invalidation, ambiguity, weak sources, evidence privacy defaults, JSONL loading, and all examples.

License

MIT. See LICENSE.

Parked Scope

These are intentionally not part of the current core:

Area	Current Decision
LangChain/LangGraph adapters	Deferred
LlamaIndex adapters	Deferred
Redis/Zep/Mem0 adapters	Deferred
dbt/GX/Tecton freshness imports	Deferred
OPA/Cedar export	Deferred
Langfuse/Braintrust/Phoenix export	Deferred
OpenTelemetry mapping	Deferred
LLM fallback extractor/classifier	Deferred
CLI	Deferred
YAML/JSON schema export	Deferred
Hosted service/dashboard	Rejected for MVP
PyPI release	Deferred

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github		.github
examples		examples
src/contextschema		src/contextschema
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
RESEARCH.md		RESEARCH.md
WHY_CONTEXTSCHEMA.md		WHY_CONTEXTSCHEMA.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ContextSchema

At A Glance

Where It Fits

What This Is

What This Is Not

Who Should Use This

When To Use It

Compatibility With Existing Tools

Multi-Decision Agents

Installation

Quickstart

API Surface

Core Behavior

Examples

Project Status And Releases

Research Notes

Run Tests

License

Parked Scope

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ContextSchema

At A Glance

Where It Fits

What This Is

What This Is Not

Who Should Use This

When To Use It

Compatibility With Existing Tools

Multi-Decision Agents

Installation

Quickstart

API Surface

Core Behavior

Examples

Project Status And Releases

Research Notes

Run Tests

License

Parked Scope

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages