Do we need an LLM-focused AP2 conformance test? Or is AP2 simple enough that any LLM can handle it? #98

pad01g · 2025-10-21T02:44:02Z

pad01g
Oct 21, 2025

Hi AP2 maintainers—thanks for all the work so far.

I’m exploring conformance testing for LLM-driven AP2 agents. The A2A TCK validates transport/API compliance, but it doesn’t tell us whether an LLM actually follows AP2’s flow rules.

Why this matters (examples):

Older GPT models don’t “know” AP2. To run AP2 accurately we must inject long prompts/RAG; as AP2 becomes richer, this gets brittle.
We’d like a test that answers “Is this model/agent AP2-capable?” E.g., can it:
- keep the Intent → Cart → Payment order,
- avoid issuing a CartMandate until all price-impacting info is collected,
- place Intent/Payment in Messages and Cart in an Artifact,
- re-price when a coupon or invite code is added before Payment (i.e., understand the whole flow, not just local steps).

Questions

Is an LLM-oriented AP2 conformance test suite needed, or is that out of scope for AP2?
Related doubt: Is AP2 intentionally simple enough that almost any LLM can be guided with minimal prompts—so a separate conformance suite wouldn’t add value?
If it is needed, where should such a spec live—in the AP2 repo, in a separate repo like “AP2-TCK”?

Thanks for your guidance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do we need an LLM-focused AP2 conformance test? Or is AP2 simple enough that any LLM can handle it? #98

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Do we need an LLM-focused AP2 conformance test? Or is AP2 simple enough that any LLM can handle it? #98

Uh oh!

pad01g Oct 21, 2025

Replies: 0 comments

pad01g
Oct 21, 2025