You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi AP2 maintainers—thanks for all the work so far.
I’m exploring conformance testing for LLM-driven AP2 agents. The A2A TCK validates transport/API compliance, but it doesn’t tell us whether an LLM actually follows AP2’s flow rules.
Why this matters (examples):
Older GPT models don’t “know” AP2. To run AP2 accurately we must inject long prompts/RAG; as AP2 becomes richer, this gets brittle.
We’d like a test that answers “Is this model/agent AP2-capable?” E.g., can it:
keep the Intent → Cart → Payment order,
avoid issuing a CartMandate until all price-impacting info is collected,
place Intent/Payment in Messages and Cart in an Artifact,
re-price when a coupon or invite code is added before Payment (i.e., understand the whole flow, not just local steps).
Questions
Is an LLM-oriented AP2 conformance test suite needed, or is that out of scope for AP2? Related doubt: Is AP2 intentionally simple enough that almost any LLM can be guided with minimal prompts—so a separate conformance suite wouldn’t add value?
If it is needed, where should such a spec live—in the AP2 repo, in a separate repo like “AP2-TCK”?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Hi AP2 maintainers—thanks for all the work so far.
I’m exploring conformance testing for LLM-driven AP2 agents. The A2A TCK validates transport/API compliance, but it doesn’t tell us whether an LLM actually follows AP2’s flow rules.
Why this matters (examples):
Older GPT models don’t “know” AP2. To run AP2 accurately we must inject long prompts/RAG; as AP2 becomes richer, this gets brittle.
We’d like a test that answers “Is this model/agent AP2-capable?” E.g., can it:
Questions
Related doubt: Is AP2 intentionally simple enough that almost any LLM can be guided with minimal prompts—so a separate conformance suite wouldn’t add value?
Thanks for your guidance.
Beta Was this translation helpful? Give feedback.
All reactions