Smart AIPI provides access to the full OpenAI model family at 75% less than OpenAI's direct pricing (you pay 25% of list rates). All models are accessed through standard OpenAI-compatible endpoints with no code changes required.
- API base URL:
https://api.smartaipi.com/v1 - Anthropic-compatible base:
https://api.smartaipi.com - WebSocket:
wss://api.smartaipi.com/v1/realtime
| Model | Input / 1M | Cached Input / 1M | Output / 1M | Notes |
|---|---|---|---|---|
gpt-5.4 |
$0.625 | $0.0625 | $3.75 | State of the art — SWE-Bench Pro 57.7%, FrontierMath 47.6% |
gpt-5.4-mini |
$0.1875 | $0.01875 | $1.125 | 60% Terminal-Bench 2.0 — best price-performance small model |
gpt-5.4-nano |
$0.05 | $0.005 | $0.3125 | Ultra-cheap routing, classification, summarization |
GPT-5.4 is OpenAI's most capable model. Benchmark highlights vs previous generation:
- OSWorld (computer use): 75.0% (up from 74.0%)
- SWE-Bench Pro (software engineering): 57.7% (up from 56.8%)
- GPQA Diamond (expert science reasoning): 92.8%
- FrontierMath (advanced math): 47.6%
- GDPval (knowledge work): 83.0% (up from 70.9%)
- Toolathlon (agentic tool use): 54.6%
GPT-5.4 also supports a 1M context window (up from 272K; tokens beyond 272K are billed at 2x). Priority processing is available via service_tier: "priority" at 1.5x standard rates.
GPT-5.4 Mini scores 60% on Terminal-Bench 2.0 — outperforming Gemini 3 Flash (47.7%) while costing less. Ideal for coding subagents and parallelized development workflows.
GPT-5.4 Nano supports Chat Completions only (not Responses API). Use for lightweight tasks: classification, summarization, routing, content moderation.
| Model | Input / 1M | Cached Input / 1M | Output / 1M | Notes |
|---|---|---|---|---|
gpt-5.3-codex |
$0.4375 | $0.045 | $3.50 | Previous frontier — excellent for coding agents |
gpt-5.3-codex-mini |
$0.0375 | $0.005 | $0.15 | Fast, cheap coding subagent |
gpt-5.3-mini |
$0.0375 | $0.005 | $0.15 | Fast small model |
GPT-5.3 Codex is OpenAI's previous frontier model and remains a top choice for coding agents (Codex CLI, OpenCode, Claude Code via Anthropic endpoint). Strong on SWE-Bench (56.8%) and all agentic benchmarks.
| Model | Input / 1M | Cached Input / 1M | Output / 1M | Notes |
|---|---|---|---|---|
gpt-5.2-codex |
$0.4375 | $0.045 | $3.50 | Solid coding frontier |
gpt-5.2-codex-mini |
$0.0375 | $0.005 | $0.15 | Fast coding model |
gpt-5.2 |
$0.4375 | $0.045 | $3.50 | General purpose |
gpt-5.2-mini |
$0.0375 | $0.005 | $0.15 | Lightweight tasks |
| Model | Input / 1M | Cached Input / 1M | Output / 1M | Notes |
|---|---|---|---|---|
gpt-5.1 |
$0.375 | $0.0375 | $3.00 | Balanced capability and cost |
gpt-5.1-codex-mini |
$0.0375 | $0.005 | $0.15 | |
gpt-5.1-mini |
$0.0375 | $0.005 | $0.15 |
| Model | Input / 1M | Cached Input / 1M | Output / 1M | Notes |
|---|---|---|---|---|
gpt-5 |
$0.3125 | $0.03125 | $2.50 | Solid general-purpose model |
gpt-5-mini |
$0.0625 | $0.0075 | $0.50 | Low-cost, fast |
gpt-5-nano |
$0.0125 | $0.0025 | $0.10 | Ultra-cheap for high-throughput |
gpt-5-codex-mini |
$0.0375 | $0.005 | $0.15 | Coding-optimized small model |
| Model | Input / 1M | Cached Input / 1M | Output / 1M | Notes |
|---|---|---|---|---|
gpt-4.1 |
$0.50 | $0.125 | $2.00 | Reliable, well-tested |
gpt-4.1-mini |
$0.10 | $0.025 | $0.40 | Efficient mid-tier |
gpt-4.1-nano |
$0.025 | $0.0075 | $0.10 | Near-free for simple tasks |
| Model | Input / 1M | Cached Input / 1M | Output / 1M | Notes |
|---|---|---|---|---|
gpt-4o |
$0.625 | $0.3125 | $2.50 | Multimodal, broad support |
gpt-4o-mini |
$0.0375 | $0.02 | $0.15 | Low-cost multimodal |
Image generation is billed per image based on resolution and quality, not per token.
| Model | Notes |
|---|---|
gpt-image-1.5 |
Frontier image generation and editing |
gpt-image-latest |
Always resolves to the latest image model |
gpt-image-1-mini |
Faster, lower-cost image generation |
Endpoint: POST /v1/images/generations
| Model | Notes |
|---|---|
sora-2 |
Up to 20 seconds of video from text prompt or image |
sora-2-pro |
Higher quality video, longer duration |
Endpoint: POST /v1/videos
When using the Anthropic-compatible endpoint (/v1/messages), Claude model names are automatically routed to the closest GPT equivalent:
| You Request | Routes To | Output / 1M |
|---|---|---|
claude-opus-4-6 |
gpt-5.3-codex | $3.50 |
claude-sonnet-4-5 |
gpt-5.3-codex | $3.50 |
claude-haiku-4-5 |
gpt-5.3-codex-mini | $0.15 |
Any claude-* model |
gpt-5.3-codex | $3.50 |
Prompt caching is automatic — no configuration needed. Requests with long, repeated system prompts (common in agentic workflows) benefit automatically. Cache hit rates of 30–50% are typical for agent loops with consistent system prompts.
| Use Case | Recommended Model | Why |
|---|---|---|
| Best quality coding | gpt-5.4 |
State-of-the-art SWE-Bench, complex multi-file tasks |
| Coding subagents | gpt-5.4-mini |
60% Terminal-Bench at 70% less cost than gpt-5.4 |
| Routing, classification, summaries | gpt-5.4-nano |
Near-free at $0.05/1M input |
| Coding agents (proven) | gpt-5.3-codex |
Excellent track record, slightly lower cost than 5.4 |
| Fast lightweight tasks | gpt-5-mini |
Balanced between speed and capability |
| High-volume pipelines | gpt-5-nano / gpt-5.4-nano |
Lowest cost per token |
| Image generation | gpt-image-1.5 |
Frontier image quality |
| Video generation | sora-2 |
Text-to-video up to 20s |