Skip to content

Latest commit

 

History

History
162 lines (110 loc) · 6.06 KB

File metadata and controls

162 lines (110 loc) · 6.06 KB

Smart AIPI — Model List and Pricing

Smart AIPI provides access to the full OpenAI model family at 75% less than OpenAI's direct pricing (you pay 25% of list rates). All models are accessed through standard OpenAI-compatible endpoints with no code changes required.

  • API base URL: https://api.smartaipi.com/v1
  • Anthropic-compatible base: https://api.smartaipi.com
  • WebSocket: wss://api.smartaipi.com/v1/realtime

Text Generation Models

GPT-5.4 Series (Latest)

Model Input / 1M Cached Input / 1M Output / 1M Notes
gpt-5.4 $0.625 $0.0625 $3.75 State of the art — SWE-Bench Pro 57.7%, FrontierMath 47.6%
gpt-5.4-mini $0.1875 $0.01875 $1.125 60% Terminal-Bench 2.0 — best price-performance small model
gpt-5.4-nano $0.05 $0.005 $0.3125 Ultra-cheap routing, classification, summarization

GPT-5.4 is OpenAI's most capable model. Benchmark highlights vs previous generation:

  • OSWorld (computer use): 75.0% (up from 74.0%)
  • SWE-Bench Pro (software engineering): 57.7% (up from 56.8%)
  • GPQA Diamond (expert science reasoning): 92.8%
  • FrontierMath (advanced math): 47.6%
  • GDPval (knowledge work): 83.0% (up from 70.9%)
  • Toolathlon (agentic tool use): 54.6%

GPT-5.4 also supports a 1M context window (up from 272K; tokens beyond 272K are billed at 2x). Priority processing is available via service_tier: "priority" at 1.5x standard rates.

GPT-5.4 Mini scores 60% on Terminal-Bench 2.0 — outperforming Gemini 3 Flash (47.7%) while costing less. Ideal for coding subagents and parallelized development workflows.

GPT-5.4 Nano supports Chat Completions only (not Responses API). Use for lightweight tasks: classification, summarization, routing, content moderation.


GPT-5.3 Series

Model Input / 1M Cached Input / 1M Output / 1M Notes
gpt-5.3-codex $0.4375 $0.045 $3.50 Previous frontier — excellent for coding agents
gpt-5.3-codex-mini $0.0375 $0.005 $0.15 Fast, cheap coding subagent
gpt-5.3-mini $0.0375 $0.005 $0.15 Fast small model

GPT-5.3 Codex is OpenAI's previous frontier model and remains a top choice for coding agents (Codex CLI, OpenCode, Claude Code via Anthropic endpoint). Strong on SWE-Bench (56.8%) and all agentic benchmarks.


GPT-5.2 Series

Model Input / 1M Cached Input / 1M Output / 1M Notes
gpt-5.2-codex $0.4375 $0.045 $3.50 Solid coding frontier
gpt-5.2-codex-mini $0.0375 $0.005 $0.15 Fast coding model
gpt-5.2 $0.4375 $0.045 $3.50 General purpose
gpt-5.2-mini $0.0375 $0.005 $0.15 Lightweight tasks

GPT-5.1 Series

Model Input / 1M Cached Input / 1M Output / 1M Notes
gpt-5.1 $0.375 $0.0375 $3.00 Balanced capability and cost
gpt-5.1-codex-mini $0.0375 $0.005 $0.15
gpt-5.1-mini $0.0375 $0.005 $0.15

GPT-5 Series

Model Input / 1M Cached Input / 1M Output / 1M Notes
gpt-5 $0.3125 $0.03125 $2.50 Solid general-purpose model
gpt-5-mini $0.0625 $0.0075 $0.50 Low-cost, fast
gpt-5-nano $0.0125 $0.0025 $0.10 Ultra-cheap for high-throughput
gpt-5-codex-mini $0.0375 $0.005 $0.15 Coding-optimized small model

GPT-4.1 Series

Model Input / 1M Cached Input / 1M Output / 1M Notes
gpt-4.1 $0.50 $0.125 $2.00 Reliable, well-tested
gpt-4.1-mini $0.10 $0.025 $0.40 Efficient mid-tier
gpt-4.1-nano $0.025 $0.0075 $0.10 Near-free for simple tasks

GPT-4o Series

Model Input / 1M Cached Input / 1M Output / 1M Notes
gpt-4o $0.625 $0.3125 $2.50 Multimodal, broad support
gpt-4o-mini $0.0375 $0.02 $0.15 Low-cost multimodal

Image Generation Models

Image generation is billed per image based on resolution and quality, not per token.

Model Notes
gpt-image-1.5 Frontier image generation and editing
gpt-image-latest Always resolves to the latest image model
gpt-image-1-mini Faster, lower-cost image generation

Endpoint: POST /v1/images/generations


Video Generation Models

Model Notes
sora-2 Up to 20 seconds of video from text prompt or image
sora-2-pro Higher quality video, longer duration

Endpoint: POST /v1/videos


Anthropic Claude Model Aliases

When using the Anthropic-compatible endpoint (/v1/messages), Claude model names are automatically routed to the closest GPT equivalent:

You Request Routes To Output / 1M
claude-opus-4-6 gpt-5.3-codex $3.50
claude-sonnet-4-5 gpt-5.3-codex $3.50
claude-haiku-4-5 gpt-5.3-codex-mini $0.15
Any claude-* model gpt-5.3-codex $3.50

Prompt Caching

Prompt caching is automatic — no configuration needed. Requests with long, repeated system prompts (common in agentic workflows) benefit automatically. Cache hit rates of 30–50% are typical for agent loops with consistent system prompts.


Model Selection Guide

Use Case Recommended Model Why
Best quality coding gpt-5.4 State-of-the-art SWE-Bench, complex multi-file tasks
Coding subagents gpt-5.4-mini 60% Terminal-Bench at 70% less cost than gpt-5.4
Routing, classification, summaries gpt-5.4-nano Near-free at $0.05/1M input
Coding agents (proven) gpt-5.3-codex Excellent track record, slightly lower cost than 5.4
Fast lightweight tasks gpt-5-mini Balanced between speed and capability
High-volume pipelines gpt-5-nano / gpt-5.4-nano Lowest cost per token
Image generation gpt-image-1.5 Frontier image quality
Video generation sora-2 Text-to-video up to 20s

Back to README | Quickstart guide | Pricing details

Get your free API key →