Here’s a number that should make every developer building with AI pause: GPT-4-class performance that cost $30 per million tokens in early 2024 now costs $2-3 per million. That’s a 10x price drop in under two years. And the trend isn’t slowing down.
If you’re building a SaaS product, chatbot, or any AI-powered feature in 2026, your LLM API costs can make or break your margins. Choose wrong, and you’ll burn through your budget before you hit product-market fit. Choose right, and you get frontier-level AI at prices that would have seemed impossible last year.
This guide breaks down the real costs across every major LLM provider as of April 2026. No marketing fluff. Just the numbers you need to make smart decisions.
What This Guide Covers
- Current pricing for 18+ models from OpenAI, Anthropic, Google, Meta, and Mistral
- Real cost calculations for common workloads (chatbots, document processing, code generation)
- Cost optimization strategies that can cut your bill by 40-60%
- When to pay for premium models vs. when budget options work just as well

The Complete LLM API Pricing Table (April 2026)
Prices are per 1 million tokens. Input = what you send to the model. Output = what the model generates. Context window = how much text the model can process at once.
| Model | Provider | Input | Output | Context |
|---|---|---|---|---|
| GPT-4.1 | OpenAI | $2.00 | $8.00 | 1M |
| GPT-4.1 mini | OpenAI | $0.40 | $1.60 | 1M |
| GPT-4.1 nano | OpenAI | $0.10 | $0.40 | 1M |
| GPT-4o | OpenAI | $2.50 | $10.00 | 128K |
| o3 | OpenAI | $2.00 | $8.00 | 200K |
| o3-mini | OpenAI | $1.10 | $4.40 | 200K |
| Claude Opus 4.6 | Anthropic | $5.00 | $25.00 | 1M |
| Claude Sonnet 4.5 | Anthropic | $3.00 | $15.00 | 200K |
| Claude Haiku 3.5 | Anthropic | $0.80 | $4.00 | 200K |
| Gemini 2.5 Pro | $1.25 | $10.00 | 1M | |
| Gemini 2.5 Flash | $0.15 | $0.60 | 1M | |
| Gemini 2.0 Flash | $0.10 | $0.40 | 1M | |
| Llama 4 Maverick | Meta (hosted) | $0.20 | $0.60 | 1M |
| Llama 4 Scout | Meta (hosted) | $0.10 | $0.25 | 10M |
| Mistral Large 2 | Mistral | $2.00 | $6.00 | 128K |
| Mistral Small | Mistral | $0.10 | $0.30 | 32K |
| DeepSeek V3.2 | DeepSeek | $0.26 | $0.38 | 164K |
| GLM-5 | Z AI | $0.72 | $2.30 | 80K |
Source: Provider pricing pages as of April 2026. Volume discounts and batch processing can reduce costs by 25-50%.
1. OpenAI: The Broadest Portfolio
OpenAI still runs the largest model portfolio in the market. Their April 2026 lineup spans from nano-class models at $0.10 per million input tokens up to the full o3 reasoning model.
GPT-4.1 Family: The New Default
GPT-4.1 has replaced GPT-4o as the default recommendation for most production workloads. It handles coding, analysis, and long-context tasks with a 1M token context window. The mini variant cuts cost by 80% with surprisingly small quality tradeoffs on structured tasks.
Real cost example: Processing 10,000 customer support tickets (average 500 tokens input, 200 tokens output each) costs roughly $16 with GPT-4.1, $3.20 with GPT-4.1 mini, and just $0.80 with GPT-4.1 nano.
Reasoning Models (o3, o4-mini)
OpenAI’s reasoning models think before answering. They consume more tokens internally (chain-of-thought tokens are billed as output), which means actual costs run 2-5x higher than the per-token price suggests. Use these for complex analysis, math, and multi-step reasoning—not for simple classification tasks.
2. Anthropic: Premium Quality at Premium Prices
Anthropic prices on a three-tier system: Haiku (fast and cheap), Sonnet (balanced), and Opus (maximum capability). The gap between tiers is significant—Opus costs 5x more than Sonnet.
When Claude Opus 4.6 Is Worth $30/Million
At $5.00/$25.00 per million tokens, Claude Opus 4.6 is the most expensive mainstream LLM. That price only makes sense for tasks where quality differences directly impact revenue:
- Legal document analysis
- Complex code generation and debugging
- Research synthesis across multiple sources
- Agentic workflows where errors cascade
For most applications, Sonnet 4 delivers 80% of the quality at 20% of the cost.
Claude Haiku 3.5: The Budget Sweet Spot
At $0.80/$4.00, Claude Haiku 3.5 fills the high-quality budget slot. It outperforms GPT-4o mini on many benchmarks while costing roughly double. The tradeoff is worth it when you need Anthropic’s safety characteristics or superior instruction following.
3. Google Gemini: The Aggressive Pricer
Google’s pricing strategy is aggressive. Gemini 2.5 Flash at $0.15/$0.60 per million tokens undercuts nearly everything except open-source models—and it includes a 1M token context window.
Gemini 2.5 Pro vs. The Competition
At $1.25/$10.00, Gemini 2.5 Pro offers strong reasoning and coding performance. The input pricing undercuts Claude Sonnet and GPT-4.1, but output tokens are priced at $10 per million, making generation-heavy workloads expensive. Use Gemini Pro when your prompts have high input-to-output ratios (document analysis, summarization).
Flash Models: Price-Performance Leaders
Gemini 2.5 Flash and 2.0 Flash are the price-performance leaders. At $0.10-$0.15 per million input tokens, they compete directly with open-source model hosting costs while requiring zero infrastructure management.

4. Open-Source Models: Llama 4, Mistral, DeepSeek
Open-weight models don’t have a single price. Your cost depends on how you host them.
Hosted API Pricing
Providers like Together AI, Fireworks, Groq, and AWS Bedrock host open-source models and charge per token. Typical rates for Llama 4 Maverick range from $0.15-$0.30 per million input tokens depending on the provider.
Self-Hosting Economics
Running Llama 4 Maverick (400B+ parameters) requires multiple high-end GPUs. A typical setup costs $3-8/hour on cloud GPU instances. At sustained high throughput (100+ requests/minute), self-hosting breaks even with API pricing around the 50,000 requests/day mark. Below that, hosted APIs are cheaper.
Real Cost Comparison by Workload
Raw per-token pricing tells part of the story. Actual costs depend on your workload pattern.
Chatbot / Conversational AI
Average conversation: 2,000 tokens input (system prompt + history), 500 tokens output per turn, 5 turns per session.
| Model | Cost per Session | Cost per 10K Sessions/Month |
|---|---|---|
| GPT-4.1 | $0.06 | $600 |
| GPT-4.1 mini | $0.012 | $120 |
| Claude Sonnet 4 | $0.068 | $675 |
| Gemini 2.5 Flash | $0.005 | $45 |
| Llama 4 Maverick | $0.005 | $55 |
Document Processing Pipeline
Average document: 8,000 tokens input, 1,000 tokens output (summary + extraction).
| Model | Cost per Document | Cost per 50K Docs/Month |
|---|---|---|
| GPT-4.1 | $0.024 | $1,200 |
| GPT-4.1 nano | $0.001 | $60 |
| Claude Haiku 3.5 | $0.010 | $520 |
| Gemini 2.5 Flash | $0.002 | $90 |
| Gemini 2.0 Flash | $0.001 | $60 |
Code Generation / Analysis
Average request: 3,000 tokens input (code + instructions), 2,000 tokens output.
| Model | Cost per Request | Cost per 100K Requests/Month |
|---|---|---|
| GPT-4.1 | $0.022 | $2,200 |
| Claude Sonnet 4 | $0.039 | $3,900 |
| Claude Opus 4 | $0.195 | $19,500 |
| Gemini 2.5 Pro | $0.024 | $2,375 |
| Mistral Large 2 | $0.018 | $1,800 |
5 Cost Optimization Strategies That Actually Work
The cheapest model isn’t always the best value. Here’s how to optimize spend without sacrificing quality.
1. Tiered Model Routing
Route requests to different models based on complexity. Use a cheap classifier (GPT-4.1 nano or Gemini 2.0 Flash) to assess request difficulty, then route simple requests to budget models and complex ones to premium models. This typically cuts costs 40-60% compared to using a single model for everything.
2. Prompt Caching
Both OpenAI and Anthropic offer prompt caching for system prompts and repeated context. Cached input tokens cost 50-90% less than fresh tokens. If your system prompt is 2,000+ tokens, caching pays for itself immediately. Anthropic’s prompt caching reduces cached input to $0.30/1M on Sonnet (90% discount).
3. Batch Processing
OpenAI’s Batch API charges 50% less for non-real-time workloads. If your use case can tolerate 24-hour turnaround (nightly report generation, weekly analysis runs), batch processing is the simplest cost reduction available.
4. Context Window Management
Stuffing the full context window costs money. A 100K token input to Claude Sonnet costs $0.30 per request. Trim your context to what’s actually needed. Use RAG to retrieve only relevant chunks instead of passing entire documents.
5. Output Token Optimization
Output tokens cost 2-5x more than input tokens across all providers. Request concise outputs. Use structured output formats (JSON) to avoid verbose prose. Set max_tokens limits to prevent runaway generation.
Which Model Should You Choose? A Decision Framework
| Your Priority | Best Choice | Why |
|---|---|---|
| Lowest possible cost, acceptable quality | Gemini 2.0 Flash or GPT-4.1 nano | $0.10/1M input tokens |
| Best price-performance balance | GPT-4.1 mini or Gemini 2.5 Flash | 80% quality at 20% cost of flagship |
| Production quality, reasonable cost | GPT-4.1 or Claude Sonnet 4 | Reliable for most business use cases |
| Maximum quality, cost secondary | Claude Opus 4.6 or o3 | Best reasoning and complex tasks |
| High volume, cost-sensitive | Llama 4 Maverick (self-hosted) | Breaks even at 50K+ requests/day |
| Privacy/compliance requirements | Self-hosted Llama 4 or Mistral | Full data control |
FAQ: LLM API Pricing in 2026
What is the cheapest LLM API in 2026?
Google’s Gemini 2.0 Flash and OpenAI’s GPT-4.1 nano are tied at $0.10 per million input tokens. For open-source alternatives, Llama 4 Scout via hosted providers starts around $0.10/1M input tokens.
How much does it cost to run a chatbot on GPT-4.1?
A typical chatbot session (5 turns, 2,000 token input and 500 token output per turn) costs about $0.06 on GPT-4.1. At 10,000 sessions per month, that’s roughly $600. Using GPT-4.1 mini drops the cost to $120/month with minimal quality loss.
Is Claude Opus 4 worth the higher price?
Claude Opus 4 costs 5x more than Claude Sonnet. It’s worth the premium for complex reasoning, legal document analysis, advanced code generation, and agentic workflows where errors are expensive. For standard chatbot and classification tasks, Sonnet delivers 80% of the quality at 20% of the cost.
What’s the difference between input and output token pricing?
Input tokens are what you send to the model (your prompt, system instructions, context). Output tokens are what the model generates in response. Output tokens cost 2-5x more than input tokens across all providers because generation requires more compute than processing input.
Are LLM API prices still dropping?
Yes. LLM API prices have dropped roughly 10x over the past two years. Hardware improvements (new GPU architectures), model efficiency (mixture-of-experts), and competition continue pushing prices down. Expect another 2-3x price reduction over the next 12 months for equivalent quality levels.
Key Takeaways
- Prices have dropped 10x in 2 years. What cost $30/1M tokens in 2024 now costs $2-3/1M.
- Gemini 2.5 Flash is the value leader. At $0.15/$0.60 with 1M context, it’s hard to beat for most use cases.
- Tiered routing cuts costs 40-60%. Use cheap models for simple tasks, expensive ones for complex work.
- Prompt caching is free money. 50-90% savings on repeated context with zero quality loss.
- Output tokens are the expensive part. Optimize your prompts for concise responses.
Conclusion
Choosing the right LLM API in 2026 isn’t about finding the cheapest option—it’s about matching the model to your use case. A $0.10/1M token model is perfect for classification tasks. A $25/1M token model might be essential for complex reasoning where errors cost you customers.
The good news? You have more options than ever, and the economics keep getting better. Build smart, optimize ruthlessly, and reinvest those savings into features that matter.
Ready to build AI-powered features into your SaaS? Get started with Fungies — we handle the payments and tax complexity so you can focus on shipping great products.


