Inputs
Runs / day
Days / month
Input tokens / run
Output tokens / run
Model
Prices are configurable. Last updated: —.
Failed to load pricing config. Please refresh, or check pricing.json.
⚙️ Advanced
Cache hit rate %
Retry rate %
Context overhead tokens
Results
—
Monthly burn
—
Input —
Output —
Tokens/month: — in + — out
Per run
—
—
—
What this calculator counts
Runs/month = runs/day × days/month
Paid input tokens/month = runs × (input + overhead) × (1 + retry%) × (1 − cache%)
Output tokens/month = runs × output × (1 + retry%)
Cost/month = input_tokens/1M × price_in + output_tokens/1M × price_out
If your provider bills cached tokens differently, update pricing.json (or set cache to 0%).
What if (sensitivity)
Runs/day ±
—
Output tokens ±
—
Retry rate ±
—
This changes monthly cost by
—
Set a budget (OnceOnly bridge)
Budget: $ / month
Budget: $ / day
Alert threshold %
Max retries
Max tool calls
Policy snippet (copy/paste)
Copied.
Compare models
Same workload, different prices. Useful for “GPT vs Claude cost” comparisons.
| Model | Input / 1M | Output / 1M | Total / month | $ / run |
|---|---|---|---|---|
| Updates automatically based on the inputs above. | ||||
Showing the first 6 models from pricing.json. Add or reorder models there to change this table.
How to estimate AI agent cost
A useful estimate needs only a few numbers:
- calls/day × days/month
- Average input tokens + context overhead (system prompt, tool schemas, instructions)
- Average output tokens (agent plans + tool results + final answers)
- Retry rate and cache hit rate, if you have them
Why agents get expensive
- Retries: network timeouts, rate limits, tool failures → same task repeats.
- Long context: tool schemas, instructions, memory, and traces add overhead every run.
- Tool loops: “search → browse → refine → browse again” can balloon output tokens.
- Verbose agents: long rationales / step-by-step outputs increase output tokens (the expensive side).
Budgeting: daily caps & alerts
Budgets work best when they’re enforced at runtime, not after the invoice.
- Set $ / day caps to limit blast radius from a broken loop.
- Set $ / month caps to keep spend predictable.
- Alert at 80% so you can react before hitting hard limits.
Cost per task vs cost per user
Two useful ways to model unit economics:
- Cost / task: $ per run × tasks per month.
- Cost / user: (tasks per user × $ per run) + background maintenance runs.
How OnceOnly enforces budgets at runtime
OnceOnly sits between your agent and the outside world, enforcing guardrails before costs spiral:
- Hard budget caps (daily, monthly) + alert thresholds.
- Execution controls: max retries, max tool calls, and per-run limits.
- Stop “runaway loops” early, while keeping good runs fast.
Generate a policy above, then enforce it automatically with OnceOnly →