AI Agent Cost Calculator (Tokens → $/month)

Inputs

Runs / day

Days / month

Input tokens / run

Output tokens / run

Model

Prices are configurable. Last updated: —.

⚙️ Advanced

Cache hit rate %

Retry rate %

Context overhead tokens

Results

—

Monthly burn

—

Input —

Output —

Tokens/month: — in + — out

Per run

—

What this calculator counts

Runs/month = runs/day × days/month

Paid input tokens/month = runs × (input + overhead) × (1 + retry%) × (1 − cache%)

Output tokens/month = runs × output × (1 + retry%)

Cost/month = input_tokens/1M × price_in + output_tokens/1M × price_out

If your provider bills cached tokens differently, update pricing.json (or set cache to 0%).

What if (sensitivity)

Runs/day ±

—

Output tokens ±

—

Retry rate ±

—

This changes monthly cost by

—

Set a budget (OnceOnly bridge)

Budget: $ / month

Budget: $ / day

Alert threshold %

Max retries

Max tool calls

Compare models

Same workload, different prices. Useful for “GPT vs Claude cost” comparisons.

Model	Input / 1M	Output / 1M	Total / month	$ / run
Updates automatically based on the inputs above.

Showing the first 6 models from pricing.json. Add or reorder models there to change this table.

How to estimate AI agent cost

A useful estimate needs only a few numbers:

calls/day × days/month
Average input tokens + context overhead (system prompt, tool schemas, instructions)
Average output tokens (agent plans + tool results + final answers)
Retry rate and cache hit rate, if you have them

Why agents get expensive

Retries: network timeouts, rate limits, tool failures → same task repeats.
Long context: tool schemas, instructions, memory, and traces add overhead every run.
Tool loops: “search → browse → refine → browse again” can balloon output tokens.
Verbose agents: long rationales / step-by-step outputs increase output tokens (the expensive side).

Budgeting: daily caps & alerts

Budgets work best when they’re enforced at runtime, not after the invoice.

Set $ / day caps to limit blast radius from a broken loop.
Set $ / month caps to keep spend predictable.
Alert at 80% so you can react before hitting hard limits.

Cost per task vs cost per user

Two useful ways to model unit economics:

Cost / task: $ per run × tasks per month.
Cost / user: (tasks per user × $ per run) + background maintenance runs.

How OnceOnly enforces budgets at runtime

OnceOnly sits between your agent and the outside world, enforcing guardrails before costs spiral:

Hard budget caps (daily, monthly) + alert thresholds.
Execution controls: max retries, max tool calls, and per-run limits.
Stop “runaway loops” early, while keeping good runs fast.

Generate a policy above, then enforce it automatically with OnceOnly →