Budgets¶
tako-runtime ships an in-memory BudgetTracker that's consulted both
before every provider call (using LlmProvider::estimate_cost_usd)
and after (reconciling against the response's actual usage).
budget = tako.Budget(
max_usd_per_request=5.0,
max_usd_per_day=500.0,
max_usd_per_tenant_per_day={"acme": 100.0, "beta": 1000.0},
)
client = tako.Client(providers=[...], budget=budget)
A BudgetExhausted error propagates when:
- The estimated cost would push a single request over
max_usd_per_request. - The reconciled cost would push the day's spend over
max_usd_per_day. - The reconciled cost would push a tenant over their per-tenant cap.
BudgetBackend¶
BudgetTracker delegates to a BudgetBackend:
#[async_trait]
pub trait BudgetBackend: Send + Sync + 'static + Debug {
async fn current_usage(&self, tenant_id: &str) -> Result<TenantUsage, TakoError>;
async fn record(&self, tenant_id: &str, usd: f64, tokens: u64) -> Result<(), TakoError>;
}
Two backends ship out of the box:
InMemoryBudgetBackend— single-process deployments (default).RedisBudgetBackend— multi-replica deployments. Backed by a monotonic-write Lua script so a slow replica cannot clobber a higher water-mark. Behind thetako-runtime/rediscargo feature; mirror in Python istako.Client(budget_backend=tako.RedisBudgetBackend(...)).
BudgetTracker is wired through tako.SingleAgent, tako.Conductor,
tako.Trinity, and tako.guards.LlmJudge — every provider call counts
against the same caps, including the judge in SelfCaller.
Cost estimation¶
Providers expose estimate_cost_usd(messages: &[Message]). Most provider
crates use the per-million-token rates from their Capabilities
struct; you can override these at builder time:
provider = tako.providers.OpenAI(
model="gpt-5",
api_key="...",
capabilities=tako.Capabilities(
max_context_tokens=128_000,
usd_per_input_mtok=2.50,
usd_per_output_mtok=10.00,
...
),
)
Estimation is conservative: it assumes the worst-case output length up
to max_tokens (or 1024 if unset).