Ask HN: What's your biggest LLM cost multiplier?
"Tokens per request" has been a misleading cost model for us in production. The real drivers seem to be multipliers: retries/429s, tool fanout, P95 context growth, and safety passes.
What’s been the biggest cost multiplier in your prod LLM systems, and what policies worked (caps, degraded mode, fallback, hard fail)?
5 comments
Loading...
Almost there! We're setting everything up for you.