What one knowledge base answer really costs
A RAG (retrieval-augmented generation) query feels like one action, but you pay for three stages: embedding the user’s question, the generation model reading the question plus the retrieved context, and the model writing the answer. This calculator prices each stage so you can see exactly where your spend goes and project it across daily volume.
How the per-query cost breaks down
embed_cost = query_tokens/1e6 × embed_price
gen_in_cost = (query_tokens + context_tokens)/1e6 × gen_in_price
gen_out_cost = answer_tokens/1e6 × gen_out_price
per_query = embed_cost + gen_in_cost + gen_out_cost
The embedding step is almost free — embedding models are cheap and the query is short. The dominant cost is the retrieved context you inject as generation input, because you re-pay for it on every query, and the answer output, which is priced at the higher output rate.
Tips to control RAG spend
Lower your top-k so you inject fewer chunks, rerank to keep only the most relevant ones, and cache answers to repeated questions. For high-volume, low-difficulty lookups, route to a cheap generation model and only escalate hard questions to a frontier model.