What is faithfulness in RAGAS?

Faithfulness is the fraction of claims in the answer that can be inferred from the retrieved context. It is supported claims divided by total claims, scored from 0 to 1, and measures how grounded the answer is.

How is context precision calculated?

Context precision rewards ranking relevant chunks near the top. It averages precision-at-k over the positions of relevant chunks, so a relevant chunk at rank 1 contributes more than the same chunk at rank 5.

What does context recall measure?

Context recall is the fraction of ground-truth facts that are actually covered by the retrieved context. Low recall means the retriever missed information needed to answer correctly.

Is this the same as running the RAGAS library?

No. RAGAS normally uses an LLM to extract claims and judge relevance automatically. This tool implements the same formulas but takes your manual judgements as input, so you can sanity-check or learn the metrics without a pipeline.

RAGAS Score Calculator

Compute RAGAS metrics by hand

RAGAS is the de facto scoring framework for retrieval-augmented generation, but its automated pipeline can feel like a black box. This calculator implements the exact formulas behind three core RAGAS metrics — faithfulness, context precision, and context recall — and lets you feed in your own judgements. It’s ideal for learning the metrics, spot-checking a pipeline, or scoring a small eval set without standing up the full library.

How the RAGAS formulas work

Faithfulness = supported claims ÷ total claims. You decompose the generated answer into atomic claims and count how many are entailed by the retrieved context. Context precision rewards good ranking: it is the mean of precision-at-k computed at every rank where a relevant chunk appears, so relevant chunks near the top score higher. Context recall = ground-truth facts covered by the context ÷ total ground-truth facts, measuring whether the retriever actually fetched the information needed. Each metric lands between 0 and 1.

Tips for using the scores

A high faithfulness but low context recall means the model answered honestly from incomplete context — fix retrieval, not generation.
Low context precision with high recall means you’re retrieving the right facts but burying them under noise — improve ranking or re-ranking.
Score several examples and average; a single query’s metrics are noisy.
Pair this with an LLM judge (see the consistency and relevance tools) when you want the claim extraction and relevance labelling automated.