RAGAS Score Calculator

Calculate RAGAS faithfulness, answer relevance, and context precision scores.

Ad placeholder (leaderboard)

Compute RAGAS metrics by hand

RAGAS is the de facto scoring framework for retrieval-augmented generation, but its automated pipeline can feel like a black box. This calculator implements the exact formulas behind three core RAGAS metrics — faithfulness, context precision, and context recall — and lets you feed in your own judgements. It’s ideal for learning the metrics, spot-checking a pipeline, or scoring a small eval set without standing up the full library.

How the RAGAS formulas work

Faithfulness = supported claims ÷ total claims. You decompose the generated answer into atomic claims and count how many are entailed by the retrieved context. Context precision rewards good ranking: it is the mean of precision-at-k computed at every rank where a relevant chunk appears, so relevant chunks near the top score higher. Context recall = ground-truth facts covered by the context ÷ total ground-truth facts, measuring whether the retriever actually fetched the information needed. Each metric lands between 0 and 1.

Tips for using the scores

  • A high faithfulness but low context recall means the model answered honestly from incomplete context — fix retrieval, not generation.
  • Low context precision with high recall means you’re retrieving the right facts but burying them under noise — improve ranking or re-ranking.
  • Score several examples and average; a single query’s metrics are noisy.
  • Pair this with an LLM judge (see the consistency and relevance tools) when you want the claim extraction and relevance labelling automated.
Ad placeholder (rectangle)