Question 1

What is RAG (Retrieval-Augmented Generation)?

Accepted Answer

RAG is a pattern where, before answering, the system retrieves relevant documents from an external knowledge source and inserts them into the prompt. The LLM then generates its answer grounded in that retrieved text rather than relying only on its training data.

Question 2

Why use RAG instead of fine-tuning?

Accepted Answer

RAG keeps knowledge in an external store you can update instantly, with no retraining, and lets the model cite its sources. Fine-tuning bakes knowledge into the weights, which is costly to update and harder to attribute. RAG is preferred when facts change often or must be verifiable.

Question 3

What are the two main components of a RAG system?

Accepted Answer

A retriever and a generator. The retriever — typically an embedding model plus a vector database — finds the most relevant chunks for a query. The generator is the LLM, which reads those chunks alongside the question and produces the final answer.

Question 4

What is the difference between naive and advanced RAG?

Accepted Answer

Naive RAG embeds, retrieves the top-k chunks, and stuffs them into the prompt. Advanced RAG adds steps like query rewriting, hybrid keyword-plus-vector search, reranking of retrieved chunks, and post-retrieval filtering to improve relevance and reduce hallucination.

RAG — Retrieval-Augmented Generation (AI Glossary)

Definition

Why RAG exists

How it works: retriever and generator

Naive RAG vs advanced RAG

Why it matters