The shape of a modern recommender
A recommendation engine answers one question for each user: of everything we have, what should we show next? The strongest modern systems are hybrids that combine three techniques, each doing what it is best at. Collaborative filtering captures behaviour — what similar users actually did. Embedding-based retrieval captures content similarity and handles new items and users. And an LLM adds the explanation layer that turns a ranked list into something users trust. Building all three and blending their scores is the difference between a demo and a system that lifts conversion.
How the pieces work together
Collaborative filtering for the core signal. From your interaction log (views, purchases, ratings) you compute item-item or user-user similarity, or factor the interaction matrix into latent features. The output is a behavioural score: “users like this one tend to engage with those.” This captures patterns no content analysis can — and it is fast at query time once precomputed.
Embeddings for content and cold-start. Encode each item’s text (title, description, attributes) into a vector using an embedding model, and store them in a vector database. For a new user or a new item with no interaction history, you retrieve nearest neighbours by content similarity. Embed a user’s recent interests into the same space and you get content-based recommendations that work from day one.
LLM-generated explanations. For the final ranked shortlist, pass each item plus the user’s context to an LLM and ask for a short, honest “why this is recommended for you”. Explanations measurably increase trust and click-through — but generate them only for the few items you actually show, because they cost time and money per call.
Blending and ranking. Combine the collaborative score, the embedding-similarity score, and business rules (recency, availability, margin) into a single ranking. A weighted sum is a fine start; a learning-to-rank model is the upgrade once you have enough data.
Building and evaluating it
Start simple: ship item-similarity collaborative filtering plus a popularity fallback, then add embeddings for cold-start, then LLM explanations on the shortlist. Handle cold-start by leaning on content and popularity signals and shifting weight to collaborative signals as interactions accumulate. Evaluate offline with precision@k, recall@k, and NDCG on held-out interactions, but trust online A/B tests on click-through and conversion above all — offline metrics reward predicting the past, not improving the future. Iterate on the blend weights against real engagement, and keep the explanation layer honest: never let the model justify a recommendation with a reason that is not actually true.