Moving past the single-prompt comfort zone
Once you can call an LLM and parse its output, the next leap is architecture: systems that retrieve from your own data, loop through multiple steps, measure their own quality, and stay affordable under load. That is where intermediate AI development lives, and where the real engineering problems appear — non-determinism, cost control, evaluation, and reliability. The projects below are deliberately chosen to drill those skills. Each one teaches a distinct, transferable pattern you will reuse across every serious AI product you build afterwards.
How it works
Use the picker to filter the ten projects by the skill you most want to develop — retrieval, agents, evaluation, or fine-tuning. Each entry gives you the core idea, the architecture you will need, and the hardest part to get right, which is almost always where the learning concentrates. The recommended sequence is to build the evaluation harness early, because it turns every other project from “seems to work” into “measurably works,” and to add guardrails before capability on anything agentic.
Example: how the patterns compound
These projects are not isolated; the patterns stack. A multi-source RAG system teaches you chunking, embeddings, retrieval, and grounded prompting. Layer an eval harness on top and you can now compare retrieval strategies objectively instead of by vibe. Add an agent loop and you have a system that retrieves, reasons over results, and acts — which is the skeleton of most production AI products. Fine-tuning, when your evals finally justify it, then squeezes the last quality and latency gains from a task you have already proven and measured.
That is the meta-lesson for intermediate developers: applied AI is ordinary engineering wearing a probabilistic coat. The teams that win treat each capability as a small, testable change measured against a baseline, ship it behind a flag, and only escalate complexity when measurement — not excitement — demands it. Build two or three of these properly, with evals and cost limits, and you will be ahead of most people calling themselves AI engineers.