What a prompt engineer actually does
A prompt engineer designs, tests, and maintains the instructions and context that make large language models behave reliably in production. The job is far more than typing clever prompts: it blends a writer’s precision with an engineer’s discipline. You define the task, supply the right context, constrain the output format, measure quality against a test set, and iterate until the system is dependable enough to ship. In 2025 the role increasingly overlaps with AI engineering — building retrieval pipelines, wiring tools into agents, and running evaluations — so the most employable people treat prompting as one layer of a larger stack.
The roadmap, stage by stage
Stage 1 — Fundamentals (weeks 1–3). Learn how tokenization, context windows, temperature, and system vs. user messages work. Understand why models hallucinate and how cost scales with tokens. Practise the core techniques: zero-shot, few-shot, chain-of-thought, role and format instructions, and explicit output schemas.
Stage 2 — Structured prompting (weeks 3–6). Move from one-off prompts to reusable templates. Learn to force structured output (JSON, tables), to decompose a task into steps, and to use delimiters and examples to reduce ambiguity. Start keeping a small library of prompts with notes on what failed and why.
Stage 3 — Evaluation (weeks 6–9). This is the stage that separates hobbyists from professionals. Build a labelled test set for a task and score outputs — by exact match, rubric, or an LLM-as-judge. Track regressions when you change a prompt. Being able to say “this version is 12% more accurate on 80 test cases” is what employers pay for.
Stage 4 — Specialise (weeks 9–16). Pick a track: RAG (retrieval over your own documents), agents and tool use (function calling, multi-step workflows), or code generation. Each has its own prompting patterns and failure modes. Pair your chosen track with light Python and one provider SDK.
Building a portfolio and landing the role
Employers hire on evidence, not certificates. Ship three to five public projects that each solve a concrete problem and include an evaluation: a customer-support classifier with measured accuracy, a RAG assistant over a real document set, or an agent that completes a multi-step task. Write up each one with the prompts, the test set, and the before/after metrics — this doubles as a portfolio and a writing sample.
When applying, target both dedicated prompt-engineering roles and the much larger pool of “AI engineer”, “applied AI”, and product roles that list prompting as a requirement. Tailor each application around a metric you improved. To estimate the running cost of any system you demo, use the LLM API Cost Calculator, and brush up on the underlying unit of billing with What Is a Token in AI?. Keep iterating on your portfolio after you’re hired — the field moves quickly, and a habit of measuring and improving prompts is the most transferable skill you can own.