What you are building
This tutorial builds a real-time streaming chat interface in Next.js using the
Vercel AI SDK. Instead of waiting for the model to finish a long answer and
then dumping it on screen, tokens appear as they are generated — the same
typewriter effect you see in ChatGPT and Claude. The AI SDK hides the awkward
parts (server-sent event framing, client buffering, abort wiring) behind a route
handler helper and the useChat hook, so a working streaming chat is a few dozen
lines.
How streaming works
On the server, an App Router route handler at app/api/chat/route.ts calls
the model with streamText and returns result.toDataStreamResponse(). That
response body is an ongoing stream of structured deltas rather than a single JSON
blob, so the connection stays open and tokens flow out as the model produces
them. On the client, a component calls useChat, which gives you
messages, the bound input/handleSubmit, an isLoading flag, a stop
function, and an error value. As deltas arrive, the SDK appends them to the
in-progress assistant message and React re-renders, producing the live typing
effect. Because an AbortController backs the request, calling stop cancels the
stream and halts token billing immediately.
Tips and the latency estimator below
Always render an explicit stop button — long answers are exactly where users
want to bail early. Show the partial text even on error so a mid-stream failure
does not erase what already arrived; pair it with a retry. Consider
runtime = 'edge' for lower time-to-first-token on chat. And remember that
streaming improves perceived latency, not total latency: the model still
generates at the same tokens-per-second, you just reveal it sooner. The estimator
below shows the difference — how long until the first word appears versus the full
answer — for a given model speed and answer length.