What does the Assistants API give me over the Chat Completions API?

It manages conversation state (threads), automatically truncates history to fit context, and provides hosted tools like code interpreter and file search that run on OpenAI's side. With Chat Completions you manage all of that yourself. The trade-off is less control and a higher-level abstraction.

What is a thread versus a run?

A thread is the persistent conversation — a list of messages that grows over time. A run is a single execution of an assistant against a thread; it processes the latest messages, may call tools, and appends the assistant's reply. One thread can have many runs over its lifetime.

How does file search work?

You attach files to a vector store and give the assistant the file_search tool. When relevant, the assistant retrieves passages from those files and grounds its answer in them — managed RAG without you running embeddings or a vector database yourself. You pay for storage and retrieval.

What is the code interpreter tool?

It is a sandboxed Python environment the assistant can write and run code in — for calculations, data analysis, chart generation, or processing uploaded files. The model decides when to use it, executes the code on OpenAI's side, and incorporates the results into its answer.

Do I need to poll the run, or can I stream?

You can do either. Streaming gives you events as the run progresses (text deltas, tool calls) for a responsive UI. If you do not stream, you create the run and poll its status until it reaches completed, then read the new messages from the thread.

How to Use the OpenAI Assistants API

What the Assistants API is

The Assistants API is OpenAI’s higher-level, stateful interface for building agents. Instead of resending the entire conversation on every call and wiring up retrieval yourself, you create an assistant (a model plus instructions plus tools), keep the conversation in a thread, and execute it with a run. OpenAI manages the conversation state, context truncation, and hosted tools like code interpreter and file search. It trades some control for a lot less plumbing.

How the pieces fit together

There are four objects. An assistant is the reusable configuration: which model, what system instructions, and which tools it may use. A thread is a single conversation — an ordered list of messages that persists across calls. A message is one entry in a thread, from the user or the assistant. A run executes an assistant against a thread: it reads the latest messages, optionally calls tools, and appends the assistant’s reply. You either poll the run until its status is completed or stream events for a live UI.

The two headline tools are managed for you. File search turns attached files into a vector store and retrieves relevant passages automatically — managed RAG with no embedding code on your side. Code interpreter gives the assistant a sandboxed Python environment to run calculations, analyse data, or process uploads. The generator below produces the complete create-assistant, create- thread, add-message, and run flow in Python from your choices.

Tips and pitfalls

Reuse one assistant across many threads — the assistant is configuration, the thread is the conversation, so you do not recreate the assistant per user. Keep a thread per user or per conversation rather than mixing contexts. Prefer streaming for any interactive UI so users see progress. Remember that hosted tools cost extra (file storage, code execution) on top of token usage, so clean up vector stores you no longer need. And if you need full control over context, caching, or non-OpenAI providers, the lower-level Chat Completions API may suit you better than this managed abstraction.