Why run a model locally
Running an LLM locally with Ollama gives you three things a hosted API can’t: privacy (nothing leaves your machine), zero marginal cost (no per-token billing), and offline availability. The trade-off is that you run smaller, open-weight models than the largest hosted ones, and speed depends on your hardware. For coding help, drafting, summarisation, and RAG over private documents, a 7B–9B model on a laptop is genuinely useful. Use the builder above to match a model to your RAM and generate the exact commands.
How it works
Ollama installs a small background service that manages model weights and
exposes a local HTTP API on port 11434. You interact with it three ways:
ollama run <model>for an interactive terminal chat — the fastest way to try a model.- The native REST API at
localhost:11434/api/chat, which returns Ollama’s own JSON schema and supports streaming token-by-token. - The OpenAI-compatible API at
localhost:11434/v1/chat/completions, which lets you reuse code written for OpenAI by simply changing the base URL — the API key can be any non-empty string.
The first ollama pull downloads quantised weights (compressed to run in far
less memory than full precision) once; subsequent runs are instant. Models are
stored locally and shared across all three access methods.
Tips and notes
- Quantisation is your friend. Default Ollama models are 4-bit quantised, which is why a 7B model fits in ~5 GB on disk and runs in ~8–16 GB of RAM with little quality loss for most tasks.
- Use the OpenAI shim to migrate gradually. Develop against the local endpoint, then flip the base URL back to a hosted provider for production if you need a larger model — no other code changes.
- Keep one model loaded. Ollama unloads idle models after a few minutes;
set
OLLAMA_KEEP_ALIVEif you want a server to keep a model hot for low latency. - Watch the RAM check. If the tool above says your model won’t fit, it really won’t run well — swapping to disk makes generation painfully slow. Drop to the next smaller model instead.