Getting Started with the Cohere API

Enterprise NLP — Cohere Command R+ and Embed from scratch

Ad placeholder (leaderboard)

Why Cohere

Cohere is an enterprise-focused NLP provider whose models are built around retrieval and search rather than open-ended chat. Its three workhorse endpoints map cleanly onto the parts of a production RAG system: Embed turns text into vectors, Rerank scores how relevant documents are to a query, and Chat (Command R and R+) generates grounded answers with citations and tool calls. If you are building search, knowledge assistants, or document processing at scale, Cohere’s separation of these concerns is a genuine advantage over a single chat-only API.

Use the builder above to assemble a real request for any of the three endpoints, then copy the curl and run it with your own key.

How the three endpoints fit together

A typical Cohere RAG flow looks like this. At ingestion time you call Embed with input_type: "search_document" to vectorise every chunk of your corpus and store the vectors. At query time you embed the user’s question with input_type: "search_query", do an approximate-nearest-neighbour lookup to get a noisy top-k, then pass that candidate set to Rerank to reorder by true relevance. Finally you hand the best few chunks to Chat as documents so Command R+ produces an answer grounded in your data — with inline citations it generates automatically when you pass a documents field.

The key detail beginners miss is that v3 embeddings are asymmetric: documents and queries must be embedded with different input_type values or retrieval quality drops noticeably. Get that right and Cohere’s retrieval stack is hard to beat.

Tips and gotchas

  • Authenticate with a Bearer header. Cohere uses Authorization: Bearer $CO_API_KEY, not a custom key header.
  • Start with Rerank for cheap wins. If your RAG answers are mediocre, adding a Rerank pass over your existing retrieval is the highest-leverage change you can make and costs far less than the chat call.
  • Pin the model string. Always specify an exact model (e.g. command-r-plus) rather than an alias, so a model update never silently changes your behaviour.
  • Move to the SDK once it works. Prototype with the curl above, then switch to the official cohere Python or TypeScript SDK for retries, streaming, and typed responses in production.
Ad placeholder (rectangle)