Question 1

What is a foundation model API?

Accepted Answer

A foundation model API is a web service that lets your application send text (or images, audio, etc.) to a large pre-trained AI model and receive a generated response. You do not download or train the model — the provider hosts it on their own hardware, and you call it over HTTPS with an API key. This turns frontier AI into a metered, pay-as-you-go utility.

Question 2

How does per-token pricing work?

Accepted Answer

Providers charge per token, where a token is roughly three-quarters of a word. You typically pay one rate for input (prompt) tokens and a higher rate for output (generated) tokens, quoted per million tokens. Larger, more capable models cost more per token than smaller ones, so cost scales with both the size of your prompts and which model you pick.

Question 3

What are rate limits and why do they exist?

Accepted Answer

Rate limits cap how many requests or tokens you can send in a given time window — for example, requests per minute and tokens per minute. They protect the shared infrastructure from overload and abuse, and they usually rise as your account builds spend history. If you exceed them you receive an HTTP 429 error and should retry with backoff.

Question 4

Which foundation model API should I use?

Accepted Answer

There is no single best choice. OpenAI and Anthropic lead on general reasoning and coding, Google's Gemini integrates tightly with Google Cloud, Mistral offers strong open-weight and hosted European options, and Cohere focuses on enterprise retrieval and embeddings. Most teams pick based on quality on their specific task, price, latency, and data-handling terms — and many keep more than one provider for redundancy.

What Is a Foundation Model API? Accessing AI Without Training

What a foundation model API actually is

How per-token pricing works

Rate limits and reliability

The main providers