What Is a Foundation Model API? Accessing AI Without Training

OpenAI, Anthropic, Google — how developers access frontier AI as a service

Ad placeholder (leaderboard)

What a foundation model API actually is

A foundation model is a large AI model — usually a transformer — that has been pre-trained on a broad sweep of data so it can be adapted to many tasks. A foundation model API is simply the way you access one of these models over the internet: you send a request containing your prompt and a few settings, the provider runs the model on their own GPUs, and you get the generated response back. You never download the weights or manage any hardware.

This is the dominant way most software talks to AI today. Instead of buying GPUs and training a model, a developer signs up, gets an API key, and treats the model like any other cloud service — a metered utility billed by usage.

How per-token pricing works

Foundation model APIs are priced by the token, not by the request. A token is a chunk of text — roughly three-quarters of an English word — and both your input and the model’s output are measured this way. Two things to know:

  • You usually pay separate rates for input and output, with output tokens costing more because they require generation.
  • Prices are quoted per million tokens, and bigger, smarter models cost more than smaller, faster ones.

This means your bill is driven by how long your prompts are, how much the model writes back, and which model you choose. Trimming unnecessary context and picking the smallest model that does the job are the two biggest cost levers.

Rate limits and reliability

Because the underlying GPUs are shared across all customers, providers impose rate limits. These are typically expressed as requests per minute and tokens per minute. New accounts start with low limits that increase as you build a track record of usage and payment. If you go over, the API returns an HTTP 429 (“too many requests”), and the correct response is to retry after a short, exponentially increasing delay rather than hammering the endpoint.

The main providers

Several providers compete in this space, and most serious applications choose based on quality for their task, price, latency, and data terms:

  • OpenAI — the GPT family; broad capability and a large ecosystem.
  • Anthropic — the Claude family; known for strong reasoning, coding, and long context.
  • Google — Gemini models, tightly integrated with Google Cloud and Vertex AI.
  • Mistral — strong open-weight and hosted models, with a European footprint.
  • Cohere — enterprise focus, especially on embeddings and retrieval.

Because all of them expose similar HTTP interfaces, switching or running several in parallel is common practice — both to get the best model for each task and to stay resilient if one provider has an outage.

Ad placeholder (rectangle)