What Is LLaMA? Meta's Open-Source Language Model Explained

The model that democratised LLM research—and sparked a thousand fine-tunes

Ad placeholder (leaderboard)

The core idea

LLaMA — Large Language Model Meta AI — is a family of large language models that Meta began releasing in 2023. Its significance was less about a single breakthrough and more about a strategy: build strong, efficient models and make their weights available to the wider community. LLaMA demonstrated that you did not need the very largest model to be competitive — a smaller model trained on far more data could match or beat bigger rivals — and by opening the weights, it handed the research and developer community a powerful, modifiable foundation.

A refined decoder-only transformer

Architecturally, LLaMA is a decoder-only transformer, the same broad family as GPT, but with several practical refinements that improve efficiency and stability:

  • RMSNorm replaces standard layer normalisation. It normalises using only the root-mean-square of activations, which is cheaper and stabilises training.
  • SwiGLU is used as the activation in the feed-forward blocks instead of the older ReLU/GELU, giving better performance for the same parameter budget.
  • Rotary Positional Embeddings (RoPE) encode token positions by rotating the query and key vectors, replacing absolute position encodings and generalising better across sequence lengths.

None of these change the fundamental attention mechanism — they are well-chosen tweaks that collectively make the model train more efficiently.

Compute-optimal training

A central lesson behind LLaMA was that many earlier models were under-trained: they had lots of parameters but had not seen enough data. LLaMA leaned into the idea that training a smaller model on more tokens can be a better use of compute, especially because a smaller model is then cheaper to run at inference time. This made LLaMA-class models practical to deploy on modest hardware — a key reason the open-weight community embraced them.

The ecosystem it unleashed

Releasing capable weights changed the landscape. Developers fine-tuned LLaMA into instruction-following chat models, applied quantisation to shrink it for laptops and phones, and built local-inference tools so the models could run without any cloud API. A vast ecosystem of derivatives, datasets, and training recipes grew up around it, accelerating open research into alignment, efficiency, and specialised domains. Much of today’s “run an LLM on your own machine” movement traces directly back to LLaMA.

Licensing and legacy

LLaMA is best described as open-weight rather than strictly open source: the weights are available but under a custom community licence with conditions, not a conventional software licence. Even so, it was open enough to fine-tune, self-host, and build products on. Successive versions improved quality and loosened terms, cementing LLaMA’s role as the reference point for open models — the counterweight to closed APIs like GPT and Claude, and the foundation on which a large share of independent AI development now stands.

Ad placeholder (rectangle)