What named entity recognition does
Named entity recognition (NER) is one of the oldest and most useful tasks in
natural language processing. Its job is narrow but powerful: given a piece of
text, find the spans that refer to named things in the world and label each
one with a type. Typical types include PERSON, ORGANIZATION, LOCATION,
DATE, MONEY, and PERCENT, though domain-specific schemes add types like
GENE, DRUG, or CASE_NUMBER. The output transforms a sentence such as
“Apple opened a store in Paris in March” into a structured set of facts —
Apple (organisation), Paris (location), March (date) — that software can
sort, filter, link, and reason about. That conversion from prose to structure is
why NER sits underneath search engines, document pipelines, and knowledge graphs.
NER as a sequence-labelling task
Under the hood, classic NER is framed as sequence labelling: the text is
split into tokens, and the model assigns a label to every token in order. The
dominant labelling scheme is BIO tagging (also called IOB). B- marks the
first token of an entity, I- marks subsequent tokens inside the same entity,
and O marks tokens that belong to no entity. For “New York City is large”, the
tags would be B-LOC I-LOC I-LOC O O. This per-token scheme elegantly handles
multi-word entities and adjacent entities of the same type, which a naive
“highlight the entity words” approach cannot. Variants like BILOU add explicit
tags for the last token and single-token entities, sometimes improving accuracy.
How NER models evolved
Early NER systems were rule-based — hand-written patterns, gazetteers (lists of known names), and regular expressions. They were precise but brittle and expensive to maintain. Statistical models replaced them: Hidden Markov Models and especially Conditional Random Fields (CRFs) learned label sequences from annotated corpora and dominated for years because they model dependencies between adjacent labels. The deep-learning era brought BiLSTM-CRF architectures, which read the sentence in both directions before labelling, capturing context far better than fixed feature templates. The current state of the art is transformer-based models such as fine-tuned BERT, where contextual embeddings let the same word be tagged differently depending on its surroundings — “Washington” the person versus “Washington” the place.
NER in the LLM era
Large language models changed the workflow. Instead of training a dedicated tagger, you can simply prompt a model: “Extract all people, organisations, and locations from this text and return them as JSON.” For common entity types and reasonable volumes, modern LLMs do this well with zero or few examples, and they adapt instantly to new or unusual entity types just by describing them. The cost is real, though: per-call latency and price are higher than a small purpose-built model, and free-form generation can produce inconsistent formats or hallucinate entities that are not in the text. In practice, teams choose based on scale and stability — a fine-tuned encoder model for high-volume, fixed-schema extraction, and an LLM when flexibility, fast iteration, or rare entity types matter more than throughput.
Why NER still matters
It is tempting to think general-purpose LLMs make a dedicated task like NER obsolete, but the opposite is often true: NER is the structuring layer that makes text usable by everything downstream. Redacting personal data for privacy compliance, linking mentions to a knowledge base, populating a database from invoices, or building the entity index a retrieval system searches over — all of these depend on reliably knowing which words name what. Whether that labelling comes from a CRF, a BERT model, or a prompt to an LLM, the underlying task is the same, and understanding it helps you pick the right tool for the job.