What Is Named Entity Recognition (NER)?

How AI finds people, places, dates, and organisations in raw text

Ad placeholder (leaderboard)

What named entity recognition does

Named entity recognition (NER) is one of the oldest and most useful tasks in natural language processing. Its job is narrow but powerful: given a piece of text, find the spans that refer to named things in the world and label each one with a type. Typical types include PERSON, ORGANIZATION, LOCATION, DATE, MONEY, and PERCENT, though domain-specific schemes add types like GENE, DRUG, or CASE_NUMBER. The output transforms a sentence such as “Apple opened a store in Paris in March” into a structured set of facts — Apple (organisation), Paris (location), March (date) — that software can sort, filter, link, and reason about. That conversion from prose to structure is why NER sits underneath search engines, document pipelines, and knowledge graphs.

NER as a sequence-labelling task

Under the hood, classic NER is framed as sequence labelling: the text is split into tokens, and the model assigns a label to every token in order. The dominant labelling scheme is BIO tagging (also called IOB). B- marks the first token of an entity, I- marks subsequent tokens inside the same entity, and O marks tokens that belong to no entity. For “New York City is large”, the tags would be B-LOC I-LOC I-LOC O O. This per-token scheme elegantly handles multi-word entities and adjacent entities of the same type, which a naive “highlight the entity words” approach cannot. Variants like BILOU add explicit tags for the last token and single-token entities, sometimes improving accuracy.

How NER models evolved

Early NER systems were rule-based — hand-written patterns, gazetteers (lists of known names), and regular expressions. They were precise but brittle and expensive to maintain. Statistical models replaced them: Hidden Markov Models and especially Conditional Random Fields (CRFs) learned label sequences from annotated corpora and dominated for years because they model dependencies between adjacent labels. The deep-learning era brought BiLSTM-CRF architectures, which read the sentence in both directions before labelling, capturing context far better than fixed feature templates. The current state of the art is transformer-based models such as fine-tuned BERT, where contextual embeddings let the same word be tagged differently depending on its surroundings — “Washington” the person versus “Washington” the place.

NER in the LLM era

Large language models changed the workflow. Instead of training a dedicated tagger, you can simply prompt a model: “Extract all people, organisations, and locations from this text and return them as JSON.” For common entity types and reasonable volumes, modern LLMs do this well with zero or few examples, and they adapt instantly to new or unusual entity types just by describing them. The cost is real, though: per-call latency and price are higher than a small purpose-built model, and free-form generation can produce inconsistent formats or hallucinate entities that are not in the text. In practice, teams choose based on scale and stability — a fine-tuned encoder model for high-volume, fixed-schema extraction, and an LLM when flexibility, fast iteration, or rare entity types matter more than throughput.

Why NER still matters

It is tempting to think general-purpose LLMs make a dedicated task like NER obsolete, but the opposite is often true: NER is the structuring layer that makes text usable by everything downstream. Redacting personal data for privacy compliance, linking mentions to a knowledge base, populating a database from invoices, or building the entity index a retrieval system searches over — all of these depend on reliably knowing which words name what. Whether that labelling comes from a CRF, a BERT model, or a prompt to an LLM, the underlying task is the same, and understanding it helps you pick the right tool for the job.

Ad placeholder (rectangle)