What NLP is
Natural language processing (NLP) is the branch of artificial intelligence devoted to making computers work with human language — reading it, interpreting it, and producing it. The field is old, with roots in the machine-translation efforts of the 1950s, and it gave us most of the concepts that modern AI still relies on. Long before large language models existed, NLP researchers were building systems to split sentences into words, identify names, and judge whether a review was positive or negative.
The classic pipeline
Traditional NLP often worked as a pipeline of stages, each feeding the next. Tokenization broke raw text into units — words, sub-words, or punctuation. Part-of-speech (POS) tagging then labelled each token as a noun, verb, adjective, and so on. Parsing worked out the grammatical structure, building a tree that showed how phrases related. Named-entity recognition (NER) picked out people, places, organisations, and dates. Downstream tasks like sentiment analysis or machine translation built on these foundations. Each stage was a distinct problem with its own models and evaluation benchmarks.
How early systems worked
The first generation of NLP was rule-based: linguists hand-wrote grammars and dictionaries, and the system applied them mechanically. This was precise but brittle — language has endless exceptions, and rules could not cover them all. The next wave was statistical: instead of fixed rules, systems learned probabilities from annotated corpora, using techniques like hidden Markov models for tagging and n-gram models for prediction. Statistical methods were more robust to the messiness of real text but still needed large amounts of hand-labelled data for each task.
The deep-learning turn
In the 2010s, neural networks reshaped NLP. Word embeddings such as word2vec and GloVe represented words as vectors so that similar words sat close together in space, capturing meaning numerically. Recurrent networks and LSTMs then processed sequences word by word, improving translation and tagging. These models learned features from data rather than relying on hand-crafted rules, and they set new records on benchmark after benchmark — but they still tended to be trained one task at a time.
How transformers superseded the old toolchain
The 2017 transformer architecture, and the BERT and GPT models that followed, changed everything. By pre-training a single large model on vast unlabelled text and then adapting it, researchers found that one model could handle tagging, NER, classification, translation, and question answering — often beating the specialised systems built for each. The neat, stage-by-stage pipeline gave way to end-to-end models that absorb the whole task at once. This is why classical NLP terms still appear everywhere even though the dedicated tools behind them are used less.
Why classical NLP still matters
Even in the LLM era, the older toolkit has its place. Lightweight tokenizers, regular expressions, and small classifiers are fast, cheap, and run on a phone or in a browser without a giant model. They are explainable, which matters in regulated settings, and they give guaranteed structured output. Many production systems combine the two worlds — using a small classical component for speed and an LLM only where its flexibility is genuinely needed. Understanding NLP’s foundations therefore remains valuable for anyone building real language systems.