An n-gram is a contiguous sequence of n items from a text. Word bigrams are pairs of adjacent words; character trigrams are runs of three characters. They are a basic building block of language models and text classification.

What is the difference between character and word n-grams?

Character n-grams slide over individual characters and capture spelling and sub-word patterns, while word n-grams slide over tokens and capture phrasing. Character n-grams are robust to typos; word n-grams capture meaning.

How are word n-grams tokenized?

Text is split into Unicode word tokens — runs of letters, numbers, and apostrophes — so contractions stay intact and punctuation is dropped. The n-gram window then slides one token at a time.

Why are some n-grams missing?

A text of length L yields L minus n plus 1 n-grams, so if your text is shorter than n items you get none. Increase the text length or lower n.

No. All n-gram extraction runs entirely in your browser with JavaScript, so your text never leaves your device.

N-gram Generator — Gera Tools

Break text into n-grams

This tool extracts n-grams — contiguous sequences of n characters or words — from any text and counts how often each one appears. N-grams are a foundational tool in natural language processing, used for language modeling, spelling correction, text classification, and similarity scoring.

How it works

Pick a mode and a window size n:

Word n-grams: the text is tokenized into Unicode word tokens, then a window of n consecutive tokens slides one position at a time.
Character n-grams: the window of n consecutive characters slides one position at a time over the raw text (optionally with whitespace collapsed).

A text of L items produces L - n + 1 n-grams. Each distinct n-gram is tallied, and the results are listed most-frequent first. Case folding can be enabled to treat The and the as the same n-gram.

Example

For the word bigrams (n = 2) of “to be or not to be”:

to be   2
be or   1
or not  1
not to  1

The pair to be appears twice; everything runs locally so your text stays private.