What is accent folding?

Accent folding maps accented characters to their unaccented ASCII equivalents so that café and cafe are treated as the same term. Search engines apply it on both the index and the query so users find results regardless of how they type accents.

How is it different from just stripping diacritics?

Stripping diacritics removes separable combining marks. Accent folding goes further by also expanding ligatures and special letters that have no combining mark, such as ß to ss, æ to ae, and ø to o, giving fully ASCII output.

Why fold the query and the index the same way?

Matching only works if both sides are normalised identically. If you fold the stored term to cafe but search for café unfolded, the comparison fails. Apply the exact same folding pipeline when building the index and when handling each query.

Should I also lowercase?

For most case-insensitive search you should fold case as well, so Café, CAFÉ, and cafe all collapse to cafe. The optional lowercase toggle here lets you produce a single canonical key for comparison.

Accent Fold (Search Normaliser)

Accent folding normalises text so that searches match regardless of accents and special letters. It is the standard preprocessing step behind diacritic-insensitive search, where café and cafe are treated as identical.

How it works

Two stages produce a clean ASCII key:

1. Expand special letters and ligatures that have no combining mark:
   ß → ss   æ → ae   œ → oe   ø → o   ð → d   þ → th   ł → l
2. Normalise to NFD and delete combining marks (U+0300–U+036F):
   é → e   ñ → n   ü → u
   (optionally) lowercase the whole string

Doing the ligature expansion before the NFD strip ensures characters that NFD cannot decompose still become plain ASCII. Apply the same pipeline to both your search index and the incoming query.

Example and tips

Mötley Crüe — Straße folds to motley crue strasse. Build your search index by folding every stored term with this exact pipeline, then fold each query the same way before comparing. That guarantees accented and unaccented spellings always match.