Unicode Character Inspector

Inspect every Unicode code point in a string — name, block, category.

Ad placeholder (leaderboard)

See exactly what is in your text

Strings are not always what they look like. A “space” might be a non-breaking space, an “a” might be Cyrillic, and a single emoji can be five joined code points. This inspector iterates over a string by Unicode code point and shows, for every character, its U+XXXX value, decimal value, UTF-8 byte count, Unicode block, general category, and ready-to-paste HTML and JavaScript escapes. Everything runs locally in your browser.

How it works

The tool uses the string’s built-in iterator (for…of), which yields full code points rather than UTF-16 code units, so astral characters above U+FFFF are handled correctly. For each code point it computes the hex value with codePointAt, classifies the block by comparing the value against the standard Unicode block ranges (Basic Latin, Latin-1 Supplement, Cyrillic, Arabic, CJK, Emoji and more), and derives a general category using Unicode regular-expression property escapes (\p{Lu}, \p{Ll}, \p{Nd}, \p{Zs}, \p{Cf} and so on). UTF-8 byte length follows the standard ranges (1 byte below U+0080, 2 below U+0800, 3 below U+10000, 4 above). Escapes are built as &#xHEX; for HTML and \uXXXX / \u{XXXXXX} for JavaScript.

Example

Pasting the single visible emoji ”👍🏽” reveals two code points: U+1F44D (thumbs-up sign) followed by U+1F3FD (a medium skin-tone modifier). Pasting café where the é is decomposed shows e (U+0065) followed by a combining acute accent (U+0301) — invaluable when text comparisons mysteriously fail.

Tips and notes

  • If your JavaScript string length is larger than the visible character count, you have astral characters using surrogate pairs.
  • A trailing U+FE0F (variation selector-16) forces emoji presentation of an otherwise text-style symbol.
Ad placeholder (rectangle)