Unicode Code Point Inspector

Show code point, category, block, and UTF-8 bytes per character

Ad placeholder (leaderboard)

Mystery characters in text — an invisible control byte, a look-alike Cyrillic letter, or an emoji that breaks a database column — are easy to misread. This inspector breaks any string into its individual Unicode code points and shows the full identity of each one.

How it works

The tool iterates the string by code point rather than by UTF-16 unit, so emoji and other astral characters are treated as single characters. For each one it reports:

  • the code point in U+XXXX notation via codePointAt,
  • the general category (such as Lu, Nd, or So), derived from the browser’s Unicode property escapes like \p{Lu},
  • the Unicode block, matched against the standard range table,
  • the UTF-8 bytes, computed directly from the code point, and the UTF-16 units that make up the JavaScript string.

Tips and notes

Use the UTF-8 column to debug encoding bugs: a character that should be one byte but shows up as several often means text was double-encoded. The category column helps when writing regular expressions, since \p{Nd} matches any decimal digit across scripts, not just 0-9. Watch for control characters (category Cc), which display as a ctrl marker here because they have no visible glyph but can still corrupt files and break parsers.

Ad placeholder (rectangle)