See exactly what is in your text
Strings are not always what they look like. A “space” might be a non-breaking
space, an “a” might be Cyrillic, and a single emoji can be five joined code
points. This inspector iterates over a string by Unicode code point and
shows, for every character, its U+XXXX value, decimal value, UTF-8 byte count,
Unicode block, general category, and ready-to-paste HTML and JavaScript escapes.
Everything runs locally in your browser.
How it works
The tool uses the string’s built-in iterator (for…of), which yields full code
points rather than UTF-16 code units, so astral characters above U+FFFF are
handled correctly. For each code point it computes the hex value with
codePointAt, classifies the block by comparing the value against the
standard Unicode block ranges (Basic Latin, Latin-1 Supplement, Cyrillic,
Arabic, CJK, Emoji and more), and derives a general category using Unicode
regular-expression property escapes (\p{Lu}, \p{Ll}, \p{Nd}, \p{Zs},
\p{Cf} and so on). UTF-8 byte length follows the standard ranges (1 byte below
U+0080, 2 below U+0800, 3 below U+10000, 4 above). Escapes are built as
&#xHEX; for HTML and \uXXXX / \u{XXXXXX} for JavaScript.
Example
Pasting the single visible emoji ”👍🏽” reveals two code points: U+1F44D
(thumbs-up sign) followed by U+1F3FD (a medium skin-tone modifier). Pasting
café where the é is decomposed shows e (U+0065) followed by a combining
acute accent (U+0301) — invaluable when text comparisons mysteriously fail.
Tips and notes
- If your JavaScript string
lengthis larger than the visible character count, you have astral characters using surrogate pairs. - A trailing
U+FE0F(variation selector-16) forces emoji presentation of an otherwise text-style symbol.