No. The inspector iterates over your string with the browser's own iterator and code point APIs. Nothing leaves your device.

Why does one emoji show as several code points?

Many emoji are sequences: a base symbol plus skin-tone or variation selectors, or several characters joined by a zero-width joiner (U+200D). The inspector lists each code point so you can see the full sequence.

What is the difference between a code point and a UTF-16 code unit?

A code point is the abstract Unicode value (up to U+10FFFF). UTF-16 stores values above U+FFFF as two 16-bit surrogate units, which is why JavaScript string length can exceed the number of visible characters.

What does the general category mean?

Unicode assigns each code point a two-letter category such as Lu (uppercase letter), Nd (decimal digit), Zs (space) or Cf (format control). It tells you the broad role of the character.

How is the Unicode block determined?

Each code point falls in a named range — Basic Latin, Cyrillic, Arabic, Emoji and so on. The tool maps the numeric value to its block using the standard Unicode block boundaries.

What is the Unicode Character Inspector?

Free Unicode character inspector. Paste any text and see each character broken down into its code point (U+XXXX), UTF-8 and UTF-16 encoding, Unicode block, general category and HTML/JS escapes. Runs entirely in your browser. It runs free in your browser on Gera Tools, with nothing uploaded.

Unicode Character Inspector

Name: Unicode Character Inspector
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

See exactly what is in your text

Strings are not always what they look like. A “space” might be a non-breaking space, an “a” might be Cyrillic, and a single emoji can be five joined code points. This inspector iterates over a string by Unicode code point and shows, for every character, its U+XXXX value, decimal value, UTF-8 byte count, Unicode block, general category, and ready-to-paste HTML and JavaScript escapes. Everything runs locally in your browser.

What this inspector reveals that editors hide

Text editors render glyphs — the visual representation of a character. They hide the underlying code points, which means two strings that look identical can have completely different internal representations. This inspector exposes those differences. Common discoveries:

Homoglyphs: The Latin o (U+006F) and Cyrillic о (U+043E) are indistinguishable in most fonts but are entirely different characters. Phishing domains and spoofed usernames exploit this. Paste a suspicious string to see immediately whether “Latin” letters are actually Cyrillic.

Composition vs decomposition: é in NFC form is a single code point U+00E9. In NFD form it is e (U+0065) + combining acute accent (U+0301) — two code points. Both render identically but "café" === "café" is false in JavaScript without normalisation. The inspector reveals the difference at a glance.

Zero-width characters: U+200B (zero-width space), U+200C (zero-width non-joiner), U+200D (zero-width joiner), and U+FEFF (BOM/zero-width no-break space) are completely invisible in most editors. They cause unexpected sort behaviour, broken regex matches, and copy-paste content that differs from what was visible. The inspector labels them by name.

Emoji sequences: A single visible emoji — a family, a profession, a flag — can be 3 to 7 code points joined by zero-width joiners and variation selectors. The string .length in JavaScript returns the number of UTF-16 code units, not the number of visible glyphs. The inspector lists every component.

How it works

The tool uses the string’s built-in iterator (for…of), which yields full code points rather than UTF-16 code units, so astral characters above U+FFFF are handled correctly. For each code point it computes the hex value with codePointAt, classifies the block by comparing the value against the standard Unicode block ranges (Basic Latin, Latin-1 Supplement, Cyrillic, Arabic, CJK, Emoji and more), and derives a general category using Unicode regular-expression property escapes (\p{Lu}, \p{Ll}, \p{Nd}, \p{Zs}, \p{Cf} and so on). UTF-8 byte length follows the standard ranges (1 byte below U+0080, 2 below U+0800, 3 below U+10000, 4 above). Escapes are built as &#xHEX; for HTML and \uXXXX / \u{XXXXXX} for JavaScript.

Worked examples

Emoji skin-tone: Pasting the single visible emoji ”👍🏽” reveals two code points: U+1F44D (thumbs-up sign) followed by U+1F3FD (a medium skin-tone modifier). The two code points are joined at render time into a single glyph.

Decomposed é in a French word: Pasting café where the é is decomposed shows e (U+0065) followed by a combining acute accent (U+0301) — invaluable when text comparisons mysteriously fail in a form validator or database lookup.

Non-breaking space hidden in copy-pasted content: Text copied from a PDF or a website often contains U+00A0 (NO-BREAK SPACE) instead of U+0020 (SPACE). A trim() call strips normal spaces but leaves the non-breaking space, causing the string to fail equality checks. The inspector labels it as and category Zs (space separator).

Tips and notes

If your JavaScript string .length is larger than the visible character count, you have astral characters using surrogate pairs — each 4-byte emoji counts as 2 in .length.
A trailing U+FE0F (variation selector-16) forces emoji presentation of an otherwise text-style symbol; removing it switches the glyph to a text variant.
HTML escape &#xXXXX; is safe for any code point and is the most portable form for XML and HTML content.
All processing runs locally; your text never leaves the browser.