Why are code points and UTF-16 units different?

Characters above U+FFFF, such as most emoji, are stored as two UTF-16 units (a surrogate pair) but count as one Unicode code point. The summary shows both so you can see where strings will behave unexpectedly.

How are invisible characters shown?

Spaces, tabs, line feeds, and carriage returns get a visible label, and control characters are shown with a placeholder glyph, so you can spot hidden characters that cause bugs.

Is my text uploaded anywhere?

No. The inspection runs entirely in your browser and your text never leaves your device.

What is a Unicode code point?

A code point is the numeric identity of a character in the Unicode standard, written as U+ followed by hexadecimal — for example the letter A is U+0041 and 😀 is U+1F600. The inspector iterates your text by code point, so it correctly counts emoji and other characters above U+FFFF as one.

Why does the emoji count as one code point but two characters?

JavaScript stores strings in UTF-16, where any character above U+FFFF (most emoji) needs two 16-bit units called a surrogate pair. Iterating by code point counts it once, but the raw string length counts it twice — which is exactly the mismatch the summary highlights.

How do I find a hidden or invisible character?

Paste the text and look down the table for characters you did not expect — a non-breaking space (U+00A0), a zero-width space (U+200B), or a control character shown as a placeholder glyph. Each appears as its own row with a code point so you can locate and remove it.

How are character categories determined?

The tool uses Unicode property escapes — \\p{L} for letters, \\p{N} for numbers, \\p{P} for punctuation, \\p{S} for symbols, \\p{M} for marks and \\p{Z} for separators — plus special handling for spaces, tabs and control characters.

What is the difference between a code point and a grapheme?

A code point is one Unicode scalar value, while a grapheme is what a reader perceives as a single character — which may be several code points (for example a base letter plus a combining accent, or an emoji with a skin-tone modifier). This tool lists individual code points.

What is the Unicode Character Inspector?

Free Unicode character inspector — break any text into its characters and see the code point, hex value, decimal, and category for each. Runs entirely in your browser, nothing is uploaded. It runs free in your browser on Gera Tools, with nothing uploaded.

Unicode Character Inspector

Name: Unicode Character Inspector
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Unicode character inspector

When text behaves unexpectedly — a string length is off by one, a regex misses an accented letter, or an invisible character silently breaks a comparison — the cause is almost always hidden at the code-point level. This inspector splits any text into its individual Unicode code points and shows the U+ hex value, decimal value, and category of each, making invisible and stray characters immediately visible.

Why string length lies in JavaScript (and many other languages)

JavaScript stores strings in UTF-16, where most characters occupy one 16-bit unit. Characters above U+FFFF — the majority of emoji, and some less common scripts — require two 16-bit units called a surrogate pair. The String.length property returns the number of UTF-16 units, not the number of characters a human sees.

For example:

"A".length → 1 (one code point, one UTF-16 unit)
"é".length → 1 (one code point, one UTF-16 unit if precomposed)
"😀".length → 2 (one code point, but two UTF-16 units)

This mismatch causes real bugs: a database column sized for 10 characters can silently truncate an emoji, a regex match can split inside a surrogate pair producing garbage, and .slice() can cut an emoji in half. The inspector’s summary shows both the code-point count and the UTF-16 unit count so you can see the gap immediately.

How the inspector classifies characters

The tool iterates text by code point using JavaScript’s for...of loop (which is surrogate-pair-aware), so each emoji counts as one entry. For each code point it:

Reads the code point value with codePointAt(0)
Formats it as U+XXXX (padded to 4 or 5 hex digits)
Classifies it with Unicode property escapes: \p{L} letter, \p{N} number, \p{P} punctuation, \p{S} symbol, \p{M} combining mark, \p{Z} separator
Labels control characters, spaces, tabs, line feeds, and carriage returns explicitly so they become visible

Common debugging scenarios

String length mismatch. Paste the string and compare the code-point count with the UTF-16 unit count in the summary. Any discrepancy indicates astral characters (emoji or rare scripts) that take two units.

Hidden characters. Paste text that refuses to match a known value. The inspector reveals zero-width spaces (U+200B), non-breaking spaces (U+00A0), left-to-right marks (U+200E), and other invisible characters that look like plain spaces but are not.

Combining marks. An accented character can be stored as one precomposed code point (é = U+00E9) or as a base letter plus a combining accent (e + ◌́ = U+0065 U+0301). The inspector shows both as separate rows so you can see whether NFC normalisation is needed.

Copy-paste artefacts. Text copied from PDFs or Word documents often contains smart quotes, soft hyphens (U+00AD), and various punctuation look-alikes. The inspector names each one.

Worked example

Inspecting the string Café — 😀:

Character	Code point	UTF-16 units	Category
C	U+0043	1	Letter
a	U+0061	1	Letter
f	U+0066	1	Letter
é	U+00E9	1	Letter
(space)	U+0020	1	Separator
—	U+2014	1	Punctuation
(space)	U+0020	1	Separator
😀	U+1F600	2	Symbol

Code-point count: 8. UTF-16 unit count: 9. The emoji is the reason.

Everything runs in your browser — your text never leaves your device.