Unicode Character Explorer

Paste text and inspect every code point — name, category, hex, HTML entity and UTF-8 bytes.

Ad placeholder (leaderboard)
Enjoying the tools? Go Pro for £4.99 (one-time) and remove all ads — forever, on this device. Remove ads — £4.99

The Unicode Character Explorer takes any text you paste and breaks it down one code point at a time, showing the Unicode name hint, general category, hex and decimal value, HTML entity and the exact UTF-8 bytes for every character. It is built for developers debugging encoding bugs, writers tracking down a stray character that breaks a layout, localisation teams checking non-Latin scripts, and anyone curious about what is really inside a string. Because text on a screen hides a lot — combining accents, zero-width joiners, look-alike letters and control codes all look like ordinary characters — this tool makes the invisible structure visible.

How it works

When you type or paste, the explorer iterates your text by code point rather than by code unit, so surrogate pairs (the way characters above U+FFFF such as emoji are stored in JavaScript) are handled correctly and never split in half. For each code point it computes several things at once. The hex value is the familiar U+XXXX form, and the decimal value is the same number in base ten. The category comes from the browser’s built-in Unicode property escapes, mapping each character to labels like Letter uppercase, Number decimal digit, Symbol currency or Mark non-spacing. A block hint places the character in a rough region of Unicode — Basic Latin, Cyrillic, Hiragana, Currency Symbols, Emoticons and so on.

The HTML entity is generated as a numeric reference, and where a well-known named entity exists, such as the ones for ampersand, copyright or euro, the named form is shown instead. The UTF-8 bytes are produced with the standard encoding algorithm: code points up to 127 take one byte, up to 2047 take two, up to 65535 take three, and everything above that takes four bytes. The summary strip tallies code points, unique characters, UTF-16 units, total UTF-8 bytes, non-ASCII characters and invisible or control characters. Toggle between a card view and a compact table view, switch invisible characters on or off, copy a single code point, or export the whole string as code points, HTML entities, UTF-8 hex or JavaScript escapes.

Example

Paste Café ☕ and the explorer shows six entries. C, a, f are Basic Latin letters, each one UTF-8 byte. The é is U+00E9, a lowercase letter in Latin-1 Supplement, stored as two UTF-8 bytes C3 A9, with the HTML entity number 233. The space is U+0020, a space separator. The coffee cup is U+2615 in the Miscellaneous Symbols block, three UTF-8 bytes long. If your é instead came in as a plain e followed by a separate combining acute accent, the explorer would reveal it as two rows — the letter plus a non-spacing mark — which is the kind of hidden difference that causes search and comparison bugs.

Everything is calculated in your browser. No text is uploaded, logged or stored.

Ad placeholder (rectangle)