What is a Unicode code point?

A code point is the unique number Unicode assigns to a single character, written as U+ followed by a hexadecimal value — for example U+0041 is the letter A and U+1F600 is a grinning face emoji. It is the character's identity, independent of how it is stored in bytes. This tool shows the code point for every character you paste, along with its decimal equivalent.

How is a code point different from a byte?

A code point is the abstract number of a character, while bytes are how that number is actually stored. In UTF-8, one code point becomes one to four bytes — ASCII letters use a single byte, most accented and European characters use two, most other scripts use three, and emoji and rarer characters use four. The explorer lists the exact UTF-8 bytes for each character so you can see the difference.

Why does an emoji count as more than one character sometimes?

Many emoji are built from several code points joined together. A flag, a skin-tone variant or a family emoji is really a sequence of code points connected by zero-width joiners and modifiers. This tool iterates by code point, so it shows each piece of the sequence separately — which is exactly why a single visible emoji can take up several rows.

Can it find invisible or hidden characters?

Yes. Zero-width spaces, zero-width joiners, non-breaking spaces, byte-order marks and control codes are all invisible on screen but still present in your text. With the invisible-characters option enabled, each one gets its own row with a placeholder glyph and its code point, so you can spot stray characters pasted from documents or used to disguise text.

What does the category mean?

Every Unicode character has a general category such as uppercase letter, decimal digit, currency symbol, dash punctuation or non-spacing mark. The explorer derives this category using the browser's built-in Unicode property support, so the labels match the official Unicode database. Categories help you tell a real hyphen from an en dash, or a digit from a look-alike letter.

Is my text uploaded anywhere?

No. All of the analysis — code points, categories, UTF-8 encoding and HTML entities — happens in your browser using built-in JavaScript and Unicode features. Nothing you paste is sent to a server, logged or stored, so it is safe for private or sensitive text.

What is the Unicode Character Explorer?

Free Unicode character explorer. Paste any text to break it down code point by code point — Unicode name and category, hex and decimal value, HTML entity, UTF-8 bytes and JavaScript escapes. Spots hidden and invisible characters. Runs entirely in your browser. It runs free in your browser on Gera Tools, with nothing uploaded.

Unicode Character Explorer

Name: Unicode Character Explorer
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Get one useful tool a week

Like this tool? Enter your email and we'll send you one genuinely useful Gera tool a week — plus a link to come back to this one. No spam, one-click unsubscribe any time.

The Unicode Character Explorer takes any text you paste and breaks it down one code point at a time, showing the Unicode name hint, general category, hex and decimal value, HTML entity and the exact UTF-8 bytes for every character. It is built for developers debugging encoding bugs, writers tracking down a stray character that breaks a layout, localisation teams checking non-Latin scripts, and anyone curious about what is really inside a string. Because text on a screen hides a lot — combining accents, zero-width joiners, look-alike letters and control codes all look like ordinary characters — this tool makes the invisible structure visible.

How it works

When you type or paste, the explorer iterates your text by code point rather than by code unit, so surrogate pairs (the way characters above U+FFFF such as emoji are stored in JavaScript) are handled correctly and never split in half. For each code point it computes several things at once. The hex value is the familiar U+XXXX form, and the decimal value is the same number in base ten. The category comes from the browser’s built-in Unicode property escapes, mapping each character to labels like Letter uppercase, Number decimal digit, Symbol currency or Mark non-spacing. A block hint places the character in a rough region of Unicode — Basic Latin, Cyrillic, Hiragana, Currency Symbols, Emoticons and so on.

The HTML entity is generated as a numeric reference, and where a well-known named entity exists, such as the ones for ampersand, copyright or euro, the named form is shown instead. The UTF-8 bytes are produced with the standard encoding algorithm: code points up to 127 take one byte, up to 2047 take two, up to 65535 take three, and everything above that takes four bytes. The summary strip tallies code points, unique characters, UTF-16 units, total UTF-8 bytes, non-ASCII characters and invisible or control characters. Toggle between a card view and a compact table view, switch invisible characters on or off, copy a single code point, or export the whole string as code points, HTML entities, UTF-8 hex or JavaScript escapes.

Example

Paste Café ☕ and the explorer shows six entries. C, a, f are Basic Latin letters, each one UTF-8 byte. The é is U+00E9, a lowercase letter in Latin-1 Supplement, stored as two UTF-8 bytes C3 A9, with the HTML entity number 233. The space is U+0020, a space separator. The coffee cup is U+2615 in the Miscellaneous Symbols block, three UTF-8 bytes long. If your é instead came in as a plain e followed by a separate combining acute accent, the explorer would reveal it as two rows — the letter plus a non-spacing mark — which is the kind of hidden difference that causes search and comparison bugs.

Everything is calculated in your browser. No text is uploaded, logged or stored.