Why is the byte count larger than the character count?

UTF-8 uses one byte only for ASCII characters. Accented Latin letters take two bytes, most CJK characters take three, and emoji take four, so any non-ASCII text has more bytes than characters.

What is the difference between code points and UTF-16 code units?

A code point is one logical character. UTF-16 code units are the 16-bit pieces JavaScript stores; astral characters like emoji use two units, which is why the JS .length can exceed the real character count.

How many bytes does an emoji take?

A basic emoji is a single code point above U+FFFF and uses four UTF-8 bytes. Compound emoji built from zero-width-joiner sequences are several code points and can take a dozen or more bytes.

Why does the byte count matter?

Database column limits, HTTP header sizes, cookie limits and SMS segment boundaries are all measured in bytes, not characters. A field that allows 255 bytes may hold far fewer accented or emoji characters.

Yes. The tool encodes your text with the browser's native TextEncoder, which produces the precise UTF-8 byte sequence defined by the Unicode standard, then reports its length.

UTF-8 Byte Counter — Gera Tools

Counting real UTF-8 bytes

Characters and bytes are not the same thing. A tweet, a database column, an HTTP header, or an SMS segment is limited by bytes, while what you see on screen is characters. This counter reports the exact UTF-8 byte length of your text so you can tell whether it will actually fit.

How it works

The tool encodes your string with the browser’s built-in TextEncoder, which implements the official UTF-8 rules, and reports the length of the resulting byte array. UTF-8 is variable-width:

U+0000 – U+007F   1 byte   (ASCII)
U+0080 – U+07FF   2 bytes  (accented Latin, Greek, Cyrillic, Hebrew, Arabic)
U+0800 – U+FFFF   3 bytes  (most CJK, symbols)
U+10000 – U+10FFFF 4 bytes (emoji, rare scripts)

Alongside the byte total it counts code points (logical characters), UTF-16 code units (the JavaScript .length value), and splits characters into ASCII versus multi-byte so the difference between counts is obvious.

Tips and notes

If a “255 character” field rejects your text, the limit is almost certainly 255 bytes — and café 🌍 is 6 characters but 11 bytes. Watch the multi-byte count: every non-ASCII character costs at least two bytes, and a single emoji costs four. When a system reports a JavaScript .length, remember that is UTF-16 code units, so an emoji counts as 2 there but 4 UTF-8 bytes and just 1 actual character.