What is a byte-order mark?

A byte-order mark (BOM) is an optional prefix that signals endianness: FE FF means big-endian UTF-16, FF FE means little-endian. In Auto mode the tool reads and strips it to decide how to interpret the rest.

What if there is no BOM?

Without a BOM there is no inherent way to know the byte order, so in Auto mode the tool assumes big-endian (the network/standard default). You can override this by forcing big- or little-endian.

Does it handle emoji and astral characters?

Yes. UTF-16 stores characters above U+FFFF as surrogate pairs of two code units. The decoder reassembles those pairs, and the resulting UTF-8 output uses the correct four-byte sequence.

Why must the byte count be even?

UTF-16 stores each code unit in exactly two bytes, so a valid stream always has an even number of bytes after any BOM. An odd count means the data is truncated or not really UTF-16, and the tool reports an error.

What is the difference between UTF-16 and UTF-8?

Both encode the same Unicode characters, but UTF-16 uses 2 or 4 bytes per character while UTF-8 uses 1 to 4 bytes and is ASCII-compatible. This tool reads the UTF-16 bytes and re-emits the equivalent UTF-8 bytes.

UTF-16 to UTF-8 Converter

Transcoding UTF-16 to UTF-8

UTF-16 and UTF-8 are two ways to store the same Unicode text. UTF-16 — common in Windows APIs, Java strings and some file formats — uses two bytes per code unit and can be big- or little-endian. UTF-8 is the byte-compatible, ASCII-friendly encoding the web standardised on. This tool reads a UTF-16 byte sequence and gives you both the human-readable text and its UTF-8 bytes.

How it works

First the hex input is parsed into raw bytes. The tool inspects the first two bytes for a byte-order mark:

FE FF -> UTF-16 big-endian, strip BOM
FF FE -> UTF-16 little-endian, strip BOM
none  -> assume big-endian (or use your forced choice)

The remaining bytes are paired into 16-bit code units using the chosen endianness, decoded into a JavaScript string (which natively reassembles surrogate pairs into astral characters), and finally re-encoded with TextEncoder to produce the UTF-8 byte sequence.

Example and notes

The bytes FE FF 00 47 00 65 00 72 00 61 carry a big-endian BOM followed by the code units for “Gera”, and decode to that word with UTF-8 bytes 47 65 72 61. Because the data needs to pair up into 16-bit units, the byte count after the BOM must be even — an odd count signals a truncated or non-UTF-16 input and is flagged. When no BOM is present, double-check the endianness: getting it wrong turns “Gera” into Chinese-looking nonsense because the high and low bytes are swapped.