Transcoding UTF-16 to UTF-8
UTF-16 and UTF-8 are two ways to store the same Unicode text. UTF-16 — common in Windows APIs, Java strings and some file formats — uses two bytes per code unit and can be big- or little-endian. UTF-8 is the byte-compatible, ASCII-friendly encoding the web standardised on. This tool reads a UTF-16 byte sequence and gives you both the human-readable text and its UTF-8 bytes.
How it works
First the hex input is parsed into raw bytes. The tool inspects the first two bytes for a byte-order mark:
FE FF -> UTF-16 big-endian, strip BOM
FF FE -> UTF-16 little-endian, strip BOM
none -> assume big-endian (or use your forced choice)
The remaining bytes are paired into 16-bit code units using the chosen endianness, decoded into a JavaScript string (which natively reassembles surrogate pairs into astral characters), and finally re-encoded with TextEncoder to produce the UTF-8 byte sequence.
Example and notes
The bytes FE FF 00 47 00 65 00 72 00 61 carry a big-endian BOM followed by the code units for “Gera”, and decode to that word with UTF-8 bytes 47 65 72 61. Because the data needs to pair up into 16-bit units, the byte count after the BOM must be even — an odd count signals a truncated or non-UTF-16 input and is flagged. When no BOM is present, double-check the endianness: getting it wrong turns “Gera” into Chinese-looking nonsense because the high and low bytes are swapped.