Expose invisible reordering characters
Unicode has a set of bidirectional (BiDi) control characters that silently
change the visual order of text — essential for mixing right-to-left scripts like
Arabic and Hebrew with left-to-right Latin, but dangerous when abused. A single
Right-to-Left Override (U+202E) can make gpj.exe display as exe.jpg, or make
source code render differently from how it is stored (the “Trojan Source”
attack). This inspector finds every such character, names it, and shows where it
sits.
How it works
The tool scans the input code point by code point and checks each against the
complete set of Unicode bidirectional formatting and override controls:
LRM U+200E, RLM U+200F, ALM U+061C, LRE U+202A, RLE U+202B, PDF
U+202C, LRO U+202D, RLO U+202E, and the isolates LRI U+2066, RLI
U+2067, FSI U+2068, PDI U+2069. Each match is reported with its standard
name, its U+XXXX value and its index in the string. The inline view replaces
each control with a visible labelled marker so you can see exactly where the
reordering is injected, and a “stripped” output removes them all.
Example
A filename stored as invoicegpj.exe (with an RLO before gpj.exe)
renders to the eye as invoiceexe.jpg. The inspector flags one character — RLO
(U+202E) at the override position — making it obvious the file is really a .exe.
Tips and notes
- Code hosting platforms now warn about BiDi characters in diffs; this tool lets you check a snippet or filename before you trust it.
- The PDF (
U+202C) and PDI (U+2069) characters close an override or isolate; an unbalanced override with no matching close is a strong red flag.