Inspect a Parquet file without spinning up a backend
A Parquet file packs its structure into a compact footer at the end of the file rather than the start. That footer is a Thrift-serialised block describing the schema, every column’s type and compression, how many rows exist, and how the data is split into row groups. This tool reads that footer right in your browser, so you can answer “what columns are in here and how is it compressed?” without Spark, DuckDB, or a Python notebook.
How it works
A valid Parquet file is bracketed by the 4-byte magic string PAR1 at both ends. Immediately before the trailing magic sits a 4-byte little-endian integer giving the footer length, and before that is the Thrift-encoded FileMetaData. The viewer:
- Confirms the trailing
PAR1magic so it never tries to parse a truncated or non-Parquet file. - Parses the footer with a pure-JavaScript Parquet metadata reader.
- Walks the flat schema tree, keeping the leaf columns, and pairs each with its per-column metadata (codec and encodings) from the first row group.
Because only the footer is decoded — never the data pages — inspection is near-instant even on very large files.
Tips and notes
- Physical vs logical types: the physical type is how bytes are stored (
INT32,BYTE_ARRAY, etc.), while the logical/converted type tells you the intended meaning (UTF8,DECIMAL,TIMESTAMP_MICROS). - Compression is per column: Parquet lets each column use a different codec, so the table shows the codec recorded for each one rather than a single file-wide value.
- Key/value metadata often carries the serialised Arrow schema or a writer’s custom tags — handy for debugging cross-tool compatibility.
- Files up to 200 MB are accepted; the limit only guards memory, since the data itself is never read.