Is my Parquet file uploaded anywhere?

No. The file is read locally with the browser File API and parsed in-page. Nothing is sent to a server, so even sensitive data files are safe to inspect.

What metadata can it show?

It reads the file footer to report total row count, number of row groups, the format version, who created the file, each column's physical and logical type, the compression codec, the encodings used, and any custom key/value metadata such as an Arrow schema.

Does it decode the actual data?

No. Only the footer is parsed, so it is fast even on multi-gigabyte files. The data pages themselves are never read or decompressed.

Which compression codecs are recognised?

Standard Parquet codecs including SNAPPY, GZIP, ZSTD, BROTLI, LZ4, and UNCOMPRESSED are reported exactly as recorded in the column metadata.

Why does it say the file is not valid Parquet?

Every Parquet file ends with the 4-byte magic string PAR1. If those bytes are missing the file is truncated or is not Parquet, and the viewer stops rather than show misleading results.

What is the Parquet Schema Viewer?

Read an Apache Parquet file's footer in your browser to view column names, physical and logical types, row counts, row-group layout, and the compression codec per column. The file is never uploaded. It runs free in your browser on Gera Tools, with nothing uploaded.

Parquet Schema Viewer

Name: Parquet Schema Viewer
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Inspect a Parquet file without spinning up a backend

A Parquet file packs its entire structure into a compact footer at the end of the file rather than the start — unlike CSV or JSON, you cannot just open it in a text editor and understand the shape. That footer is a Thrift-serialised block describing the schema, every column’s type and compression, how many rows exist, and how the data is split into row groups. This tool reads that footer right in your browser, so you can answer “what columns are in here and how is it compressed?” without Spark, DuckDB, or a Python notebook.

How the file format works

Apache Parquet uses a columnar storage format: all values for column A are stored together, then all values for column B, and so on. This makes column scans and aggregations fast (you only read the columns you query) but means you must parse structured metadata to understand what the file contains before accessing any data.

A valid Parquet file has this layout:

[4-byte magic: PAR1]
[row group 1 data pages]
[row group 2 data pages]
...
[row group N data pages]
[Thrift-encoded FileMetaData footer]
[4-byte footer length (little-endian int32)]
[4-byte magic: PAR1]

The tool reads the last bytes of the file to find the trailing PAR1 magic, reads the 4-byte footer length, then reads exactly that many bytes for the FileMetaData Thrift block — nothing else. Data pages are never loaded or decompressed.

Step-by-step parsing

Confirm trailing PAR1 magic. If it is missing, the file is truncated or is not Parquet, and the viewer reports immediately rather than attempting to parse garbage.
Read the footer length from the 4 bytes immediately before the trailing magic.
Decode the Thrift FileMetaData, which contains the schema, row group descriptors, and custom key/value metadata.
Walk the flat schema tree, picking out the leaf nodes (actual data columns) and their physical and logical types.
Pair each leaf column with its compression codec and encodings from the first row group’s column metadata.

Because only the footer is decoded, inspection is near-instant even on multi-gigabyte files.

Understanding the output

Physical type is how bytes are physically stored — BOOLEAN, INT32, INT64, FLOAT, DOUBLE, BYTE_ARRAY, or FIXED_LEN_BYTE_ARRAY. This is the low-level storage primitive.

Logical (or converted) type is the intended semantic meaning overlaid on the physical type: UTF8 means the BYTE_ARRAY should be interpreted as a UTF-8 string; TIMESTAMP_MICROS means the INT64 is a microsecond epoch timestamp; DECIMAL carries a precision and scale annotation. When you see a column annotated as BYTE_ARRAY / UTF8, you know it is a string column.

Compression codec is reported per column — Parquet allows different columns to use different codecs in the same file. Common values: SNAPPY (fast, moderate compression), GZIP (slower, better compression), ZSTD (modern balance of speed and ratio), BROTLI, LZ4, and UNCOMPRESSED.

Key/value metadata in the file footer often carries a serialised Apache Arrow schema, writer-version tags, or application-specific metadata. This is useful when debugging why two tools read the same Parquet file differently.

Practical use cases

Before running a query: quickly confirm column names and types without loading the full file into Pandas, DuckDB, or Spark.
Cross-tool compatibility: check whether a file written by PySpark is readable by Arrow/DuckDB by inspecting logical types and metadata.
Data pipeline debugging: spot a missing column or a type mismatch that explains a downstream error, without transferring the file to a compute environment.
Audit and documentation: record the schema of a delivered data file without needing a runtime environment.

Files up to 200 MB are accepted; the limit guards browser memory since only the footer is actually read. Your file never leaves your device.