Spot anomalies in tabular data
Bad data hides in spreadsheets — a price typed with an extra zero, a sensor that spiked, a duplicated row. This tool reads a CSV, finds the numeric columns, and flags the values that sit far outside the rest of their column using one of two standard statistical methods: the IQR fence or the Z-score. Everything is computed locally, so you can vet a confidential dataset without uploading it.
How it works
The CSV is parsed in the browser (handling quoted fields and commas inside quotes). For each column the tool decides whether it is numeric by checking that most non-empty cells parse as numbers. Then:
- IQR method: the column is sorted, Q1 and Q3 are taken at the 25th and 75th
percentiles (linear interpolation), and
IQR = Q3 − Q1. A value is an outlier if it is belowQ1 − k·IQRor aboveQ3 + k·IQR. The multiplierkdefaults to1.5;3.0flags only extreme outliers. - Z-score method: the column’s mean
μand sample standard deviationσare computed, and each value’sz = (x − μ) / σ. A value is an outlier when|z|exceeds your threshold (default3).
Empty and non-numeric cells are ignored in the statistics and never flagged.
Tips and notes
- IQR is the safer default for skewed or heavy-tailed data because quartiles are not dragged around by the outliers themselves.
- With Z-score, a single huge value inflates
σand can hide smaller anomalies — if you suspect that, switch to IQR. - The bounds used for each column are shown so you can sanity-check whether the threshold matches your domain knowledge.