How does the IQR method work?

It sorts a column, finds the first and third quartiles (Q1 and Q3), and computes the inter-quartile range IQR = Q3 minus Q1. Any value below Q1 minus k times IQR or above Q3 plus k times IQR is an outlier; k defaults to 1.5 (the standard fence) and 3.0 marks extreme outliers.

How does the Z-score method work?

It computes each column's mean and standard deviation, then the Z-score (value minus mean, divided by standard deviation) for every cell. Values whose absolute Z-score exceeds the threshold (commonly 3) are flagged as outliers.

Which method should I use?

IQR is robust and works well on skewed data because quartiles are not pulled by extremes. Z-score assumes a roughly normal distribution and is sensitive to the very outliers you are hunting, so a few large outliers can mask others. Try both.

No. The CSV is parsed and all statistics are computed in JavaScript in your browser. Nothing is sent to a server, so it is safe for confidential datasets.

What about empty cells or text in a numeric column?

Non-numeric and empty cells are skipped when computing a column's statistics and are never flagged. A column is treated as numeric only if most of its non-empty cells parse as numbers.

What is the CSV Outlier Detector?

Free CSV outlier detector. Paste CSV data and flag anomalous values in each numeric column using the IQR fence method or Z-score thresholds, with adjustable sensitivity. Parsing and statistics run entirely in your browser — no upload. It runs free in your browser on Gera Tools, with nothing uploaded.

CSV Outlier Detector

Name: CSV Outlier Detector
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Spot anomalies in tabular data

Bad data hides in spreadsheets — a price typed with an extra zero, a sensor that spiked, a duplicated row. This tool reads a CSV, finds the numeric columns, and flags the values that sit far outside the rest of their column using one of two standard statistical methods: the IQR fence or the Z-score. Everything is computed locally, so you can vet a confidential dataset without uploading it.

How it works

The CSV is parsed in the browser (handling quoted fields and commas inside quotes). For each column the tool decides whether it is numeric by checking that most non-empty cells parse as numbers. Then:

IQR method: the column is sorted, Q1 and Q3 are taken at the 25th and 75th percentiles (linear interpolation), and IQR = Q3 − Q1. A value is an outlier if it is below Q1 − k·IQR or above Q3 + k·IQR. The multiplier k defaults to 1.5; 3.0 flags only extreme outliers.
Z-score method: the column’s mean μ and sample standard deviation σ are computed, and each value’s z = (x − μ) / σ. A value is an outlier when |z| exceeds your threshold (default 3).

Empty and non-numeric cells are ignored in the statistics and never flagged.

IQR versus Z-score: choosing the right method

The two methods have meaningfully different failure modes, and the right choice depends on the shape of your data.

IQR (interquartile range) is distribution-free. The quartiles it uses are medians of halves of the sorted data, so a single enormous outlier cannot drag them far from the bulk of the distribution. This makes IQR the more robust choice when the data is skewed, contains a few extreme values already, or when you have no reason to assume a bell-shaped distribution. Sales figures, pageview counts, and biological measurements are often right-skewed, and IQR handles them well.

Z-score assumes the column is roughly normally distributed. When that is true, the standard deviation is a meaningful measure of spread, and flagging values more than three sigma from the mean is a principled threshold. The weakness is masking: if the column already contains a very large value, it inflates σ, and the threshold widens enough to let genuinely outlying values pass unflagged. The fix is to run both methods and compare.

Worked example

For a price column with values [9, 10, 11, 10, 9, 10, 11, 10, 10, 2500]:

Q1 = 9.25, Q3 = 10.75, IQR = 1.5. The upper fence at k=1.5 is 10.75 + 2.25 = 13. The value 2500 is flagged.
Mean ≈ 259, σ ≈ 774. Z-score of 2500 is (2500 − 259) / 774 ≈ 2.9, which falls just under the default threshold of 3. The value is not flagged by Z-score — because the outlier itself inflated σ.

This is the classic masking effect. IQR catches the error; Z-score does not. Try both.

Adjusting sensitivity

The multiplier k in the IQR fence is adjustable:

k = 1.5 — the standard “mild outlier” fence; flags data that is notably beyond the bulk.
k = 3.0 — the “extreme outlier” fence; flags only the most egregious values.

For the Z-score, a threshold of 3.0 corresponds to roughly the outermost 0.3% of a normal distribution. Raise it to 3.5 if you are getting too many false positives on a legitimately fat-tailed column, or lower it to 2.5 for tighter screening.

Tips and notes

IQR is the safer default for skewed or heavy-tailed data because quartiles are not dragged around by the outliers themselves.
With Z-score, a single huge value inflates σ and can hide smaller anomalies — if you suspect that, switch to IQR.
The bounds used for each column are shown so you can sanity-check whether the threshold matches your domain knowledge.
Outliers are not always errors — they may be genuine high-value records, rare events, or edge cases worth investigating. Flagging is the first step; deciding whether to remove, correct, or keep them is a domain question.