How is a duplicate decided?

By a key. In whole-row mode the key is every column joined together, so two rows are duplicates only if all cells match. In key-column mode the key is just the columns you select, so rows are duplicates when those columns agree even if other columns differ.

What does keep first versus keep last do?

When several rows share a key, only one is kept. Keep first preserves the earliest occurrence in the file; keep last preserves the latest. The relative order of the surviving rows is otherwise unchanged.

How does case-insensitive matching work?

Each key value is lower-cased before comparison, so Ann and ANN are treated as the same. The kept row's original casing is preserved in the output — only the comparison ignores case.

Can it handle large files?

Yes. Detection uses a hash set keyed on the joined key columns, so it runs in a single linear pass rather than comparing every pair of rows. Everything stays in the browser with no upload.

What is the CSV Deduplicator?

Remove duplicate rows from a CSV in your browser — match on the whole row or on chosen key columns, optionally case-insensitively, and keep either the first or last occurrence. Reports how many rows were removed. No upload; runs locally. It runs free in your browser on Gera Tools, with nothing uploaded.

CSV Deduplicator — Gera Tools

Name: CSV Deduplicator
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Get one useful tool a week

Like this tool? Enter your email and we'll send you one genuinely useful Gera tool a week — plus a link to come back to this one. No spam, one-click unsubscribe any time.

Remove duplicate rows from a CSV

The CSV Deduplicator strips repeated rows from a CSV. You decide what counts as a duplicate — an exact whole-row match, or a match on one or more key columns such as an email address or an ID. It reports how many rows it removed so you can confirm the result before exporting.

How it works

Each row is reduced to a key. In whole-row mode the key is all cells joined together; in key-column mode it is only the selected columns. If case-insensitive matching is on, key values are lower-cased before being compared. The tool then walks the rows once, tracking which keys it has already seen in a hash set: the first time a key appears the row is kept, and any later row with the same key is dropped.

The keep-last option simply reverses the scan, keeps the first occurrence in that reversed order, then restores the original ordering — so the last duplicate survives while output order is preserved. Because detection is a single linear pass over a hash set rather than an all-pairs comparison, it scales to large files.

Example

Given:

email,name
[email protected],Ann
[email protected],Bob
[email protected],Ann (dup)
[email protected],Bob

deduping on the email column with case-insensitive matching keeps the first [email protected] and the first [email protected], removing two rows. The kept rows retain their original casing — only the comparison ignores case.

Choosing whole-row versus key-column mode

Whole-row mode treats every cell in a row as part of the key. Two rows are duplicates only if every column matches. This is the right choice when you have a completely verbatim copy of a row — for example when an export was appended to itself.

Key-column mode lets you choose which column (or columns) identifies a unique entity. Use this when:

You have an email or ID column and want to keep only one row per contact, even if the other columns differ (for example two records for the same email with different phone numbers).
You want to deduplicate on a composite key — for instance keeping one row per (customer_id, product_id) pair rather than per customer alone.
The data has minor variations in non-key fields and you want the first (or last) complete record to survive.

Keep first versus keep last

The choice matters when the surviving row’s non-key data varies:

Keep first — preserve the earliest version in the file. Useful when the original record is the authoritative one and later rows are partial updates or re-exports.
Keep last — preserve the most recent occurrence. Useful when the file is sorted by date and later rows are newer versions.

The relative order of surviving rows is always preserved in the output regardless of which option you choose. All processing runs locally — nothing is uploaded.