CSV Deduplicator

Remove duplicate rows from a CSV by one or more key columns

Ad placeholder (leaderboard)

Remove duplicate rows from a CSV

The CSV Deduplicator strips repeated rows from a CSV. You decide what counts as a duplicate — an exact whole-row match, or a match on one or more key columns such as an email address or an ID. It reports how many rows it removed so you can confirm the result before exporting.

How it works

Each row is reduced to a key. In whole-row mode the key is all cells joined together; in key-column mode it is only the selected columns. If case-insensitive matching is on, key values are lower-cased before being compared. The tool then walks the rows once, tracking which keys it has already seen in a hash set: the first time a key appears the row is kept, and any later row with the same key is dropped.

The keep-last option simply reverses the scan, keeps the first occurrence in that reversed order, then restores the original ordering — so the last duplicate survives while output order is preserved. Because detection is a single linear pass over a hash set rather than an all-pairs comparison, it scales to large files.

Example and notes

Given:

email,name
[email protected],Ann
[email protected],Bob
[email protected],Ann (dup)
[email protected],Bob

deduping on the email column with case-insensitive matching keeps the first [email protected] and the first [email protected], removing two rows. Use key-column mode when only an identifier needs to be unique and the other columns may legitimately differ; use whole-row mode when you want to drop only byte-for-byte repeats.

Ad placeholder (rectangle)