Remove duplicate rows from a CSV
The CSV Deduplicator strips repeated rows from a CSV. You decide what counts as a duplicate — an exact whole-row match, or a match on one or more key columns such as an email address or an ID. It reports how many rows it removed so you can confirm the result before exporting.
How it works
Each row is reduced to a key. In whole-row mode the key is all cells joined together; in key-column mode it is only the selected columns. If case-insensitive matching is on, key values are lower-cased before being compared. The tool then walks the rows once, tracking which keys it has already seen in a hash set: the first time a key appears the row is kept, and any later row with the same key is dropped.
The keep-last option simply reverses the scan, keeps the first occurrence in that reversed order, then restores the original ordering — so the last duplicate survives while output order is preserved. Because detection is a single linear pass over a hash set rather than an all-pairs comparison, it scales to large files.
Example and notes
Given:
email,name
[email protected],Ann
[email protected],Bob
[email protected],Ann (dup)
[email protected],Bob
deduping on the email column with case-insensitive matching keeps the first
[email protected] and the first [email protected], removing two rows. Use key-column mode when
only an identifier needs to be unique and the other columns may legitimately
differ; use whole-row mode when you want to drop only byte-for-byte repeats.