AI Training Data Rights Checker

Check if using a dataset for AI training is lawful under copyright and data-protection law

Ad placeholder (leaderboard)

Training an AI model on data you did not create raises two separate legal questions at once: do you have the copyright right to mine the works, and if the data contains personal information, do you have a lawful basis under data-protection law? This checker assesses both axes and combines them into a single verdict.

How it works

The tool walks two independent legal tests and reports the worst-case of the two:

  • Copyright / TDM. If the source is your own content or a licence that permits training, you are clear. Otherwise it relies on a text-and-data-mining exception. Under the EU DSM Directive, Article 3 covers research organisations with no opt-out override, while Article 4 covers commercial use unless a machine-readable opt-out is present. Under the UK CDPA section 29A, only non-commercial research is covered, so commercial training needs a licence.
  • Data protection. If personal data is present, you need a GDPR lawful basis — specific consent, or legitimate interests with a documented balancing test. No basis means the use is blocked.

The overall status is the more restrictive of the two verdicts.

Example and notes

A commercial team scraping the open web in the EU with no opt-out signals present gets a permitted copyright verdict under Article 4, but if the data includes people’s posts they still need a lawful basis, so legitimate interests with a balancing test moves the result to permitted-with-conditions. The same scrape for a commercial model in the UK is blocked on the copyright axis without a licence. This is a scoping aid, not legal advice — database rights, site terms, and unsettled case law can change the answer.

Ad placeholder (rectangle)