Training an AI model on data you did not create raises two separate legal questions at once: do you have the copyright right to mine the works, and if the data contains personal information, do you have a lawful basis under data-protection law? This checker assesses both axes and combines them into a single verdict.
How it works
The tool walks two independent legal tests and reports the worst-case of the two:
- Copyright / TDM. If the source is your own content or a licence that permits training, you are clear. Otherwise it relies on a text-and-data-mining exception. Under the EU DSM Directive, Article 3 covers research organisations with no opt-out override, while Article 4 covers commercial use unless a machine-readable opt-out is present. Under the UK CDPA section 29A, only non-commercial research is covered, so commercial training needs a licence.
- Data protection. If personal data is present, you need a GDPR lawful basis — specific consent, or legitimate interests with a documented balancing test. No basis means the use is blocked.
The overall status is the more restrictive of the two verdicts.
Example and notes
A commercial team scraping the open web in the EU with no opt-out signals present gets a permitted copyright verdict under Article 4, but if the data includes people’s posts they still need a lawful basis, so legitimate interests with a balancing test moves the result to permitted-with-conditions. The same scrape for a commercial model in the UK is blocked on the copyright axis without a licence. This is a scoping aid, not legal advice — database rights, site terms, and unsettled case law can change the answer.