What is a text-and-data-mining exception?

A TDM exception lets you copy works to analyse them computationally, including for AI training, without the rightsholder's permission in defined circumstances. The EU and UK both have such exceptions, but they differ sharply on whether commercial use is covered.

How does the EU DSM Directive treat AI training?

Article 3 lets research organisations mine lawfully accessed works with no opt-out override. Article 4 lets anyone, including commercial actors, mine lawfully accessed works unless the rightsholder has reserved the use with a machine-readable opt-out such as a TDMRep signal.

Can I train a commercial model on scraped data in the UK?

Not under the current statutory exception. UK section 29A of the CDPA only covers computational analysis for non-commercial research. Commercial AI training on third-party copyright works needs a licence, so the tool flags that path as blocked without one.

When do I need a GDPR lawful basis?

Whenever the training data contains personal data about identifiable people. Consent specific to training, or legitimate interests backed by a documented balancing test with transparency and an objection route, can provide a basis. Special-category data needs an additional Article 9 condition.

Is this legal advice?

No. TDM and AI training law is evolving and contested, and database rights, contractual terms, and sui generis rights may also apply. Treat the result as a scoping aid and take qualified legal advice before training on third-party data.

What is the AI Training Data Rights Checker?

Answer questions about data source, region, purpose, TDM opt-out signals, and personal-data presence to assess copyright text-and-data-mining exceptions under the EU DSM Directive and UK CDPA, plus a GDPR lawful-basis check, for machine-learning teams. It runs free in your browser on Gera Tools, with nothing uploaded.

AI Training Data Rights Checker

Name: AI Training Data Rights Checker
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Get one useful tool a week

Like this tool? Enter your email and we'll send you one genuinely useful Gera tool a week — plus a link to come back to this one. No spam, one-click unsubscribe any time.

Training an AI model on data you did not create raises two separate legal questions at once: do you have the copyright right to mine the works, and if the data contains personal information, do you have a lawful basis under data-protection law? This checker assesses both axes and combines them into a single verdict.

How it works

The tool walks two independent legal tests and reports the worst-case of the two:

Copyright / TDM. If the source is your own content or a licence that permits training, you are clear. Otherwise it relies on a text-and-data-mining exception. Under the EU DSM Directive, Article 3 covers research organisations with no opt-out override, while Article 4 covers commercial use unless a machine-readable opt-out is present. Under the UK CDPA section 29A, only non-commercial research is covered, so commercial training needs a licence.
Data protection. If personal data is present, you need a GDPR lawful basis — specific consent, or legitimate interests with a documented balancing test. No basis means the use is blocked.

The overall status is the more restrictive of the two verdicts.

The TDM opt-out signal: what it is and why it matters

Under EU DSM Directive Article 4, a rightsholder can block commercial text-and-data mining by including a machine-readable opt-out signal with the work. The most widely discussed standard is TDMRep (a W3C-affiliated specification), which allows sites to declare opt-outs in a header or manifest. If you are scraping the open web for commercial AI training in the EU, you must respect any machine-readable opt-out you encounter — the exception only applies where no such reservation exists. This is the key practical difference between Article 3 (research, no opt-out override) and Article 4 (commercial, opt-out blocks you).

How the copyright and data-protection axes combine

The checker treats the two verdicts independently and reports the more restrictive:

Copyright verdict	Data-protection verdict	Overall status
Permitted	No personal data	Permitted
Permitted	Lawful basis identified	Permitted with conditions
Permitted	No lawful basis	Blocked
Blocked (no licence)	—	Blocked

For example: a commercial team in the EU scraping public forums where users have posted personal data — and no TDM opt-out is set — gets a permitted copyright verdict under Article 4 but needs to establish a GDPR lawful basis, typically legitimate interests with a documented balancing test. That moves the result to “permitted with conditions.” The same scenario under UK law is blocked on the copyright axis regardless, because UK section 29A only covers non-commercial research.

What the checker does not cover

Database rights, sui generis database rights, and contractual terms of service are separate layers that can override the TDM exception even when it would otherwise apply. Scraping a site that explicitly forbids it in its terms can expose you to breach-of-contract claims even if the copyright question resolves in your favour. This tool is a scoping aid — take qualified legal advice before training on third-party data at scale.