Why document model lineage?

Regulators (EU AI Act), customers, and auditors increasingly ask where a model came from — its base model, what it was fine-tuned on, and what data trained it. A lineage record answers those questions and surfaces supply-chain risk.

What is a base model versus a fine-tuned model?

A base model is a foundation model you build on (e.g. an open-weights or API model). A fine-tuned model is one you adapted with your own data. The lineage shows the chain from your deployed model back to its origins.

What counts as training data provenance?

The source and licensing of data used to train or fine-tune the model — public datasets, licensed data, customer data, or synthetic data. Unclear provenance is a common compliance and copyright risk.

Does this replace a model card?

No, but it captures the lineage section a model card needs. You can paste the exported record into a fuller model card alongside performance, bias, and intended-use sections.

No. Everything is entered and assembled in your browser. Nothing is sent to a server, so you can document internal models safely.

What is the AI Model Lineage Tracker?

Build a structured model lineage document for your AI stack — capturing base models, fine-tuning sources, training data provenance, version history, and dependency chains for compliance and audit purposes. It runs free in your browser on Gera Tools, with nothing uploaded.

AI Model Lineage Tracker

Name: AI Model Lineage Tracker
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Get one useful tool a week

Like this tool? Enter your email and we'll send you one genuinely useful Gera tool a week — plus a link to come back to this one. No spam, one-click unsubscribe any time.

When an auditor, a customer, or a regulator asks “where did this model come from?”, most teams cannot answer cleanly. The AI Model Lineage Tracker builds a structured record of every model in your stack — its base model, fine-tuning sources, training-data provenance, and dependency chain — so the answer is one document away.

How it works

You add each model as a row: its name and version, the base model it builds on, any fine-tuning source and data, and the provenance of the training data (public, licensed, customer, or synthetic). You can mark dependencies — which deployed models call or build on others — and the tool renders the lineage chain.

The output is a clean, copyable lineage document. It maps onto the provenance and data-governance questions in the EU AI Act’s technical-documentation requirements and the “training data” and “model details” sections of a standard model card.

Why AI supply chain documentation has become urgent

The EU AI Act classifies AI systems by risk tier and requires technical documentation for high-risk systems, including training data descriptions and provenance records. Outside of the EU, customers in regulated industries — finance, healthcare, defence — now routinely ask suppliers for AI lineage records as part of vendor due diligence. An undocumented model is not just a compliance gap; it is a competitive disadvantage when procurement teams require written evidence before they can approve a supplier.

Independently of regulation, lineage documentation catches supply-chain risks early. A model fine-tuned on customer data that was licensed for one purpose but used for another creates liability. A model that depends on a third-party foundation model that gets deprecated or repriced creates operational risk. Tracking these relationships makes them visible while there is still time to act.

What the tool captures for each model

Field	Why it matters
Name and version	Makes the record auditable and reproducible
Base model	Shows which foundation you are building on and its licensing
Fine-tuning source	Identifies what adapted the model and under what data rights
Training data provenance	Public / licensed / customer / synthetic — the most common compliance question
Downstream dependencies	Which of your other models or systems depend on this one

Tips for building a useful lineage record

The highest-value field is training-data provenance — it is the one most teams cannot reconstruct after the fact, so capture it while you still can. The second is version history: record the exact version of each base and fine-tuned model, because “we use a GPT-class model” without a version number is not auditable. Treat the lineage record as living documentation and update it whenever you swap a base model, retrain, or change a data source. Everything is assembled in your browser and nothing is uploaded, so you can document internal or restricted models safely.