AI discrimination test builder
If an AI system makes or informs a decision about a person — who gets hired, what price they pay, whether their content is removed — you have a legal and ethical duty to check that it is not treating protected groups differently. The cleanest way to detect that is paired testing: send the model two requests that are identical in every respect except a single protected attribute, then see whether the outputs diverge. This tool builds those matched prompt sets for you.
How it works
You paste your real decision prompt and pick the protected characteristics you want to probe — gender, race, age, disability, religion, pregnancy, or name-based proxy signals. For each characteristic the builder produces a set of otherwise-identical variant prompts that change only that one attribute and explicitly hold all other qualifications constant. It then assembles a runnable test harness with the recommended method: run each prompt at least twenty times, record the outcome, compute the positive-outcome rate per group, and apply the four-fifths rule to flag adverse impact.
Tips and notes
- Test name proxies, not just explicit attributes. Models often infer ethnicity or gender from a name even when you never state it — that is where real-world bias hides.
- Volume matters. A single generation is noise. Aggregate over many runs at a fixed temperature so you are measuring the model, not luck.
- A passing test is not a clean bill of health. Paired testing catches first-order disparities; it cannot catch intersectional or context-dependent bias. Treat it as one layer of a broader fairness audit.