Which categories does it screen?

Violence and threats, hate and harassment, self-harm, sexual content, personal data (PII), bias and stereotyping, misinformation, and illegal or dangerous instructions. Each finding includes a severity, a short reason, and the quoted snippet that triggered it.

What does the severity threshold change?

It tunes sensitivity. At a low threshold the model flags even mild or borderline concerns, useful for strict moderation. At high it only flags clear violations, reducing false positives. Medium balances the two. Pick the level that matches your tolerance.

Is this a substitute for a compliance review?

No. It is a screening aid that surfaces likely issues quickly, but model moderation is imperfect — it misses things and over-flags others. Treat every flagged item as a prompt for human judgement, and never make a final policy or legal decision on the model output alone.

Where does my API key go?

It stays in your browser tab and is sent directly to OpenAI or Anthropic with the request you trigger. It is never stored, logged, or routed through any Gera server, and refreshing the tab clears it.

Who pays for the API calls?

You do, on your own provider account. Each check is one real API call billed at your usage rate. The tool itself is free — your only cost is the tokens consumed on your key.

What is the Content Safety Checker (BYO Key)?

Paste content and the tool prompts your own OpenAI or Anthropic key for a structured safety audit across violence, hate, self-harm, sexual content, PII, bias, and misinformation, with severity flags and quoted evidence as JSON. Client-side; your key stays in your browser. It runs free in your browser on Gera Tools, with nothing uploaded.

Content Safety Checker (BYO Key)

Name: Content Safety Checker (BYO Key)
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Get one useful tool a week

Like this tool? Enter your email and we'll send you one genuinely useful Gera tool a week — plus a link to come back to this one. No spam, one-click unsubscribe any time.

Before user comments, generated copy, or community posts go live, it helps to screen them for content that could be harmful, biased, or expose personal data. This tool runs a structured safety audit on any text using your own OpenAI or Anthropic key and returns a categorised report with quoted evidence, all in your browser.

How it works

Pick a provider and model, paste your API key, and choose a severity threshold. Paste the content and the tool builds an audit prompt covering eight categories — violence, hate, self-harm, sexual content, PII, bias, misinformation, and dangerous instructions. The model returns a single JSON object: an overall verdict (safe, review, or unsafe) plus a findings array where each entry names the category, a severity, a short reason, and the exact snippet that triggered it. The prompt instructs the model to quote evidence and to never comply with or repeat any harmful instructions the content itself might contain. It makes one direct request to the provider and shows the report to copy.

For Anthropic, the request includes the official direct-browser-access header so it works straight from the page.

Using it in a workflow

The structured JSON makes this practical for triage rather than a one-off check. Run user-generated content through it and route anything with a verdict of “review” or “unsafe” to a human moderator, while letting clearly safe items pass faster. The per-category severity lets you set different rules — for example, auto-hold anything flagged for PII or self-harm at any severity, but only escalate medium-or-higher bias findings. Keep the threshold consistent so your queue stays calibrated.

Notes and limits

LLM moderation is a useful filter, not a guarantee. It produces both false positives — flagging benign text — and false negatives, missing genuinely harmful content, especially when phrasing is indirect or in another language. The severity threshold helps you trade those off, but it cannot eliminate them. Always keep a human in the loop for consequential decisions, comply with your jurisdiction’s legal obligations independently of this tool, and remember that a “safe” verdict is the model’s opinion, not a clearance.