How does the classifier work?

It uses offline keyword and phrase pattern matching across six sensitive categories. It does not use a model or send your text anywhere — matching runs entirely in your browser.

Is this accurate enough for live moderation?

No. Pattern matching is a documentation and prototyping aid, not a production moderation classifier. It will miss obfuscated content and flag benign mentions. Use a trained model with human review for live moderation.

What categories does it cover?

Political, religious, adult/sexual, self-harm, violence, and medical advice. These reflect common sensitive-topic taxonomies used in content policies.

Why would I use a keyword-based tool at all?

To document policy intent, build a test corpus, and quickly see which categories a sample triggers when drafting moderation rules — before investing in a full model.

No. All matching happens locally in your browser, so you can classify confidential text safely.

What is the AI Sensitive Topic Classifier?

Paste text and classify it across sensitive topic categories (political, religious, adult, self-harm, violence, medical advice) using offline pattern matching — useful for building content moderation policy documentation. It runs free in your browser on Gera Tools, with nothing uploaded.

AI Sensitive Topic Classifier

Name: AI Sensitive Topic Classifier
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

When you are drafting a content-moderation policy, you need a fast, offline way to see which sensitive categories a piece of text touches. The AI Sensitive Topic Classifier matches pasted text against six common sensitive-topic categories using local pattern matching — useful for documenting policy intent and building test corpora.

How it works

You paste text and the tool checks it against keyword and phrase patterns for six categories: political, religious, adult/sexual, self-harm, violence, and medical advice. For each category it reports whether the text matched and how many distinct signals it found, giving a rough sense of how strongly the topic is present.

All matching runs in your browser. No model is called and nothing is uploaded, so you can classify confidential or sensitive text without it leaving your device.

What the six categories cover and how they differ

Political covers electoral content, partisan commentary, governmental policy advocacy, and political parties or figures in clearly partisan contexts. This is a sensitive category because platforms must decide whether to carry political ads, amplify political content, and how to handle political misinformation differently from factual error.

Religious covers references to faith traditions, religious practices, sacred figures, and theological claims. Sensitivity here is about equal treatment — moderation that disproportionately restricts one tradition’s expression and not another’s can violate policy consistency.

Adult/sexual covers explicit or suggestive content. The threshold varies enormously by platform and jurisdiction; the classifier flags presence, not violativeness.

Self-harm is categorically different from the others. The clinical and safety consensus is that content flagging self-harm should always be routed to human review and should surface support resources (helpline numbers, mental health signposting) regardless of whether the content is harmful. A journalist reporting on self-harm, a researcher discussing the topic, and someone in crisis may all trigger the same classifier — context is everything, and the consequence of a false negative is potentially severe.

Violence covers graphic or threatening violence — distinct from news reporting, fiction, or historical context that references violence. The pattern matcher cannot reliably make that distinction, which is why this is a prototyping tool rather than a production classifier.

Medical advice covers content that gives specific diagnostic or treatment guidance. Many platforms prohibit user-generated medical advice, but the same words appear in legitimate professional contexts. Again, the classifier identifies presence; policy determines whether the specific instance is a violation.

What it is — and is not — for

This is a documentation and prototyping aid. It helps you draft moderation rules, build a labelled test set, and demonstrate which categories a sample triggers. It is not a production moderation classifier: keyword matching misses obfuscated or context-dependent content and will over-flag benign mentions (a news article about violence is not violent content). Live moderation needs a trained model plus human review.

Tips and notes

Treat a category hit as “this text mentions the topic,” not “this text violates policy” — context decides the latter. The self-harm category is the one where false negatives matter most, so if you are building a real system, route any self-harm signal to human review and surface support resources rather than relying on automation alone. Use the results to write down your policy’s intent for each category, then validate that intent against real examples. Everything runs locally and nothing is uploaded.