When you are drafting a content-moderation policy, you need a fast, offline way to see which sensitive categories a piece of text touches. The AI Sensitive Topic Classifier matches pasted text against six common sensitive-topic categories using local pattern matching — useful for documenting policy intent and building test corpora.
How it works
You paste text and the tool checks it against keyword and phrase patterns for six categories: political, religious, adult/sexual, self-harm, violence, and medical advice. For each category it reports whether the text matched and how many distinct signals it found, giving a rough sense of how strongly the topic is present.
All matching runs in your browser. No model is called and nothing is uploaded, so you can classify confidential or sensitive text without it leaving your device.
What it is — and is not — for
This is a documentation and prototyping aid. It helps you draft moderation rules, build a labelled test set, and demonstrate which categories a sample triggers. It is not a production moderation classifier: keyword matching misses obfuscated or context-dependent content and will over-flag benign mentions (a news article about violence is not violent content). Live moderation needs a trained model plus human review.
Tips and notes
Treat a category hit as “this text mentions the topic,” not “this text violates policy” — context decides the latter. The self-harm category is the one where false negatives matter most, so if you are building a real system, route any self-harm signal to human review and surface support resources rather than relying on automation alone. Use the results to write down your policy’s intent for each category, then validate that intent against real examples. Everything runs locally and nothing is uploaded.