What is a denial-of-wallet attack?

It is an abuse pattern unique to metered AI APIs where an attacker drives up your provider bill rather than taking you offline. Because each call costs real money, a flood of expensive requests can be financially damaging even at modest request rates — which is why cost caps matter as much as rate limits.

Why are AI endpoints riskier than ordinary APIs?

Each call can be expensive and slow, outputs can leak system prompts via prompt injection, and the endpoint is attractive for scraping model behaviour. Standard rate limiting is necessary but not sufficient; you also need cost caps and content-level controls.

Should public AI endpoints exist at all?

They can, but only with strict limits, a hard daily cost cap, bot defences, and ideally a lightweight challenge. Unauthenticated access to a metered model is the highest-risk configuration, so the advisor tightens its recommendations sharply when there is no auth.

How are the suggested numbers derived?

They start from your expected per-user volume and scale it by a safety multiplier, then cross-check against your cost per call so the daily cap stays within a sane budget. They are sensible defaults to start from, not guarantees — tune against real traffic.

Do rate limits stop prompt injection?

No. Rate limits slow abuse but do not prevent a single crafted request from manipulating the model. You also need input/output filtering, system-prompt isolation, and not trusting model output as commands, which the advisor lists alongside the limits.

What is the AI API Rate Limit Security Advisor?

Describe your AI API endpoint and receive recommended rate limit settings, abuse detection patterns, cost-cap mechanisms, and authentication requirements to prevent prompt injection, scraping, and denial-of-wallet attacks. It runs free in your browser on Gera Tools, with nothing uploaded.

AI API Rate Limit Security Advisor

Name: AI API Rate Limit Security Advisor
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

AI API rate limit security advisor

A metered AI endpoint has a failure mode ordinary APIs don’t: an attacker can run up your bill — a denial-of-wallet attack — without ever taking you offline. On top of that, AI endpoints invite prompt injection and behaviour scraping. This advisor takes a short description of your endpoint and returns a concrete starting configuration: per-minute and per-day rate limits, a daily cost cap, authentication requirements, and the abuse controls that match your risk.

How it works

You tell the tool whether the endpoint is public or authenticated, the expected requests per user per day, and your provider cost per call. It scales your expected volume by a safety multiplier to set rate limits that absorb legitimate bursts while cutting off floods, then uses your cost-per-call to propose a daily spend cap that keeps a worst-case day within budget. Public endpoints get much tighter limits, mandatory bot defences, and a recommendation to add auth. Alongside the numbers it lists the content-level controls — input/output filtering, system-prompt isolation, never trusting model output as commands — that rate limiting alone cannot provide.

Threat categories specific to AI endpoints

Standard API security covers availability and access control. AI endpoints have three additional threat surfaces:

Denial-of-wallet. Each inference call costs money, and the cost scales with the input and output token count. An attacker who crafts very long prompts or triggers verbose responses can maximise cost per request. A per-request token cap is therefore as important as a per-minute request cap.

Prompt injection. A user who crafts input designed to override or manipulate your system prompt can change the model’s behavior, exfiltrate your instructions, or make the model act outside its intended scope. Rate limiting does nothing against a single well-crafted request. Input filtering, output monitoring, and treating model output as untrusted data are the defences here.

Behaviour scraping and model extraction. A sophisticated attacker may query your endpoint systematically to build a dataset of prompt-response pairs and reconstruct approximations of your fine-tuned model or your system prompt. Rate limits slow this but don’t stop a patient adversary. Techniques like instruction confidentiality (“do not reveal your instructions”) provide partial defence; per-key monitoring for systematic patterns is a stronger one.

Setting the numbers: a worked example

Suppose your chatbot is authenticated, users are expected to make about 20 requests per day, and each call costs roughly £0.01.

Per-minute limit: set at roughly 3–5× the expected burst (a user typing quickly), so 5–8 requests per minute per user.
Per-day limit: set at 3–5× expected volume, so 60–100 requests per day per key — enough for heavy legitimate use without unlimited spend.
Daily cost cap: at £0.01/call and 100 calls, that’s £1 per user per day. Setting the cap at £2–3 gives headroom for legitimate heavy use while capping a runaway call.

These are illustrative starting points, not recommendations for your specific case. Tune against real traffic and your provider’s pricing.

Tips and notes

Cap cost, not just rate. A few expensive calls can hurt more than many cheap ones; a hard daily spend cap is your real backstop.
Authenticate metered endpoints. Per-key limits and attribution are far stronger than per-IP on a public route.
Rate limits don’t stop injection. Layer input/output filtering and treat model output as data, never as commands.
Start strict, loosen with data. It’s easier to relax a limit than to recover from a runaway bill.