Before you embed a large corpus you want two numbers up front: how much will it cost, and how long will it take under your rate limits. This planner gives you both, plus the batch size that gets you there fastest — all computed locally in your browser.
How it works
Token volume is estimated as characters ÷ ~4 (a standard English heuristic) times your document count. Cost is that token total times the selected model’s per-million-token price — embeddings bill on input tokens only, so there’s no output cost to add.
For scheduling, the tool finds the optimal batch size: the most inputs it can pack into one request without exceeding the model’s input cap (commonly 2048) or the per-request token budget implied by your tokens-per-minute ÷ requests-per-minute ratio. From there it derives the number of requests and the run time, computing the time under both your RPM and TPM ceilings and reporting whichever is larger. The binding bottleneck is labelled so you know exactly which limit to ask your provider to raise.
Reading the bottleneck
- Request (RPM) bound — you have lots of small documents and run out of requests before tokens. Bigger batches help most here.
- Token (TPM) bound — your documents are large and you saturate the token budget. Only a higher TPM tier (or fewer/shorter chunks) speeds things up.
Tips
- Chunk size is a lever on both axes: smaller chunks cost fewer tokens each but produce more documents, shifting you toward the RPM bottleneck.
- If you can tolerate latency, use the provider’s asynchronous batch endpoint for roughly half the cost.
- Re-run with your real tokenizer count once for a representative document to calibrate the estimate.