Good retrieval is rarely about embeddings alone. The metadata you attach to each chunk — title, category, keywords, date — is what lets a vector store filter and re-rank results instead of returning the nearest-but-irrelevant neighbour. This tool generates that metadata for you, using your own OpenAI or Anthropic key, and hands back clean JSON to drop into your ingestion pipeline.
How it works
Choose a provider and model, paste your API key, and drop in a document chunk. Select which fields you want — title, summary, keywords, category, date. The tool sends one direct request from your browser asking the model to return a strict JSON object with exactly those fields. The response is parsed, validated, and shown as formatted JSON you can copy verbatim into the metadata field of your upsert call.
Your key never reaches a Gera server — it is held only in the tab and sent straight to the provider (with the official direct-browser-access header for Anthropic). Refreshing clears it.
Using the output
- Merge the JSON into each chunk’s metadata before calling your vector DB’s upsert (Pinecone, Weaviate, Qdrant, pgvector, etc.).
- Use
categoryanddateas filter predicates at query time to narrow the candidate set. - Keep
keywordsfor hybrid (dense + sparse) search or BM25 boosting.
Tips
- Cheaper, faster models (gpt-4o-mini, claude-3-5-haiku) are usually plenty for short tagging tasks and minimise cost.
- Generate only the fields your retrieval layer uses — every extra field is tokens spent and noise stored.
- Tag at the chunk level, not the whole document, so per-chunk filtering stays meaningful.