Why decay similarity scores by age?

Pure vector similarity ignores time, so a perfectly-matching but stale document can outrank a slightly-less-similar fresh one. In domains like news, pricing, or policy, recency matters, and decay folds age into the ranking.

What does half-life mean here?

Half-life is the age at which the freshness factor drops to 0.5 in the exponential model. A 30-day half-life means a 30-day-old document keeps half its score, a 60-day-old document a quarter, and so on.

How do the three decay models differ?

Exponential decays smoothly and never reaches zero, good for gradual staleness. Linear drops to zero at a chosen cutoff age. Step keeps full score until a cutoff then applies a fixed penalty, useful when freshness is binary (in-window vs out-of-window).

Does this replace the vector search?

No. It is a re-ranking layer applied after retrieval. You run your normal similarity search, then multiply each result's score by the freshness factor here to get the final order.

Is anything sent to a server?

No. All decay math runs locally in your browser. Your ages and scores never leave the page.

What is the Embedding Freshness Decay Calculator?

Apply exponential or linear time decay to vector similarity scores based on document age, with a configurable half-life. Re-rank results so fresher documents win in time-sensitive retrieval, and see the adjusted scores side by side. It runs free in your browser on Gera Tools, with nothing uploaded.

Embedding Freshness Decay Calculator

Name: Embedding Freshness Decay Calculator
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Get one useful tool a week

Like this tool? Enter your email and we'll send you one genuinely useful Gera tool a week — plus a link to come back to this one. No spam, one-click unsubscribe any time.

Vector similarity is blind to time. In domains where freshness matters — news, prices, policies, support docs — a stale document that happens to embed close to the query can beat a newer, slightly-less-similar one. A freshness decay layer fixes that by multiplying each similarity score by a factor that shrinks with document age. This calculator lets you model that factor and see the re-ranked result before you wire it into your pipeline.

How it works

For each document you supply an age in days and a base similarity from your vector search. The tool computes a freshness factor with the model you choose and multiplies it into the score:

Exponential: factor = 0.5 ^ (age / half-life) — smooth, never zero.
Linear: factor = max(0, 1 − age / cutoff) — straight-line drop to zero at the cutoff.
Step: factor = 1 while age ≤ cutoff, otherwise a fixed penalty.

It then re-sorts the documents by the adjusted score so you can compare the original and recency-aware rankings.

Worked example

Suppose a user query about a software pricing plan retrieves these three candidates from a vector search:

Document	Age (days)	Base similarity
Pricing guide 2023 (stale)	540	0.91
Feature comparison Jan 2024	170	0.87
Current pricing page	14	0.83

Without decay, the 2023 guide wins on similarity and is the top result — even though the pricing information it contains is likely outdated. With exponential decay and a 60-day half-life:

factor (540 days) = 0.5^(540/60) = 0.5^9 ≈ 0.002   → score ≈ 0.0018
factor (170 days) = 0.5^(170/60) = 0.5^2.8 ≈ 0.143  → score ≈ 0.124
factor (14 days)  = 0.5^(14/60) ≈ 0.848              → score ≈ 0.704

The current pricing page now tops the ranking despite having the lowest base similarity. For a question about current pricing, this is the right result.

Choosing a decay model and half-life

The right half-life depends entirely on how quickly your domain’s content expires:

Domain	Recommended half-life	Reason
Financial news, live prices	Hours to 1 day	Information is materially wrong within hours
News and current events	1–3 days	Stories develop and context changes rapidly
Software documentation	30–90 days	Versions ship; API signatures and UI change
Product descriptions	60–180 days	Pricing and features update quarterly or less
Legal or policy references	90–365 days	Amendments are periodic but significant
Scientific reference material	Years	Fundamental data rarely changes

Use the step model when freshness is binary: content within a defined window is fully valid, anything outside is uniformly penalised. This suits domains like legal citation periods or regulatory filing dates.

Implementation notes

Apply decay as a post-retrieval re-rank, not inside the approximate nearest-neighbour search itself. ANN indexes search on raw vectors; inject decay after retrieving the top-K candidates.
Store timestamps as metadata on each document in the vector database. Computing age at query time from a stored created_at field is cheap; storing age as a static number means re-indexing whenever you change your decay window.
If fresh documents begin dominating obviously irrelevant matches, your half-life is too short. Lengthen it until the quality-freshness balance feels right on your evaluation set.