Visualize semantic similarity between texts
The Semantic Similarity Matrix embeds each of your texts using your own OpenAI API key, then renders an N×N heatmap of the cosine similarity between every pair. It turns a list of strings into an at-a-glance picture of which items are semantically close — invaluable for deduplication, dataset auditing, and cluster discovery.
How it works
All texts are sent in one batched request to the OpenAI Embeddings API. The returned vectors are kept in browser memory, and cosine similarity is computed locally for each pair: the dot product of two vectors divided by the product of their magnitudes. Values range from -1 to 1, where 1 means identical direction. Each cell is colour-coded — brighter green for higher similarity — so patterns jump out immediately. The diagonal is always 1.0.
Tips and notes
Look for off-diagonal cells that are unexpectedly bright: those are likely near-duplicates you can collapse before indexing. If two texts you expect to be unrelated score high, your embedding model may be picking up shared surface vocabulary rather than meaning. Keep the set small (under ~20 lines) so the grid stays readable. The embeddings request runs against OpenAI directly with your key; nothing is stored.