Embeddings are lists of hundreds or thousands of numbers — impossible to read directly. This tool projects them down to two dimensions with PCA and draws a scatter plot, so you can literally see whether semantically similar items land near each other. Everything runs in your browser.
How it works
Paste a JSON array of equal-length numeric arrays. The tool centres the data, builds the covariance matrix, and uses power iteration to extract the top two principal components — the two directions along which your vectors vary most. Every vector is projected onto those two axes and plotted. The percentage of total variance each component captures is shown underneath, telling you how trustworthy the flattened view is: high percentages mean the 2D picture faithfully reflects the real geometry.
Add labels (one per line, matching the vector order) and points sharing a label are drawn in the same colour, making clusters jump out immediately.
Reading the plot
Points that sit close together had similar embeddings — the model considers those items semantically related. Well-separated coloured groups mean your embedding model cleanly distinguishes those categories, which is exactly what you want before building a retrieval or classification system on top. If everything piles into one blob and the variance percentages are low, the discriminating signal lives in dimensions PCA’s first two components miss; that is the cue to try t-SNE or UMAP.
Tips
- Use it to debug retrieval: embed a query and a handful of candidate chunks, label them, and check the query lands nearest the chunks you expect.
- Compare embedding models by plotting the same labelled items from each — the model with tighter, more separated clusters is usually the better choice for your domain.
- PCA is deterministic, so the same input always gives the same plot — handy for documenting results.