Sentence-Boundary Splitter

Split text into clean sentences for RAG indexing or evaluation.

Ad placeholder (leaderboard)

Split text into clean sentences

The Sentence-Boundary Splitter segments a block of text into individual sentences using boundary-detection heuristics that respect common edge cases. Unlike a naive split on the period character, it does not break on abbreviations (Dr., Inc., e.g.), decimal numbers (3.14), ellipses (...), or trailing quotes. The result is a clean, numbered list of sentences ready for indexing, labelling, or evaluation.

How it works

A sentence boundary is detected when a terminal punctuation mark — ., !, or ? (plus any closing quote or bracket) — is followed by whitespace and the start of a new sentence. Before splitting, the tool masks known abbreviations and decimal numbers so their periods are not treated as boundaries. After splitting, each fragment is trimmed and empty fragments are discarded. Everything runs in your browser, so even large documents stay private and fast.

Tips and notes

For RAG pipelines, sentence-level chunking pairs well with a small overlap: index each sentence but also store its neighbours’ embeddings to preserve context. If your text uses non-standard punctuation (no space after the period, for example), some boundaries may be missed — normalise spacing first for best results. The numbered output makes it easy to spot over-merged or over-split units before you commit them to an index.

Ad placeholder (rectangle)