Split text into clean sentences
The Sentence-Boundary Splitter segments a block of text into individual
sentences using boundary-detection heuristics that respect common edge cases.
Unlike a naive split on the period character, it does not break on abbreviations
(Dr., Inc., e.g.), decimal numbers (3.14), ellipses (...), or trailing
quotes. The result is a clean, numbered list of sentences ready for indexing,
labelling, or evaluation.
How it works
A sentence boundary is detected when a terminal punctuation mark — ., !, or
? (plus any closing quote or bracket) — is followed by whitespace and the start
of a new sentence. Before splitting, the tool masks known abbreviations and
decimal numbers so their periods are not treated as boundaries. After splitting,
each fragment is trimmed and empty fragments are discarded. Everything runs in
your browser, so even large documents stay private and fast.
Tips and notes
For RAG pipelines, sentence-level chunking pairs well with a small overlap: index each sentence but also store its neighbours’ embeddings to preserve context. If your text uses non-standard punctuation (no space after the period, for example), some boundaries may be missed — normalise spacing first for best results. The numbered output makes it easy to spot over-merged or over-split units before you commit them to an index.