The Caption & Transcript Word-Count & Reading-Time tool turns a raw caption file or transcript into the numbers caption editors actually need: how many words, how long it takes to read, and which cues are too dense or too long to read comfortably on screen.
How it works
The tool first detects the format:
- WebVTT — begins with the
WEBVTTheader; timestamps use a dot decimal (00:00:01.000). - SRT — numeric cue counters and comma decimal timestamps (
00:00:01,000). - Plain text — anything else.
It strips cue numbers, timestamp lines, and inline VTT tags, then counts words (runs of non-whitespace) and characters. Reading time is words ÷ 180 wpm. For timed formats it parses each cue’s start/end into seconds and computes:
- Words per cue and the average across all cues.
- Characters per second = cue characters ÷ cue duration. Cues above ~32 cps are flagged as too fast.
- Line length — any line longer than 42 characters is flagged against the broadcast standard.
Example
A VTT cue:
00:00:01.000 --> 00:00:02.000
This single line is far too long to fit on a caption safely.
The line is 60 characters (over 42) and runs at 60 characters in 1 second (60 cps, well over 32), so it is flagged twice — split it across two cues and shorten the lines.
Tips
- Keep cues to two lines of at most 42 characters each, and aim for a reading speed your audience can follow.
- Reading time at 180 wpm is for the transcript as prose; broadcast captions are timed independently to the audio.
- Use the per-cue flags to find exactly where to re-break or re-time before exporting.